Sample records for raw illumina beadarray

  1. BeadArray Expression Analysis Using Bioconductor

    PubMed Central

    Ritchie, Matthew E.; Dunning, Mark J.; Smith, Mike L.; Shi, Wei; Lynch, Andy G.

    2011-01-01

    Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered. PMID:22144879

  2. Generalization of the normal-exponential model: exploration of a more accurate parametrisation for the signal distribution on Illumina BeadArrays.

    PubMed

    Plancade, Sandra; Rozenholc, Yves; Lund, Eiliv

    2012-12-11

    Illumina BeadArray technology includes non specific negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2012) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction, implemented in the R-package NormalGamma. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, it is shown on two types of Illumina BeadChips that this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets, through testing procedures which represent various experimental designs. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision. This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.

  3. Enhanced identification and biological validation of differential gene expression via Illumina whole-genome expression arrays through the use of the model-based background correction methodology

    PubMed Central

    Ding, Liang-Hao; Xie, Yang; Park, Seongmi; Xiao, Guanghua; Story, Michael D.

    2008-01-01

    Despite the tremendous growth of microarray usage in scientific studies, there is a lack of standards for background correction methodologies, especially in single-color microarray platforms. Traditional background subtraction methods often generate negative signals and thus cause large amounts of data loss. Hence, some researchers prefer to avoid background corrections, which typically result in the underestimation of differential expression. Here, by utilizing nonspecific negative control features integrated into Illumina whole genome expression arrays, we have developed a method of model-based background correction for BeadArrays (MBCB). We compared the MBCB with a method adapted from the Affymetrix robust multi-array analysis algorithm and with no background subtraction, using a mouse acute myeloid leukemia (AML) dataset. We demonstrated that differential expression ratios obtained by using the MBCB had the best correlation with quantitative RT–PCR. MBCB also achieved better sensitivity in detecting differentially expressed genes with biological significance. For example, we demonstrated that the differential regulation of Tnfr2, Ikk and NF-kappaB, the death receptor pathway, in the AML samples, could only be detected by using data after MBCB implementation. We conclude that MBCB is a robust background correction method that will lead to more precise determination of gene expression and better biological interpretation of Illumina BeadArray data. PMID:18450815

  4. An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray.

    PubMed

    Salas, Lucas A; Koestler, Devin C; Butler, Rondi A; Hansen, Helen M; Wiencke, John K; Kelsey, Karl T; Christensen, Brock C

    2018-05-29

    Genome-wide methylation arrays are powerful tools for assessing cell composition of complex mixtures. We compare three approaches to select reference libraries for deconvoluting neutrophil, monocyte, B-lymphocyte, natural killer, and CD4+ and CD8+ T-cell fractions based on blood-derived DNA methylation signatures assayed using the Illumina HumanMethylationEPIC array. The IDOL algorithm identifies a library of 450 CpGs, resulting in an average R 2  = 99.2 across cell types when applied to EPIC methylation data collected on artificial mixtures constructed from the above cell types. Of the 450 CpGs, 69% are unique to EPIC. This library has the potential to reduce unintended technical differences across array platforms.

  5. Haematobia irritans dataset of raw sequence reads from Illumina and Pac Bio sequencing of genomic DNA

    USDA-ARS?s Scientific Manuscript database

    The genome of the horn fly, Haematobia irritans, was sequenced using Illumina- and Pac Bio-based protocols. Following quality filtering, the raw reads have been deposited at NCBI under the BioProject and BioSample accession numbers PRJNA30967 and SAMN07830356, respectively. The Illumina reads are un...

  6. CNV-WebStore: online CNV analysis, storage and interpretation.

    PubMed

    Vandeweyer, Geert; Reyniers, Edwin; Wuyts, Wim; Rooms, Liesbeth; Kooy, R Frank

    2011-01-05

    Microarray technology allows the analysis of genomic aberrations at an ever increasing resolution, making functional interpretation of these vast amounts of data the main bottleneck in routine implementation of high resolution array platforms, and emphasising the need for a centralised and easy to use CNV data management and interpretation system. We present CNV-WebStore, an online platform to streamline the processing and downstream interpretation of microarray data in a clinical context, tailored towards but not limited to the Illumina BeadArray platform. Provided analysis tools include CNV analsyis, parent of origin and uniparental disomy detection. Interpretation tools include data visualisation, gene prioritisation, automated PubMed searching, linking data to several genome browsers and annotation of CNVs based on several public databases. Finally a module is provided for uniform reporting of results. CNV-WebStore is able to present copy number data in an intuitive way to both lab technicians and clinicians, making it a useful tool in daily clinical practice.

  7. A Practical Platform for Blood Biomarker Study by Using Global Gene Expression Profiling of Peripheral Whole Blood

    PubMed Central

    Schmid, Patrick; Yao, Hui; Galdzicki, Michal; Berger, Bonnie; Wu, Erxi; Kohane, Isaac S.

    2009-01-01

    Background Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study. Methods and Findings We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal. Conclusion We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study. PMID:19381341

  8. Transcriptome sequencing of newly molted adult female cattle ticks, Rhipicephalus microplus: Raw Illumina reads.

    USDA-ARS?s Scientific Manuscript database

    Illumina paired end oligo-dT sequencing technology was used to sequence the transcriptome from newly molted adult females from the cattle tick, Rhipicephalus microplus. These samples include newly molted unfed whole adult females, newly molted whole adult females feeding for 2 hours on a bovine host...

  9. Digital detection of multiple minority mutants and expression levels of multiple colorectal cancer-related genes using digital-PCR coupled with bead-array.

    PubMed

    Huang, Huan; Li, Shuo; Sun, Lizhou; Zhou, Guohua

    2015-01-01

    To simultaneously analyze mutations and expression levels of multiple genes on one detection platform, we proposed a method termed "multiplex ligation-dependent probe amplification-digital amplification coupled with hydrogel bead-array" (MLPA-DABA) and applied it to diagnose colorectal cancer (CRC). CRC cells and tissues were sampled to extract nucleic acid, perform MLPA with sequence-tagged probes, perform digital emulsion polymerase chain reaction (PCR), and produce a hydrogel bead-array to immobilize beads and form a single bead layer on the array. After hybridization with fluorescent probes, the number of colored beads, which reflects the abundance of expressed genes and the mutation rate, was counted for diagnosis. Only red or green beads occurred on the chips in the mixed samples, indicating the success of single-molecule PCR. When a one-source sample was analyzed using mixed MLPA probes, beads of only one color occurred, suggesting the high specificity of the method in analyzing CRC mutation and gene expression. In gene expression analysis of a CRC tissue from one CRC patient, the mutant percentage was 3.1%, and the expression levels of CRC-related genes were much higher than those of normal tissue. The highly sensitive MLPA-DABA succeeds in the relative quantification of mutations and gene expressions of exfoliated cells in stool samples of CRC patients on the same chip platform. MLPA-DABA coupled with hydrogel bead-array is a promising method in the non-invasive diagnosis of CRC.

  10. Multilayer-omics analysis of renal cell carcinoma, including the whole exome, methylome and transcriptome.

    PubMed

    Arai, Eri; Sakamoto, Hiromi; Ichikawa, Hitoshi; Totsuka, Hirohiko; Chiku, Suenori; Gotoh, Masahiro; Mori, Taisuke; Nakatani, Tamao; Ohnami, Sumiko; Nakagawa, Tohru; Fujimoto, Hiroyuki; Wang, Linghua; Aburatani, Hiroyuki; Yoshida, Teruhiko; Kanai, Yae

    2014-09-15

    The aim of this study was to identify pathways that have a significant impact during renal carcinogenesis. Sixty-seven paired samples of both noncancerous renal cortex tissue and cancerous tissue from patients with clear cell renal cell carcinomas (RCCs) were subjected to whole-exome, methylome and transcriptome analyses using Agilent SureSelect All Exon capture followed by sequencing on an Illumina HiSeq 2000 platform, Illumina Infinium HumanMethylation27 BeadArray and Agilent SurePrint Human Gene Expression microarray, respectively. Sanger sequencing and quantitative reverse transcription-PCR were performed for technical verification. MetaCore software was used for pathway analysis. Somatic nonsynonymous single-nucleotide mutations, insertions/deletions and intragenic breaks of 2,153, 359 and 8 genes were detected, respectively. Mutations of GCN1L1, MED12 and CCNC, which are members of CDK8 mediator complex directly regulating β-catenin-driven transcription, were identified in 16% of the RCCs. Mutations of MACF1, which functions in the Wnt/β-catenin signaling pathway, were identified in 4% of the RCCs. A combination of methylome and transcriptome analyses further highlighted the significant role of the Wnt/β-catenin signaling pathway in renal carcinogenesis. Genetic aberrations and reduced expression of ERC2 and ABCA13 were frequent in RCCs, and MTOR mutations were identified as one of the major disrupters of cell signaling during renal carcinogenesis. Our results confirm that multilayer-omics analysis can be a powerful tool for revealing pathways that play a significant role in carcinogenesis. © 2014 The Authors. Published by Wiley Periodicals, Inc. on behalf of UICC.

  11. Multilayer-omics analysis of renal cell carcinoma, including the whole exome, methylome and transcriptome

    PubMed Central

    Arai, Eri; Sakamoto, Hiromi; Ichikawa, Hitoshi; Totsuka, Hirohiko; Chiku, Suenori; Gotoh, Masahiro; Mori, Taisuke; Nakatani, Tamao; Ohnami, Sumiko; Nakagawa, Tohru; Fujimoto, Hiroyuki; Wang, Linghua; Aburatani, Hiroyuki; Yoshida, Teruhiko; Kanai, Yae

    2014-01-01

    The aim of this study was to identify pathways that have a significant impact during renal carcinogenesis. Sixty-seven paired samples of both noncancerous renal cortex tissue and cancerous tissue from patients with clear cell renal cell carcinomas (RCCs) were subjected to whole-exome, methylome and transcriptome analyses using Agilent SureSelect All Exon capture followed by sequencing on an Illumina HiSeq 2000 platform, Illumina Infinium HumanMethylation27 BeadArray and Agilent SurePrint Human Gene Expression microarray, respectively. Sanger sequencing and quantitative reverse transcription-PCR were performed for technical verification. MetaCore software was used for pathway analysis. Somatic nonsynonymous single-nucleotide mutations, insertions/deletions and intragenic breaks of 2,153, 359 and 8 genes were detected, respectively. Mutations of GCN1L1, MED12 and CCNC, which are members of CDK8 mediator complex directly regulating β-catenin-driven transcription, were identified in 16% of the RCCs. Mutations of MACF1, which functions in the Wnt/β-catenin signaling pathway, were identified in 4% of the RCCs. A combination of methylome and transcriptome analyses further highlighted the significant role of the Wnt/β-catenin signaling pathway in renal carcinogenesis. Genetic aberrations and reduced expression of ERC2 and ABCA13 were frequent in RCCs, and MTOR mutations were identified as one of the major disrupters of cell signaling during renal carcinogenesis. Our results confirm that multilayer-omics analysis can be a powerful tool for revealing pathways that play a significant role in carcinogenesis. PMID:24504440

  12. Haematobia irritans dataset of raw sequence reads from Illumina-based transcriptome sequencing of specific tissues and life stages

    USDA-ARS?s Scientific Manuscript database

    Illumina HiSeq technology was used to sequence the transcriptome from various dissected tissues and life stages from the horn fly, Haematobia irritans. These samples include eggs (0, 2, 4, and 9 hours post-oviposition), adult fly gut, adult fly legs, adult fly malpighian tubule, adult fly ovary, adu...

  13. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data processing.

    PubMed

    Almeida, Diogo; Skov, Ida; Lund, Jesper; Mohammadnejad, Afsaneh; Silva, Artur; Vandin, Fabio; Tan, Qihua; Baumbach, Jan; Röttger, Richard

    2016-10-01

    Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.

  14. Jllumina - A comprehensive Java-based API for statistical Illumina Infinium HumanMethylation450 and MethylationEPIC data processing.

    PubMed

    Almeida, Diogo; Skov, Ida; Lund, Jesper; Mohammadnejad, Afsaneh; Silva, Artur; Vandin, Fabio; Tan, Qihua; Baumbach, Jan; Röttger, Richard

    2016-12-18

    Measuring differential methylation of the DNA is the nowadays most common approach to linking epigenetic modifications to diseases (called epigenome-wide association studies, EWAS). For its low cost, its efficiency and easy handling, the Illumina HumanMethylation450 BeadChip and its successor, the Infinium MethylationEPIC BeadChip, is the by far most popular techniques for conduction EWAS in large patient cohorts. Despite the popularity of this chip technology, raw data processing and statistical analysis of the array data remains far from trivial and still lacks dedicated software libraries enabling high quality and statistically sound downstream analyses. As of yet, only R-based solutions are freely available for low-level processing of the Illumina chip data. However, the lack of alternative libraries poses a hurdle for the development of new bioinformatic tools, in particular when it comes to web services or applications where run time and memory consumption matter, or EWAS data analysis is an integrative part of a bigger framework or data analysis pipeline. We have therefore developed and implemented Jllumina, an open-source Java library for raw data manipulation of Illumina Infinium HumanMethylation450 and Infinium MethylationEPIC BeadChip data, supporting the developer with Java functions covering reading and preprocessing the raw data, down to statistical assessment, permutation tests, and identification of differentially methylated loci. Jllumina is fully parallelizable and publicly available at http://dimmer.compbio.sdu.dk/download.html.

  15. A Single Meal Containing Raw, Crushed Garlic Influences Expression of Immunity- and Cancer-Related Genes in Whole Blood of Humans1234

    PubMed Central

    Charron, Craig S; Dawson, Harry D; Albaugh, George P; Solverson, Patrick M; Vinyard, Bryan T; Solano-Aguilar, Gloria I; Molokin, Aleksey; Novotny, Janet A

    2015-01-01

    Background: Preclinical and epidemiologic studies suggest that garlic intake is inversely associated with the progression of cancer and cardiovascular disease. Objective: We designed a study to probe the mechanisms of garlic action in humans. Methods: We conducted a randomized crossover feeding trial in which 17 volunteers consumed a garlic-containing meal (100 g white bread, 15 g butter, and 5 g raw, crushed garlic) or a garlic-free control meal (100 g white bread and 15 g butter) after 10 d of consuming a controlled, garlic-free diet. Blood was collected before and 3 h after test meal consumption for gene expression analysis in whole blood. Illumina BeadArray was used to screen for genes of interest, followed by real-time quantitative reverse transcriptase–polymerase chain reaction (qRT-PCR) on selected genes. To augment human study findings, Mono Mac 6 cells were treated with a purified garlic extract (0.5 μL/mL), and mRNA was measured by qRT-PCR at 0, 3, 6, and 24 h. Results: The following 7 genes were found to be upregulated by garlic intake: aryl hydrocarbon receptor (AHR), aryl hydrocarbon receptor nuclear translocator (ARNT), hypoxia-inducible factor 1α (HIF1A), proto-oncogene c-Jun (JUN), nuclear factor of activated T cells (NFAT) activating protein with immunoreceptor tyrosine-based activation motif 1 (NFAM1), oncostatin M (OSM), and V-rel avian reticuloendotheliosis viral oncogene homolog (REL). Fold-increases in mRNA transcripts ranged from 1.6 (HIF1A) to 3.0 (NFAM1) (P < 0.05). The mRNA levels of 5 of the 7 genes that were upregulated in the human trial were also upregulated in cell culture at 3 and 6 h: AHR, HIF1A, JUN, OSM, and REL. Fold-increases in mRNA transcripts in cell culture ranged from 1.7 (HIF1A) to 12.1 (JUN) (P < 0.01). OSM protein was measured by ELISA and was significantly higher than the control at 3, 6, and 24 h (24 h: 19.5 ± 1.4 and 74.8 ± 1.4 pg/mL for control and garlic, respectively). OSM is a pleiotropic cytokine that inhibits several tumor cell lines in culture. Conclusion: These data indicate that the bioactivity of garlic is multifaceted and includes activation of genes related to immunity, apoptosis, and xenobiotic metabolism in humans and Mono Mac 6 cells. This trial is registered at clinicaltrials.gov as NCT01293591. PMID:26423732

  16. A Single Meal Containing Raw, Crushed Garlic Influences Expression of Immunity- and Cancer-Related Genes in Whole Blood of Humans.

    PubMed

    Charron, Craig S; Dawson, Harry D; Albaugh, George P; Solverson, Patrick M; Vinyard, Bryan T; Solano-Aguilar, Gloria I; Molokin, Aleksey; Novotny, Janet A

    2015-11-01

    Preclinical and epidemiologic studies suggest that garlic intake is inversely associated with the progression of cancer and cardiovascular disease. We designed a study to probe the mechanisms of garlic action in humans. We conducted a randomized crossover feeding trial in which 17 volunteers consumed a garlic-containing meal (100 g white bread, 15 g butter, and 5 g raw, crushed garlic) or a garlic-free control meal (100 g white bread and 15 g butter) after 10 d of consuming a controlled, garlic-free diet. Blood was collected before and 3 h after test meal consumption for gene expression analysis in whole blood. Illumina BeadArray was used to screen for genes of interest, followed by real-time quantitative reverse transcriptase-polymerase chain reaction (qRT-PCR) on selected genes. To augment human study findings, Mono Mac 6 cells were treated with a purified garlic extract (0.5 μL/mL), and mRNA was measured by qRT-PCR at 0, 3, 6, and 24 h. The following 7 genes were found to be upregulated by garlic intake: aryl hydrocarbon receptor (AHR), aryl hydrocarbon receptor nuclear translocator (ARNT), hypoxia-inducible factor 1α (HIF1A), proto-oncogene c-Jun (JUN), nuclear factor of activated T cells (NFAT) activating protein with immunoreceptor tyrosine-based activation motif 1 (NFAM1), oncostatin M (OSM), and V-rel avian reticuloendotheliosis viral oncogene homolog (REL). Fold-increases in mRNA transcripts ranged from 1.6 (HIF1A) to 3.0 (NFAM1) (P < 0.05). The mRNA levels of 5 of the 7 genes that were upregulated in the human trial were also upregulated in cell culture at 3 and 6 h: AHR, HIF1A, JUN, OSM, and REL. Fold-increases in mRNA transcripts in cell culture ranged from 1.7 (HIF1A) to 12.1 (JUN) (P < 0.01). OSM protein was measured by ELISA and was significantly higher than the control at 3, 6, and 24 h (24 h: 19.5 ± 1.4 and 74.8 ± 1.4 pg/mL for control and garlic, respectively). OSM is a pleiotropic cytokine that inhibits several tumor cell lines in culture. These data indicate that the bioactivity of garlic is multifaceted and includes activation of genes related to immunity, apoptosis, and xenobiotic metabolism in humans and Mono Mac 6 cells. This trial is registered at clinicaltrials.gov as NCT01293591. © 2015 American Society for Nutrition.

  17. Epigenomics of Alzheimer’s Disease

    PubMed Central

    Bennett, David A.; Yu, Lei; Yang, Jingyun; Srivastava, Gyan P.; Aubin, Cristin; De Jager, Philip L.

    2014-01-01

    Alzheimer’s disease (AD) is a large and growing public health problem. It is characterized by the accumulation of amyloid-β peptides and abnormally phosphorylated tau proteins that are associated with cognitive decline and dementia. Much has been learned about the genomics of AD from linkage analyses and more recently, genome-wide association studies. Several but not all aspects of the genomic landscape are involved in amyloid-metabolism. The moderate concordance of disease among twins suggests other factors, potentially epigenomic factors, are related to AD. We are at the earliest stages of examining the relation of the epigenome to the clinical and pathologic phenotypes that characterize AD. Our literature review suggests that there is some evidence of age-related changes in human brain methylation. Unfortunately, studies of AD have been relatively small with limited coverage of methylation sites and microRNA, let alone other epigenomic marks. We are in the midst of two large studies of human brains including coverage of more than 420,000 autosomal cytosine-guanine dinucleotides (CGs) with the Illumina Infinium HumanMethylation 450K BeadArray, and histone acetylation with chromatin immunoprecipitation-sequencing. We present descriptive data to help inform other researchers what to expect from these approaches in order to better design and power their studies. We then discuss future directions to inform on the epigenomic architecture of AD. PMID:24905038

  18. Preprocessing of gene expression data by optimally robust estimators

    PubMed Central

    2010-01-01

    Background The preprocessing of gene expression data obtained from several platforms routinely includes the aggregation of multiple raw signal intensities to one expression value. Examples are the computation of a single expression measure based on the perfect match (PM) and mismatch (MM) probes for the Affymetrix technology, the summarization of bead level values to bead summary values for the Illumina technology or the aggregation of replicated measurements in the case of other technologies including real-time quantitative polymerase chain reaction (RT-qPCR) platforms. The summarization of technical replicates is also performed in other "-omics" disciplines like proteomics or metabolomics. Preprocessing methods like MAS 5.0, Illumina's default summarization method, RMA, or VSN show that the use of robust estimators is widely accepted in gene expression analysis. However, the selection of robust methods seems to be mainly driven by their high breakdown point and not by efficiency. Results We describe how optimally robust radius-minimax (rmx) estimators, i.e. estimators that minimize an asymptotic maximum risk on shrinking neighborhoods about an ideal model, can be used for the aggregation of multiple raw signal intensities to one expression value for Affymetrix and Illumina data. With regard to the Affymetrix data, we have implemented an algorithm which is a variant of MAS 5.0. Using datasets from the literature and Monte-Carlo simulations we provide some reasoning for assuming approximate log-normal distributions of the raw signal intensities by means of the Kolmogorov distance, at least for the discussed datasets, and compare the results of our preprocessing algorithms with the results of Affymetrix's MAS 5.0 and Illumina's default method. The numerical results indicate that when using rmx estimators an accuracy improvement of about 10-20% is obtained compared to Affymetrix's MAS 5.0 and about 1-5% compared to Illumina's default method. The improvement is also visible in the analysis of technical replicates where the reproducibility of the values (in terms of Pearson and Spearman correlation) is increased for all Affymetrix and almost all Illumina examples considered. Our algorithms are implemented in the R package named RobLoxBioC which is publicly available via CRAN, The Comprehensive R Archive Network (http://cran.r-project.org/web/packages/RobLoxBioC/). Conclusions Optimally robust rmx estimators have a high breakdown point and are computationally feasible. They can lead to a considerable gain in efficiency for well-established bioinformatics procedures and thus, can increase the reproducibility and power of subsequent statistical analysis. PMID:21118506

  19. Transcriptomic Identification of ADH1B as a Novel Candidate Gene for Obesity and Insulin Resistance in Human Adipose Tissue in Mexican Americans from the Veterans Administration Genetic Epidemiology Study (VAGES)

    PubMed Central

    Winnier, Deidre A.; Fourcaudot, Marcel; Norton, Luke; Abdul-Ghani, Muhammad A.; Hu, Shirley L.; Farook, Vidya S.; Coletta, Dawn K.; Kumar, Satish; Puppala, Sobha; Chittoor, Geetha; Dyer, Thomas D.; Arya, Rector; Carless, Melanie; Lehman, Donna M.; Curran, Joanne E.; Cromack, Douglas T.; Tripathy, Devjit; Blangero, John; Duggirala, Ravindranath; Göring, Harald H. H.; DeFronzo, Ralph A.; Jenkinson, Christopher P.

    2015-01-01

    Type 2 diabetes (T2D) is a complex metabolic disease that is more prevalent in ethnic groups such as Mexican Americans, and is strongly associated with the risk factors obesity and insulin resistance. The goal of this study was to perform whole genome gene expression profiling in adipose tissue to detect common patterns of gene regulation associated with obesity and insulin resistance. We used phenotypic and genotypic data from 308 Mexican American participants from the Veterans Administration Genetic Epidemiology Study (VAGES). Basal fasting RNA was extracted from adipose tissue biopsies from a subset of 75 unrelated individuals, and gene expression data generated on the Illumina BeadArray platform. The number of gene probes with significant expression above baseline was approximately 31,000. We performed multiple regression analysis of all probes with 15 metabolic traits. Adipose tissue had 3,012 genes significantly associated with the traits of interest (false discovery rate, FDR ≤ 0.05). The significance of gene expression changes was used to select 52 genes with significant (FDR ≤ 10-4) gene expression changes across multiple traits. Gene sets/Pathways analysis identified one gene, alcohol dehydrogenase 1B (ADH1B) that was significantly enriched (P < 10-60) as a prime candidate for involvement in multiple relevant metabolic pathways. Illumina BeadChip derived ADH1B expression data was consistent with quantitative real time PCR data. We observed significant inverse correlations with waist circumference (2.8 x 10-9), BMI (5.4 x 10-6), and fasting plasma insulin (P < 0.001). These findings are consistent with a central role for ADH1B in obesity and insulin resistance and provide evidence for a novel genetic regulatory mechanism for human metabolic diseases related to these traits. PMID:25830378

  20. Digital Detection of Multiple Minority Mutants and Expression Levels of Multiple Colorectal Cancer-Related Genes Using Digital-PCR Coupled with Bead-Array

    PubMed Central

    Huang, Huan; Li, Shuo; Sun, Lizhou; Zhou, Guohua

    2015-01-01

    To simultaneously analyze mutations and expression levels of multiple genes on one detection platform, we proposed a method termed “multiplex ligation-dependent probe amplification–digital amplification coupled with hydrogel bead-array” (MLPA–DABA) and applied it to diagnose colorectal cancer (CRC). CRC cells and tissues were sampled to extract nucleic acid, perform MLPA with sequence-tagged probes, perform digital emulsion polymerase chain reaction (PCR), and produce a hydrogel bead-array to immobilize beads and form a single bead layer on the array. After hybridization with fluorescent probes, the number of colored beads, which reflects the abundance of expressed genes and the mutation rate, was counted for diagnosis. Only red or green beads occurred on the chips in the mixed samples, indicating the success of single-molecule PCR. When a one-source sample was analyzed using mixed MLPA probes, beads of only one color occurred, suggesting the high specificity of the method in analyzing CRC mutation and gene expression. In gene expression analysis of a CRC tissue from one CRC patient, the mutant percentage was 3.1%, and the expression levels of CRC-related genes were much higher than those of normal tissue. The highly sensitive MLPA–DABA succeeds in the relative quantification of mutations and gene expressions of exfoliated cells in stool samples of CRC patients on the same chip platform. MLPA–DABA coupled with hydrogel bead-array is a promising method in the non-invasive diagnosis of CRC. PMID:25880764

  1. Epigenetic Patterns in Successful Weight Loss Maintainers: A Pilot Study

    PubMed Central

    Hawley, Nicola L.; Wing, Rena R.; Kelsey, Karl T.; McCaffery, Jeanne M.

    2014-01-01

    DNA methylation changes occur in animal models of calorie restriction, simulating human dieting, and in human subjects undergoing behavioral weight loss interventions. This suggests that obese individuals may possess unique epigenetic patterns that may vary with weight loss. Here, we examine whether methylation patterns in leukocytes differ in individuals who lost sufficient weight to go from obese to normal weight (successful weight loss maintainers; SWLM) vs currently obese (OB) or normal weight (NW) individuals. This study examined peripheral blood mononuclear cell (PBMC) methylation patterns in NW (n=16, current/lifetime BMI 18.5-24.9) and OB individuals (n=16, current BMI≥30), and SWLM (n=16, current BMI 18.5-24.9, lifetime maximum BMI ≥30, average weight loss 57.4 lbs) using an Illumina Infinium HumanMethylation450 BeadArray. No leukocyte population-adjusted epigenome-wide analyses were significant; however, potentially differentially methylated loci across groups were observed in RYR1 (p=1.54E-6), MPZL3 (p=4.70E-6), and TUBA3C (p=4.78E-6). In 32 obesity-related candidate genes, differential methylation patterns were found in BDNF (gene-wide p=0.00018). In RYR1, TUBA3C and BDNF, SWLM differed from OB but not NW. In this preliminary investigation, leukocyte SWLM DNA methylation patterns more closely resembled NW than OB individuals in three gene regions. These results suggest that PBMC methylation is associated with weight status. PMID:25520250

  2. Analysis of differential gene expression by bead-based fiber-optic array in nonfunctioning pituitary adenomas.

    PubMed

    Jiang, Z; Gui, S; Zhang, Y

    2011-05-01

    Nonfunctioning pituitary adenomas (NFPAs) are relatively common, accounting for 30% of all pituitary adenomas; however, their pathogenesis remains enigmatic. To explore the possible pathogenesis of NFPAs, we used fiber-optic BeadArray to examine gene expression in 5 NFPAs compared with 3 normal pituitaries. 4 differentially expressed genes were chosen randomly for validation by reverse transcriptase-real time quantitative polymerase chain reaction (RT-qPCR). We then analyzed the differentially expressed gene profile with Kyoto Encyclopedia of Genes and Genomes (KEGG). The array analysis indentified significant increases in the expression of 1,402 genes and 383 expressed sequence tags (ESTs), and decreases in 1,697 genes and 113 ESTs in the NFPAs. Bioinformatic and pathway analysis showed that the genes HIGD1B, FAM5C, PMAIP1 and the pathway cell-cycle regulation may play an important role in tumorigenesis and progression of NFPAs. Our data suggest fiber-optic BeadArray combined with pathway analysis of differential gene expression profile appears to be a valid approach for investigating the pathogenesis of tumors. © Georg Thieme Verlag KG Stuttgart · New York.

  3. First report of bacterial community from a Bat Guano using Illumina next-generation sequencing.

    PubMed

    De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu

    2015-06-01

    V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave - Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach.

  4. First report of bacterial community from a Bat Guano using Illumina next-generation sequencing

    PubMed Central

    De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu

    2015-01-01

    V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave — Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach. PMID:26484190

  5. Epigenetic patterns in successful weight loss maintainers: a pilot study.

    PubMed

    Huang, Yen-Tsung; Maccani, Jennifer Z J; Hawley, Nicola L; Wing, Rena R; Kelsey, Karl T; McCaffery, Jeanne M

    2015-05-01

    DNA methylation changes occur in animal models of calorie restriction, simulating human dieting, and in human subjects undergoing behavioral weight loss interventions. This suggests that obese (OB) individuals may possess unique epigenetic patterns that may vary with weight loss. Here, we examine whether methylation patterns in leukocytes differ in individuals who lost sufficient weight to go from OB to normal weight (NW; successful weight loss maintainers; SWLMs) vs currently OB or NW individuals. This study examined peripheral blood mononuclear cell (PBMC) methylation patterns in NW (n=16, current/lifetime BMI 18.5-24.9) and OB individuals (n=16, current body mass index (BMI)⩾30), and SWLM (n=16, current BMI 18.5-24.9, lifetime maximum BMI ⩾30, average weight loss 57.4 lbs) using an Illumina Infinium HumanMethylation450 BeadArray. No leukocyte population-adjusted epigenome-wide analyses were significant; however, potentially differentially methylated loci across the groups were observed in ryanodine receptor-1 (RYR1; P=1.54E-6), myelin protein zero-like 3 (MPZL3; P=4.70E-6) and alpha 3c tubulin (TUBA3C; P=4.78E-6). In 32 obesity-related candidate genes, differential methylation patterns were found in brain-derived neurotrophic factor (BDNF; gene-wide P=0.00018). In RYR1, TUBA3C and BDNF, SWLM differed from OB but not NW. In this preliminary investigation, leukocyte SWLM DNA methylation patterns more closely resembled NW than OB individuals in three gene regions. These results suggest that PBMC methylation is associated with weight status.

  6. A high-density intraspecific SNP linkage map of pigeonpea (Cajanas cajan L. Millsp.)

    PubMed Central

    Mandal, Paritra; Bhutani, Shefali; Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram Pratap; Chaudhary, A. K.; Yadav, Rekha; Gaikwad, K.; Sevanthi, Amitha Mithra; Datta, Subhojit; Raje, Ranjeet S.; Sharma, Tilak R.; Singh, Nagendra Kumar

    2017-01-01

    Pigeonpea (Cajanus cajan (L.) Millsp.) is a major food legume cultivated in semi-arid tropical regions including the Indian subcontinent, Africa, and Southeast Asia. It is an important source of protein, minerals, and vitamins for nearly 20% of the world population. Due to high carbon sequestration and drought tolerance, pigeonpea is an important crop for the development of climate resilient agriculture and nutritional security. However, pigeonpea productivity has remained low for decades because of limited genetic and genomic resources, and sparse utilization of landraces and wild pigeonpea germplasm. Here, we present a dense intraspecific linkage map of pigeonpea comprising 932 markers that span a total adjusted map length of 1,411.83 cM. The consensus map is based on three different linkage maps that incorporate a large number of single nucleotide polymorphism (SNP) markers derived from next generation sequencing data, using Illumina GoldenGate bead arrays, and genotyping with restriction site associated DNA (RAD) sequencing. The genotyping-by-sequencing enhanced the marker density but was met with limited success due to lack of common markers across the genotypes of mapping population. The integrated map has 547 bead-array SNP, 319 RAD-SNP, and 65 simple sequence repeat (SSR) marker loci. We also show here correspondence between our linkage map and published genome pseudomolecules of pigeonpea. The availability of a high-density linkage map will help improve the anchoring of the pigeonpea genome to its chromosomes and the mapping of genes and quantitative trait loci associated with useful agronomic traits. PMID:28654689

  7. Farm-to-fork investigation of an outbreak of Shiga toxin-producing Escherichia coli O157

    PubMed Central

    Wilson, Deborah; Dolan, Gayle; Aird, Heather; Sorrell, Shirley; Dallman, Timothy J.; Jenkins, Claire; Robertson, Lucy; Gorton, Russell

    2018-01-01

    Fifteen cases of Shiga toxin-producing Escherichia coli (STEC) O157 infection were associated with the consumption of contaminated food from two related butchers’ premises in the north-east of England. Ten cases were admitted to hospital and seven cases developed haemolytic uraemic syndrome. A case control study found a statistically significant association with the purchase of raw and/or ready-to-eat (RTE) food supplied by the implicated butchers’ shops. Isolates of STEC O157 were detected in two raw lamb burgers taken from one of the butchers’ premises. Subsequent environmental sampling identified STEC O157 in bovine faecal samples on the farm supplying cattle to the implicated butchers for slaughter. Whole genome sequencing (WGS) was performed on the Illumina HiSeq 2500 platform on all cultures isolated from humans, food and cattle during the investigation. Quality trimmed Illumina reads were mapped to the STEC O157 reference genome Sakai using bwa-mem, and single nucleotide polymorphisms (SNPs) were identified using gatk2. Analysis of the core genome SNP positions (>90 % consensus, minimum depth 10×, mapping quality (MQ)≥30) revealed that all isolates from humans, food and cattle differed by two SNPs. WGS analysis provided forensic-level microbiological evidence to support the epidemiological links between the farm, the butchers’ premises and the clinical cases. Cross-contamination from raw meat to RTE foods at the butchers’ premises was the most plausible transmission route. The evidence presented here highlights the importance of taking measures to mitigate the risks of cross-contamination in this setting. PMID:29488865

  8. Impact of thistle rennet from Carlina acanthifolia All. subsp. acanthifolia on bacterial diversity and dynamics of a specialty Italian raw ewes' milk cheese.

    PubMed

    Cardinali, Federica; Osimani, Andrea; Taccari, Manuela; Milanović, Vesna; Garofalo, Cristiana; Clementi, Francesca; Polverigiani, Serena; Zitti, Silvia; Raffaelli, Nadia; Mozzon, Massimo; Foligni, Roberta; Franciosi, Elena; Tuohy, Kieran; Aquilanti, Lucia

    2017-08-16

    Caciofiore della Sibilla is an Italian specialty soft cheese manufactured with Sopravissana raw ewes' milk and thistle rennet prepared with young fresh leaves and stems of Carlina acanthifolia All. subsp. acanthifolia, according to an ancient tradition deeply rooted in the territory of origin (mountainous hinterland of the Marche region, Central Italy). In this study, the impact of thistle rennet on the bacterial dynamics and diversity of Caciofiore della Sibilla cheese was investigated by applying a polyphasic approach based on culture and DNA-based techniques (Illumina sequencing and PCR-DGGE). A control cheese manufactured with the same batch of ewes' raw milk and commercial animal rennet was analyzed in parallel. Overall, a large number of bacterial taxa were identified, including spoilage, environmental and pro-technological bacteria, primarily ascribed to Lactobacillales. Thistle rennet was observed clearly to affect the early bacterial dynamics of Caciofiore della Sibilla cheese with Lactobacillus alimentarius/paralimentarius and Lactobacillus plantarum/paraplantarum/pentosus being detected in the phyllosphere of C. acanthifolia All., thistle rennet and curd obtained with thistle rennet. Other bacterial taxa, hypothetically originating from the vegetable coagulant (Enterococcus faecium, Lactobacillus brevis, Lactobacillus delbrueckii, Leuconostoc mesenteroides/pseudomesenteroides), were exclusively found in Caciofiore della Sibilla cheese by PCR-DGGE. At the end of the maturation period, Illumina sequencing demonstrated that both cheeses were dominated by Lactobacillales; however curd and cheese produced with thistle rennet were co-dominated by Lactobacillus and Leuconostoc, whereas Lactoccous prevailed in curd and cheese produced with commercial animal rennet followed by Lactobacillus. Differences in the bacterial composition between the two cheeses at the end of their maturation period were confirmed by PCR-DGGE analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  9. Role of genetic & environment risk factors in the aetiology of colorectal cancer in Malaysia.

    PubMed

    Ramzi, Nurul Hanis; Chahil, Jagdish Kaur; Lye, Say Hean; Munretnam, Khamsigan; Sahadevappa, Kavitha Itagi; Velapasamy, Sharmila; Hashim, Nikman Adli Nor; Cheah, Soon Keat; Lim, Gerard Chin Chye; Hussein, Heselynn; Haron, Mohd Roslan; Alex, Livy; Ler, Lian Wee

    2014-06-01

    Colorectal cancer (CRC) is second only to breast cancer as the leading cause of cancer-related deaths in Malaysia. In the Asia-Pacific area, it is the highest emerging gastrointestinal cancer. The aim of this study was to identify single nucleotide polymorphisms (SNPs) and environmental factors associated with CRC risk in Malaysia from a panel of cancer associated SNPs. In this case-control study, 160 Malaysian subjects were recruited, including both with CRC and controls. A total of 768 SNPs were genotyped and analyzed to distinguish risk and protective alleles. Genotyping was carried out using Illumina's BeadArray platform. Information on blood group, occupation, medical history, family history of cancer, intake of red meat and vegetables, exposure to radiation, smoking and drinking habits, etc was collected. Odds ratio (OR), 95% confidence interval (CI) were calculated. A panel of 23 SNPs significantly associated with colorectal cancer risk was identified (P<0.01). Of these, 12 SNPs increased the risk of CRC and 11 reduced the risk. Among the environmental risk factors investigated, high intake of red meat (more than 50% daily proportion) was found to be significantly associated with increased risk of CRC (OR=6.52, 95% CI :1.93-2.04, P=0.003). Two SNPs including rs2069521 and rs10046 in genes of cytochrome P450 (CYP) superfamily were found significantly associated with CRC risk. For gene-environment analysis, the A allele of rs2069521 showed a significant association with CRC risk when stratified by red meat intake. In this preliminary study, a panel of SNPs found to be significantly associated with CRC in Malaysian population, was identified. Also, red meat consumption and lack of physical exercise were risk factors for CRC, while consumption of fruits and vegetables served as protective factor.

  10. Coding Complete Genome for the Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil

    DTIC Science & Technology

    2017-05-04

    sequences for all four genome segments. We downloaded the raw Illumina sequence reads from the NCBI Short Read Archive (GenBank...MGTV genome segments through sequence similarity (BLASTN) to the published genome of Jingmen tick virus (JMTV) isolate SY84 (GenBank: KJ001579-KJ001582...2014. Standards for sequencing viral genomes in the era of high-throughput sequencing . MBio 5:e01360–14. 8. Bankevich A, Nurk S, Antipov

  11. Large-scale contamination of microbial isolate genomes by Illumina PhiX control.

    PubMed

    Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos C; Pati, Amrita

    2015-01-01

    With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.

  12. Composition and immuno-stimulatory properties of extracellular DNA from mouse gut flora.

    PubMed

    Qi, Ce; Li, Ya; Yu, Ren-Qiang; Zhou, Sheng-Li; Wang, Xing-Guo; Le, Guo-Wei; Jin, Qing-Zhe; Xiao, Hang; Sun, Jin

    2017-11-28

    To demonstrate that specific bacteria might release bacterial extracellular DNA (eDNA) to exert immunomodulatory functions in the mouse small intestine. Extracellular DNA was extracted using phosphate buffered saline with 0.5 mmol/L dithiothreitol combined with two phenol extractions. TOTO-1 iodide, a cell-impermeant and high-affinity nucleic acid stain, was used to confirm the existence of eDNA in the mucus layers of the small intestine and colon in healthy Male C57BL/6 mice. Composition difference of eDNA and intracellular DNA (iDNA) of the small intestinal mucus was studied by Illumina sequencing and terminal restriction fragment length polymorphism (T-RFLP). Stimulation of cytokine production by eDNA was studied in RAW264.7 cells in vitro . TOTO-1 iodide staining confirmed existence of eDNA in loose mucus layer of the mouse colon and thin surface mucus layer of the small intestine. Illumina sequencing analysis and T-RFLP revealed that the composition of the eDNA in the small intestinal mucus was significantly different from that of the iDNA of the small intestinal mucus bacteria. Illumina Miseq sequencing showed that the eDNA sequences came mainly from Gram-negative bacteria of Bacteroidales S24-7. By contrast, predominant bacteria of the small intestinal flora comprised Gram-positive bacteria. Both eDNA and iDNA were added to native or lipopolysaccharide-stimulated Raw267.4 macrophages, respectively. The eDNA induced significantly lower tumor necrosis factor-α/interleukin-10 (IL-10) and IL-6/IL-10 ratios than iDNA, suggesting the predominance for maintaining immune homeostasis of the gut. Our results indicated that degraded bacterial genomic DNA was mainly released by Gram-negative bacteria, especially Bacteroidales-S24-7 and Stenotrophomonas genus in gut mucus of mice. They decreased pro-inflammatory activity compared to total gut flora genomic DNA.

  13. Bacterial Pathogens and Community Composition in Advanced Sewage Treatment Systems Revealed by Metagenomics Analysis Based on High-Throughput Sequencing

    PubMed Central

    Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying

    2015-01-01

    This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416

  14. Characterization of the indigenous microflora in raw and pasteurized buffalo milk during storage at refrigeration temperature by high-throughput sequencing.

    PubMed

    Li, Ling; Renye, John A; Feng, Ling; Zeng, Qingkun; Tang, Yan; Huang, Li; Ren, Daxi; Yang, Pan

    2016-09-01

    The effect of refrigeration on bacterial communities within raw and pasteurized buffalo milk was studied using high-throughput sequencing. High-quality samples of raw buffalo milk were obtained from 3 dairy farms in the Guangxi province in southern China. Five liters of each milk sample were pasteurized (72°C; 15 s); and both raw and pasteurized milks were stored at refrigeration temperature (1-4°C) for various times with their microbial communities characterized using the Illumina Miseq platform (Novogene, Beijing, China). Results showed that both raw and pasteurized milks contained a diverse microbial population and that the populations changed over time during storage. In raw buffalo milk, Lactococcus and Streptococcus dominated the population within the first 24h; however, when stored for up to 72h the dominant bacteria were members of the Pseudomonas and Acinetobacter genera, totaling more than 60% of the community. In pasteurized buffalo milk, the microbial population shifted from a Lactococcus-dominated community (7d), to one containing more than 84% Paenibacillus by 21d of storage. To increase the shelf-life of buffalo milk and its products, raw milk needs to be refrigerated immediately after milking and throughout transport, and should be monitored for the presence of Paenibacillus. Results from this study suggest pasteurization should be performed within 24h of raw milk collection, when the number of psychrotrophic bacteria are low; however, as Paenibacillus spores are resistant to pasteurization, additional antimicrobial treatments may be required to extend shelf-life. The findings from this study are expected to aid in improving the quality and safety of raw and pasteurized buffalo milk. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Digital detection of multiple minority mutants in stool DNA for noninvasive colorectal cancer diagnosis.

    PubMed

    Deng, Lili; Qi, Zongtai; Zou, Binjie; Wu, Haiping; Huang, Huan; Kajiyama, Tomoharu; Kambara, Hideki; Zhou, Guohua

    2012-07-03

    Somatic mutations in stool DNA are quite specific to colorectal cancer (CRC), but a method being able to detect the extraordinarily low amounts of mutants is challengeable in sensitivity. We proposed a hydrogel bead-array to digitally count CRC-specific mutants in stool at a low cost. At first, multiplex amplification of targets containing multiple mutation loci of interest is carried out by a target enriched multiplex PCR (Tem-PCR), yielding the templates qualified for emulsion PCR (emPCR). Then, after immobilizing the beads from emPCR on a glass surface, the incorporation of Cy3-dUTP into the mutant-specific probes, which are specifically hybridized with the amplified beads from emPCR, is used to color the beads coated with mutants. As all amplified beads are hybridized with the Cy5-labeled universal probe, a mutation rate is readily obtained by digitally counting the beads with different colors (yellow and red). A high specificity of the method is achieved by removing the mismatched probes in a bead-array with electrophoresis. The approach has been used to simultaneously detect 8 mutation loci within the APC, TP53, and KRAS genes in stools from eight CRC patients, and 50% of CRC patients were positively diagnosed; therefore, our method can be a potential tool for the noninvasive diagnosis of CRC.

  16. Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.

    PubMed

    Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua

    2018-03-01

    More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.

  17. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    PubMed

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.

  18. Digital analysis of the expression levels of multiple colorectal cancer-related genes by multiplexed digital-PCR coupled with hydrogel bead-array.

    PubMed

    Qi, Zongtai; Ma, Yinjiao; Deng, Lili; Wu, Haiping; Zhou, Guohua; Kajiyama, Tomoharu; Kambara, Hideki

    2011-06-07

    To digitally analyze expression levels of multiple genes in one reaction, we proposed a method termed as 'MDHB' (Multiplexed Digital-PCR coupled with Hydrogel Bead-array). The template for bead-based emulsion PCR (emPCR) was prepared by reverse transcription using sequence-tagged primers. The beads recovered from emPCR were immobilized with hydrogel to form a single-bead layer on a chip, and then decoded by gene-specific probe hybridization and Cy3-dUTP based primer extension reaction. The specificity of probe hybridization was improved by using electrophoresis to remove mismatched probes on the bead's surface. The number of positive beads reflects the abundance of expressed genes; the expression levels of target genes were normalized to a housekeeping gene and expressed as the number ratio of green beads to red beads. The discrimination limit of MDHB is 0.1% (i.e., one target molecule from 1000 background molecules), and the sensitivity of the method is below 100 cells when using the β-actin gene as the detection target. We have successfully employed MDHB to detect the relative expression levels of four colorectal cancer (CRC)-related genes (c-myc, COX-2, MMP7, and DPEP1) in 8 tissue samples and 9 stool samples from CRC patients, giving the detection rates of 100% and 77%, respectively. The results suggest that MDHB could be a potential tool for early non-invasive diagnosis of CRC.

  19. Role of genetic & environment risk factors in the aetiology of colorectal cancer in Malaysia

    PubMed Central

    Ramzi, Nurul Hanis; Chahil, Jagdish Kaur; Lye, Say Hean; Munretnam, Khamsigan; Sahadevappa, Kavitha Itagi; Velapasamy, Sharmila; Hashim, Nikman Adli Nor; Cheah, Soon Keat; Lim, Gerard Chin Chye; Hussein, Heselynn; Haron, Mohd Roslan; Alex, Livy; Ler, Lian Wee

    2014-01-01

    Background & objectives: Colorectal cancer (CRC) is second only to breast cancer as the leading cause of cancer-related deaths in Malaysia. In the Asia–Pacific area, it is the highest emerging gastrointestinal cancer. The aim of this study was to identify single nucleotide polymorphisms (SNPs) and environmental factors associated with CRC risk in Malaysia from a panel of cancer associated SNPs. Methods: In this case-control study, 160 Malaysian subjects were recruited, including both with CRC and controls. A total of 768 SNPs were genotyped and analyzed to distinguish risk and protective alleles. Genotyping was carried out using Illumina's BeadArray platform. Information on blood group, occupation, medical history, family history of cancer, intake of red meat and vegetables, exposure to radiation, smoking and drinking habits, etc was collected. Odds ratio (OR), 95% confidence interval (CI) were calculated. Results: A panel of 23 SNPs significantly associated with colorectal cancer risk was identified (P<0.01). Of these, 12 SNPs increased the risk of CRC and 11 reduced the risk. Among the environmental risk factors investigated, high intake of red meat (more than 50% daily proportion) was found to be significantly associated with increased risk of CRC (OR=6.52, 95% CI :1.93 - 2.04, P=0.003). Two SNPs including rs2069521 and rs10046 in genes of cytochrome P450 (CYP) superfamily were found significantly associated with CRC risk. For gene-environment analysis, the A allele of rs2069521 showed a significant association with CRC risk when stratified by red meat intake. Interpretation & conclusions: In this preliminary study, a panel of SNPs found to be significantly associated with CRC in Malaysian population, was identified. Also, red meat consumption and lack of physical exercise were risk factors for CRC, while consumption of fruits and vegetables served as protective factor. PMID:25109722

  20. Up-Regulation of MicroRNA-190b Plays a Role for Decreased IGF-1 That Induces Insulin Resistance in Human Hepatocellular Carcinoma

    PubMed Central

    Hung, Tzu-Min; Ho, Cheng-Maw; Liu, Yen-Chun; Lee, Jia-Ling; Liao, Yow-Rong; Wu, Yao-Ming; Ho, Ming-Chih; Chen, Chien-Hung; Lai, Hong-Shiee; Lee, Po-Huang

    2014-01-01

    Background & Aims Insulin-like growth factor, (IGF)-1, is produced mainly by the liver and plays important roles in promoting growth and regulating metabolism. Previous study reported that development of hepatocellular carcinoma (HCC) was accompanied by a significant reduction in serum IGF-1 levels. Here, we hypothesized that dysregulation of microRNAs (miRNA) in HCC can modulate IGF-1 expression post-transcriptionally. Methods The miRNAs expression profiles in a dataset of 29 HCC patients were examined using illumina BeadArray. Specific miRNA (miR)-190b, which was significantly up-regulated in HCC tumor tissues when compared with paired non-tumor tissues, was among those predicted to interact with 3′-untranslated region (UTR) of IGF-1. In order to explore the regulatory effects of miR-190b on IGF-1 expression, luciferase reporter assay, quantitative real-time PCR, western blotting and immunofluorecence analysis were performed in HCC cells. Results Overexpression of miR-190b in Huh7 cells attenuated the expression of IGF-1, whereas inhibition of miR-190b resulted in up-regulation of IGF-1. Restoration of IGF-1 expression reversed miR-190b-mediated impaired insulin signaling in Huh7 cells, supporting that IGF-1 was a direct and functional target of miR-190b. Additionally, low serum IGF-1 level was associated with insulin resistance and poor overall survival in HCC patients. Conclusions Increased expression of miR-190 may cause decreased IGF-1 in HCC development. Insulin resistance appears to be a part of the physiopathologic significance of decreased IGF-1 levels in HCC progression. This study provides a novel miRNA-mediated regulatory mechanism for controlling IGF-1 expression in HCC and elucidates the biological relevance of this interaction in HCC. PMID:24586785

  1. Powerful Identification of Cis-regulatory SNPs in Human Primary Monocytes Using Allele-Specific Gene Expression

    PubMed Central

    Almlöf, Jonas Carlsson; Lundmark, Per; Lundmark, Anders; Ge, Bing; Maouche, Seraya; Göring, Harald H. H.; Liljedahl, Ulrika; Enström, Camilla; Brocheton, Jessy; Proust, Carole; Godefroy, Tiphaine; Sambrook, Jennifer G.; Jolley, Jennifer; Crisp-Hihn, Abigail; Foad, Nicola; Lloyd-Jones, Heather; Stephens, Jonathan; Gwilliam, Rhian; Rice, Catherine M.; Hengstenberg, Christian; Samani, Nilesh J.; Erdmann, Jeanette; Schunkert, Heribert; Pastinen, Tomi; Deloukas, Panos; Goodall, Alison H.; Ouwehand, Willem H.; Cambien, François; Syvänen, Ann-Christine

    2012-01-01

    A large number of genome-wide association studies have been performed during the past five years to identify associations between SNPs and human complex diseases and traits. The assignment of a functional role for the identified disease-associated SNP is not straight-forward. Genome-wide expression quantitative trait locus (eQTL) analysis is frequently used as the initial step to define a function while allele-specific gene expression (ASE) analysis has not yet gained a wide-spread use in disease mapping studies. We compared the power to identify cis-acting regulatory SNPs (cis-rSNPs) by genome-wide allele-specific gene expression (ASE) analysis with that of traditional expression quantitative trait locus (eQTL) mapping. Our study included 395 healthy blood donors for whom global gene expression profiles in circulating monocytes were determined by Illumina BeadArrays. ASE was assessed in a subset of these monocytes from 188 donors by quantitative genotyping of mRNA using a genome-wide panel of SNP markers. The performance of the two methods for detecting cis-rSNPs was evaluated by comparing associations between SNP genotypes and gene expression levels in sample sets of varying size. We found that up to 8-fold more samples are required for eQTL mapping to reach the same statistical power as that obtained by ASE analysis for the same rSNPs. The performance of ASE is insensitive to SNPs with low minor allele frequencies and detects a larger number of significantly associated rSNPs using the same sample size as eQTL mapping. An unequivocal conclusion from our comparison is that ASE analysis is more sensitive for detecting cis-rSNPs than standard eQTL mapping. Our study shows the potential of ASE mapping in tissue samples and primary cells which are difficult to obtain in large numbers. PMID:23300628

  2. De novo transcriptome assembly of 'Angeleno' and 'Lamoon' Japanese plum cultivars (Prunus salicina).

    PubMed

    González, Máximo; Maldonado, Jonathan; Salazar, Erika; Silva, Herman; Carrasco, Basilio

    2016-09-01

    Japanese plum (Prunus salicina L.) is a fruit tree of the Rosaceae family, which is an economically important stone fruit around the world. Currently, Japanese plum breeding programs combine traditional breeding and plant physiology strategies with genetic and genomic analysis. In order to understand the flavonoid pathway regulation and to develop molecular markers associated to the fuit skin color (EST-SSRs), we performed a next generation sequencing based on Illumina Hiseq2000 platform. A total of 22.4 GB and 21 GB raw data were obtained from 'Lamoon' and 'Angeleno' respectively, corresponding to 85,404,726 raw reads to 'Lamoon' and 79,781,666 to 'Angeleno'. A total of 139,775,975 reads were filtered after removing low-quality reads and trimming the adapter sequences. De novo transcriptome assembly was performed using CLC Genome Workbench software and a total of 54,584 unique contigs were generated, with an N50 of 1343 base pair (bp) and a mean length of 829 bp. This work contributed with a specific Japanese plum skin transcriptome, providing two libraries of contrasting fruit skin color phenotype (yellow and red) and increasing substantially the GB of raw data available until now for this specie.

  3. Efficient detection of differentially methylated regions using DiMmeR.

    PubMed

    Almeida, Diogo; Skov, Ida; Silva, Artur; Vandin, Fabio; Tan, Qihua; Röttger, Richard; Baumbach, Jan

    2017-02-15

    Epigenome-wide association studies (EWAS) generate big epidemiological datasets. They aim for detecting differentially methylated DNA regions that are likely to influence transcriptional gene activity and, thus, the regulation of metabolic processes. The by far most widely used technology is the Illumina Methylation BeadChip, which measures the methylation levels of 450 (850) thousand cytosines, in the CpG dinucleotide context in a set of patients compared to a control group. Many bioinformatics tools exist for raw data analysis. However, most of them require some knowledge in the programming language R, have no user interface, and do not offer all necessary steps to guide users from raw data all the way down to statistically significant differentially methylated regions (DMRs) and the associated genes. Here, we present DiMmeR (Discovery of Multiple Differentially Methylated Regions), the first free standalone software that interactively guides with a user-friendly graphical user interface (GUI) scientists the whole way through EWAS data analysis. It offers parallelized statistical methods for efficiently identifying DMRs in both Illumina 450K and 850K EPIC chip data. DiMmeR computes empirical P -values through randomization tests, even for big datasets of hundreds of patients and thousands of permutations within a few minutes on a standard desktop PC. It is independent of any third-party libraries, computes regression coefficients, P -values and empirical P -values, and it corrects for multiple testing. DiMmeR is publicly available at http://dimmer.compbio.sdu.dk . diogoma@bmb.sdu.dk. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  4. Transcriptome of the Caribbean stony coral Porites astreoides from three developmental stages.

    PubMed

    Mansour, Tamer A; Rosenthal, Joshua J C; Brown, C Titus; Roberson, Loretta M

    2016-08-02

    Porites astreoides is a ubiquitous species of coral on modern Caribbean reefs that is resistant to increasing temperatures, overfishing, and other anthropogenic impacts that have threatened most other coral species. We assembled and annotated a transcriptome from this coral using Illumina sequences from three different developmental stages collected over several years: free-swimming larvae, newly settled larvae, and adults (>10 cm in diameter). This resource will aid understanding of coral calcification, larval settlement, and host-symbiont interactions. A de novo transcriptome for the P. astreoides holobiont (coral plus algal symbiont) was assembled using 594 Mbp of raw Illumina sequencing data generated from five age-specific cDNA libraries. The new transcriptome consists of 867 255 transcript elements with an average length of 685 bases. The isolated P. astreoides assembly consists of 129 718 transcript elements with an average length of 811 bases, and the isolated Symbiodinium sp. assembly had 186 177 transcript elements with an average length of 1105 bases. This contribution to coral transcriptome data provides a valuable resource for researchers studying the ontogeny of gene expression patterns within both the coral and its dinoflagellate symbiont.

  5. Comparative analysis of the microbial communities in raw milk produced in different regions of Korea.

    PubMed

    Kim, In Seon; Hur, Yoo Kyung; Kim, Eun Ji; Ahn, Young-Tae; Kim, Jong Geun; Choi, Yun-Jaie; Huh, Chul Sung

    2017-11-01

    The control of psychrotrophic bacteria causing milk spoilage and illness due to toxic compounds is an important issue in the dairy industry. In South Korea, Gangwon-do province is one of the coldest terrains in which eighty percent of the area is mountainous regions, and mainly plays an important role in the agriculture and dairy industries. The purposes of this study were to analyze the indigenous microbiota of raw milk in Gangwon-do and accurately investigate a putative microbial group causing deterioration in milk quality. We collected raw milk from the bulk tank of 18 dairy farms in the Hoengseong and Pyeongchang regions of Gangwon-do. Milk components were analyzed and the number of viable bacteria was confirmed. The V3 and V4 regions of 16S rRNA gene were amplified and sequenced on an Illumina Miseq platform. Sequences were then assigned to operational taxonomic units, followed by the selection of representative sequences using the QIIME software package. The milk samples from Pyeongchang were higher in fat, protein, lactose, total solid, and solid non-fat, and bacterial cell counts were observed only for the Hoengseong samples. The phylum Proteobacteria was detected most frequently in both the Hoengseong and Pyeongchang samples, followed by the phyla Firmicutes and Actinobacteria. Notably, Corynebacterium, Pediococcus, Macrococcus, and Acinetobacter were significantly different from two regions. Although the predominant phylum in raw milk is same, the abundances of major genera in milk samples were different between Hoengseong and Pyeongchang. We assumed that these differences are caused by regional dissimilar farming environments such as soil, forage, and dairy farming equipment so that the quality of milk raw milk from Pyeongchang is higher than that of Hoengseong. These results could provide the crucial information for identifying the microbiota in raw milk of South Korea.

  6. De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum)

    PubMed Central

    Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong

    2015-01-01

    Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper. PMID:26121657

  7. De Novo Assembly and Characterization of Fruit Transcriptome in Black Pepper (Piper nigrum).

    PubMed

    Hu, Lisong; Hao, Chaoyun; Fan, Rui; Wu, Baoduo; Tan, Lehe; Wu, Huasong

    2015-01-01

    Black pepper is one of the most popular and oldest spices in the world and valued for its pungent constituent alkaloids. Pinerine is the main bioactive compound in pepper alkaloids, which perform unique physiological functions. However, the mechanisms of piperine synthesis are poorly understood. This study is the first to describe the fruit transcriptome of black pepper by sequencing on Illumina HiSeq 2000 platform. A total of 56,281,710 raw reads were obtained and assembled. From these raw reads, 44,061 unigenes with an average length of 1,345 nt were generated. During functional annotation, 40,537 unigenes were annotated in Gene Ontology categories, Kyoto Encyclopedia of Genes and Genomes pathways, Swiss-Prot database, and Nucleotide Collection (NR/NT) database. In addition, 8,196 simple sequence repeats (SSRs) were detected. In a detailed analysis of the transcriptome, housekeeping genes for quantitative polymerase chain reaction internal control, polymorphic SSRs, and lysine/ornithine metabolism-related genes were identified. These results validated the availability of our database. Our study could provide useful data for further research on piperine synthesis in black pepper.

  8. A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana.

    PubMed

    Nowell, Reuben W; Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J; Wheat, Christopher W; Saastamoinen, Marjo; Saccheri, Ilik J; Van't Hof, Arjen E; Wasik, Bethany R; Connahs, Heidi; Aslam, Muhammad L; Kumar, Sujai; Challis, Richard J; Monteiro, Antónia; Brakefield, Paul M; Blaxter, Mark

    2017-07-01

    The mycalesine butterfly Bicyclus anynana, the "Squinting bush brown," is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). © The Authors 2017. Published by Oxford University Press.

  9. A high-coverage draft genome of the mycalesine butterfly Bicyclus anynana

    PubMed Central

    Elsworth, Ben; Oostra, Vicencio; Zwaan, Bas J.; Wheat, Christopher W.; Saastamoinen, Marjo; Saccheri, Ilik J.; van’t Hof, Arjen E.; Wasik, Bethany R.; Connahs, Heidi; Aslam, Muhammad L.; Kumar, Sujai; Challis, Richard J.; Monteiro, Antónia; Brakefield, Paul M.

    2017-01-01

    Abstract The mycalesine butterfly Bicyclus anynana, the “Squinting bush brown,” is a model organism in the study of lepidopteran ecology, development, and evolution. Here, we present a draft genome sequence for B. anynana to serve as a genomics resource for current and future studies of this important model species. Seven libraries with insert sizes ranging from 350 bp to 20 kb were constructed using DNA from an inbred female and sequenced using both Illumina and PacBio technology; 128 Gb of raw Illumina data was filtered to 124 Gb and assembled to a final size of 475 Mb (∼×260 assembly coverage). Contigs were scaffolded using mate-pair, transcriptome, and PacBio data into 10 800 sequences with an N50 of 638 kb (longest scaffold 5 Mb). The genome is comprised of 26% repetitive elements and encodes a total of 22 642 predicted protein-coding genes. Recovery of a BUSCO set of core metazoan genes was almost complete (98%). Overall, these metrics compare well with other recently published lepidopteran genomes. We report a high-quality draft genome sequence for Bicyclus anynana. The genome assembly and annotated gene models are available at LepBase (http://ensembl.lepbase.org/index.html). PMID:28486658

  10. NG6: Integrated next generation sequencing storage and processing environment.

    PubMed

    Mariette, Jérôme; Escudié, Frédéric; Allias, Nicolas; Salin, Gérald; Noirot, Céline; Thomas, Sylvain; Klopp, Christophe

    2012-09-09

    Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads. We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine. NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.

  11. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

    PubMed

    Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-07-01

    Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.

  12. Identification of an Epigenetic Signature of Osteoporosis in Blood DNA of Post-menopausal Women.

    PubMed

    Cheishvili, David; Parashar, Surabhi; Mahmood, Niaz; Arakelian, Ani; Kremer, Richard; Goltzman, David; Szyf, Moshe; Rabbani, Shafaat A

    2018-06-20

    Osteoporosis is one of the most common age-related progressive bone diseases in elderly people. Approximately one in three women and one in five men are predisposed to developing OP. In postmenopausal women a reduction in bone mineral density (BMD) leads to an increased risk of fractures. In the current study we delineated the DNA methylation signatures in whole blood samples of postmenopausal osteoporotic women. We obtained whole blood DNA from 22 normal women and 22 postmenopausal osteoporotic women (51-89 years) from the Canadian Multicenter Osteoporosis Study (CaMos) cohort. These DNA samples were subjected to Illumina Infinium Human Methylation 450 K analysis. Illumina 450K raw data was analyzed by Genome Studio software. Analysis of the female participants with early and advanced osteoporosis resulted in the generation of a list of 1233 differentially methylated CpG sites when compared with age matched normal females. T-test, ANOVA and post-hoc statistical analyses were performed and 77 significantly differentially methylated CpG sites were identified. From the 13 most significant genes, ZNF267, ABLIM2, RHOJ, CDKL5, PDCD1 were selected for their potential role in bone biology. A weighted polygenic DNA methylation score of these genes predicted osteoporosis at an early stage with high sensitivity and specificity and correlated with measures of bone density. Pyrosequencing analysis of these genes was performed to validate the results obtained from Illumina 450 K methylation analysis. The current study provides proof of principal for the role of DNA methylation in osteoporosis. Using whole blood DNA methylation analysis, women at risk of developing osteoporosis can be identified before a diagnosis of osteoporosis is made using BMD as a screening method. Early diagnosis will help to select patients that might benefit from early therapeutic intervention. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

  13. The fungal composition of natural biofinishes on oil-treated wood.

    PubMed

    van Nieuwenhuijzen, Elke J; Houbraken, Jos A M P; Punt, Peter J; Roeselers, Guus; Adan, Olaf C G; Samson, Robert A

    2017-01-01

    Biofinished wood is considered to be a decorative and protective material for outdoor constructions, showing advantages compared to traditional treated wood in terms of sustainability and self-repair. Natural dark wood staining fungi are essential to biofinish formation on wood. Although all sorts of outdoor situated timber are subjected to fungal staining, the homogenous dark staining called biofinish has only been detected on specific vegetable oil-treated substrates. Revealing the fungal composition of various natural biofinishes on wood is a first step to understand and control biofinish formation for industrial application. A culture-based survey of fungi in natural biofinishes on oil-treated wood samples showed the common wood stain fungus Aureobasidium and the recently described genus Superstratomyces to be predominant constituents. A culture-independent approach, based on amplification of the internal transcribed spacer regions, cloning and Sanger sequencing, resulted in clone libraries of two types of biofinishes. Aureobasidium was present in both biofinish types, but was only predominant in biofinishes on pine sapwood treated with raw linseed oil. Most cloned sequences of the other biofinish type (pine sapwood treated with olive oil) could not be identified. In addition, a more in-depth overview of the fungal composition of biofinishes was obtained with Illumina amplicon sequencing that targeted the internal transcribed spacer region 1. All investigated samples, that varied in wood species, (oil) treatments and exposure times, contained Aureobasidium and this genus was predominant in the biofinishes on pine sapwood treated with raw linseed oil. Lapidomyces was the predominant genus in most of the other biofinishes and present in all other samples. Surprisingly, Superstratomyces , which was predominantly detected by the cultivation-based approach, could not be found with the Illumina sequencing approach, while Lapidomyces was not detected in the culture-based approach. Overall, the culture-based approach and two culture-independent methods that were used in this study revealed that natural biofinishes were composed of multiple fungal genera always containing the common wood staining mould Aureobasidium . Besides Aureobasidium , the use of other fungal genera for the production of biofinished wood has to be considered.

  14. De novo assembly and annotation of the Antarctic copepod (Tigriopus kingsejongensis) transcriptome.

    PubMed

    Kim, Hui-Su; Lee, Bo-Young; Han, Jeonghoon; Lee, Young Hwan; Min, Gi-Sik; Kim, Sanghee; Lee, Jae-Seong

    2016-08-01

    The whole transcriptome of the Antarctic copepod (Tigriopus kingsejongensis) was sequenced using Illumina RNA-seq. De novo assembly was performed with 64,785,098 raw reads using Trinity, which assembled into 81,653 contigs. TransDecoder found 38,250 candidate coding contigs which showed homology to other species by BLAST analysis. Functional gene annotation was performed by Gene Ontology (GO), InterProScan, and KEGG pathway analyses. Finally, we identified a number of expressed gene catalog for T. kingsejongensis that is a useful model animal for gene information-based polar research to uncover molecular mechanisms of environmental adaptation on harsh environments. In particular, we observed highly developing lipid metabolism in T. kingsejongensis directly compared to those of the Far East Pacific coast copepod Tigriopus japonicus at the transcriptome level. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Improving cell mixture deconvolution by identifying optimal DNA methylation libraries (IDOL).

    PubMed

    Koestler, Devin C; Jones, Meaghan J; Usset, Joseph; Christensen, Brock C; Butler, Rondi A; Kobor, Michael S; Wiencke, John K; Kelsey, Karl T

    2016-03-08

    Confounding due to cellular heterogeneity represents one of the foremost challenges currently facing Epigenome-Wide Association Studies (EWAS). Statistical methods leveraging the tissue-specificity of DNA methylation for deconvoluting the cellular mixture of heterogenous biospecimens offer a promising solution, however the performance of such methods depends entirely on the library of methylation markers being used for deconvolution. Here, we introduce a novel algorithm for Identifying Optimal Libraries (IDOL) that dynamically scans a candidate set of cell-specific methylation markers to find libraries that optimize the accuracy of cell fraction estimates obtained from cell mixture deconvolution. Application of IDOL to training set consisting of samples with both whole-blood DNA methylation data (Illumina HumanMethylation450 BeadArray (HM450)) and flow cytometry measurements of cell composition revealed an optimized library comprised of 300 CpG sites. When compared existing libraries, the library identified by IDOL demonstrated significantly better overall discrimination of the entire immune cell landscape (p = 0.038), and resulted in improved discrimination of 14 out of the 15 pairs of leukocyte subtypes. Estimates of cell composition across the samples in the training set using the IDOL library were highly correlated with their respective flow cytometry measurements, with all cell-specific R (2)>0.99 and root mean square errors (RMSEs) ranging from [0.97 % to 1.33 %] across leukocyte subtypes. Independent validation of the optimized IDOL library using two additional HM450 data sets showed similarly strong prediction performance, with all cell-specific R (2)>0.90 and R M S E<4.00 %. In simulation studies, adjustments for cell composition using the IDOL library resulted in uniformly lower false positive rates compared to competing libraries, while also demonstrating an improved capacity to explain epigenome-wide variation in DNA methylation within two large publicly available HM450 data sets. Despite consisting of half as many CpGs compared to existing libraries for whole blood mixture deconvolution, the optimized IDOL library identified herein resulted in outstanding prediction performance across all considered data sets and demonstrated potential to improve the operating characteristics of EWAS involving adjustments for cell distribution. In addition to providing the EWAS community with an optimized library for whole blood mixture deconvolution, our work establishes a systematic and generalizable framework for the assembly of libraries that improve the accuracy of cell mixture deconvolution.

  16. Identification of neglected cestode Taenia multiceps microRNAs by illumina sequencing and bioinformatic analysis

    PubMed Central

    2013-01-01

    Background Worldwide, but especially in developing countries, coenurosis of sheep and other livestock is caused by Taenia multiceps larvae, and zoonotic infections occur in humans. Infections frequently lead to host death, resulting in huge socioeconomic losses. MicroRNAs (miRNAs) have important roles in the post-transcriptional regulation of a large number of animal genes by imperfectly binding target mRNAs. To date, there have been no reports of miRNAs in T. multiceps. Results In this study, we obtained 12.8 million high quality raw reads from adult T. multiceps small RNA library using Illumina sequencing technology. A total of 796 conserved miRNA families (containing 1,006 miRNAs) from 170,888 unique miRNAs were characterized using miRBase (Release 17.0). Here, we selected three conserved miRNA/miRNA* (antisense strand) duplexes at random and amplified their corresponding precursors using a PCR-based method. Furthermore, 20 candidate novel miRNA precursors were verified by genomic PCR. Among these, six corresponding T. multiceps miRNAs are considered specific for Taeniidae because no homologs were found in other species annotated in miRBase. In addition, 181,077 target sites within T. multiceps transcriptome were predicted for 20 candidate newly miRNAs. Conclusions Our large-scale investigation of miRNAs in adult T. multiceps provides a substantial platform for improving our understanding of the molecular regulation of T. multiceps and other cestodes development. PMID:23941076

  17. nuID: a universal naming scheme of oligonucleotides for Illumina, Affymetrix, and other microarrays

    PubMed Central

    Du, Pan; Kibbe, Warren A; Lin, Simon M

    2007-01-01

    Background Oligonucleotide probes that are sequence identical may have different identifiers between manufacturers and even between different versions of the same company's microarray; and sometimes the same identifier is reused and represents a completely different oligonucleotide, resulting in ambiguity and potentially mis-identification of the genes hybridizing to that probe. Results We have devised a unique, non-degenerate encoding scheme that can be used as a universal representation to identify an oligonucleotide across manufacturers. We have named the encoded representation 'nuID', for nucleotide universal identifier. Inspired by the fact that the raw sequence of the oligonucleotide is the true definition of identity for a probe, the encoding algorithm uniquely and non-degenerately transforms the sequence itself into a compact identifier (a lossless compression). In addition, we added a redundancy check (checksum) to validate the integrity of the identifier. These two steps, encoding plus checksum, result in an nuID, which is a unique, non-degenerate, permanent, robust and efficient representation of the probe sequence. For commercial applications that require the sequence identity to be confidential, we have an encryption schema for nuID. We demonstrate the utility of nuIDs for the annotation of Illumina microarrays, and we believe it has universal applicability as a source-independent naming convention for oligomers. Reviewers This article was reviewed by Itai Yanai, Rong Chen (nominated by Mark Gerstein), and Gregory Schuler (nominated by David Lipman). PMID:17540033

  18. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  19. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely available under a GNU General Public License version 3.0 (GPLv3) at https://github.com/tadkeys/tabsat/ and http://demo.platomics.com/. PMID:27467908

  20. Rapid Threat Organism Recognition Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, Kelly P.; Solberg, Owen D.; Schoeniger, Joseph S.

    2013-05-07

    The RAPTOR computational pipeline identifies microbial nucleic acid sequences present in sequence data from clinical samples. It takes as input raw short-read genomic sequence data (in particular, the type generated by the Illumina sequencing platforms) and outputs taxonomic evaluation of detected microbes in various human-readable formats. This software was designed to assist in the diagnosis or characterization of infectious disease, by detecting pathogen sequences in nucleic acid sequence data from clinical samples. It has also been applied in the detection of algal pathogens, when algal biofuel ponds became unproductive. RAPTOR first trims and filters genomic sequence reads based on qualitymore » and related considerations, then performs a quick alignment to the human (or other host) genome to filter out host sequences, then performs a deeper search against microbial genomes. Alignment to a protein sequence database is optional. Alignment results are summarized and placed in a taxonomic framework using the Lowest Common Ancestor algorithm.« less

  1. TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

    PubMed

    Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud

    2011-09-01

    Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.

  2. SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses.

    PubMed

    Vetrovský, Tomáš; Baldrian, Petr; Morais, Daniel; Berger, Bonnie

    2018-02-14

    Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use. We created SEED 2, a graphical user interface for handling high-throughput amplicon-sequencing data under Windows operating systems. SEED 2 is the only sequence visualizer that empowers users with tools to handle amplicon-sequencing data of microbial community markers. It is suitable for any marker genes sequences obtained through Illumina, IonTorrent or Sanger sequencing. SEED 2 allows the user to process raw sequencing data, identify specific taxa, produce of OTU-tables, create sequence alignments and construct phylogenetic trees. Standard dual core laptops with 8 GB of RAM can handle ca. 8 million of Illumina PE 300 bp sequences, ca. 4GB of data. SEED 2 was implemented in Object Pascal and uses internal functions and external software for amplicon data processing. SEED 2 is a freeware software, available at http://www.biomed.cas.cz/mbu/lbwrf/seed/ as a self-contained file, including all the dependencies, and does not require installation. Supplementary data contain a comprehensive list of supported functions. daniel.morais@biomed.cas.cz. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  3. De novo assembly and transcriptome analysis of the rubber tree (Hevea brasiliensis) and SNP markers development for rubber biosynthesis pathways.

    PubMed

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.

  4. De Novo Transcriptome Analysis of an Aerial Microalga Trentepohlia jolithus: Pathway Description and Gene Discovery for Carbon Fixation and Carotenoid Biosynthesis

    PubMed Central

    Li, Qianqian; Liu, Jianguo; Zhang, Litao; Liu, Qian

    2014-01-01

    Background Algae in the order Trentepohliales have a broad geographic distribution and are generally characterized by the presence of abundant β-carotene. The many monographs published to date have mainly focused on their morphology, taxonomy, phylogeny, distribution and reproduction; molecular studies of this order are still rare. High-throughput RNA sequencing (RNA-Seq) technology provides a powerful and efficient method for transcript analysis and gene discovery in Trentepohlia jolithus. Methods/Principal Findings Illumina HiSeq 2000 sequencing generated 55,007,830 Illumina PE raw reads, which were assembled into 41,328 assembled unigenes. Based on NR annotation, 53.28% of the unigenes (22,018) could be assigned to gene ontology classes with 54 subcategories and 161,451 functional terms. A total of 26,217 (63.44%) assembled unigenes were mapped to 128 KEGG pathways. Furthermore, a set of 5,798 SSRs in 5,206 unigenes and 131,478 putative SNPs were identified. Moreover, the fact that all of the C4 photosynthesis genes exist in T. jolithus suggests a complex carbon acquisition and fixation system. Similarities and differences between T. jolithus and other algae in carotenoid biosynthesis are also described in depth. Conclusions/Significance This is the first broad transcriptome survey for T. jolithus, increasing the amount of molecular data available for the class Ulvophyceae. As well as providing resources for functional genomics studies, the functional genes and putative pathways identified here will contribute to a better understanding of carbon fixation and fatty acid and carotenoid biosynthesis in T. jolithus. PMID:25254555

  5. CIDR

    Science.gov Websites

    they have high Illumina design scores or have worked in other experiments. For Golden Gate experiments design files returned by Illumina. Efficient SNP selection: The most efficient way to select your SNPs is to get Illumina design scores on all of your possible SNPs prior to narrowing down your list. You can

  6. Monitoring Error Rates In Illumina Sequencing.

    PubMed

    Manley, Leigh J; Ma, Duanduan; Levine, Stuart S

    2016-12-01

    Guaranteeing high-quality next-generation sequencing data in a rapidly changing environment is an ongoing challenge. The introduction of the Illumina NextSeq 500 and the depreciation of specific metrics from Illumina's Sequencing Analysis Viewer (SAV; Illumina, San Diego, CA, USA) have made it more difficult to determine directly the baseline error rate of sequencing runs. To improve our ability to measure base quality, we have created an open-source tool to construct the Percent Perfect Reads (PPR) plot, previously provided by the Illumina sequencers. The PPR program is compatible with HiSeq 2000/2500, MiSeq, and NextSeq 500 instruments and provides an alternative to Illumina's quality value (Q) scores for determining run quality. Whereas Q scores are representative of run quality, they are often overestimated and are sourced from different look-up tables for each platform. The PPR's unique capabilities as a cross-instrument comparison device, as a troubleshooting tool, and as a tool for monitoring instrument performance can provide an increase in clarity over SAV metrics that is often crucial for maintaining instrument health. These capabilities are highlighted.

  7. Analysis of the Transcriptome of Erigeron breviscapus Uncovers Putative Scutellarin and Chlorogenic Acids Biosynthetic Genes and Genetic Markers

    PubMed Central

    Zhang, Jia-Jin; Shu, Li-Ping; Zhang, Wei; Long, Guang-Qiang; Liu, Tao; Meng, Zheng-Gui; Chen, Jun-Wen; Yang, Sheng-Chao

    2014-01-01

    Background Erigeron breviscapus (Vant.) Hand-Mazz. is a famous medicinal plant. Scutellarin and chlorogenic acids are the primary active components in this herb. However, the mechanisms of biosynthesis and regulation for scutellarin and chlorogenic acids in E. breviscapus are considerably unknown. In addition, genomic information of this herb is also unavailable. Principal Findings Using Illumina sequencing on GAIIx platform, a total of 64,605,972 raw sequencing reads were generated and assembled into 73,092 non-redundant unigenes. Among them, 44,855 unigenes (61.37%) were annotated in the public databases Nr, Swiss-Prot, KEGG, and COG. The transcripts encoding the known enzymes involved in flavonoids and in chlorogenic acids biosynthesis were discovered in the Illumina dataset. Three candidate cytochrome P450 genes were discovered which might encode flavone 6-hydroase converting apigenin to scutellarein. Furthermore, 4 unigenes encoding the homologues of maize P1 (R2R3-MYB transcription factors) were defined, which might regulate the biosynthesis of scutellarin. Additionally, a total of 11,077 simple sequence repeat (SSR) were identified from 9,255 unigenes. Of SSRs, tri-nucleotide motifs were the most abundant motif. Thirty-six primer pairs for SSRs were randomly selected for validation of the amplification and polymorphism. The result revealed that 34 (94.40%) primer pairs were successfully amplified and 19 (52.78%) primer pairs exhibited polymorphisms. Conclusion Using next generation sequencing (NGS) technology, this study firstly provides abundant genomic data for E. breviscapus. The candidate genes involved in the biosynthesis and transcriptional regulation of scutellarin and chlorogenic acids were obtained in this study. Additionally, a plenty of genetic makers were generated by identification of SSRs, which is a powerful tool for molecular breeding and genetics applications in this herb. PMID:24956277

  8. De Novo Assembly and Transcriptome Analysis of the Rubber Tree (Hevea brasiliensis) and SNP Markers Development for Rubber Biosynthesis Pathways

    PubMed Central

    Mantello, Camila Campos; Cardoso-Silva, Claudio Benicio; da Silva, Carla Cristina; de Souza, Livia Moura; Scaloppi Junior, Erivaldo José; de Souza Gonçalves, Paulo; Vicentini, Renato; de Souza, Anete Pereira

    2014-01-01

    Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection. PMID:25048025

  9. Analysis of the transcriptome of Erigeron breviscapus uncovers putative scutellarin and chlorogenic acids biosynthetic genes and genetic markers.

    PubMed

    Jiang, Ni-Hao; Zhang, Guang-Hui; Zhang, Jia-Jin; Shu, Li-Ping; Zhang, Wei; Long, Guang-Qiang; Liu, Tao; Meng, Zheng-Gui; Chen, Jun-Wen; Yang, Sheng-Chao

    2014-01-01

    Erigeron breviscapus (Vant.) Hand-Mazz. is a famous medicinal plant. Scutellarin and chlorogenic acids are the primary active components in this herb. However, the mechanisms of biosynthesis and regulation for scutellarin and chlorogenic acids in E. breviscapus are considerably unknown. In addition, genomic information of this herb is also unavailable. Using Illumina sequencing on GAIIx platform, a total of 64,605,972 raw sequencing reads were generated and assembled into 73,092 non-redundant unigenes. Among them, 44,855 unigenes (61.37%) were annotated in the public databases Nr, Swiss-Prot, KEGG, and COG. The transcripts encoding the known enzymes involved in flavonoids and in chlorogenic acids biosynthesis were discovered in the Illumina dataset. Three candidate cytochrome P450 genes were discovered which might encode flavone 6-hydroase converting apigenin to scutellarein. Furthermore, 4 unigenes encoding the homologues of maize P1 (R2R3-MYB transcription factors) were defined, which might regulate the biosynthesis of scutellarin. Additionally, a total of 11,077 simple sequence repeat (SSR) were identified from 9,255 unigenes. Of SSRs, tri-nucleotide motifs were the most abundant motif. Thirty-six primer pairs for SSRs were randomly selected for validation of the amplification and polymorphism. The result revealed that 34 (94.40%) primer pairs were successfully amplified and 19 (52.78%) primer pairs exhibited polymorphisms. Using next generation sequencing (NGS) technology, this study firstly provides abundant genomic data for E. breviscapus. The candidate genes involved in the biosynthesis and transcriptional regulation of scutellarin and chlorogenic acids were obtained in this study. Additionally, a plenty of genetic makers were generated by identification of SSRs, which is a powerful tool for molecular breeding and genetics applications in this herb.

  10. Arkas: Rapid reproducible RNAseq analysis

    PubMed Central

    Colombo, Anthony R.; J. Triche Jr, Timothy; Ramsingh, Giridharan

    2017-01-01

    The recently introduced Kallisto pseudoaligner has radically simplified the quantification of transcripts in RNA-sequencing experiments.  We offer cloud-scale RNAseq pipelines Arkas-Quantification, and Arkas-Analysis available within Illumina’s BaseSpace cloud application platform which expedites Kallisto preparatory routines, reliably calculates differential expression, and performs gene-set enrichment of REACTOME pathways .  Due to inherit inefficiencies of scale, Illumina's BaseSpace computing platform offers a massively parallel distributive environment improving data management services and data importing.   Arkas-Quantification deploys Kallisto for parallel cloud computations and is conveniently integrated downstream from the BaseSpace Sequence Read Archive (SRA) import/conversion application titled SRA Import.  Arkas-Analysis annotates the Kallisto results by extracting structured information directly from source FASTA files with per-contig metadata, calculates the differential expression and gene-set enrichment analysis on both coding genes and transcripts. The Arkas cloud pipeline supports ENSEMBL transcriptomes and can be used downstream from the SRA Import facilitating raw sequencing importing, SRA FASTQ conversion, RNA quantification and analysis steps. PMID:28868134

  11. Fungal community and cellulose-degrading genes in the composting process of Chinese medicinal herbal residues.

    PubMed

    Tian, Xueping; Yang, Tao; He, Jingzhong; Chu, Qian; Jia, Xiaojun; Huang, Jun

    2017-10-01

    The fungal community and the population of 16S rRNA, 18S rRNA and cellulose-degrading genes during the 30-day composting process of Chinese medicinal herbal residues were investigated using Illumina MiSeq and quantitative real-time PCR. An obvious succession of fungal communities occurred during the composting process. Unidentified fungi predominated in the raw materials. As composting progressed, Ascomycota became the most dominant phylum, with Aspergillus being the most dominant genus, and Aspergillus fumigatus making up 99.65% of that genus. Because of the inoculation of cellulolytic fungi in the mature stage, the cellulose degradation rate in inoculation groups was faster and the relative abundances of Aspergillus and the glycoside hydrolase family 7 genes were significantly higher than those in the control groups. These indicated that the fungal inoculants facilitated the degradation of cellulose, increased cellulolytic fungi and optimized the community structure. Copyright © 2017. Published by Elsevier Ltd.

  12. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing.

    PubMed

    Hargreaves, Adam D; Mulley, John F

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0-2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5' and 3' UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species.

  13. Assessing the utility of the Oxford Nanopore MinION for snake venom gland cDNA sequencing

    PubMed Central

    Hargreaves, Adam D.

    2015-01-01

    Portable DNA sequencers such as the Oxford Nanopore MinION device have the potential to be truly disruptive technologies, facilitating new approaches and analyses and, in some cases, taking sequencing out of the lab and into the field. However, the capabilities of these technologies are still being revealed. Here we show that single-molecule cDNA sequencing using the MinION accurately characterises venom toxin-encoding genes in the painted saw-scaled viper, Echis coloratus. We find the raw sequencing error rate to be around 12%, improved to 0–2% with hybrid error correction and 3% with de novo error correction. Our corrected data provides full coding sequences and 5′ and 3′ UTRs for 29 of 33 candidate venom toxins detected, far superior to Illumina data (13/40 complete) and Sanger-based ESTs (15/29). We suggest that, should the current pace of improvement continue, the MinION will become the default approach for cDNA sequencing in a variety of species. PMID:26623194

  14. De novo transcriptome analysis and microsatellite marker development for population genetic study of a serious insect pest, Rhopalosiphum padi (L.) (Hemiptera: Aphididae).

    PubMed

    Duan, Xinle; Wang, Kang; Su, Sha; Tian, Ruizheng; Li, Yuting; Chen, Maohua

    2017-01-01

    The bird cherry-oat aphid, Rhopalosiphum padi (L.), is one of the most abundant aphid pests of cereals and has a global distribution. Next-generation sequencing (NGS) is a rapid and efficient method for developing molecular markers. However, transcriptomic and genomic resources of R. padi have not been investigated. In this study, we used transcriptome information obtained by RNA-Seq to develop polymorphic microsatellites for investigating population genetics in this species. The transcriptome of R. padi was sequenced on an Illumina HiSeq 2000 platform. A total of 114.4 million raw reads with a GC content of 40.03% was generated. The raw reads were cleaned and assembled into 29,467 unigenes with an N50 length of 1,580 bp. Using several public databases, 82.47% of these unigenes were annotated. Of the annotated unigenes, 8,022 were assigned to COG pathways, 9,895 were assigned to GO pathways, and 14,586 were mapped to 257 KEGG pathways. A total of 7,936 potential microsatellites were identified in 5,564 unigenes, 60 of which were selected randomly and amplified using specific primer pairs. Fourteen loci were found to be polymorphic in the four R. padi populations. The transcriptomic data presented herein will facilitate gene discovery, gene analyses, and development of molecular markers for future studies of R. padi and other closely related aphid species.

  15. An 18S rRNA Workflow for Characterizing Protists in Sewage, with a Focus on Zoonotic Trichomonads.

    PubMed

    Maritz, Julia M; Rogers, Krysta H; Rock, Tara M; Liu, Nicole; Joseph, Susan; Land, Kirkwood M; Carlton, Jane M

    2017-11-01

    Microbial eukaryotes (protists) are important components of terrestrial and aquatic environments, as well as animal and human microbiomes. Their relationships with metazoa range from mutualistic to parasitic and zoonotic (i.e., transmissible between humans and animals). Despite their ecological importance, our knowledge of protists in urban environments lags behind that of bacteria, largely due to a lack of experimentally validated high-throughput protocols that produce accurate estimates of protist diversity while minimizing non-protist DNA representation. We optimized protocols for detecting zoonotic protists in raw sewage samples, with a focus on trichomonad taxa. First, we investigated the utility of two commonly used variable regions of the 18S rRNA marker gene, V4 and V9, by amplifying and Sanger sequencing 23 different eukaryotic species, including 16 protist species such as Cryptosporidium parvum, Giardia intestinalis, Toxoplasma gondii, and species of trichomonad. Next, we optimized wet-lab methods for sample processing and Illumina sequencing of both regions from raw sewage collected from a private apartment building in New York City. Our results show that both regions are effective at identifying several zoonotic protists that may be present in sewage. A combination of small extractions (1 mL volumes) performed on the same day as sample collection, and the incorporation of a vertebrate blocking primer, is ideal to detect protist taxa of interest and combat the effects of metazoan DNA. We expect that the robust, standardized methods presented in our workflow will be applicable to investigations of protists in other environmental samples, and will help facilitate large-scale investigations of protistan diversity.

  16. Transcriptomic Analysis of Multipurpose Timber Yielding Tree Neolamarckia cadamba during Xylogenesis Using RNA-Seq.

    PubMed

    Ouyang, Kunxi; Li, Juncheng; Zhao, Xianhai; Que, Qingmin; Li, Pei; Huang, Hao; Deng, Xiaomei; Singh, Sunil Kumar; Wu, Ai-Min; Chen, Xiaoyang

    2016-01-01

    Neolamarckia cadamba is a fast-growing tropical hardwood tree that is used extensively for plywood and pulp production, light furniture fabrication, building materials, and as a raw material for the preparation of certain indigenous medicines. Lack of genomic resources hampers progress in the molecular breeding and genetic improvement of this multipurpose tree species. In this study, transcriptome profiling of differentiating stems was performed to understand N. cadamba xylogenesis. The N. cadamba transcriptome was sequenced using Illumina paired-end sequencing technology. This generated 42.49 G of raw data that was then de novo assembled into 55,432 UniGenes with a mean length of 803.2bp. Approximately 47.8% of the UniGenes (26,487) were annotated against publically available protein databases, among which 21,699 and 7,754 UniGenes were assigned to Gene Ontology categories (GO) and Clusters of Orthologous Groups (COG), respectively. 5,589 UniGenes could be mapped onto 116 pathways using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. Among 6,202 UniGenes exhibiting differential expression during xylogenesis, 1,634 showed significantly higher levels of expression in the basal and middle stem segments compared to the apical stem segment. These genes included NAC and MYB transcription factors related to secondary cell wall biosynthesis, genes related to most metabolic steps of lignin biosynthesis, and CesA genes involved in cellulose biosynthesis. This study lays the foundation for further screening of key genes associated with xylogenesis in N. cadamba as well as enhancing our understanding of the mechanism of xylogenesis in fast-growing trees.

  17. Identification of 28 cytochrome P450 genes from the transcriptome of the marine rotifer Brachionus plicatilis and analysis of their expression.

    PubMed

    Kim, Hui-Su; Han, Jeonghoon; Kim, Hee-Jin; Hagiwara, Atsushi; Lee, Jae-Seong

    2017-09-01

    Whole transcriptomes of the rotifer Brachionus plicatilis were analyzed using an Illumina sequencer. De novo assembly was performed with 49,122,780 raw reads using Trinity software. Among the assembled 42,820 contigs, 27,437 putative open reading frame contigs were identified (average length 1235bp; N50=1707bp). Functional gene annotation with Gene Ontology and InterProScan, in addition to Kyoto Encyclopedia of Genes and Genomes pathway analysis, highlighted the metabolism of xenobiotics by cytochrome P450 (CYP). In addition, 28 CYP genes were identified, and their transcriptional responses to benzo[α]pyrene (B[α]P) were investigated. Most of the CYPs were significantly upregulated or downregulated (P<0.05) in response to B[α]P, suggesting that Bp-CYP genes play a crucial role in detoxification mechanisms in response to xenobiotics. This study sheds light on the molecular defense mechanisms of the rotifer B. plicatilis in response to exposure to various chemicals. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Atropos: specific, sensitive, and speedy trimming of sequencing reads.

    PubMed

    Didion, John P; Martin, Marcel; Collins, Francis S

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

  19. Atropos: specific, sensitive, and speedy trimming of sequencing reads

    PubMed Central

    Collins, Francis S.

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos. PMID:28875074

  20. Comparative microRNA-seq Analysis Depicts Candidate miRNAs Involved in Skin Color Differentiation in Red Tilapia.

    PubMed

    Wang, Lanmei; Zhu, Wenbin; Dong, Zaijie; Song, Feibiao; Dong, Juanjuan; Fu, Jianjun

    2018-04-16

    Differentiation and variation in body color has been a growing limitation to the commercial value of red tilapia. Limited microRNA (miRNA) information is available on skin color differentiation and variation in fish so far. In this study, a high-throughput Illumina sequencing of sRNAs was conducted on three color varieties of red tilapia and 81,394,491 raw reads were generated. A total of 158 differentially expressed miRNAs (|log₂(fold change)| ≥ 1 and q -value ≤ 0.001) were identified. Target prediction and functional analysis of color-related miRNAs showed that a variety of putative target genes—including slc7a11 , mc1r and asip —played potential roles in pigmentation. Moreover; the miRNA-mRNA regulatory network was illustrated to elucidate the pigmentation differentiation, in which miR-138-5p and miR-722 were predicted to play important roles in regulating the pigmentation process. These results advance our understanding of the molecular mechanisms of skin pigmentation differentiation in red tilapia.

  1. Genome Sequence of the Freshwater Yangtze Finless Porpoise.

    PubMed

    Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jingsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang; Li, Songhai

    2018-04-16

    The Yangtze finless porpoise ( Neophocaena asiaeorientalis ssp. asiaeorientalis ) is a subspecies of the narrow-ridged finless porpoise ( N. asiaeorientalis ). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603.

  2. Genome Sequence of the Freshwater Yangtze Finless Porpoise

    PubMed Central

    Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jinsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang

    2018-01-01

    The Yangtze finless porpoise (Neophocaena asiaeorientalis ssp. asiaeorientalis) is a subspecies of the narrow-ridged finless porpoise (N. asiaeorientalis). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603. PMID:29659530

  3. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake

    USGS Publications Warehouse

    Castoe, Todd A.; Poole, Alexander W.; de Koning, A. P. Jason; Jones, Kenneth L.; Tomback, Diana F.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Lance, Stacey L.; Streicher, Jeffrey W.; Smith, Eric N.; Pollock, David D.

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample – a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable.

  4. Rapid microsatellite identification from illumina paired-end genomic sequencing in two birds and a snake

    USGS Publications Warehouse

    Castoe, T.A.; Poole, A.W.; de Koning, A. P. J.; Jones, K.L.; Tomback, D.F.; Oyler-McCance, S.J.; Fike, J.A.; Lance, S.L.; Streicher, J.W.; Smith, E.N.; Pollock, D.D.

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample - a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable. ?? 2012 Castoe et al.

  5. A user-friendly workflow for analysis of Illumina gene expression bead array data available at the arrayanalysis.org portal.

    PubMed

    Eijssen, Lars M T; Goelela, Varshna S; Kelder, Thomas; Adriaens, Michiel E; Evelo, Chris T; Radonjic, Marijana

    2015-06-30

    Illumina whole-genome expression bead arrays are a widely used platform for transcriptomics. Most of the tools available for the analysis of the resulting data are not easily applicable by less experienced users. ArrayAnalysis.org provides researchers with an easy-to-use and comprehensive interface to the functionality of R and Bioconductor packages for microarray data analysis. As a modular open source project, it allows developers to contribute modules that provide support for additional types of data or extend workflows. To enable data analysis of Illumina bead arrays for a broad user community, we have developed a module for ArrayAnalysis.org that provides a free and user-friendly web interface for quality control and pre-processing for these arrays. This module can be used together with existing modules for statistical and pathway analysis to provide a full workflow for Illumina gene expression data analysis. The module accepts data exported from Illumina's GenomeStudio, and provides the user with quality control plots and normalized data. The outputs are directly linked to the existing statistics module of ArrayAnalysis.org, but can also be downloaded for further downstream analysis in third-party tools. The Illumina bead arrays analysis module is available at http://www.arrayanalysis.org . A user guide, a tutorial demonstrating the analysis of an example dataset, and R scripts are available. The module can be used as a starting point for statistical evaluation and pathway analysis provided on the website or to generate processed input data for a broad range of applications in life sciences research.

  6. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake.

    PubMed

    Castoe, Todd A; Poole, Alexander W; de Koning, A P Jason; Jones, Kenneth L; Tomback, Diana F; Oyler-McCance, Sara J; Fike, Jennifer A; Lance, Stacey L; Streicher, Jeffrey W; Smith, Eric N; Pollock, David D

    2012-01-01

    Identification of microsatellites, or simple sequence repeats (SSRs), can be a time-consuming and costly investment requiring enrichment, cloning, and sequencing of candidate loci. Recently, however, high throughput sequencing (with or without prior enrichment for specific SSR loci) has been utilized to identify SSR loci. The direct "Seq-to-SSR" approach has an advantage over enrichment-based strategies in that it does not require a priori selection of particular motifs, or prior knowledge of genomic SSR content. It has been more expensive per SSR locus recovered, however, particularly for genomes with few SSR loci, such as bird genomes. The longer but relatively more expensive 454 reads have been preferred over less expensive Illumina reads. Here, we use Illumina paired-end sequence data to identify potentially amplifiable SSR loci (PALs) from a snake (the Burmese python, Python molurus bivittatus), and directly compare these results to those from 454 data. We also compare the python results to results from Illumina sequencing of two bird genomes (Gunnison Sage-grouse, Centrocercus minimus, and Clark's Nutcracker, Nucifraga columbiana), which have considerably fewer SSRs than the python. We show that direct Illumina Seq-to-SSR can identify and characterize thousands of potentially amplifiable SSR loci for as little as $10 per sample--a fraction of the cost of 454 sequencing. Given that Illumina Seq-to-SSR is effective, inexpensive, and reliable even for species such as birds that have few SSR loci, it seems that there are now few situations for which prior hybridization is justifiable.

  7. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing.

    PubMed

    Van Neste, Christophe; Vandewoestyne, Mado; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2014-03-01

    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as molecular autopsy in legal medicine and in mitochondrial DNA research. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  8. Transcriptome analysis of stem development in the tumourous stem mustard Brassica juncea var. tumida Tsen et Lee by RNA sequencing.

    PubMed

    Sun, Quan; Zhou, Guanfan; Cai, Yingfan; Fan, Yonghong; Zhu, Xiaoyan; Liu, Yihua; He, Xiaohong; Shen, Jinjuan; Jiang, Huaizhong; Hu, Daiwen; Pan, Zheng; Xiang, Liuxin; He, Guanghua; Dong, Daiwen; Yang, Jianping

    2012-04-21

    Tumourous stem mustard (Brassica juncea var. tumida Tsen et Lee) is an economically and nutritionally important vegetable crop of the Cruciferae family that also provides the raw material for Fuling mustard. The genetics breeding, physiology, biochemistry and classification of mustards have been extensively studied, but little information is available on tumourous stem mustard at the molecular level. To gain greater insight into the molecular mechanisms underlying stem swelling in this vegetable and to provide additional information for molecular research and breeding, we sequenced the transcriptome of tumourous stem mustard at various stem developmental stages and compared it with that of a mutant variety lacking swollen stems. Using Illumina short-read technology with a tag-based digital gene expression (DGE) system, we performed de novo transcriptome assembly and gene expression analysis. In our analysis, we assembled genetic information for tumourous stem mustard at various stem developmental stages. In addition, we constructed five DGE libraries, which covered the strains Yong'an and Dayejie at various development stages. Illumina sequencing identified 146,265 unigenes, including 11,245 clusters and 135,020 singletons. The unigenes were subjected to a BLAST search and annotated using the GO and KO databases. We also compared the gene expression profiles of three swollen stem samples with those of two non-swollen stem samples. A total of 1,042 genes with significantly different expression levels occurring simultaneously in the six comparison groups were screened out. Finally, the altered expression levels of a number of randomly selected genes were confirmed by quantitative real-time PCR. Our data provide comprehensive gene expression information at the transcriptional level and the first insight into the understanding of the molecular mechanisms and regulatory pathways of stem swelling and development in this plant, and will help define new mechanisms of stem development in non-model plant organisms.

  9. Transcriptome analysis of Brassica juncea var. tumida Tsen responses to Plasmodiophora brassicae primed by the biocontrol strain Zhihengliuella aestuarii.

    PubMed

    Luo, Yuanli; Dong, Daiwen; Su, Yu; Wang, Xuyi; Peng, Yumei; Peng, Jiang; Zhou, Changyong

    2018-05-01

    Mustard clubroot, caused by Plasmodiophora brassicae, is a serious disease that affects Brassica juncea var. tumida Tsen, a mustard plant that is the raw material for a traditional fermented food manufactured in Chongqing, China. In our laboratory, we screened the antagonistic bacteria Zhihengliuella aestuarii against P. brassicae. To better understand the biocontrol mechanism, three transcriptome analyses of B. juncea var. tumida Tsen were conducted using Illumina HiSeq 4000, one from B. juncea only inoculated with P. brassicae (P), one inoculated with P. brassica and the biocontrol agent Z. aestuarii at the same time (P + B), and the other was the control (H), in which P. brassicae was replaced by sterile water. A total of 19.94 Gb was generated by Illumina HiSeq sequencing. The sequence data were de novo assembled, and 107,617 unigenes were obtained. In total, 5629 differentially expressed genes between biocontrol-treated (P + B) and infected (P) samples were assigned to 126 KEGG pathways. Using multiple testing corrections, 20 pathways were significantly enriched with Qvalue ≤ 0.05. The resistance-related genes, involved in the production of pathogenesis-related proteins, pathogen-associated molecular pattern-triggered immunity, and effector-triggered immunity signaling pathways, calcium influx, salicylic acid pathway, reactive oxygen intermediates, and mitogen-activated protein kinase cascades, and cell wall modification, were obtained. The various defense responses induced by the biocontrol strain combatted the P. brassicae infection. The genes and pathways involved in plant resistance were induced by a biocontrol strain. The transcriptome data explained the molecular mechanism of the potential biocontrol strain against P. brassicae. The data will also serve as an important public information platform to study B. juncea var. tumida Tsen and will be useful for breeding mustard plants resistant to P. brassicae.

  10. RNA sequencing analysis to capture the transcriptome landscape during skin ulceration syndrome progression in sea cucumber Apostichopus japonicus.

    PubMed

    Yang, Aifu; Zhou, Zunchun; Pan, Yongjia; Jiang, Jingwei; Dong, Ying; Guan, Xiaoyan; Sun, Hongjuan; Gao, Shan; Chen, Zhong

    2016-06-14

    Sea cucumber Apostichopus japonicus is an important economic species in China, which is affected by various diseases; skin ulceration syndrome (SUS) is the most serious. In this study, we characterized the transcriptomes in A. japonicus challenged with Vibrio splendidus to elucidate the changes in gene expression throughout the three stages of SUS progression. RNA sequencing of 21 cDNA libraries from various tissues and developmental stages of SUS-affected A. japonicus yielded 553 million raw reads, of which 542 million high-quality reads were generated by deep-sequencing using the Illumina HiSeq™ 2000 platform. The reference transcriptome comprised a combination of the Illumina reads, 454 sequencing data and Sanger sequences obtained from the public database to generate 93,163 unigenes (average length, 1,052 bp; N50 = 1,575 bp); 33,860 were annotated. Transcriptome comparisons between healthy and SUS-affected A. japonicus revealed greater differences in gene expression profiles in the body walls (BW) than in the intestines (Int), respiratory trees (RT) and coelomocytes (C). Clustering of expression models revealed stable up-regulation as the main pattern occurring in the BW throughout the three stages of SUS progression. Significantly affected pathways were associated with signal transduction, immune system, cellular processes, development and metabolism. Ninety-two differentially expressed genes (DEGs) were divided into four functional categories: attachment/pathogen recognition (17), inflammatory reactions (38), oxidative stress response (7) and apoptosis (30). Using quantitative real-time PCR, twenty representative DEGs were selected to validate the sequencing results. The Pearson's correlation coefficient (R) of the 20 DEGs ranged from 0.811 to 0.999, which confirmed the consistency and accuracy between these two approaches. Dynamic changes in global gene expression occur during SUS progression in A. japonicus. Elucidation of these changes is important in clarifying the molecular mechanisms associated with the development of SUS in sea cucumber.

  11. Illumina Unamplified Indexed Library Construction: An Automated Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hack, Christopher A.; Sczyrba, Alexander; Cheng, Jan-Fang

    Manual library construction is a limiting factor in Illumina sequencing. Constructing libraries by hand is costly, time-consuming, low-throughput, and ergonomically hazardous, and constructing multiple libraries introduces risk of library failure due to pipetting errors. The ability to construct multiple libraries simultaneously in automated fashion represents significant cost and time savings. Here we present a strategy to construct up to 96 unamplified indexed libraries using Illumina TruSeq reagents and a Biomek FX robotic platform. We also present data to indicate that this library construction method has little or no risk of cross-contamination between samples.

  12. Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries.

    PubMed

    Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W

    2018-05-01

    The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.

  13. Improved Protocols for Illumina Sequencing

    PubMed Central

    Bronner, Iraad F.; Quail, Michael A.; Turner, Daniel J.; Swerdlow, Harold

    2013-01-01

    In this unit, we describe a set of improvements we have made to the standard Illumina protocols to make the sequencing process more reliable in a high-throughput environment, reduce amplification bias, narrow the distribution of insert sizes, and reliably obtain high yields of data. PMID:19582764

  14. Sequencing the Genome of the Heirloom Watermelon Cultivar Charleston Gray

    USDA-ARS?s Scientific Manuscript database

    The genome of the watermelon cultivar Charleston Gray, a major heirloom which has been used in breeding programs of many watermelon cultivars, was sequenced. Our strategy involved a hybrid approach using the Illumina and 454/Titanium next-generation sequencing technologies. For Illumina, shotgun g...

  15. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay

    2018-01-04

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  16. Development and Applications of a Bovine 50,000 SNP Chip

    USDA-ARS?s Scientific Manuscript database

    To develop an Illumina iSelect high density single nucleotide polymorphism (SNP) assay for cattle, the collaborative iBMC (Illumina, USDA ARS Beltsville, University of Missouri, USDA ARS Clay Center) Consortium first performed a de novo SNP discovery project in which genomic reduced representation l...

  17. High-throughput illumina strand-specific RNA sequencing library preparation

    USDA-ARS?s Scientific Manuscript database

    Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...

  18. Development of genomic microsatellites in Gleditsia triacanthos (Fabaceae) using illumina sequencing

    Treesearch

    Sandra A. Owusu; Margaret Staton; Tara N. Jennings; Scott Schlarbaum; Mark V. Coggeshall; Jeanne Romero-Severson; John E. Carlson; Oliver Gailing

    2013-01-01

    Premise of the study: Fourteen genomic microsatellite markers were developed and characterized in honey locust, Gleditsia triacanthos, using Illumina sequencing. Due to their high variability, these markers can be applied in analyses of genetic diversity and structure, and in mating system and gene flow studies.

  19. Additional annotation of the pig transcriptome using integrated Iso-seq and Illumina RNA-seq analysis

    USDA-ARS?s Scientific Manuscript database

    Alternative splicing is a well-known phenomenon that dramatically increases eukaryotic transcriptome diversity. The extent of mRNA isoform diversity among porcine tissues was assessed using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq) and Illumina short read sequencing ...

  20. OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data

    PubMed Central

    Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y. Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences. PMID:24824529

  1. OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data.

    PubMed

    Zhu, Pengyuan; He, Lingyu; Li, Yaqiao; Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology's Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.

  2. Investigation of bacterial and archaeal communities: novel protocols using modern sequencing by Illumina MiSeq and traditional DGGE-cloning.

    PubMed

    Kraková, Lucia; Šoltys, Katarína; Budiš, Jaroslav; Grivalský, Tomáš; Ďuriš, František; Pangallo, Domenico; Szemes, Tomáš

    2016-09-01

    Different protocols based on Illumina high-throughput DNA sequencing and denaturing gradient gel electrophoresis (DGGE)-cloning were developed and applied for investigating hot spring related samples. The study was focused on three target genes: archaeal and bacterial 16S rRNA and mcrA of methanogenic microflora. Shorter read lengths of the currently most popular technology of sequencing by Illumina do not allow analysis of the complete 16S rRNA region, or of longer gene fragments, as was the case of Sanger sequencing. Here, we demonstrate that there is no need for special indexed or tailed primer sets dedicated to short variable regions of 16S rRNA since the presented approach allows the analysis of complete bacterial 16S rRNA amplicons (V1-V9) and longer archaeal 16S rRNA and mcrA sequences. Sample augmented with transposon is represented by a set of approximately 300 bp long fragments that can be easily sequenced by Illumina MiSeq. Furthermore, a low proportion of chimeric sequences was observed. DGGE-cloning based strategies were performed combining semi-nested PCR, DGGE and clone library construction. Comparing both investigation methods, a certain degree of complementarity was observed confirming that the DGGE-cloning approach is not obsolete. Novel protocols were created for several types of laboratories, utilizing the traditional DGGE technique or using the most modern Illumina sequencing.

  3. Overview of Next-generation Sequencing Platforms Used in Published Draft Plant Genomes in Light of Genotypization of Immortelle Plant (Helichrysium Arenarium)

    PubMed Central

    Hodzic, Jasin; Gurbeta, Lejla; Omanovic-Miklicanin, Enisa; Badnjevic, Almir

    2017-01-01

    Introduction: Major advancements in DNA sequencing methods introduced in the first decade of the new millennium initiated a rapid expansion of sequencing studies, which yielded a tremendous amount of DNA sequence data, including whole sequenced genomes of various species, including plants. A set of novel sequencing platforms, often collectively named as “next-generation sequencing” (NGS) completely transformed the life sciences, by allowing extensive throughput, while greatly reducing the necessary time, labor and cost of any sequencing endeavor. Purpose: of this paper is to present an overview NGS platforms used to produce the current compendium of published draft genomes of various plants, namely the Roche/454, ABI/SOLiD, and Solexa/Illumina, and to determine the most frequently used platform for the whole genome sequencing of plants in light of genotypization of immortelle plant. Materials and methods: 45 papers were selected (with 47 presented plant genome draft sequences), and utilized sequencing techniques and NGS platforms (Roche/454, ABI/SOLiD and Illumina/Solexa) in selected papers were determined. Subsequently, frequency of usage of each platform or combination of platforms was calculated. Results: Illumina/Solexa platforms are by used either as sole sequencing tool in 40.42% of published genomes, or in combination with other platforms - additional 48.94% of published genomes, followed by Roche/454 platforms, used in combination with traditional Sanger sequencing method (10.64%), and never as a sole tool. ABI/SOLiD was only used in combination with Illumina/Solexa and Roche/454 in 4.25% of publications. Conclusions: Illumina/Solexa platforms are by far most preferred by researchers, most probably due to most affordable sequencing costs. Taking into consideration the current economic situation in the Balkans region, Illumina Solexa is the best (if not the only) platform choice if the sequencing of immortelle plant (Helichrysium arenarium) is to be performed by the researchers in this region. PMID:28974852

  4. Unravelling molecular mechanisms from floral initiation to lipid biosynthesis in a promising biofuel tree species, Pongamia pinnata using transcriptome analysis

    PubMed Central

    Sreeharsha, Rachapudi V.; Mudalkar, Shalini; Singha, Kambam T.; Reddy, Attipalli R.

    2016-01-01

    Pongamia pinnata (L.) (Fabaceae) is a promising biofuel tree species which is underexploited in the areas of both fundamental and applied research, due to the lack of information either on transcriptome or genomic data. To investigate the possible metabolic pathways, we performed whole transcriptome analysis of Pongamia through Illumina NextSeq platform and generated 2.8 GB of paired end sequence reads. The de novo assembly of raw reads generated 40,000 contigs and 35,000 transcripts, representing leaf, flower and seed unigenes. Spatial and temporal expression profiles of photoperiod and floral homeotic genes in Pongamia, identified GIGANTEA (GI) - CONSTANS (CO) - FLOWERING LOCUS T (FT) as active signal cascade for floral initiation. Four prominent stages of seed development were selected in a high yielding Pongamia accession (TOIL 1) to follow the temporal expression patterns of key fatty acid biosynthetic genes involved in lipid biosynthesis and accumulation. Our results provide insights into an array of molecular events from flowering to seed maturity in Pongamia which will provide substantial basis for modulation of fatty acid composition and enhancing oil yields which should serve as a potential feedstock for biofuel production. PMID:27677333

  5. ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

    PubMed

    He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

    2013-12-04

    Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.

  6. De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris

    PubMed Central

    Niu, Shan-Ce; Xu, Qing; Zhang, Guo-Qiang; Zhang, Yong-Qiang; Tsai, Wen-Chieh; Hsu, Jui-Ling; Liang, Chieh-Kai; Luo, Yi-Bo; Liu, Zhong-Jian

    2016-01-01

    Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search. PMID:27673730

  7. The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis.

    PubMed

    Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok; Kim, Hyun-Woo; Kim, Sanghee; Park, Hyun

    2017-01-01

    The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi . The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis -specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. © The Author 2017. Published by Oxford University Press.

  8. The genome of the Antarctic-endemic copepod, Tigriopus kingsejongensis

    PubMed Central

    Kang, Seunghyun; Ahn, Do-Hwan; Lee, Jun Hyuck; Lee, Sung Gu; Shin, Seung Chul; Lee, Jungeun; Min, Gi-Sik; Lee, Hyoungseok

    2017-01-01

    Abstract Background: The Antarctic intertidal zone is continuously subjected to extremely fluctuating biotic and abiotic stressors. The West Antarctic Peninsula is the most rapidly warming region on Earth. Organisms living in Antarctic intertidal pools are therefore interesting for research into evolutionary adaptation to extreme environments and the effects of climate change. Findings: We report the whole genome sequence of the Antarctic-endemic harpacticoid copepod Tigriopus kingsejongensi. The 37 Gb raw DNA sequence was generated using the Illumina Miseq platform. Libraries were prepared with 65-fold coverage and a total length of 295 Mb. The final assembly consists of 48 368 contigs with an N50 contig length of 17.5 kb, and 27 823 scaffolds with an N50 contig length of 159.2 kb. A total of 12 772 coding genes were inferred using the MAKER annotation pipeline. Comparative genome analysis revealed that T. kingsejongensis-specific genes are enriched in transport and metabolism processes. Furthermore, rapidly evolving genes related to energy metabolism showed positive selection signatures. Conclusions: The T. kingsejongensis genome provides an interesting example of an evolutionary strategy for Antarctic cold adaptation, and offers new genetic insights into Antarctic intertidal biota. PMID:28369352

  9. Self-organizing approach for meta-genomes.

    PubMed

    Zhu, Jianfeng; Zheng, Wei-Mou

    2014-12-01

    We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. PipeCraft: Flexible open-source toolkit for bioinformatics analysis of custom high-throughput amplicon sequencing data.

    PubMed

    Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho

    2017-11-01

    High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.

  11. Accuracy of genotype imputation in Swiss cattle breeds

    USDA-ARS?s Scientific Manuscript database

    The objective of this study was to evaluate the accuracy of imputation from Illumina Bovine3k Bead Chip (3k) and Illumina BovineLD (6k) to 54k chip information in Swiss dairy cattle breeds. Genotype data comprised of 54k SNP chip data of Original Braunvieh (OB), Brown Swiss (BS), Swiss Fleckvieh (SF...

  12. Illumina sequencing of green stink bug nymph and adult cdna to identify potential rnai gene targets

    USDA-ARS?s Scientific Manuscript database

    Whole-body transcriptomes for nymphs and adults of the green stink bug, Acrosternum hilare (Say), were sequenced on an Illumina® Genome Analyzer IIx sequencer. The insects were collected from sites in North Carolina and Virginia, USA. The cDNA library for each sample was sequenced on one lane of an...

  13. New features to the night sky radiance model illumina: Hyperspectral support, improved obstacles and cloud reflection

    NASA Astrophysics Data System (ADS)

    Aubé, M.; Simoneau, A.

    2018-05-01

    Illumina is one of the most physically detailed artificial night sky brightness model to date. It has been in continuous development since 2005 [1]. In 2016-17, many improvements were made to the Illumina code including an overhead cloud scheme, an improved blocking scheme for subgrid obstacles (trees and buildings), and most importantly, a full hyperspectral modeling approach. Code optimization resulted in significant reduction in execution time enabling users to run the model on standard personal computers for some applications. After describing the new schemes introduced in the model, we give some examples of applications for a peri-urban and a rural site both located inside the International Dark Sky reserve of Mont-Mégantic (QC, Canada).

  14. Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.

    PubMed

    Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong

    2018-05-04

    Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.

  15. Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platform (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Tremblay, Julien

    2018-01-22

    Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  16. Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platform (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tremblay, Julien

    2012-06-01

    Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  17. Design of the Illumina Porcine 50K+ SNP Iselect(TM) Beadchip and Characterization of the Porcine HapMap Population

    USDA-ARS?s Scientific Manuscript database

    Using next generation sequencing technology the International Swine SNP Consortium has identified 500,000 SNPs and used these to design an Illumina Infinium iSelect™ SNP BeadChip with a selection of 60,218 SNPs. The selected SNPs include previously validated SNPs and SNPs identified de novo using se...

  18. Evaluation of Different Normalization and Analysis Procedures for Illumina Gene Expression Microarray Data Involving Small Changes

    PubMed Central

    Johnstone, Daniel M.; Riveros, Carlos; Heidari, Moones; Graham, Ross M.; Trinder, Debbie; Berretta, Regina; Olynyk, John K.; Scott, Rodney J.; Moscato, Pablo; Milward, Elizabeth A.

    2013-01-01

    While Illumina microarrays can be used successfully for detecting small gene expression changes due to their high degree of technical replicability, there is little information on how different normalization and differential expression analysis strategies affect outcomes. To evaluate this, we assessed concordance across gene lists generated by applying different combinations of normalization strategy and analytical approach to two Illumina datasets with modest expression changes. In addition to using traditional statistical approaches, we also tested an approach based on combinatorial optimization. We found that the choice of both normalization strategy and analytical approach considerably affected outcomes, in some cases leading to substantial differences in gene lists and subsequent pathway analysis results. Our findings suggest that important biological phenomena may be overlooked when there is a routine practice of using only one approach to investigate all microarray datasets. Analytical artefacts of this kind are likely to be especially relevant for datasets involving small fold changes, where inherent technical variation—if not adequately minimized by effective normalization—may overshadow true biological variation. This report provides some basic guidelines for optimizing outcomes when working with Illumina datasets involving small expression changes. PMID:27605185

  19. Library preparation and data analysis packages for rapid genome sequencing.

    PubMed

    Pomraning, Kyle R; Smith, Kristina M; Bredeweg, Erin L; Connolly, Lanelle R; Phatale, Pallavi A; Freitag, Michael

    2012-01-01

    High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for "multiplexing," i.e. the analysis of several samples in a single flowcell lane by generating "barcoded" or "indexed" Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or "mapping") and counting algorithms are being developed and tested.

  20. Differences in the gut microbiota of dogs (Canis lupus familiaris) fed a natural diet or a commercial feed revealed by the Illumina MiSeq platform.

    PubMed

    Kim, Junhyung; An, Jae-Uk; Kim, Woohyun; Lee, Soomin; Cho, Seongbeom

    2017-01-01

    Recent advances in next-generation sequencing technologies have enabled comprehensive analysis of the gut microbiota, which is closely linked to the health of the host. Consequently, several studies have explored the factors affecting gut microbiota composition. In recent years, increasing number of dog owners are feeding their pets a natural diet i.e., one consisting of bones, raw meat (such as chicken and beef), and vegetables, instead of commercial feed. However, the effect of these diets on the microbiota of dogs ( Canis lupus familiaris ) is unclear. Six dogs fed a natural diet and five dogs fed a commercial feed were selected; dog fecal metagenomic DNA samples were analyzed using the Illumina MiSeq platform. Pronounced differences in alpha and beta diversities, and taxonomic composition of the core gut microbiota were observed between the two groups. According to alpha diversity, the number of operational taxonomic units, the richness estimates, and diversity indices of microbiota were significantly higher ( p  < 0.05) in the natural diet group than in the commercial feed group. Based on beta diversity, most samples clustered together according to the diet type ( p  = 0.004). Additionally, the core microbiota between the two groups was different at the phylum, family, and species levels. Marked differences in the taxonomic composition of the core microbiota of the two groups were observed at the species level; Clostridium perfringens ( p  = 0.017) and Fusobacterium varium ( p  = 0.030) were more abundant in the natural diet group. The gut microbiota of dogs is significantly influenced by diet type (i.e., natural diet and commercial feed). Specifically, dogs fed a natural diet have more diverse and abundant microbial composition in the gut microbiota than dogs fed a commercial feed. In addition, this study suggests that in dogs fed a natural diet, the potential risk of opportunistic infection could be higher, than in dogs fed a commercial feed. The type of diet might therefore play a key role in animal health by affecting the gut microbiota. This study could be the basis for future gut microbiota research in dogs.

  1. Combining independent de novo assemblies optimizes the coding transcriptome for nonconventional model eukaryotic organisms.

    PubMed

    Cerveau, Nicolas; Jackson, Daniel J

    2016-12-09

    Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.

  2. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

    PubMed

    Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

    2010-07-02

    The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.

  3. Complete Genome Sequence of a Streptococcus pyogenes Serotype M12 Scarlet Fever Outbreak Isolate from China, Compiled Using Oxford Nanopore and Illumina Sequencing

    PubMed Central

    You, Yuanhai; Kou, Yongjun; Niu, Longfei; Jia, Qiong; Liu, Yahui; Walker, Mark J.; Zhu, Jiaqiang

    2018-01-01

    ABSTRACT The incidence of scarlet fever cases remains high in China. Here, we report the complete genome sequence of a Streptococcus pyogenes isolate of serotype M12, which has been confirmed as the predominant serotype in recent outbreaks. Genome sequencing was achieved by a combination of Oxford Nanopore MinION and Illumina methodologies. PMID:29724853

  4. Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources

    DTIC Science & Technology

    2014-11-01

    genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen

  5. Dissection of the Octoploid Strawberry Genome by Deep Sequencing of the Genomes of Fragaria Species

    PubMed Central

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N.

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  6. Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

    PubMed Central

    Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

    2018-01-01

    The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646

  7. Transcriptome sequencing and identification of cold tolerance genes in hardy Corylus species (C. heterophylla Fisch) floral buds.

    PubMed

    Chen, Xin; Zhang, Jin; Liu, Qingzhong; Guo, Wei; Zhao, Tiantian; Ma, Qinghua; Wang, Guixi

    2014-01-01

    The genus Corylus is an important woody species in Northeast China. Its products, hazelnuts, constitute one of the most important raw materials for the pastry and chocolate industry. However, limited genetic research has focused on Corylus because of the lack of genomic resources. The advent of high-throughput sequencing technologies provides a turning point for Corylus research. In the present study, we performed de novo transcriptome sequencing for the first time to produce a comprehensive database for the Corylus heterophylla Fisch floral buds. The C. heterophylla Fisch floral buds transcriptome was sequenced using the Illumina paired-end sequencing technology. We produced 28,930,890 raw reads and assembled them into 82,684 contigs. A total of 40,941 unigenes were identified, among which 30,549 were annotated in the NCBI Non-redundant (Nr) protein database and 18,581 were annotated in the Swiss-Prot database. Of these annotated unigenes, 25,311 and 10,514 unigenes were assigned to gene ontology (GO) categories and clusters of orthologous groups (COG), respectively. We could map 17,207 unigenes onto 128 pathways using the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database. Additionally, based on the transcriptome, we constructed a candidate cold tolerance gene set of C. heterophylla Fisch floral buds. The expression patterns of selected genes during four stages of cold acclimation suggested that these genes might be involved in different cold responsive stages in C. heterophylla Fisch floral buds. The transcriptome of C. heterophylla Fisch floral buds was deep sequenced, de novo assembled, and annotated, providing abundant data to better understand the C. heterophylla Fisch floral buds transcriptome. Candidate genes potentially involved in cold tolerance were identified, providing a material basis for future molecular mechanism analysis of C. heterophylla Fisch floral buds tolerant to cold stress.

  8. Complete Genome Sequence of a Streptococcus pyogenes Serotype M12 Scarlet Fever Outbreak Isolate from China, Compiled Using Oxford Nanopore and Illumina Sequencing.

    PubMed

    You, Yuanhai; Kou, Yongjun; Niu, Longfei; Jia, Qiong; Liu, Yahui; Davies, Mark R; Walker, Mark J; Zhu, Jiaqiang; Zhang, Jianzhong

    2018-05-03

    The incidence of scarlet fever cases remains high in China. Here, we report the complete genome sequence of a Streptococcus pyogenes isolate of serotype M12, which has been confirmed as the predominant serotype in recent outbreaks. Genome sequencing was achieved by a combination of Oxford Nanopore MinION and Illumina methodologies. Copyright © 2018 You et al.

  9. Analysis of Illumina Microbial Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clum, Alicia; Foster, Brian; Froula, Jeff

    2010-05-28

    Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

  10. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  11. Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing

    PubMed Central

    Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.

    2015-01-01

    The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379

  12. An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

    PubMed Central

    2010-01-01

    Background The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. Findings We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. Conclusions TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease. PMID:20598141

  13. Developing cDNA Libraries of Receptors Involved in the Recruitment of the Biofouling Tubeworm Hydroides elegans

    DTIC Science & Technology

    2014-06-12

    Transcriptome, Hydroides elegans, Next Generation Sequencing, Illumina HiSeq, PacBio SMRT, Biofilm , Metamorphosis 16. SECURITY CLASSIFICATION OF: a...to a bacterial cue from a bacterial biofilm . Recently, this cue has been identified to be a phage-tail like bacteriocin produced by the bacterium...submitted to the Huntsman Cancer Institute at the University of Utah and the subsequent isolation of mRNA was used for Illumina HiSeq 101 paired end

  14. Estimating genotype error rates from high-coverage next-generation sequence data.

    PubMed

    Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

    2014-11-01

    Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Model-based variance-stabilizing transformation for Illumina microarray data.

    PubMed

    Lin, Simon M; Du, Pan; Huber, Wolfgang; Kibbe, Warren A

    2008-02-01

    Variance stabilization is a step in the preprocessing of microarray data that can greatly benefit the performance of subsequent statistical modeling and inference. Due to the often limited number of technical replicates for Affymetrix and cDNA arrays, achieving variance stabilization can be difficult. Although the Illumina microarray platform provides a larger number of technical replicates on each array (usually over 30 randomly distributed beads per probe), these replicates have not been leveraged in the current log2 data transformation process. We devised a variance-stabilizing transformation (VST) method that takes advantage of the technical replicates available on an Illumina microarray. We have compared VST with log2 and Variance-stabilizing normalization (VSN) by using the Kruglyak bead-level data (2006) and Barnes titration data (2005). The results of the Kruglyak data suggest that VST stabilizes variances of bead-replicates within an array. The results of the Barnes data show that VST can improve the detection of differentially expressed genes and reduce false-positive identifications. We conclude that although both VST and VSN are built upon the same model of measurement noise, VST stabilizes the variance better and more efficiently for the Illumina platform by leveraging the availability of a larger number of within-array replicates. The algorithms and Supplementary Data are included in the lumi package of Bioconductor, available at: www.bioconductor.org.

  16. Diversity and community composition of methanogenic archaea in the rumen of Scottish upland sheep assessed by different methods.

    PubMed

    Snelling, Timothy J; Genç, Buğra; McKain, Nest; Watson, Mick; Waters, Sinéad M; Creevey, Christopher J; Wallace, R John

    2014-01-01

    Ruminal archaeomes of two mature sheep grazing in the Scottish uplands were analysed by different sequencing and analysis methods in order to compare the apparent archaeal communities. All methods revealed that the majority of methanogens belonged to the Methanobacteriales order containing the Methanobrevibacter, Methanosphaera and Methanobacteria genera. Sanger sequenced 1.3 kb 16S rRNA gene amplicons identified the main species of Methanobrevibacter present to be a SGMT Clade member Mbb. millerae (≥ 91% of OTUs); Methanosphaera comprised the remainder of the OTUs. The primers did not amplify ruminal Thermoplasmatales-related 16S rRNA genes. Illumina sequenced V6-V8 16S rRNA gene amplicons identified similar Methanobrevibacter spp. and Methanosphaera clades and also identified the Thermoplasmatales-related order as 13% of total archaea. Unusually, both methods concluded that Mbb. ruminantium and relatives from the same clade (RO) were almost absent. Sequences mapping to rumen 16S rRNA and mcrA gene references were extracted from Illumina metagenome data. Mapping of the metagenome data to 16S rRNA gene references produced taxonomic identification to Order level including 2-3% Thermoplasmatales, but was unable to discriminate to species level. Mapping of the metagenome data to mcrA gene references resolved 69% to unclassified Methanobacteriales. Only 30% of sequences were assigned to species level clades: of the sequences assigned to Methanobrevibacter, most mapped to SGMT (16%) and RO (10%) clades. The Sanger 16S amplicon and Illumina metagenome mcrA analyses showed similar species richness (Chao1 Index 19-35), while Illumina metagenome and amplicon 16S rRNA analysis gave lower richness estimates (10-18). The values of the Shannon Index were low in all methods, indicating low richness and uneven species distribution. Thus, although much information may be extracted from the other methods, Illumina amplicon sequencing of the V6-V8 16S rRNA gene would be the method of choice for studying rumen archaeal communities.

  17. Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform.

    PubMed

    Schirmer, Melanie; Ijaz, Umer Z; D'Amore, Rosalinda; Hall, Neil; Sloan, William T; Quince, Christopher

    2015-03-31

    With read lengths of currently up to 2 × 300 bp, high throughput and low sequencing costs Illumina's MiSeq is becoming one of the most utilized sequencing platforms worldwide. The platform is manageable and affordable even for smaller labs. This enables quick turnaround on a broad range of applications such as targeted gene sequencing, metagenomics, small genome sequencing and clinical molecular diagnostics. However, Illumina error profiles are still poorly understood and programs are therefore not designed for the idiosyncrasies of Illumina data. A better knowledge of the error patterns is essential for sequence analysis and vital if we are to draw valid conclusions. Studying true genetic variation in a population sample is fundamental for understanding diseases, evolution and origin. We conducted a large study on the error patterns for the MiSeq based on 16S rRNA amplicon sequencing data. We tested state-of-the-art library preparation methods for amplicon sequencing and showed that the library preparation method and the choice of primers are the most significant sources of bias and cause distinct error patterns. Furthermore we tested the efficiency of various error correction strategies and identified quality trimming (Sickle) combined with error correction (BayesHammer) followed by read overlapping (PANDAseq) as the most successful approach, reducing substitution error rates on average by 93%. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    USGS Publications Warehouse

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  19. Application of allflex conservation buffer in illumina genotyping.

    PubMed

    de Groot, M; Ras, T; van Haeringen, W A

    2016-12-01

    This experiment was designed to study if liquid conservation buffer used in the novel Tissue Sampling Technology (TST) from Allflex can be used for Illumina BeadChip genotyping. Ear punches were collected from 6 bovine samples, using both the Tissue Sampling Unit (TSU) as well as the Total Tagger Universal (TTU) collection system. The stability of the liquid conservation buffer was tested by genotyping samples on Illumina BeadChips, incubated at 0, 3, 15, 24, 48, 72, 168, 336, 720 h after sample collection. Additionally, a replenishment study was designed to test how often the liquid conservation buffer could be completely replenished before a significant call rate drop could be observed. Results from the stability study showed an average call rate of 0.993 for samples collected with the TSU system and 0.953 for samples collected with the TTU system, both exceeding the inclusion threshold call rate of 0.85. As an additional control, the identity of the individual animals was confirmed using the International Society of Animal Genetics (ISAG) recommended SNP panel. The replenishment study revealed a slight drop in the sample call rate after replenishing the conservation buffer for the fourth time for the TSU as well as the TTU samples. In routine analysis, this application allows for multiple experiments to be performed on the liquid conservation buffer, while maintaining the tissue samples for future use. The data collected in this study shows that the liquid conservation buffer used in the TST system can be used for Illumina BeadChip genotyping applications.

  20. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  1. Illuminating choices for library prep: a comparison of library preparation methods for whole genome sequencing of Cryptococcus neoformans using Illumina HiSeq.

    PubMed

    Rhodes, Johanna; Beale, Mathew A; Fisher, Matthew C

    2014-01-01

    The industry of next-generation sequencing is constantly evolving, with novel library preparation methods and new sequencing machines being released by the major sequencing technology companies annually. The Illumina TruSeq v2 library preparation method was the most widely used kit and the market leader; however, it has now been discontinued, and in 2013 was replaced by the TruSeq Nano and TruSeq PCR-free methods, leaving a gap in knowledge regarding which is the most appropriate library preparation method to use. Here, we used isolates from the pathogenic fungi Cryptococcus neoformans var. grubii and sequenced them using the existing TruSeq DNA v2 kit (Illumina), along with two new kits: the TruSeq Nano DNA kit (Illumina) and the NEBNext Ultra DNA kit (New England Biolabs) to provide a comparison. Compared to the original TruSeq DNA v2 kit, both newer kits gave equivalent or better sequencing data, with increased coverage. When comparing the two newer kits, we found little difference in cost and workflow, with the NEBNext Ultra both slightly cheaper and faster than the TruSeq Nano. However, the quality of data generated using the TruSeq Nano DNA kit was superior due to higher coverage at regions of low GC content, and more SNPs identified. Researchers should therefore evaluate their resources and the type of application (and hence data quality) being considered when ultimately deciding on which library prep method to use.

  2. Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification

    PubMed Central

    2013-01-01

    Background Next-generation-sequencing (NGS) technologies combined with a classic DNA barcoding approach have enabled fast and credible measurement for biodiversity of mixed environmental samples. However, the PCR amplification involved in nearly all existing NGS protocols inevitably introduces taxonomic biases. In the present study, we developed new Illumina pipelines without PCR amplifications to analyze terrestrial arthropod communities. Results Mitochondrial enrichment directly followed by Illumina shotgun sequencing, at an ultra-high sequence volume, enabled the recovery of Cytochrome c Oxidase subunit 1 (COI) barcode sequences, which allowed for the estimation of species composition at high fidelity for a terrestrial insect community. With 15.5 Gbp Illumina data, approximately 97% and 92% were detected out of the 37 input Operational Taxonomic Units (OTUs), whether the reference barcode library was used or not, respectively, while only 1 novel OTU was found for the latter. Additionally, relatively strong correlation between the sequencing volume and the total biomass was observed for species from the bulk sample, suggesting a potential solution to reveal relative abundance. Conclusions The ability of the new Illumina PCR-free pipeline for DNA metabarcoding to detect small arthropod specimens and its tendency to avoid most, if not all, false positives suggests its great potential in biodiversity-related surveillance, such as in biomonitoring programs. However, further improvement for mitochondrial enrichment is likely needed for the application of the new pipeline in analyzing arthropod communities at higher diversity. PMID:23587339

  3. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    PubMed

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  4. Portable point-of-care blood analysis system for global health (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Dou, James J.; Aitchison, James Stewart; Chen, Lu; Nayyar, Rakesh

    2016-03-01

    In this paper we present a portable blood analysis system based on a disposable cartridge and hand-held reader. The platform can perform all the sample preparation, detection and waste collection required to complete a clinical test. In order to demonstrate the utility of this approach a CD4 T cell enumeration was carried out. A handheld, point-of-care CD4 T cell system was developed based on this system. In particular we will describe a pneumatic, active pumping method to control the on-chip fluidic actuation. Reagents for the CD4 T cell counting assay were dried on a reagent plug to eliminate the need for cold chain storage when used in the field. A micromixer based on the active fluidic actuation was designed to complete sample staining with fluorescent dyes that was dried on the reagent plugs. A novel image detection and analysis algorithm was developed to detect and track the flight of target particles and cells during each analysis. The handheld, point-of-care CD4 testing system was benchmarked against clinical cytometer. The experimental results demonstrated experimental results were closely matched with the flow cytometry. The same platform can be further expanded into a bead-array detection system where other types of biomolecules such as proteins can be detected using the same detection system.

  5. Identification of Methylated Genes Associated with Aggressive Bladder Cancer

    PubMed Central

    Marsit, Carmen J.; Houseman, E. Andres; Christensen, Brock C.; Gagne, Luc; Wrensch, Margaret R.; Nelson, Heather H.; Wiemels, Joseph; Zheng, Shichun; Wiencke, John K.; Andrew, Angeline S.; Schned, Alan R.; Karagas, Margaret R.; Kelsey, Karl T.

    2010-01-01

    Approximately 500,000 individuals diagnosed with bladder cancer in the U.S. require routine cystoscopic follow-up to monitor for disease recurrences or progression, resulting in over $2 billion in annual expenditures. Identification of new diagnostic and monitoring strategies are clearly needed, and markers related to DNA methylation alterations hold great promise due to their stability, objective measurement, and known associations with the disease and with its clinical features. To identify novel epigenetic markers of aggressive bladder cancer, we utilized a high-throughput DNA methylation bead-array in two distinct population-based series of incident bladder cancer (n = 73 and n = 264, respectively). We then validated the association between methylation of these candidate loci with tumor grade in a third population (n = 245) through bisulfite pyrosequencing of candidate loci. Array based analyses identified 5 loci for further confirmation with bisulfite pyrosequencing. We identified and confirmed that increased promoter methylation of HOXB2 is significantly and independently associated with invasive bladder cancer and methylation of HOXB2, KRT13 and FRZB together significantly predict high-grade non-invasive disease. Methylation of these genes may be useful as clinical markers of the disease and may point to genes and pathways worthy of additional examination as novel targets for therapeutic treatment. PMID:20808801

  6. Identification of methylated genes associated with aggressive bladder cancer.

    PubMed

    Marsit, Carmen J; Houseman, E Andres; Christensen, Brock C; Gagne, Luc; Wrensch, Margaret R; Nelson, Heather H; Wiemels, Joseph; Zheng, Shichun; Wiencke, John K; Andrew, Angeline S; Schned, Alan R; Karagas, Margaret R; Kelsey, Karl T

    2010-08-23

    Approximately 500,000 individuals diagnosed with bladder cancer in the U.S. require routine cystoscopic follow-up to monitor for disease recurrences or progression, resulting in over $2 billion in annual expenditures. Identification of new diagnostic and monitoring strategies are clearly needed, and markers related to DNA methylation alterations hold great promise due to their stability, objective measurement, and known associations with the disease and with its clinical features. To identify novel epigenetic markers of aggressive bladder cancer, we utilized a high-throughput DNA methylation bead-array in two distinct population-based series of incident bladder cancer (n = 73 and n = 264, respectively). We then validated the association between methylation of these candidate loci with tumor grade in a third population (n = 245) through bisulfite pyrosequencing of candidate loci. Array based analyses identified 5 loci for further confirmation with bisulfite pyrosequencing. We identified and confirmed that increased promoter methylation of HOXB2 is significantly and independently associated with invasive bladder cancer and methylation of HOXB2, KRT13 and FRZB together significantly predict high-grade non-invasive disease. Methylation of these genes may be useful as clinical markers of the disease and may point to genes and pathways worthy of additional examination as novel targets for therapeutic treatment.

  7. Genomic dissection of small RNAs in wild rice (Oryza rufipogon): lessons for rice domestication.

    PubMed

    Wang, Yu; Bai, Xuefei; Yan, Chenghai; Gui, Yiejie; Wei, Xinghua; Zhu, Qian-Hao; Guo, Longbiao; Fan, Longjiang

    2012-11-01

    The lack of a MIRNA set and genome sequence of wild rice (Oryza rufipogon) has prevented us from determining the role of MIRNA genes in rice domestication. In this study, a genome, three small RNA populations and a degradome of O. rufipogon were sequenced by Illumina platform and the expression levels of microRNAs (miRNAs) were investigated by miRNA chips. A de novo O. rufipogon genome was assembled using c. 55× coverage of raw sequencing data and a total of 387 MIRNAs were identified in the O. rufipogon genome based on c. 5.2 million unique small RNA reads from three different tissues of O. rufipogon. Of these, O. rufipogon MIRNAs, 259 were not found in the cultivated rice, suggesting a loss of these MIRNAs in the cultivated rice. We also found that 48 MIRNAs were novel in the cultivated rice, suggesting that they were potential targets of domestication selection. Some miRNAs showed significant expression differences between wild and cultivated rice, suggesting that expression of miRNA could also be a target of domestication, as demonstrated for the miR164 family. Our results illustrated that MIRNA genes, like protein-coding genes, might have been significantly shaped during rice domestication and could be one of the driving forces that contributed to rice domestication. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.

  8. Draft genome of the Northern snakehead, Channa argus.

    PubMed

    Xu, Jian; Bian, Chao; Chen, Kunci; Liu, Guiming; Jiang, Yanliang; Luo, Qing; You, Xinxin; Peng, Wenzhu; Li, Jia; Huang, Yu; Yi, Yunhai; Dong, Chuanju; Deng, Hua; Zhang, Songhao; Zhang, Hanyuan; Shi, Qiong; Xu, Peng

    2017-04-01

    The Northern snakehead (Channa argus), a member of the Channidae family of the Perciformes, is an economically important freshwater fish native to East Asia. In North America, it has become notorious as an intentionally released invasive species. Its ability to breathe air with gills and migrate short distances over land makes it a good model for bimodal breath research. Therefore, recent research has focused on the identification of relevant candidate genes. Here, we performed whole genome sequencing of C. argus to construct its draft genome, aiming to offer useful information for further functional studies and identification of target genes related to its unusual facultative air breathing. Findings: We assembled the C. argus genome with a total of 140.3 Gb of raw reads, which were sequenced using the Illumina HiSeq2000 platform. The final draft genome assembly was approximately 615.3 Mb, with a contig N50 of 81.4 kb and scaffold N50 of 4.5 Mb. The identified repeat sequences account for 18.9% of the whole genome. The 19 877 protein-coding genes were predicted from the genome assembly, with an average of 10.5 exons per gene. Conclusion: We generated a high-quality draft genome of C. argus, which will provide a valuable genetic resource for further biomedical investigations of this economically important teleost fish. © The Author 2017. Published by Oxford University Press.

  9. Functional Characterization of Novel Sesquiterpene Synthases from Indian Sandalwood, Santalum album

    PubMed Central

    Srivastava, Prabhakar Lal; Daramwar, Pankaj P.; Krithika, Ramakrishnan; Pandreka, Avinash; Shankar, S. Shiva; Thulasiram, Hirekodathakallu V.

    2015-01-01

    Indian Sandalwood, Santalum album L. is highly valued for its fragrant heartwood oil and is dominated by a blend of sesquiterpenes. Sesquiterpenes are formed through cyclization of farnesyl diphosphate (FPP), catalyzed by metal dependent terpene cyclases. This report describes the cloning and functional characterization of five genes, which encode two sesquisabinene synthases (SaSQS1, SaSQS2), bisabolene synthase (SaBS), santalene synthase (SaSS) and farnesyl diphosphate synthase (SaFDS) using the transcriptome sequencing of S. album. Using Illumina next generation sequencing, 33.32 million high quality raw reads were generated, which were assembled into 84,094 unigenes with an average length of 494.17 bp. Based on the transcriptome sequencing, five sesquiterpene synthases SaFDS, SaSQS1, SaSQS2, SaBS and SaSS involved in the biosynthesis of FPP, sesquisabinene, β-bisabolene and santalenes, respectively, were cloned and functionally characterized. Novel sesquiterpene synthases (SaSQS1 and SaSQS2) were characterized as isoforms of sesquisabinene synthase with varying kinetic parameters and expression levels. Furthermore, the feasibility of microbial production of sesquisabinene from both the unigenes, SaSQS1 and SaSQS2 in non-optimized bacterial cell for the preparative scale production of sesquisabinene has been demonstrated. These results may pave the way for in vivo production of sandalwood sesquiterpenes in genetically tractable heterologous systems. PMID:25976282

  10. Functional Characterization of Novel Sesquiterpene Synthases from Indian Sandalwood, Santalum album.

    PubMed

    Srivastava, Prabhakar Lal; Daramwar, Pankaj P; Krithika, Ramakrishnan; Pandreka, Avinash; Shankar, S Shiva; Thulasiram, Hirekodathakallu V

    2015-05-15

    Indian Sandalwood, Santalum album L. is highly valued for its fragrant heartwood oil and is dominated by a blend of sesquiterpenes. Sesquiterpenes are formed through cyclization of farnesyl diphosphate (FPP), catalyzed by metal dependent terpene cyclases. This report describes the cloning and functional characterization of five genes, which encode two sesquisabinene synthases (SaSQS1, SaSQS2), bisabolene synthase (SaBS), santalene synthase (SaSS) and farnesyl diphosphate synthase (SaFDS) using the transcriptome sequencing of S. album. Using Illumina next generation sequencing, 33.32 million high quality raw reads were generated, which were assembled into 84,094 unigenes with an average length of 494.17 bp. Based on the transcriptome sequencing, five sesquiterpene synthases SaFDS, SaSQS1, SaSQS2, SaBS and SaSS involved in the biosynthesis of FPP, sesquisabinene, β-bisabolene and santalenes, respectively, were cloned and functionally characterized. Novel sesquiterpene synthases (SaSQS1 and SaSQS2) were characterized as isoforms of sesquisabinene synthase with varying kinetic parameters and expression levels. Furthermore, the feasibility of microbial production of sesquisabinene from both the unigenes, SaSQS1 and SaSQS2 in non-optimized bacterial cell for the preparative scale production of sesquisabinene has been demonstrated. These results may pave the way for in vivo production of sandalwood sesquiterpenes in genetically tractable heterologous systems.

  11. StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

    PubMed

    Roosaare, Märt; Vaher, Mihkel; Kaplinski, Lauris; Möls, Märt; Andreson, Reidar; Lepamets, Maarja; Kõressaar, Triinu; Naaber, Paul; Kõljalg, Siiri; Remm, Maido

    2017-01-01

    Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. A tool named StrainSeeker was developed that constructs a list of specific k -mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k -mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.

  12. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    PubMed

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  13. De Novo Transcriptome Analysis of Allium cepa L. (Onion) Bulb to Identify Allergens and Epitopes.

    PubMed

    Rajkumar, Hemalatha; Ramagoni, Ramesh Kumar; Anchoju, Vijayendra Chary; Vankudavath, Raju Naik; Syed, Arshi Uz Zaman

    2015-01-01

    Allium cepa (onion) is a diploid plant with one of the largest nuclear genomes among all diploids. Onion is an example of an under-researched crop which has a complex heterozygous genome. There are no allergenic proteins and genomic data available for onions. This study was conducted to establish a transcriptome catalogue of onion bulb that will enable us to study onion related genes involved in medicinal use and allergies. Transcriptome dataset generated from onion bulb using the Illumina HiSeq 2000 technology showed a total of 99,074,309 high quality raw reads (~20 Gb). Based on sequence homology onion genes were categorized into 49 different functional groups. Most of the genes however, were classified under 'unknown' in all three gene ontology categories. Of the categorized genes, 61.2% showed metabolic functions followed by cellular components such as binding, cellular processes; catalytic activity and cell part. With BLASTx top hit analysis, a total of 2,511 homologous allergenic sequences were found, which had 37-100% similarity with 46 different types of allergens existing in the database. From the 46 contigs or allergens, 521 B-cell linear epitopes were identified using BepiPred linear epitope prediction tool. This is the first comprehensive insight into the transcriptome of onion bulb tissue using the NGS technology, which can be used to map IgE epitopes and prediction of structures and functions of various proteins.

  14. Novel transcriptome resources for three scleractinian coral species from the Indo-Pacific

    PubMed Central

    Kenkel, Carly D.; Bay, Line K

    2017-01-01

    Abstract Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92–102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460–72 405 contigs, representing 26 693–37 894 isogroups (∼genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1–94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98–91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology. PMID:28938722

  15. Novel transcriptome resources for three scleractinian coral species from the Indo-Pacific.

    PubMed

    Kenkel, Carly D; Bay, Line K

    2017-09-01

    Transcriptomic resources for coral species can provide insight into coral evolutionary history and stress-response physiology. Goniopora columna, Galaxea astreata, and Galaxea acrhelia are scleractinian corals of the Indo-Pacific, representing a diversity of morphologies and life-history traits. G. columna and G. astreata are common and cosmopolitan, while G. acrhelia is largely restricted to the coral triangle and Great Barrier Reef. Reference transcriptomes for these species were assembled from replicate colony fragments exposed to elevated (31°C) and ambient (27°C) temperatures. Trinity was used to create de novo assemblies for each species from 92-102 million raw Illumina Hiseq 2 × 150 bp reads. Host-specific assemblies contained 65 460-72 405 contigs, representing 26 693-37 894 isogroups (∼genes) with an average N50 of 2254. Gene name and/or gene ontology annotations were possible for 58% of isogroups on average. Transcriptomes contained 93.1-94.3% of EuKaryotic Orthologous Groups comprising the core eukaryotic gene set, and 89.98-91.92% of the single-copy metazoan core gene set orthologs were complete, indicating fairly comprehensive assemblies. This work expands the complement of transcriptomic resources available for scleractinian coral species, including the first reference for a representative of Goniopora spp. as well as species with novel morphology. © The Authors 2017. Published by Oxford University Press.

  16. Nanopore DNA Sequencing and Genome Assembly on the International Space Station.

    PubMed

    Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S

    2017-12-21

    We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.

  17. Strategies for genotype imputation in composite beef cattle.

    PubMed

    Chud, Tatiane C S; Ventura, Ricardo V; Schenkel, Flavio S; Carvalheiro, Roberto; Buzanskas, Marcos E; Rosa, Jaqueline O; Mudadu, Maurício de Alvarenga; da Silva, Marcos Vinicius G B; Mokry, Fabiana B; Marcondes, Cintia R; Regitano, Luciana C A; Munari, Danísio P

    2015-08-07

    Genotype imputation has been used to increase genomic information, allow more animals in genome-wide analyses, and reduce genotyping costs. In Brazilian beef cattle production, many animals are resulting from crossbreeding and such an event may alter linkage disequilibrium patterns. Thus, the challenge is to obtain accurately imputed genotypes in crossbred animals. The objective of this study was to evaluate the best fitting and most accurate imputation strategy on the MA genetic group (the progeny of a Charolais sire mated with crossbred Canchim X Zebu cows) and Canchim cattle. The data set contained 400 animals (born between 1999 and 2005) genotyped with the Illumina BovineHD panel. Imputation accuracy of genotypes from the Illumina-Bovine3K (3K), Illumina-BovineLD (6K), GeneSeek-Genomic-Profiler (GGP) BeefLD (GGP9K), GGP-IndicusLD (GGP20Ki), Illumina-BovineSNP50 (50K), GGP-IndicusHD (GGP75Ki), and GGP-BeefHD (GGP80K) to Illumina-BovineHD (HD) SNP panels were investigated. Seven scenarios for reference and target populations were tested; the animals were grouped according with birth year (S1), genetic groups (S2 and S3), genetic groups and birth year (S4 and S5), gender (S6), and gender and birth year (S7). Analyses were performed using FImpute and BEAGLE software and computation run-time was recorded. Genotype imputation accuracy was measured by concordance rate (CR) and allelic R square (R(2)). The highest imputation accuracy scenario consisted of a reference population with males and females and a target population with young females. Among the SNP panels in the tested scenarios, from the 50K, GGP75Ki and GGP80K were the most adequate to impute to HD in Canchim cattle. FImpute reduced computation run-time to impute genotypes from 20 to 100 times when compared to BEAGLE. The genotyping panels possessing at least 50 thousands markers are suitable for genotype imputation to HD with acceptable accuracy. The FImpute algorithm demonstrated a higher efficiency of imputed markers, especially in lower density panels. These considerations may assist to increase genotypic information, reduce genotyping costs, and aid in genomic selection evaluations in crossbred animals.

  18. De novo transcriptome assembly for a non-model species, the blood-sucking bug Triatoma brasiliensis, a vector of Chagas disease.

    PubMed

    Marchant, A; Mougel, F; Almeida, C; Jacquin-Joly, E; Costa, J; Harry, M

    2015-04-01

    High throughput sequencing (HTS) provides new research opportunities for work on non-model organisms, such as differential expression studies between populations exposed to different environmental conditions. However, such transcriptomic studies first require the production of a reference assembly. The choice of sampling procedure, sequencing strategy and assembly workflow is crucial. To develop a reliable reference transcriptome for Triatoma brasiliensis, the major Chagas disease vector in Northeastern Brazil, different de novo assembly protocols were generated using various datasets and software. Both 454 and Illumina sequencing technologies were applied on RNA extracted from antennae and mouthparts from single or pooled individuals. The 454 library yielded 278 Mb. Fifteen Illumina libraries were constructed and yielded nearly 360 million RNA-seq single reads and 46 million RNA-seq paired-end reads for nearly 45 Gb. For the 454 reads, we used three assemblers, Newbler, CAP3 and/or MIRA and for the Illumina reads, the Trinity assembler. Ten assembly workflows were compared using these programs separately or in combination. To compare the assemblies obtained, quantitative and qualitative criteria were used, including contig length, N50, contig number and the percentage of chimeric contigs. Completeness of the assemblies was estimated using the CEGMA pipeline. The best assembly (57,657 contigs, completeness of 80 %, <1 % chimeric contigs) was a hybrid assembly leading to recommend the use of (1) a single individual with large representation of biological tissues, (2) merging both long reads and short paired-end Illumina reads, (3) several assemblers in order to combine the specific advantages of each.

  19. Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample.

    PubMed

    Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T

    2012-01-01

    Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

  20. Bacterial diversity in typical Italian salami at different ripening stages as revealed by high-throughput sequencing of 16S rRNA amplicons.

    PubMed

    Połka, Justyna; Rebecchi, Annalisa; Pisacane, Vincenza; Morelli, Lorenzo; Puglisi, Edoardo

    2015-04-01

    The bacterial diversity involved in food fermentations is one of the most important factors shaping the final characteristics of traditional foods. Knowledge about this diversity can be greatly improved by the application of high-throughput sequencing technologies (HTS) coupled to the PCR amplification of the 16S rRNA subunit. Here we investigated the bacterial diversity in batches of Salame Piacentino PDO (Protected Designation of Origin), a dry fermented sausage that is typical of a regional area of Northern Italy. Salami samples from 6 different local factories were analysed at 0, 21, 49 and 63 days of ripening; raw meat at time 0 and casing samples at 21 days of ripening where also analysed, and the effect of starter addition was included in the experimental set-up. Culture-based microbiological analyses and PCR-DGGE were carried out in order to be compared with HTS results. A total of 722,196 high quality sequences were obtained after trimming, paired-reads assembly and quality screening of raw reads obtained by Illumina MiSeq sequencing of the two bacterial 16S hypervariable regions V3 and V4; manual curation of 16S database allowed a correct taxonomical classification at the species for 99.5% of these reads. Results confirmed the presence of main bacterial species involved in the fermentation of salami as assessed by PCR-DGGE, but with a greater extent of resolution and quantitative assessments that are not possible by the mere analyses of gel banding patterns. Thirty-two different Staphylococcus and 33 Lactobacillus species where identified in the salami from different producers, while the whole data set obtained accounted for 13 main families and 98 rare ones, 23 of which were present in at least 10% of the investigated samples, with casings being the major sources of the observed diversity. Multivariate analyses also showed that batches from 6 local producers tend to cluster altogether after 21 days of ripening, thus indicating that HTS has the potential for fine scale differentiation of local fermented foods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  1. Identification and Differential Abundance of Mitochondrial Genome Encoding Small RNAs (mitosRNA) in Breast Muscles of Modern Broilers and Unselected Chicken Breed

    PubMed Central

    Bottje, Walter G.; Khatri, Bhuwan; Shouse, Stephanie A.; Seo, Dongwon; Mallmann, Barbara; Orlowski, Sara K.; Pan, Jeonghoon; Kong, Seongbae; Owens, Casey M.; Anthony, Nicholas B.; Kim, Jae K.; Kong, Byungwhi C.

    2017-01-01

    Background: Although small non-coding RNAs are mostly encoded by the nuclear genome, thousands of small non-coding RNAs encoded by the mitochondrial genome, termed as mitosRNAs were recently reported in human, mouse and trout. In this study, we first identified chicken mitosRNAs in breast muscle using small RNA sequencing method and the differential abundance was analyzed between modern pedigree male (PeM) broilers (characterized by rapid growth and large muscle mass) and the foundational Barred Plymouth Rock (BPR) chickens (characterized by slow growth and small muscle mass). Methods: Small RNA sequencing was performed with total RNAs extracted from breast muscles of PeM and BPR (n = 6 per group) using the 1 × 50 bp single end read method of Illumina sequencing. Raw reads were processed by quality assessment, adapter trimming, and alignment to the chicken mitochondrial genome (GenBank Accession: X52392.1) using the NGen program. Further statistical analyses were performed using the JMP Genomics 8. Differentially expressed (DE) mitosRNAs between PeM and BPR were confirmed by quantitative PCR. Results: Totals of 183,416 unique small RNA sequences were identified as potential chicken mitosRNAs. After stringent filtering processes, 117 mitosRNAs showing >100 raw read counts were abundantly produced from all 37 mitochondrial genes (except D-loop region) and the length of mitosRNAs ranged from 22 to 46 nucleotides. Of those, abundance of 44 mitosRNAs were significantly altered in breast muscles of PeM compared to those of BPR: all mitosRNAs were higher in PeM breast except those produced from 16S-rRNA gene. Possibly, the higher mitosRNAs abundance in PeM breast may be due to a higher mitochondrial content compared to BPR. Our data demonstrate that in addition to 37 known mitochondrial genes, the mitochondrial genome also encodes abundant mitosRNAs, that may play an important regulatory role in muscle growth via mitochondrial gene expression control. PMID:29104541

  2. Highly diverse microbiota in dental root canals in cases of apical periodontitis (data of illumina sequencing).

    PubMed

    Vengerfeldt, Veiko; Špilka, Katerina; Saag, Mare; Preem, Jens-Konrad; Oopkaup, Kristjan; Truu, Jaak; Mändar, Reet

    2014-11-01

    Chronic apical periodontitis (CAP) is a frequent condition that has a considerable effect on a patient's quality of life. We aimed to reveal root canal microbial communities in antibiotic-naive patients by applying Illumina sequencing (Illumina Inc, San Diego, CA). Samples were collected under strict aseptic conditions from 12 teeth (5 with primary CAP, 3 with secondary CAP, and 4 with a periapical abscess [PA]) and characterized by profiling the microbial community on the basis of the V6 hypervariable region of the 16S ribosomal RNA gene by using Illumina HiSeq2000 sequencing combinatorial sequence-tagged polymerase chain reaction products. Root canal specimens displayed highly polymicrobial communities in all 3 patient groups. One sample contained 5-8 (mean = 6.5) phyla of bacteria. The most numerous were Firmicutes and Bacteroidetes, but Actinobacteria, Fusobacteria, Proteobacteria, Spirochaetes, Tenericutes, and Synergistetes were also present in most of the patients. One sample contained 30-70 different operational taxonomic units; the mean (± standard deviation) was lower in the primary CAP group (36 ± 4) than in the PA (45 ± 4) and secondary CAP (43 ± 13) groups (P < .05). The communities were individually different, but anaerobic bacteria predominated as the rule. Enterococcus faecalis was found only in patients with secondary CAP. One PA sample displayed a significantly high proportion (47%) of Proteobacteria, mainly at the expense of Janthinobacterium lividum. This study provided an in-depth characterization of the microbiota of periapical tissues, revealing highly polymicrobial communities and minor differences between the study groups. A full understanding of the etiology of periodontal disease will only be possible through further in-depth systems-level analyses of the host-microbiome interaction. Copyright © 2014 American Association of Endodontists. Published by Elsevier Inc. All rights reserved.

  3. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  4. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  5. nextPARS: parallel probing of RNA structures in Illumina

    PubMed Central

    Saus, Ester; Willis, Jesse R.; Pryszcz, Leszek P.; Hafez, Ahmed; Llorens, Carlos; Himmelbauer, Heinz

    2018-01-01

    RNA molecules play important roles in virtually every cellular process. These functions are often mediated through the adoption of specific structures that enable RNAs to interact with other molecules. Thus, determining the secondary structures of RNAs is central to understanding their function and evolution. In recent years several sequencing-based approaches have been developed that allow probing structural features of thousands of RNA molecules present in a sample. Here, we describe nextPARS, a novel Illumina-based implementation of in vitro parallel probing of RNA structures. Our approach achieves comparable accuracy to previous implementations, while enabling higher throughput and sample multiplexing. PMID:29358234

  6. Evaluation and optimisation of preparative semi-automated electrophoresis systems for Illumina library preparation.

    PubMed

    Quail, Michael A; Gu, Yong; Swerdlow, Harold; Mayho, Matthew

    2012-12-01

    Size selection can be a critical step in preparation of next-generation sequencing libraries. Traditional methods employing gel electrophoresis lack reproducibility, are labour intensive, do not scale well and employ hazardous interchelating dyes. In a high-throughput setting, solid-phase reversible immobilisation beads are commonly used for size-selection, but result in quite a broad fragment size range. We have evaluated and optimised the use of two semi-automated preparative DNA electrophoresis systems, the Caliper Labchip XT and the Sage Science Pippin Prep, for size selection of Illumina sequencing libraries. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  7. RELIC: a novel dye-bias correction method for Illumina Methylation BeadChip.

    PubMed

    Xu, Zongli; Langie, Sabine A S; De Boever, Patrick; Taylor, Jack A; Niu, Liang

    2017-01-03

    The Illumina Infinium HumanMethylation450 BeadChip and its successor, Infinium MethylationEPIC BeadChip, have been extensively utilized in epigenome-wide association studies. Both arrays use two fluorescent dyes (Cy3-green/Cy5-red) to measure methylation level at CpG sites. However, performance difference between dyes can result in biased estimates of methylation levels. Here we describe a novel method, called REgression on Logarithm of Internal Control probes (RELIC) to correct for dye bias on whole array by utilizing the intensity values of paired internal control probes that monitor the two color channels. We evaluate the method in several datasets against other widely used dye-bias correction methods. Results on data quality improvement showed that RELIC correction statistically significantly outperforms alternative dye-bias correction methods. We incorporated the method into the R package ENmix, which is freely available from the Bioconductor website ( https://www.bioconductor.org/packages/release/bioc/html/ENmix.html ). RELIC is an efficient and robust method to correct for dye-bias in Illumina Methylation BeadChip data. It outperforms other alternative methods and conveniently implemented in R package ENmix to facilitate DNA methylation studies.

  8. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    PubMed Central

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  9. RNA-Seq of the Caribbean reef-building coral Orbicella faveolata (Scleractinia-Merulinidae) under bleaching and disease stress expands models of coral innate immunity.

    PubMed

    Anderson, David A; Walz, Marcus E; Weil, Ernesto; Tonellato, Peter; Smith, Matthew C

    2016-01-01

    Climate change-driven coral disease outbreaks have led to widespread declines in coral populations. Early work on coral genomics established that corals have a complex innate immune system, and whole-transcriptome gene expression studies have revealed mechanisms by which the coral immune system responds to stress and disease. The present investigation expands bioinformatic data available to study coral molecular physiology through the assembly and annotation of a reference transcriptome of the Caribbean reef-building coral, Orbicella faveolata. Samples were collected during a warm water thermal anomaly, coral bleaching event and Caribbean yellow band disease outbreak in 2010 in Puerto Rico. Multiplex sequencing of RNA on the Illumina GAIIx platform and de novo transcriptome assembly by Trinity produced 70,745,177 raw short-sequence reads and 32,463 O. faveolata transcripts, respectively. The reference transcriptome was annotated with gene ontologies, mapped to KEGG pathways, and a predicted proteome of 20,488 sequences was generated. Protein families and signaling pathways that are essential in the regulation of innate immunity across Phyla were investigated in-depth. Results were used to develop models of evolutionarily conserved Wnt, Notch, Rig-like receptor, Nod-like receptor, and Dicer signaling. O. faveolata is a coral species that has been studied widely under climate-driven stress and disease, and the present investigation provides new data on the genes that putatively regulate its immune system.

  10. RNA-Seq of the Caribbean reef-building coral Orbicella faveolata (Scleractinia-Merulinidae) under bleaching and disease stress expands models of coral innate immunity

    PubMed Central

    Walz, Marcus E.; Weil, Ernesto; Smith, Matthew C.

    2016-01-01

    Climate change-driven coral disease outbreaks have led to widespread declines in coral populations. Early work on coral genomics established that corals have a complex innate immune system, and whole-transcriptome gene expression studies have revealed mechanisms by which the coral immune system responds to stress and disease. The present investigation expands bioinformatic data available to study coral molecular physiology through the assembly and annotation of a reference transcriptome of the Caribbean reef-building coral, Orbicella faveolata. Samples were collected during a warm water thermal anomaly, coral bleaching event and Caribbean yellow band disease outbreak in 2010 in Puerto Rico. Multiplex sequencing of RNA on the Illumina GAIIx platform and de novo transcriptome assembly by Trinity produced 70,745,177 raw short-sequence reads and 32,463 O. faveolata transcripts, respectively. The reference transcriptome was annotated with gene ontologies, mapped to KEGG pathways, and a predicted proteome of 20,488 sequences was generated. Protein families and signaling pathways that are essential in the regulation of innate immunity across Phyla were investigated in-depth. Results were used to develop models of evolutionarily conserved Wnt, Notch, Rig-like receptor, Nod-like receptor, and Dicer signaling. O. faveolata is a coral species that has been studied widely under climate-driven stress and disease, and the present investigation provides new data on the genes that putatively regulate its immune system. PMID:26925311

  11. Secure and robust cloud computing for high-throughput forensic microsatellite sequence analysis and databasing.

    PubMed

    Bailey, Sarah F; Scheible, Melissa K; Williams, Christopher; Silva, Deborah S B S; Hoggan, Marina; Eichman, Christopher; Faith, Seth A

    2017-11-01

    Next-generation Sequencing (NGS) is a rapidly evolving technology with demonstrated benefits for forensic genetic applications, and the strategies to analyze and manage the massive NGS datasets are currently in development. Here, the computing, data storage, connectivity, and security resources of the Cloud were evaluated as a model for forensic laboratory systems that produce NGS data. A complete front-to-end Cloud system was developed to upload, process, and interpret raw NGS data using a web browser dashboard. The system was extensible, demonstrating analysis capabilities of autosomal and Y-STRs from a variety of NGS instrumentation (Illumina MiniSeq and MiSeq, and Oxford Nanopore MinION). NGS data for STRs were concordant with standard reference materials previously characterized with capillary electrophoresis and Sanger sequencing. The computing power of the Cloud was implemented with on-demand auto-scaling to allow multiple file analysis in tandem. The system was designed to store resulting data in a relational database, amenable to downstream sample interpretations and databasing applications following the most recent guidelines in nomenclature for sequenced alleles. Lastly, a multi-layered Cloud security architecture was tested and showed that industry standards for securing data and computing resources were readily applied to the NGS system without disadvantageous effects for bioinformatic analysis, connectivity or data storage/retrieval. The results of this study demonstrate the feasibility of using Cloud-based systems for secured NGS data analysis, storage, databasing, and multi-user distributed connectivity. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Next-generation transcriptome sequencing, SNP discovery and validation in four market classes of peanut, Arachis hypogaea L.

    PubMed

    Chopra, Ratan; Burow, Gloria; Farmer, Andrew; Mudge, Joann; Simpson, Charles E; Wilkins, Thea A; Baring, Michael R; Puppala, Naveen; Chamberlin, Kelly D; Burow, Mark D

    2015-06-01

    Single-nucleotide polymorphisms, which can be identified in the thousands or millions from comparisons of transcriptome or genome sequences, are ideally suited for making high-resolution genetic maps, investigating population evolutionary history, and discovering marker-trait linkages. Despite significant results from their use in human genetics, progress in identification and use in plants, and particularly polyploid plants, has lagged. As part of a long-term project to identify and use SNPs suitable for these purposes in cultivated peanut, which is tetraploid, we generated transcriptome sequences of four peanut cultivars, namely OLin, New Mexico Valencia C, Tamrun OL07 and Jupiter, which represent the four major market classes of peanut grown in the world, and which are important economically to the US southwest peanut growing region. CopyDNA libraries of each genotype were used to generate 2 × 54 paired-end reads using an Illumina GAIIx sequencer. Raw reads were mapped to a custom reference consisting of Tifrunner 454 sequences plus peanut ESTs in GenBank, compromising 43,108 contigs; 263,840 SNP and indel variants were identified among four genotypes compared to the reference. A subset of 6 variants was assayed across 24 genotypes representing four market types using KASP chemistry to assess the criteria for SNP selection. Results demonstrated that transcriptome sequencing can identify SNPs usable as selectable DNA-based markers in complex polyploid species such as peanut. Criteria for effective use of SNPs as markers are discussed in this context.

  13. The same ELA class II risk factors confer equine insect bite hypersensitivity in two distinct populations.

    PubMed

    Andersson, Lisa S; Swinburne, June E; Meadows, Jennifer R S; Broström, Hans; Eriksson, Susanne; Fikse, W Freddy; Frey, Rebecka; Sundquist, Marie; Tseng, Chia T; Mikko, Sofia; Lindgren, Gabriella

    2012-03-01

    Insect bite hypersensitivity (IBH) is a chronic allergic dermatitis common in horses. Affected horses mainly react against antigens present in the saliva from the biting midges, Culicoides ssp, and occasionally black flies, Simulium ssp. Because of this insect dependency, the disease is clearly seasonal and prevalence varies between geographical locations. For two distinct horse breeds, we genotyped four microsatellite markers positioned within the MHC class II region and sequenced the highly polymorphic exons two from DRA and DRB3, respectively. Initially, 94 IBH-affected and 93 unaffected Swedish born Icelandic horses were tested for genetic association. These horses had previously been genotyped on the Illumina Equine SNP50 BeadChip, which made it possible to ensure that our study did not suffer from the effects of stratification. The second population consisted of 106 unaffected and 80 IBH-affected Exmoor ponies. We show that variants in the MHC class II region are associated with disease susceptibility (p (raw) = 2.34 × 10(-5)), with the same allele (COR112:274) associated in two separate populations. In addition, we combined microsatellite and sequencing data in order to investigate the pattern of homozygosity and show that homozygosity across the entire MHC class II region is associated with a higher risk of developing IBH (p = 0.0013). To our knowledge this is the first time in any atopic dermatitis suffering species, including man, where the same risk allele has been identified in two distinct populations.

  14. Next-generation sequencing showing potential leachate influence on bacterial communities around a landfill in China.

    PubMed

    Rajasekar, Adharsh; Sekar, Raju; Medina-Roldán, Eduardo; Bridge, Jonathan; Moy, Charles K S; Wilkinson, Stephen

    2018-04-10

    The impact of contaminated leachate on groundwater from landfills is well known, but the specific effects on bacterial consortia are less well-studied. Bacterial communities in a landfill and an urban site located in Suzhou, China, were studied using Illumina high-throughput sequencing. A total of 153 944 good-quality reads were produced and sequences assigned to 6388 operational taxonomic units. Bacterial consortia consisted of up to 16 phyla, including Proteobacteria (31.9%-94.9% at landfill, 25.1%-43.3% at urban sites), Actinobacteria (0%-28.7% at landfill, 9.9%-34.3% at urban sites), Bacteroidetes (1.4%-25.6% at landfill, 5.6%-7.8% at urban sites), Chloroflexi (0.4%-26.5% at urban sites only), and unclassified bacteria. Pseudomonas was the dominant (67%-93%) genus in landfill leachate. Arsenic concentrations in landfill raw leachate (RL) (1.11 × 10 3 μg/L) and fresh leachate (FL2) (1.78 × 10 3 μg/L) and mercury concentrations in RL (10.9 μg/L) and FL2 (7.37 μg/L) exceeded Chinese State Environmental Protection Administration standards for leachate in landfills. The Shannon diversity index and Chao1 richness estimate showed RL and FL2 lacked richness and diversity when compared with other samples. This is consistent with stresses imposed by elevated arsenic and mercury and has implications for ecological site remediation by bioremediation or natural attenuation.

  15. De Novo Transcriptome Analysis of Allium cepa L. (Onion) Bulb to Identify Allergens and Epitopes

    PubMed Central

    Rajkumar, Hemalatha; Ramagoni, Ramesh Kumar; Anchoju, Vijayendra Chary; Vankudavath, Raju Naik; Syed, Arshi Uz Zaman

    2015-01-01

    Allium cepa (onion) is a diploid plant with one of the largest nuclear genomes among all diploids. Onion is an example of an under-researched crop which has a complex heterozygous genome. There are no allergenic proteins and genomic data available for onions. This study was conducted to establish a transcriptome catalogue of onion bulb that will enable us to study onion related genes involved in medicinal use and allergies. Transcriptome dataset generated from onion bulb using the Illumina HiSeq 2000 technology showed a total of 99,074,309 high quality raw reads (~20 Gb). Based on sequence homology onion genes were categorized into 49 different functional groups. Most of the genes however, were classified under 'unknown' in all three gene ontology categories. Of the categorized genes, 61.2% showed metabolic functions followed by cellular components such as binding, cellular processes; catalytic activity and cell part. With BLASTx top hit analysis, a total of 2,511 homologous allergenic sequences were found, which had 37–100% similarity with 46 different types of allergens existing in the database. From the 46 contigs or allergens, 521 B-cell linear epitopes were identified using BepiPred linear epitope prediction tool. This is the first comprehensive insight into the transcriptome of onion bulb tissue using the NGS technology, which can be used to map IgE epitopes and prediction of structures and functions of various proteins. PMID:26284934

  16. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics.

    PubMed

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.

  17. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics

    PubMed Central

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129

  18. Draft genome of the gayal, Bos frontalis

    PubMed Central

    Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping

    2017-01-01

    Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483

  19. Genome sequence of the olive tree, Olea europaea.

    PubMed

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  20. Illumina Production Sequencing at the DOE Joint Genome Institute - Workflow and Optimizations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tarver, Angela; Fern, Alison; Diego, Matthew San

    2010-06-18

    The U.S. Department of Energy (DOE) Joint Genome Institute?s (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the DOE mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI?s Production Sequencing group, the Illumina Genome Analyzer pipeline has been established as one of three sequencing platforms, along with Roche/454 and ABI/Sanger. Optimization of the Illumina pipeline has been ongoing with the aim of continual process improvement of the laboratory workflow. These process improvement projects are being led by the JGI?s Process Optimization, Sequencing Technologies, Instrumentation&more » Engineering, and the New Technology Production groups. Primary focus has been on improving the procedural ergonomics and the technicians? operating environment, reducing manually intensive technician operations with different tools, reducing associated production costs, and improving the overall process and generated sequence quality. The U.S. DOE JGI was established in 1997 in Walnut Creek, CA, to unite the expertise and resources of five national laboratories? Lawrence Berkeley, Lawrence Livermore, Los Alamos, Oak Ridge, and Pacific Northwest ? along with HudsonAlpha Institute for Biotechnology. JGI is operated by the University of California for the U.S. DOE.« less

  1. Exon-Specific QTLs Skew the Inferred Distribution of Expression QTLs Detected Using Gene Expression Array Data

    PubMed Central

    Veyrieras, Jean-Baptiste; Gaffney, Daniel J.; Pickrell, Joseph K.; Gilad, Yoav; Stephens, Matthew; Pritchard, Jonathan K.

    2012-01-01

    Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3′ untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation. PMID:22359548

  2. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    PubMed

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects.

  3. Rapid Sequencing of the Bamboo Mitochondrial Genome Using Illumina Technology and Parallel Episodic Evolution of Organelle Genomes in Grasses

    PubMed Central

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Background Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. Methodology/Principal Findings We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Conclusions/Significance Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects. PMID:22272330

  4. Genome-Wide SNP Detection, Validation, and Development of an 8K SNP Array for Apple

    PubMed Central

    Chagné, David; Crowhurst, Ross N.; Troggio, Michela; Davey, Mark W.; Gilmore, Barbara; Lawley, Cindy; Vanderzande, Stijn; Hellens, Roger P.; Kumar, Satish; Cestaro, Alessandro; Velasco, Riccardo; Main, Dorrie; Rees, Jasper D.; Iezzoni, Amy; Mockler, Todd; Wilhelm, Larry; Van de Weg, Eric; Gardiner, Susan E.; Bassil, Nahla; Peace, Cameron

    2012-01-01

    As high-throughput genetic marker screening systems are essential for a range of genetics studies and plant breeding applications, the International RosBREED SNP Consortium (IRSC) has utilized the Illumina Infinium® II system to develop a medium- to high-throughput SNP screening tool for genome-wide evaluation of allelic variation in apple (Malus×domestica) breeding germplasm. For genome-wide SNP discovery, 27 apple cultivars were chosen to represent worldwide breeding germplasm and re-sequenced at low coverage with the Illumina Genome Analyzer II. Following alignment of these sequences to the whole genome sequence of ‘Golden Delicious’, SNPs were identified using SoapSNP. A total of 2,113,120 SNPs were detected, corresponding to one SNP to every 288 bp of the genome. The Illumina GoldenGate® assay was then used to validate a subset of 144 SNPs with a range of characteristics, using a set of 160 apple accessions. This validation assay enabled fine-tuning of the final subset of SNPs for the Illumina Infinium® II system. The set of stringent filtering criteria developed allowed choice of a set of SNPs that not only exhibited an even distribution across the apple genome and a range of minor allele frequencies to ensure utility across germplasm, but also were located in putative exonic regions to maximize genotyping success rate. A total of 7867 apple SNPs was established for the IRSC apple 8K SNP array v1, of which 5554 were polymorphic after evaluation in segregating families and a germplasm collection. This publicly available genomics resource will provide an unprecedented resolution of SNP haplotypes, which will enable marker-locus-trait association discovery, description of the genetic architecture of quantitative traits, investigation of genetic variation (neutral and functional), and genomic selection in apple. PMID:22363718

  5. Development of High Throughput Process for Constructing 454 Titanium and Illumina Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Deshpande, Shweta; Hack, Christopher; Tang, Eric

    2010-05-28

    We have developed two processes with the Biomek FX robot to construct 454 titanium and Illumina libraries in order to meet the increasing library demands. All modifications in the library construction steps were made to enable the adaptation of the entire processes to work with the 96-well plate format. The key modifications include the shearing of DNA with Covaris E210 and the enzymatic reaction cleaning and fragment size selection with SPRI beads and magnetic plate holders. The construction of 96 Titanium libraries takes about 8 hours from sheared DNA to ssDNA recovery. The processing of 96 Illumina libraries takes lessmore » time than that of the Titanium library process. Although both processes still require manual transfer of plates from robot to other work stations such as thermocyclers, these robotic processes represent about 12- to 24-folds increase of library capacity comparing to the manual processes. To enable the sequencing of many libraries in parallel, we have also developed sets of molecular barcodes for both library types. The requirements for the 454 library barcodes include 10 bases, 40-60percent GC, no consecutive same base, and no less than 3 bases difference between barcodes. We have used 96 of the resulted 270 barcodes to construct libraries and pool to test the ability of accurately assigning reads to the right samples. When allowing 1 base error occurred in the 10 base barcodes, we could assign 99.6percent of the total reads and 100percent of them were uniquely assigned. As for the Illumina barcodes, the requirements include 4 bases, balanced GC, and at least 2 bases difference between barcodes. We have begun to assess the ability to assign reads after pooling different number of libraries. We will discuss the progress and the challenges of these scale-up processes.« less

  6. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.

    PubMed

    Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D

    2013-02-28

    Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.

  7. De novo genome assembly and annotation of Australia's largest freshwater fish, the Murray cod (Maccullochella peelii), from Illumina and Nanopore sequencing read.

    PubMed

    Austin, Christopher M; Tan, Mun Hua; Harrisson, Katherine A; Lee, Yin Peng; Croft, Laurence J; Sunnucks, Paul; Pavlova, Alexandra; Gan, Han Ming

    2017-08-01

    One of the most iconic Australian fish is the Murray cod, Maccullochella peelii (Mitchell 1838), a freshwater species that can grow to ∼1.8 metres in length and live to age ≥48 years. The Murray cod is of a conservation concern as a result of strong population contractions, but it is also popular for recreational fishing and is of growing aquaculture interest. In this study, we report the whole genome sequence of the Murray cod to support ongoing population genetics, conservation, and management research, as well as to better understand the evolutionary ecology and history of the species. A draft Murray cod genome of 633 Mbp (N50 = 109 974bp; BUSCO and CEGMA completeness of 94.2% and 91.9%, respectively) with an estimated 148 Mbp of putative repetitive sequences was assembled from the combined sequencing data of 2 fish individuals with an identical maternal lineage; 47.2 Gb of Illumina HiSeq data and 804 Mb of Nanopore data were generated from the first individual while 23.2 Gb of Illumina MiSeq data were generated from the second individual. The inclusion of Nanopore reads for scaffolding followed by subsequent gap-closing using Illumina data led to a 29% reduction in the number of scaffolds and a 55% and 54% increase in the scaffold and contig N50, respectively. We also report the first transcriptome of Murray cod that was subsequently used to annotate the Murray cod genome, leading to the identification of 26 539 protein-coding genes. We present the whole genome of the Murray cod and anticipate this will be a catalyst for a range of genetic, genomic, and phylogenetic studies of the Murray cod and more generally other fish species of the Percichthydae family. © The Authors 2017. Published by Oxford University Press.

  8. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly

    PubMed Central

    Austin, Christopher M; Hammer, Michael P; Lee, Yin Peng; Gan, Han Ming

    2018-01-01

    Abstract Background Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish (Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. Results We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. Conclusions We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae. PMID:29342277

  9. A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation

    PubMed Central

    2013-01-01

    Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change. PMID:23445355

  10. Finding Nemo: hybrid assembly with Oxford Nanopore and Illumina reads greatly improves the clownfish (Amphiprion ocellaris) genome assembly.

    PubMed

    Tan, Mun Hua; Austin, Christopher M; Hammer, Michael P; Lee, Yin Peng; Croft, Laurence J; Gan, Han Ming

    2018-03-01

    Some of the most widely recognized coral reef fishes are clownfish or anemonefish, members of the family Pomacentridae (subfamily: Amphiprioninae). They are popular aquarium species due to their bright colours, adaptability to captivity, and fascinating behavior. Their breeding biology (sequential hermaphrodites) and symbiotic mutualism with sea anemones have attracted much scientific interest. Moreover, there are some curious geographic-based phenotypes that warrant investigation. Leveraging on the advancement in Nanopore long read technology, we report the first hybrid assembly of the clown anemonefish (Amphiprion ocellaris) genome utilizing Illumina and Nanopore reads, further demonstrating the substantial impact of modest long read sequencing data sets on improving genome assembly statistics. We generated 43 Gb of short Illumina reads and 9 Gb of long Nanopore reads, representing approximate genome coverage of 54× and 11×, respectively, based on the range of estimated k-mer-predicted genome sizes of between 791 and 967 Mbp. The final assembled genome is contained in 6404 scaffolds with an accumulated length of 880 Mb (96.3% BUSCO-calculated genome completeness). Compared with the Illumina-only assembly, the hybrid approach generated 94% fewer scaffolds with an 18-fold increase in N50 length (401 kb) and increased the genome completeness by an additional 16%. A total of 27 240 high-quality protein-coding genes were predicted from the clown anemonefish, 26 211 (96%) of which were annotated functionally with information from either sequence homology or protein signature searches. We present the first genome of any anemonefish and demonstrate the value of low coverage (∼11×) long Nanopore read sequencing in improving both genome assembly contiguity and completeness. The near-complete assembly of the A. ocellaris genome will be an invaluable molecular resource for supporting a range of genetic, genomic, and phylogenetic studies specifically for clownfish and more generally for other related fish species of the family Pomacentridae.

  11. Isolation and characterization of microsatellite markers for Jasminum sambac (Oleaceae) using Illumina shotgun sequencing.

    PubMed

    Li, Yong; Zhang, Weirui

    2015-10-01

    Microsatellite markers of Jasminum sambac (Oleaceae) were isolated to investigate wild germplasm resources and provide markers for breeding. Illumina sequencing was used to isolate microsatellite markers from the transcriptome of J. sambac. A total of 1322 microsatellites were identified from 49,772 assembled unigenes. One hundred primer pairs were randomly selected to verify primer amplification efficiency. Out of these tested primer pairs, 31 were successfully amplified: 18 primer pairs yielded a single allele, seven exhibited fixed heterozygosity with two alleles, and only six displayed polymorphisms. This study obtained the first set of microsatellite markers for J. sambac, which will be helpful for the assessment of wild germplasm resources and the development of molecular marker-assisted breeding.

  12. Acclimatization of a mixed-animal manure inoculum to the anaerobic digestion of Axonopus compressus reveals the putative importance of Mesotoga infera and Methanosaeta concilii as elucidated by DGGE and Illumina MiSeq.

    PubMed

    Lee, Jonathan T E; He, Jianzhong; Tong, Yen Wah

    2017-12-01

    In this study, a multifarious microbial mix from different sources is acclimatized over a period of three months to digesting cowgrass, and the changes in the community structure are examined with both a traditional denaturing gradient gel electrophoresis method as well as a next generation sequencing MiSeq method. It is shown that the much more in depth analysis by Illumina gives more information about the relative abundance and thus putative importance of the role of various microbes, in particular the bacterium Mesotoga infera and the archaeon Methanosaeta concilii. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. CIDR

    Science.gov Websites

    Consortium Developed Arrays Infinium Human Drug Core Array The Illumina nfinium DrugDev Consortium array drug target discovery, validation and treatment response. Detailed Information on Array Infinium Human

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Athavale, Ajay

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  15. Population and performance analyses of four major populations with Illumina's FGx Forensic Genomics System.

    PubMed

    Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Seah, Lay Hong; Budowle, Bruce

    2017-09-01

    The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

    PubMed

    Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2015-03-01

    Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  17. Fungal communities from the calcareous deep-sea sediments in the Southwest India Ridge revealed by Illumina sequencing technology.

    PubMed

    Zhang, Likui; Kang, Manyu; Huang, Yangchao; Yang, Lixiang

    2016-05-01

    The diversity and ecological significance of bacteria and archaea in deep-sea environments have been thoroughly investigated, but eukaryotic microorganisms in these areas, such as fungi, are poorly understood. To elucidate fungal diversity in calcareous deep-sea sediments in the Southwest India Ridge (SWIR), the internal transcribed spacer (ITS) regions of rRNA genes from two sediment metagenomic DNA samples were amplified and sequenced using the Illumina sequencing platform. The results revealed that 58-63 % and 36-42 % of the ITS sequences (97 % similarity) belonged to Basidiomycota and Ascomycota, respectively. These findings suggest that Basidiomycota and Ascomycota are the predominant fungal phyla in the two samples. We also found that Agaricomycetes, Leotiomycetes, and Pezizomycetes were the major fungal classes in the two samples. At the species level, Thelephoraceae sp. and Phialocephala fortinii were major fungal species in the two samples. Despite the low relative abundance, unidentified fungal sequences were also observed in the two samples. Furthermore, we found that there were slight differences in fungal diversity between the two sediment samples, although both were collected from the SWIR. Thus, our results demonstrate that calcareous deep-sea sediments in the SWIR harbor diverse fungi, which augment the fungal groups in deep-sea sediments. This is the first report of fungal communities in calcareous deep-sea sediments in the SWIR revealed by Illumina sequencing.

  18. A comprehensive insight into bacterial virulence in drinking water using 454 pyrosequencing and Illumina high-throughput sequencing.

    PubMed

    Huang, Kailong; Zhang, Xu-Xiang; Shi, Peng; Wu, Bing; Ren, Hongqiang

    2014-11-01

    In order to comprehensively investigate bacterial virulence in drinking water, 454 pyrosequencing and Illumina high-throughput sequencing were used to detect potential pathogenic bacteria and virulence factors (VFs) in a full-scale drinking water treatment and distribution system. 16S rRNA gene pyrosequencing revealed high bacterial diversity in the drinking water (441-586 operational taxonomic units). Bacterial diversity decreased after chlorine disinfection, but increased after pipeline distribution. α-Proteobacteria was the most dominant taxonomic class. Alignment against the established pathogen database showed that several types of putative pathogens were present in the drinking water and Pseudomonas aeruginosa had the highest abundance (over 11‰ of total sequencing reads). Many pathogens disappeared after chlorine disinfection, but P. aeruginosa and Leptospira interrogans were still detected in the tap water. High-throughput sequencing revealed prevalence of various pathogenicity islands and virulence proteins in the drinking water, and translocases, transposons, Clp proteases and flagellar motor switch proteins were the predominant VFs. Both diversity and abundance of the detectable VFs increased after the chlorination, and decreased after the pipeline distribution. This study indicates that joint use of 454 pyrosequencing and Illumina sequencing can comprehensively characterize environmental pathogenesis, and several types of putative pathogens and various VFs are prevalent in drinking water. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Barcoded NS31/AML2 primers for sequencing of arbuscular mycorrhizal communities in environmental samples1

    PubMed Central

    Morgan, Benjamin S. T.; Egerton-Warburton, Louise M.

    2017-01-01

    Premise of the study: Arbuscular mycorrhizal fungi (AMF) are globally important root symbioses that enhance plant growth and nutrition and influence ecosystem structure and function. To better characterize levels of AMF diversity relevant to ecosystem function, deeper sequencing depth in environmental samples is needed. In this study, Illumina barcoded primers and a bioinformatics pipeline were developed and applied to study AMF diversity and community structure in environmental samples. Methods: Libraries of small subunit ribosomal RNA fragment amplicons were amplified from environmental DNA using a single-step PCR reaction with barcoded NS31/AML2 primers. Amplicons were sequenced on an Illumina MiSeq sequencer using version 2, 2 × 250-bp paired-end chemistry, and analyzed using QIIME and RDP Classifier. Results: Sequencing captured 196 to 6416 operational taxonomic units (OTUs; depending on clustering parameters) representing nine AMF genera. Regardless of clustering parameters, ∼20 OTUs dominated AMF communities (78–87% reads) with the remaining reads distributed among other OTUs. Analyses also showed significant biogeographic differences in AMF communities and that community composition could be linked to specific edaphic factors. Discussion: Barcoded NS31/AML2 primers and Illumina MiSeq sequencing provide a powerful approach to address AMF diversity and variations in fungal assemblages across host plants, ecosystems, and responses to environmental drivers including global change. PMID:28924511

  20. CpG island methylation profile in non-invasive oral rinse samples is predictive of oral and pharyngeal carcinoma.

    PubMed

    Langevin, Scott M; Eliot, Melissa; Butler, Rondi A; Cheong, Agnes; Zhang, Xiang; McClean, Michael D; Koestler, Devin C; Kelsey, Karl T

    2015-01-01

    There are currently no screening tests in routine use for oral and pharyngeal cancer beyond visual inspection and palpation, which are provided on an opportunistic basis, indicating a need for development of novel methods for early detection, particularly in high-risk populations. We sought to address this need through comprehensive interrogation of CpG island methylation in oral rinse samples. We used the Infinium HumanMethylation450 BeadArray to interrogate DNA methylation in oral rinse samples collected from 154 patients with incident oral or pharyngeal carcinoma prior to treatment and 72 cancer-free control subjects. Subjects were randomly allocated to either a training or a testing set. For each subject, average methylation was calculated for each CpG island represented on the array. We applied a semi-supervised recursively partitioned mixture model to the CpG island methylation data to identify a classifier for prediction of case status in the training set. We then applied the resultant classifier to the testing set for validation and to assess the predictive accuracy. We identified a methylation classifier comprised of 22 CpG islands, which predicted oral and pharyngeal carcinoma with a high degree of accuracy (AUC = 0.92, 95 % CI 0.86, 0.98). This novel methylation panel is a strong predictor of oral and pharyngeal carcinoma case status in oral rinse samples and may have utility in early detection and post-treatment follow-up.

  1. Comparative Transcriptome Analysis of Latex Reveals Molecular Mechanisms Underlying Increased Rubber Yield in Hevea brasiliensis Self-Rooting Juvenile Clones

    PubMed Central

    Li, Hui-Liang; Guo, Dong; Zhu, Jia-Hong; Wang, Ying; Chen, Xiong-Ting; Peng, Shi-Qing

    2016-01-01

    Rubber tree (Hevea brasiliensis) self-rooting juvenile clones (JCs) are promising planting materials for rubber production. In a comparative trial between self-rooting JCs and donor clones (DCs), self-rooting JCs exhibited better performance in rubber yield. To study the molecular mechanism associated with higher rubber yield in self-rooting JCs, we sequenced and comparatively analyzed the latex of rubber tree self-rooting JCs and DCs at the transcriptome level. Total raw reads of 34,632,012 and 35,913,020 bp were obtained from the library of self-rooting JCs and DCs, respectively, by using Illumina HiSeq 2000 sequencing technology. De novo assemblies yielded 54689 unigenes from the library of self-rooting JCs and DCs. Among 54689 genes, 1716 genes were identified as differentially expressed between self-rooting JCs and DCs via comparative transcript profiling. Functional analysis showed that the genes related to the mass of categories were differentially enriched between the two clones. Several genes involved in carbohydrate metabolism, hormone metabolism and reactive oxygen species scavenging were up-regulated in self-rooting JCs, suggesting that the self-rooting JCs provide sufficient molecular basis for the increased rubber yielding, especially in the aspects of improved latex metabolisms and latex flow. Some genes encoding epigenetic modification enzymes were also differentially expressed between self-rooting JCs and DCs. Epigenetic modifications may lead to gene differential expression between self-rooting JCs and DCs. These data will provide new cues to understand the molecular mechanism underlying the improved rubber yield of H. brasiliensis self-rooting clones. PMID:27555864

  2. Comparative Transcriptome Analysis of Latex Reveals Molecular Mechanisms Underlying Increased Rubber Yield in Hevea brasiliensis Self-Rooting Juvenile Clones.

    PubMed

    Li, Hui-Liang; Guo, Dong; Zhu, Jia-Hong; Wang, Ying; Chen, Xiong-Ting; Peng, Shi-Qing

    2016-01-01

    Rubber tree (Hevea brasiliensis) self-rooting juvenile clones (JCs) are promising planting materials for rubber production. In a comparative trial between self-rooting JCs and donor clones (DCs), self-rooting JCs exhibited better performance in rubber yield. To study the molecular mechanism associated with higher rubber yield in self-rooting JCs, we sequenced and comparatively analyzed the latex of rubber tree self-rooting JCs and DCs at the transcriptome level. Total raw reads of 34,632,012 and 35,913,020 bp were obtained from the library of self-rooting JCs and DCs, respectively, by using Illumina HiSeq 2000 sequencing technology. De novo assemblies yielded 54689 unigenes from the library of self-rooting JCs and DCs. Among 54689 genes, 1716 genes were identified as differentially expressed between self-rooting JCs and DCs via comparative transcript profiling. Functional analysis showed that the genes related to the mass of categories were differentially enriched between the two clones. Several genes involved in carbohydrate metabolism, hormone metabolism and reactive oxygen species scavenging were up-regulated in self-rooting JCs, suggesting that the self-rooting JCs provide sufficient molecular basis for the increased rubber yielding, especially in the aspects of improved latex metabolisms and latex flow. Some genes encoding epigenetic modification enzymes were also differentially expressed between self-rooting JCs and DCs. Epigenetic modifications may lead to gene differential expression between self-rooting JCs and DCs. These data will provide new cues to understand the molecular mechanism underlying the improved rubber yield of H. brasiliensis self-rooting clones.

  3. Is Drosophila-microbe association species-specific or region specific? A study undertaken involving six Indian Drosophila species.

    PubMed

    Singhal, Kopal; Khanna, Radhika; Mohanty, Sujata

    2017-06-01

    The present work aims to identify the microbial diversity associated with six Indian Drosophila species using next generation sequencing (NGS) technology and to discover the nature of their distribution across species and eco-geographic regions. Whole fly gDNA of six Drosophila species were used to generate sequences in an Illumina platform using NGS technology. De novo based assembled raw reads were blasted against the NR database of NCBI using BLASTn for identification of their bacterial loads. We have tried to include Drosophila species from different taxonomical groups and subgroups and from three different eco-climatic regions India; four species belong to Central India, while the rest two, D. melanogaster and D. ananassae, belong to West and South India to determine both their species-wise and region-wide distribution. We detected the presence of 33 bacterial genera across all six study species, predominated by the class Proteobacteria. Amongst all, D. melanogaster was found to be the most diverse by carrying around 85% of the bacterial diversity. Our findings infer both species-specific and environment-specific nature of the bacterial species inhabiting the Drosophila host. Though the present results are consistent with most of the earlier studies, they also remain incoherent with some. The present study outcome on the host-bacteria association and their species specific adaptation may provide some insight to understand the host-microbial interactions and the phenotypic implications of microbes on the host physiology. The knowledge gained may be importantly applied into the recent insect and pest population control strategy going to implement through gut microflora in India and abroad.

  4. A systematic assessment of normalization approaches for the Infinium 450K methylation platform.

    PubMed

    Wu, Michael C; Joubert, Bonnie R; Kuan, Pei-fen; Håberg, Siri E; Nystad, Wenche; Peddada, Shyamal D; London, Stephanie J

    2014-02-01

    The Illumina Infinium HumanMethylation450 BeadChip has emerged as one of the most popular platforms for genome wide profiling of DNA methylation. While the technology is wide-spread, systematic technical biases are believed to be present in the data. For example, this array incorporates two different chemical assays, i.e., Type I and Type II probes, which exhibit different technical characteristics and potentially complicate the computational and statistical analysis. Several normalization methods have been introduced recently to adjust for possible biases. However, there is considerable debate within the field on which normalization procedure should be used and indeed whether normalization is even necessary. Yet despite the importance of the question, there has been little comprehensive comparison of normalization methods. We sought to systematically compare several popular normalization approaches using the Norwegian Mother and Child Cohort Study (MoBa) methylation data set and the technical replicates analyzed with it as a case study. We assessed both the reproducibility between technical replicates following normalization and the effect of normalization on association analysis. Results indicate that the raw data are already highly reproducible, some normalization approaches can slightly improve reproducibility, but other normalization approaches may introduce more variability into the data. Results also suggest that differences in association analysis after applying different normalizations are not large when the signal is strong, but when the signal is more modest, different normalizations can yield very different numbers of findings that meet a weaker statistical significance threshold. Overall, our work provides useful, objective assessment of the effectiveness of key normalization methods.

  5. Reference-guided assembly of four diverse Arabidopsis thaliana genomes

    PubMed Central

    Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef

    2011-01-01

    We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520

  6. Whole-Body Microbiota of Sea Cucumber (Apostichopus japonicus) from South Korea for Improved Seafood Management.

    PubMed

    Kim, Tae-Yoon; Lee, Jin-Jae; Kim, Bong-Soo; Choi, Sang Ho

    2017-10-28

    Sea cucumber ( Apostichopus japonicus ) is a popular seafood source in Asia, including South Korea, and its consumption has recently increased with recognition of its medicinal properties. However, because raw sea cucumber contains various microbes, its ingestion can cause foodborne illness. Therefore, analysis of the microbiota in the whole body of sea cucumber can extend our understanding of foodborne illness caused by microorganisms and help to better manage products. We collected 40 sea cucumbers from four different sites in August and November, which are known as the maximum production areas in Korea. The microbiota was analyzed by an Illumina MiSeq system, and bacterial amounts were quantified by real-time PCR. The diversity and bacterial amounts in sea cucumber were higher in August than in November. Alpha-, Beta-, and Gammaproteobacteria were common dominant classes in all samples. However, the microbiota composition differed according to sampling time and site. Staphylococcus warneri and Propionibacterium acnes were commonly detected potential pathogens in August and November samples, respectively. The effect of experimental Vibrio parahaemolyticus infection on the indigenous microbiota of sea cucumber was analyzed at different temperatures, revealing clear alterations of Psychrobacter and Moraxella ; thus, these shifts can be used as indicators for monitoring infection of sea cucumber. Although further studies are needed to clarify and understand the virulence and mechanisms of the identified pathogens of sea cucumber, our study provides a valuable reference for determining the potential of foodborne illness caused by sea cucumber ingestion and to develop monitoring strategies of products using microbiota information.

  7. Draft genome of the leopard gecko, Eublepharis macularius.

    PubMed

    Xiong, Zijun; Li, Fang; Li, Qiye; Zhou, Long; Gamble, Tony; Zheng, Jiao; Kui, Ling; Li, Cai; Li, Shengbin; Yang, Huanming; Zhang, Guojie

    2016-10-26

    Geckos are among the most species-rich reptile groups and the sister clade to all other lizards and snakes. Geckos possess a suite of distinctive characteristics, including adhesive digits, nocturnal activity, hard, calcareous eggshells, and a lack of eyelids. However, one gecko clade, the Eublepharidae, appears to be the exception to most of these 'rules' and lacks adhesive toe pads, has eyelids, and lays eggs with soft, leathery eggshells. These differences make eublepharids an important component of any investigation into the underlying genomic innovations contributing to the distinctive phenotypes in 'typical' geckos. We report high-depth genome sequencing, assembly, and annotation for a male leopard gecko, Eublepharis macularius (Eublepharidae). Illumina sequence data were generated from seven insert libraries (ranging from 170 to 20 kb), representing a raw sequencing depth of 136X from 303 Gb of data, reduced to 84X and 187 Gb after filtering. The assembled genome of 2.02 Gb was close to the 2.23 Gb estimated by k-mer analysis. Scaffold and contig N50 sizes of 664 and 20 kb, respectively, were comparable to the previously published Gekko japonicus genome. Repetitive elements accounted for 42 % of the genome. Gene annotation yielded 24,755 protein-coding genes, of which 93 % were functionally annotated. CEGMA and BUSCO assessment showed that our assembly captured 91 % (225 of 248) of the core eukaryotic genes, and 76 % of vertebrate universal single-copy orthologs. Assembly of the leopard gecko genome provides a valuable resource for future comparative genomic studies of geckos and other squamate reptiles.

  8. Draft genome sequence of an extensively drug-resistant Pseudomonas aeruginosa isolate belonging to ST644 isolated from a footpad infection in a Magellanic penguin (Spheniscus magellanicus).

    PubMed

    Sellera, Fábio P; Fernandes, Miriam R; Moura, Quézia; Souza, Tiago A; Nascimento, Cristiane L; Cerdeira, Louise; Lincopan, Nilton

    2018-03-01

    The incidence of multidrug-resistant bacteria in wildlife animals has been investigated to improve our knowledge of the spread of clinically relevant antimicrobial resistance genes. The aim of this study was to report the first draft genome sequence of an extensively drug-resistant (XDR) Pseudomonas aeruginosa ST644 isolate recovered from a Magellanic penguin with a footpad infection (bumblefoot) undergoing rehabilitation process. The genome was sequenced on an Illumina NextSeq ® platform using 150-bp paired-end reads. De novo genome assembly was performed using Velvet v.1.2.10, and the whole genome sequence was evaluated using bioinformatics approaches from the Center of Genomic Epidemiology, whereas an in-house method (mapping of raw whole genome sequence reads) was used to identify chromosomal point mutations. The genome size was calculated at 6436450bp, with 6357 protein-coding sequences and the presence of genes conferring resistance to aminoglycosides, β-lactams, phenicols, sulphonamides, tetracyclines, quinolones and fosfomycin; in addition, mutations in the genes gyrA (Thr83Ile), parC (Ser87Leu), phoQ (Arg61His) and pmrB (Tyr345His), conferring resistance to quinolones and polymyxins, respectively, were confirmed. This draft genome sequence can provide useful information for comparative genomic analysis regarding the dissemination of clinically significant antibiotic resistance genes and XDR bacterial species at the human-animal interface. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.

  9. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    PubMed

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  10. De novo assembly and comparative analysis of root transcriptomes from different varieties of Panax ginseng C. A. Meyer grown in different environments.

    PubMed

    Zhen, Gang; Zhang, Lei; Du, YaNan; Yu, RenBo; Liu, XinMin; Cao, FangRui; Chang, Qi; Deng, XingWang; Xia, Mian; He, Hang

    2015-11-01

    Panax ginseng C. A. Meyer is an important traditional herb in eastern Asia. It contains ginsenosides, which are primary bioactive compounds with medicinal properties. Although ginseng has been cultivated since at least the Ming dynasty to increase production, cultivated ginseng has lower quantities of ginsenosides and lower disease resistance than ginseng grown under natural conditions. We extracted root RNA from six varieties of fifth-year P. ginseng cultivars representing four different growth conditions, and performed Illumina paired-end sequencing. In total, 163,165,706 raw reads were obtained and used to generate a de novo transcriptome that consisted of 151,763 contigs (76,336 unigenes), of which 100,648 contigs (66.3%) were successfully annotated. Differential expression analysis revealed that most differentially expressed genes (DEGs) were upregulated (246 out of 258, 95.3%) in ginseng grown under natural conditions compared with that grown under artificial conditions. These DEGs were enriched in gene ontology (GO) terms including response to stimuli and localization. In particular, some key ginsenoside biosynthesis-related genes, including HMG-CoA synthase (HMGS), mevalonate kinase (MVK), and squalene epoxidase (SE), were upregulated in wild-grown ginseng. Moreover, a high proportion of disease resistance-related genes were upregulated in wild-grown ginseng. This study is the first transcriptome analysis to compare wild-grown and cultivated ginseng, and identifies genes that may produce higher ginsenoside content and better disease resistance in the wild; these genes may have the potential to improve cultivated ginseng grown in artificial environments.

  11. Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium

    PubMed Central

    Grove, Megan L.; Yu, Bing; Cochran, Barbara J.; Haritunians, Talin; Bis, Joshua C.; Taylor, Kent D.; Hansen, Mark; Borecki, Ingrid B.; Cupples, L. Adrienne; Fornage, Myriam; Gudnason, Vilmundur; Harris, Tamara B.; Kathiresan, Sekar; Kraaij, Robert; Launer, Lenore J.; Levy, Daniel; Liu, Yongmei; Mosley, Thomas; Peloso, Gina M.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Siscovick, David S.; Smith, Albert V.; Uitterlinden, Andre; van Duijn, Cornelia M.; Wilson, James G.; O’Donnell, Christopher J.; Rotter, Jerome I.; Boerwinkle, Eric

    2013-01-01

    Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip. PMID:23874508

  12. DeepSAGE Based Differential Gene Expression Analysis under Cold and Freeze Stress in Seabuckthorn (Hippophae rhamnoides L.)

    PubMed Central

    Chaudhary, Saurabh; Sharma, Prakash C.

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants. PMID:25803684

  13. DeepSAGE based differential gene expression analysis under cold and freeze stress in seabuckthorn (Hippophae rhamnoides L.).

    PubMed

    Chaudhary, Saurabh; Sharma, Prakash C

    2015-01-01

    Seabuckthorn (Hippophae rhamnoides L.), an important plant species of Indian Himalayas, is well known for its immense medicinal and nutritional value. The plant has the ability to sustain growth in harsh environments of extreme temperatures, drought and salinity. We employed DeepSAGE, a tag based approach, to identify differentially expressed genes under cold and freeze stress in seabuckthorn. In total 36.2 million raw tags including 13.9 million distinct tags were generated using Illumina sequencing platform for three leaf tissue libraries including control (CON), cold stress (CS) and freeze stress (FS). After discarding low quality tags, 35.5 million clean tags including 7 million distinct clean tags were obtained. In all, 11922 differentially expressed genes (DEGs) including 6539 up regulated and 5383 down regulated genes were identified in three comparative setups i.e. CON vs CS, CON vs FS and CS vs FS. Gene ontology and KEGG pathway analysis were performed to assign gene ontology term to DEGs and ascertain their biological functions. DEGs were mapped back to our existing seabuckthorn transcriptome assembly comprising of 88,297 putative unigenes leading to the identification of 428 cold and freeze stress responsive genes. Expression of randomly selected 22 DEGs was validated using qRT-PCR that further supported our DeepSAGE results. The present study provided a comprehensive view of global gene expression profile of seabuckthorn under cold and freeze stresses. The DeepSAGE data could also serve as a valuable resource for further functional genomics studies aiming selection of candidate genes for development of abiotic stress tolerant transgenic plants.

  14. Draft genome of the reindeer (Rangifer tarandus).

    PubMed

    Li, Zhipeng; Lin, Zeshan; Ba, Hengxing; Chen, Lei; Yang, Yongzhi; Wang, Kun; Qiu, Qiang; Wang, Wen; Li, Guangyu

    2017-12-01

    The reindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and it is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer, as well as males, regularly grow cranial appendages (antlers, the defining characteristics of cervids). Moreover, reindeer milk contains more protein and less lactose than bovids' milk. A high-quality reference genome of this species will assist efforts to elucidate these and other important features in the reindeer. We obtained 615 Gb (Gigabase) of usable sequences by filtering the low-quality reads of the raw data generated from the Illumina Hiseq 4000 platform, and a 2.64-Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21 555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1339 snRNA, and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus and ancestors of Bos taurus and Capra hircus is estimated to be about 29.5 million years ago. Our results provide the first high-quality reference genome for the reindeer and a valuable resource for studying the evolution, domestication, and other unusual characteristics of the reindeer. © The Authors 2017. Published by Oxford University Press.

  15. A filtering method to generate high quality short reads using illumina paired-end technology.

    PubMed

    Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L

    2013-01-01

    Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.

  16. ChAMP: updated methylation analysis pipeline for Illumina BeadChips.

    PubMed

    Tian, Yuan; Morris, Tiffany J; Webster, Amy P; Yang, Zhen; Beck, Stephan; Feber, Andrew; Teschendorff, Andrew E

    2017-12-15

    The Illumina Infinium HumanMethylationEPIC BeadChip is the new platform for high-throughput DNA methylation analysis, effectively doubling the coverage compared to the older 450 K array. Here we present a significantly updated and improved version of the Bioconductor package ChAMP, which can be used to analyze EPIC and 450k data. Many enhanced functionalities have been added, including correction for cell-type heterogeneity, network analysis and a series of interactive graphical user interfaces. ChAMP is a BioC package available from https://bioconductor.org/packages/release/bioc/html/ChAMP.html. a.teschendorff@ucl.ac.uk or s.beck@ucl.ac.uk or a.feber@ucl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  17. Isolation and characterization of microsatellite markers for Jasminum sambac (Oleaceae) using Illumina shotgun sequencing1

    PubMed Central

    Li, Yong; Zhang, Weirui

    2015-01-01

    Premise of the study: Microsatellite markers of Jasminum sambac (Oleaceae) were isolated to investigate wild germplasm resources and provide markers for breeding. Methods and Results: Illumina sequencing was used to isolate microsatellite markers from the transcriptome of J. sambac. A total of 1322 microsatellites were identified from 49,772 assembled unigenes. One hundred primer pairs were randomly selected to verify primer amplification efficiency. Out of these tested primer pairs, 31 were successfully amplified: 18 primer pairs yielded a single allele, seven exhibited fixed heterozygosity with two alleles, and only six displayed polymorphisms. Conclusions: This study obtained the first set of microsatellite markers for J. sambac, which will be helpful for the assessment of wild germplasm resources and the development of molecular marker–assisted breeding. PMID:26504683

  18. Analysis of microbial community variation during the mixed culture fermentation of agricultural peel wastes to produce lactic acid.

    PubMed

    Liang, Shaobo; Gliniewicz, Karol; Gerritsen, Alida T; McDonald, Armando G

    2016-05-01

    Mixed cultures fermentation can be used to convert organic wastes into various chemicals and fuels. This study examined the fermentation performance of four batch reactors fed with different agricultural (orange, banana, and potato (mechanical and steam)) peel wastes using mixed cultures, and monitored the interval variation of reactor microbial communities with 16S rRNA genes using Illumina sequencing. All four reactors produced similar chemical profile with lactic acid (LA) as dominant compound. Acetic acid and ethanol were also observed with small fractions. The Illumina sequencing results revealed the diversity of microbial community decreased during fermentation and a community of largely lactic acid producing bacteria dominated by species of Lactobacillus developed. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Transcriptome-Based Differentiation of Closely-Related Miscanthus Lines

    DOE PAGES

    Chouvarine, Philippe; Cooksey, Amanda M.; McCarthy, Fiona M.; ...

    2012-01-10

    Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthusmore » (Miscanthus6giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations."« less

  20. Characterization of microsatellite loci for an Australian epiphytic orchid, Dendrobium calamiforme, using Illumina sequencing1

    PubMed Central

    Trapnell, Dorset W.; Beasley, Rochelle R.; Lance, Stacey L.; Field, Ashley R.; Jones, Kenneth L.

    2015-01-01

    Premise of the study: Microsatellite loci were developed for the epiphytic pencil orchid Dendrobium calamiforme for population genetic and phylogeographic investigation of this Australian taxon. Methods and Results: Nineteen microsatellite loci were identified from an Illumina paired-end shotgun library of D. calamiforme. Polymorphism and genetic diversity were assessed in 24 individuals from five populations separated by a maximum distance of ∼80 km. All loci were polymorphic with two to 14 alleles per locus, expected heterozygosity ranging from 0.486 to 0.902, and probability of identity values ranging from 0.018 to 0.380. Conclusions: These novel markers will serve as valuable tools for investigation of levels of genetic diversity as well as patterns of gene flow, genetic structure, and phylogeographic history. PMID:26082878

  1. Characterization of microsatellite loci for an Australian epiphytic orchid, Dendrobium calamiforme, using illumina sequencing

    DOE PAGES

    Trapnell, Dorset W.; Beasley, Rochelle R.; Lance, Stacey L.; ...

    2015-06-05

    Our premise describes how microsatellite loci were developed for the epiphytic pencil orchid Dendrobium calamiforme for population genetic and phylogeographic investigation of this Australian taxon. Nineteen microsatellite loci were identified from an Illumina paired-end shotgun library of D. calamiforme. Polymorphism and genetic diversity were assessed in 24 individuals from five populations separated by a maximum distance of ~80 km. All loci were polymorphic with two to 14 alleles per locus, expected heterozygosity ranging from 0.486 to 0.902, and probability of identity values ranging from 0.018 to 0.380. In conclusion, these novel markers will serve as valuable tools for investigation ofmore » levels of genetic diversity as well as patterns of gene flow, genetic structure, and phylogeographic history.« less

  2. 40 CFR 63.1348 - Standards for affected sources other than kilns; in-line kiln/raw mills; clinker coolers; new and...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... than kilns; in-line kiln/raw mills; clinker coolers; new and reconstructed raw material dryers; and raw...; in-line kiln/raw mills; clinker coolers; new and reconstructed raw material dryers; and raw and finish mills. The owner or operator of each new or existing raw material, clinker, or finished product...

  3. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing.

    PubMed

    Zhang, Jin; Ruhlman, Tracey A; Mower, Jeffrey P; Jansen, Robert K

    2013-12-29

    Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants.

  4. Comparative analyses of two Geraniaceae transcriptomes using next-generation sequencing

    PubMed Central

    2013-01-01

    Background Organelle genomes of Geraniaceae exhibit several unusual evolutionary phenomena compared to other angiosperm families including accelerated nucleotide substitution rates, widespread gene loss, reduced RNA editing, and extensive genomic rearrangements. Since most organelle-encoded proteins function in multi-subunit complexes that also contain nuclear-encoded proteins, it is likely that the atypical organellar phenomena affect the evolution of nuclear genes encoding organellar proteins. To begin to unravel the complex co-evolutionary interplay between organellar and nuclear genomes in this family, we sequenced nuclear transcriptomes of two species, Geranium maderense and Pelargonium x hortorum. Results Normalized cDNA libraries of G. maderense and P. x hortorum were used for transcriptome sequencing. Five assemblers (MIRA, Newbler, SOAPdenovo, SOAPdenovo-trans [SOAPtrans], Trinity) and two next-generation technologies (454 and Illumina) were compared to determine the optimal transcriptome sequencing approach. Trinity provided the highest quality assembly of Illumina data with the deepest transcriptome coverage. An analysis to determine the amount of sequencing needed for de novo assembly revealed diminishing returns of coverage and quality with data sets larger than sixty million Illumina paired end reads for both species. The G. maderense and P. x hortorum transcriptomes contained fewer transcripts encoding the PLS subclass of PPR proteins relative to other angiosperms, consistent with reduced mitochondrial RNA editing activity in Geraniaceae. In addition, transcripts for all six plastid targeted sigma factors were identified in both transcriptomes, suggesting that one of the highly divergent rpoA-like ORFs in the P. x hortorum plastid genome is functional. Conclusions The findings support the use of the Illumina platform and assemblers optimized for transcriptome assembly, such as Trinity or SOAPtrans, to generate high-quality de novo transcriptomes with broad coverage. In addition, results indicated no major improvements in breadth of coverage with data sets larger than six billion nucleotides or when sampling RNA from four tissue types rather than from a single tissue. Finally, this work demonstrates the power of cross-compartmental genomic analyses to deepen our understanding of the correlated evolution of the nuclear, plastid, and mitochondrial genomes in plants. PMID:24373163

  5. Short read Illumina data for the de novo assembly of a non-model snail species transcriptome (Radix balthica, Basommatophora, Pulmonata), and a comparison of assembler performance

    PubMed Central

    2011-01-01

    Background Until recently, read lengths on the Solexa/Illumina system were too short to reliably assemble transcriptomes without a reference sequence, especially for non-model organisms. However, with read lengths up to 100 nucleotides available in the current version, an assembly without reference genome should be possible. For this study we created an EST data set for the common pond snail Radix balthica by Illumina sequencing of a normalized transcriptome. Performance of three different short read assemblers was compared with respect to: the number of contigs, their length, depth of coverage, their quality in various BLAST searches and the alignment to mitochondrial genes. Results A single sequencing run of a normalized RNA pool resulted in 16,923,850 paired end reads with median read length of 61 bases. The assemblies generated by VELVET, OASES, and SeqMan NGEN differed in the total number of contigs, contig length, the number and quality of gene hits obtained by BLAST searches against various databases, and contig performance in the mt genome comparison. While VELVET produced the highest overall number of contigs, a large fraction of these were of small size (< 200bp), and gave redundant hits in BLAST searches and the mt genome alignment. The best overall contig performance resulted from the NGEN assembly. It produced the second largest number of contigs, which on average were comparable to the OASES contigs but gave the highest number of gene hits in two out of four BLAST searches against different reference databases. A subsequent meta-assembly of the four contig sets resulted in larger contigs, less redundancy and a higher number of BLAST hits. Conclusion Our results document the first de novo transcriptome assembly of a non-model species using Illumina sequencing data. We show that de novo transcriptome assembly using this approach yields results useful for downstream applications, in particular if a meta-assembly of contig sets is used to increase contig quality. These results highlight the ongoing need for improvements in assembly methodology. PMID:21679424

  6. Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing

    PubMed Central

    Mak, Sarah Siu Tze; Gopalakrishnan, Shyam; Carøe, Christian; Geng, Chunyu; Liu, Shanlin; Sinding, Mikkel-Holger S; Kuderna, Lukas F K; Zhang, Wenwei; Fu, Shujin; Vieira, Filipe G; Germonpré, Mietje; Bocherens, Hervé; Fedorov, Sergey; Petersen, Bent; Sicheritz-Pontén, Thomas; Marques-Bonet, Tomas; Zhang, Guojie; Jiang, Hui; Gilbert, M Thomas P

    2017-01-01

    Abstract Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction–amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA. PMID:28854615

  7. missMethyl: an R package for analyzing data from Illumina's HumanMethylation450 platform.

    PubMed

    Phipson, Belinda; Maksimovic, Jovana; Oshlack, Alicia

    2016-01-15

    DNA methylation is one of the most commonly studied epigenetic modifications due to its role in both disease and development. The Illumina HumanMethylation450 BeadChip is a cost-effective way to profile >450 000 CpGs across the human genome, making it a popular platform for profiling DNA methylation. Here we introduce missMethyl, an R package with a suite of tools for performing normalization, removal of unwanted variation in differential methylation analysis, differential variability testing and gene set analysis for the 450K array. missMethyl is an R package available from the Bioconductor project at www.bioconductor.org. alicia.oshlack@mcri.edu.au Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. The correlation of methylation levels measured using Illumina 450K and EPIC BeadChips in blood samples

    PubMed Central

    Logue, Mark W; Smith, Alicia K; Wolf, Erika J; Maniates, Hannah; Stone, Annjanette; Schichman, Steven A; McGlinchey, Regina E; Milberg, William; Miller, Mark W

    2017-01-01

    Aim: We examined concordance of methylation levels across the Illumina Infinium HumanMethylation450 BeadChip and the Infinium MethylationEPIC BeadChip. Methods: We computed the correlation for 145 whole blood DNA samples at each of the 422,524 CpG sites measured by both chips. Results: The correlation at some sites was high (up to r = 0.95), but many sites had low correlation (55% had r < 0.20). The low correspondence between 450K and EPIC measured methylation values at many loci was largely due to the low variability in methylation values for the majority of the CpG sites in blood. Conclusion: Filtering out probes based on the observed correlation or low variability may increase reproducibility of BeadChip-based epidemiological studies. PMID:28809127

  9. Assessment of DNA extracted from FTA® cards for use on the Illumina iSelect BeadChip

    PubMed Central

    McClure, Matthew C; McKay, Stephanie D; Schnabel, Robert D; Taylor, Jeremy F

    2009-01-01

    Background As FTA® cards provide an ideal medium for the field collection of DNA we sought to assess the quality of genomic DNA extracted from this source for use on the Illumina BovineSNP50 iSelect BeadChip which requires unbound, relatively intact (fragment sizes ≥ 2 kb), and high-quality DNA. Bovine blood and nasal swab samples collected on FTA cards were extracted using the commercially available GenSolve kit with a minor modification. The call rate and concordance of genotypes from each sample were compared to those obtained from whole blood samples extracted by standard PCI extraction. Findings An ANOVA analysis indicated no significant difference (P > 0.72) in BovineSNP50 genotype call rate between DNA extracted from FTA cards by the GenSolve kit or extracted from whole blood by PCI. Two sample t-tests demonstrated that the DNA extracted from the FTA cards produced genotype call and concordance rates that were not different to those produced by assaying DNA samples extracted by PCI from whole blood. Conclusion We conclude that DNA extracted from FTA cards by the GenSolve kit is of sufficiently high quality to produce results comparable to those obtained from DNA extracted from whole blood when assayed by the Illumina iSelect technology. Additionally, we validate the use of nasal swabs as an alternative to venous blood or buccal samples from animal subjects for reliably producing high quality genotypes on this platform. PMID:19531223

  10. Assessment of DNA extracted from FTA cards for use on the Illumina iSelect BeadChip.

    PubMed

    McClure, Matthew C; McKay, Stephanie D; Schnabel, Robert D; Taylor, Jeremy F

    2009-06-16

    As FTA cards provide an ideal medium for the field collection of DNA we sought to assess the quality of genomic DNA extracted from this source for use on the Illumina BovineSNP50 iSelect BeadChip which requires unbound, relatively intact (fragment sizes >or= 2 kb), and high-quality DNA. Bovine blood and nasal swab samples collected on FTA cards were extracted using the commercially available GenSolve kit with a minor modification. The call rate and concordance of genotypes from each sample were compared to those obtained from whole blood samples extracted by standard PCI extraction. An ANOVA analysis indicated no significant difference (P > 0.72) in BovineSNP50 genotype call rate between DNA extracted from FTA cards by the GenSolve kit or extracted from whole blood by PCI. Two sample t-tests demonstrated that the DNA extracted from the FTA cards produced genotype call and concordance rates that were not different to those produced by assaying DNA samples extracted by PCI from whole blood. We conclude that DNA extracted from FTA cards by the GenSolve kit is of sufficiently high quality to produce results comparable to those obtained from DNA extracted from whole blood when assayed by the Illumina iSelect technology. Additionally, we validate the use of nasal swabs as an alternative to venous blood or buccal samples from animal subjects for reliably producing high quality genotypes on this platform.

  11. Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

    PubMed Central

    2013-01-01

    Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206

  12. "Gap hunting" to characterize clustered probe signals in Illumina methylation array data.

    PubMed

    Andrews, Shan V; Ladd-Acosta, Christine; Feinberg, Andrew P; Hansen, Kasper D; Fallin, M Daniele

    2016-01-01

    The Illumina 450k array has been widely used in epigenetic association studies. Current quality-control (QC) pipelines typically remove certain sets of probes, such as those containing a SNP or with multiple mapping locations. An additional set of potentially problematic probes are those with DNA methylation distributions characterized by two or more distinct clusters separated by gaps. Data-driven identification of such probes may offer additional insights for downstream analyses. We developed a procedure, termed "gap hunting," to identify probes showing clustered distributions. Among 590 peripheral blood samples from the Study to Explore Early Development, we identified 11,007 "gap probes." The vast majority (9199) are likely attributed to an underlying SNP(s) or other variant in the probe, although SNP-affected probes exist that do not produce a gap signals. Specific factors predict which SNPs lead to gap signals, including type of nucleotide change, probe type, DNA strand, and overall methylation state. These expected effects are demonstrated in paired genotype and 450k data on the same samples. Gap probes can also serve as a surrogate for the local genetic sequence on a haplotype scale and can be used to adjust for population stratification. The characteristics of gap probes reflect potentially informative biology. QC pipelines may benefit from an efficient data-driven approach that "flags" gap probes, rather than filtering such probes, followed by careful interpretation of downstream association analyses. Our results should translate directly to the recently released Illumina EPIC array given the similar chemistry and content design.

  13. In Vivo Isotopic Labeling of Symbiotic Bacteria Involved in Cellulose Degradation and Nitrogen Recycling within the Gut of the Forest Cockchafer (Melolontha hippocastani).

    PubMed

    Alonso-Pernas, Pol; Bartram, Stefan; Arias-Cordero, Erika M; Novoselov, Alexey L; Halty-deLeon, Lorena; Shao, Yongqi; Boland, Wilhelm

    2017-01-01

    The guts of insects harbor symbiotic bacterial communities. However, due to their complexity, it is challenging to relate a specific symbiotic phylotype to its corresponding function. In the present study, we focused on the forest cockchafer ( Melolontha hippocastani ), a phytophagous insect with a dual life cycle, consisting of a root-feeding larval stage and a leaf-feeding adult stage. By combining in vivo stable isotope probing (SIP) with 13 C cellulose and 15 N urea as trophic links, with Illumina MiSeq (Illumina-SIP), we unraveled bacterial networks processing recalcitrant dietary components and recycling nitrogenous waste. The bacterial communities behind these processes change between larval and adult stages. In 13 C cellulose-fed insects, the bacterial families Lachnospiraceae and Enterobacteriaceae were isotopically labeled in larvae and adults, respectively. In 15 N urea-fed insects, the genera Burkholderia and Parabacteroides were isotopically labeled in larvae and adults, respectively. Additionally, the PICRUSt-predicted metagenome suggested a possible ability to degrade hemicellulose and to produce amino acids of, respectively, 13 C cellulose- and 15 N urea labeled bacteria. The incorporation of 15 N from ingested urea back into the insect body was confirmed, in larvae and adults, by isotope ratio mass spectrometry (IRMS). Besides highlighting key bacterial symbionts of the gut of M. hippocastani , this study provides example on how Illumina-SIP with multiple trophic links can be used to target microorganisms embracing different roles within an environment.

  14. 5' Rapid Amplification of cDNA Ends and Illumina MiSeq Reveals B Cell Receptor Features in Healthy Adults, Adults With Chronic HIV-1 Infection, Cord Blood, and Humanized Mice.

    PubMed

    Waltari, Eric; Jia, Manxue; Jiang, Caroline S; Lu, Hong; Huang, Jing; Fernandez, Cristina; Finzi, Andrés; Kaufmann, Daniel E; Markowitz, Martin; Tsuji, Moriya; Wu, Xueling

    2018-01-01

    Using 5' rapid amplification of cDNA ends, Illumina MiSeq, and basic flow cytometry, we systematically analyzed the expressed B cell receptor (BCR) repertoire in 14 healthy adult PBMCs, 5 HIV-1+ adult PBMCs, 5 cord blood samples, and 3 HIS-CD4/B mice, examining the full-length variable region of μ, γ, α, κ, and λ chains for V-gene usage, somatic hypermutation (SHM), and CDR3 length. Adding to the known repertoire of healthy adults, Illumina MiSeq consistently detected small fractions of reads with high mutation frequencies including hypermutated μ reads, and reads with long CDR3s. Additionally, the less studied IgA repertoire displayed similar characteristics to that of IgG. Compared to healthy adults, the five HIV-1 chronically infected adults displayed elevated mutation frequencies for all μ, γ, α, κ, and λ chains examined and slightly longer CDR3 lengths for γ, α, and λ. To evaluate the reconstituted human BCR sequences in a humanized mouse model, we analyzed cord blood and HIS-CD4/B mice, which all lacked the typical SHM seen in the adult reference. Furthermore, MiSeq revealed identical unmutated IgM sequences derived from separate cell aliquots, thus for the first time demonstrating rare clonal members of unmutated IgM B cells by sequencing.

  15. ALLPATHS: Assembling Large Genomes with Short Illumina Reads

    ScienceCinema

    Gnerre, Sante

    2018-02-06

    Sante Gnerre from the Broad Institute speaks on the challenge of developing high quality assemblies of large genomes using short reads at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  16. Food safety hazards associated with consumption of raw milk.

    PubMed

    Oliver, Stephen P; Boor, Kathryn J; Murphy, Steven C; Murinda, Shelton E

    2009-09-01

    An increasing number of people are consuming raw unpasteurized milk. Enhanced nutritional qualities, taste, and health benefits have all been advocated as reasons for increased interest in raw milk consumption. However, science-based data to substantiate these claims are limited. People continue to consume raw milk even though numerous epidemiological studies have shown clearly that raw milk can be contaminated by a variety of pathogens, some of which are associated with human illness and disease. Several documented milkborne disease outbreaks occurred from 2000-2008 and were traced back to consumption of raw unpasteurized milk. Numerous people were found to have infections, some were hospitalized, and a few died. In the majority of these outbreaks, the organism associated with the milkborne outbreak was isolated from the implicated product(s) or from subsequent products made at the suspected dairy or source. In contrast, fewer milkborne disease outbreaks were associated with consumption of pasteurized milk during this same time period. Twenty nine states allow the sale of raw milk by some means. Direct purchase, cow-share or leasing programs, and the sale of raw milk as pet food have been used as means for consumers to obtain raw milk. Where raw milk is offered for sale, strategies to reduce risks associated with raw milk and products made from raw milk are needed. Developing uniform regulations including microbial standards for raw milk to be sold for human consumption, labeling of raw milk, improving sanitation during milking, and enhancing and targeting educational efforts are potential approaches to this issue. Development of pre- and postharvest control measures to effectively reduce contamination is critical to the control of pathogens in raw milk. One sure way to prevent raw milk-associated foodborne illness is for consumers to refrain from drinking raw milk and from consuming dairy products manufactured using raw milk.

  17. Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daum, Christopher; Zane, Matthew; Han, James

    2011-01-31

    The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less

  18. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping.

    PubMed

    Brumme, Chanson J; Poon, Art F Y

    2017-07-15

    Genetic sequencing ("genotyping") plays a critical role in the modern clinical management of HIV infection. This virus evolves rapidly within patients because of its error-prone reverse transcriptase and short generation time. Consequently, HIV variants with mutations that confer resistance to one or more antiretroviral drugs can emerge during sub-optimal treatment. There are now multiple HIV drug resistance interpretation algorithms that take the region of the HIV genome encoding the major drug targets as inputs; expert use of these algorithms can significantly improve to clinical outcomes in HIV treatment. Next-generation sequencing has the potential to revolutionize HIV resistance genotyping by lowering the threshold that rare but clinically significant HIV variants can be detected reproducibly, and by conferring improved cost-effectiveness in high-throughput scenarios. In this review, we discuss the relative merits and challenges of deploying the Illumina MiSeq instrument for clinical HIV genotyping. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Impact of Compounding Error on Strategies for Subtyping Pathogenic Bacteria

    PubMed Central

    Orfe, Lisa; Davis, Margaret A.; Lafrentz, Stacey; Kang, Min-Su

    2008-01-01

    Abstract Comparative-omics will identify a multitude of markers that can be used for intraspecific discrimination between strains of bacteria. It seems intuitive that with this plethora of markers we can construct higher resolution subtyping assays using discrete markers to define strain “barcodes.” Unfortunately, with each new marker added to an assay, overall assay robustness declines because errors are compounded exponentially. For example, the difference in accuracy of strain classification for an assay with 60 markers will change from 99.9% to 54.7% when average probe accuracy declines from 99.999% to 99.0%. To illustrate this effect empirically, we constructed a 19 probe bead-array for subtyping Listeria monocytogenes and showed that despite seemingly reliable individual probe accuracy (>97%), our best classification results at the strain level were <75%. A more robust strategy would use as few markers as possible to achieve strain discrimination. Consequently, we developed two variable number of tandem repeat (VNTR) assays (Vibrio parahaemolyticus and L. monocytogenes) and demonstrate that these assays along with a published assay (Salmonella enterica) produce robust results when products were machine scored. The discriminatory ability with four to seven VNTR loci was comparable to pulsed-field gel electrophoresis. Passage experiments showed some instability with ca. 5% of passaged lines showing evidence for new alleles within 30 days (V. parahaemolyticus and S. enterica). Changes were limited to a single locus and allele so conservative rules can be used to determine strain matching. Most importantly, VNTRs appear robust and portable and can clearly discriminate between strains with relatively few loci thereby limiting effects of compounding error. PMID:18713065

  20. Assembly of highly repetitive genomes using short reads: the genome of discrete typing unit III Trypanosoma cruzi strain 231.

    PubMed

    Baptista, Rodrigo P; Reis-Cunha, Joao Luis; DeBarry, Jeremy D; Chiari, Egler; Kissinger, Jessica C; Bartholomeu, Daniella C; Macedo, Andrea M

    2018-02-14

    Next-generation sequencing (NGS) methods are low-cost high-throughput technologies that produce thousands to millions of sequence reads. Despite the high number of raw sequence reads, their short length, relative to Sanger, PacBio or Nanopore reads, complicates the assembly of genomic repeats. Many genome tools are available, but the assembly of highly repetitive genome sequences using only NGS short reads remains challenging. Genome assembly of organisms responsible for important neglected diseases such as Trypanosoma cruzi, the aetiological agent of Chagas disease, is known to be challenging because of their repetitive nature. Only three of six recognized discrete typing units (DTUs) of the parasite have their draft genomes published and therefore genome evolution analyses in the taxon are limited. In this study, we developed a computational workflow to assemble highly repetitive genomes via a combination of de novo and reference-based assembly strategies to better overcome the intrinsic limitations of each, based on Illumina reads. The highly repetitive genome of the human-infecting parasite T. cruzi 231 strain was used as a test subject. The combined-assembly approach shown in this study benefits from the reference-based assembly ability to resolve highly repetitive sequences and from the de novo capacity to assemble genome-specific regions, improving the quality of the assembly. The acceptable confidence obtained by analyzing our results showed that our combined approach is an attractive option to assemble highly repetitive genomes with NGS short reads. Phylogenomic analysis including the 231 strain, the first representative of DTU III whose genome was sequenced, was also performed and provides new insights into T. cruzi genome evolution.

  1. Genomic Characterization of a Novel Phage Found in Black Abalone (Haliotis cracherodii) Infected with Withering Syndrome

    NASA Astrophysics Data System (ADS)

    Closek, C. J.; Langevin, S.; Burge, C. A.; Crosson, L.; White, S.; Friedman, C. S.

    2016-02-01

    Withering syndrome (WS), caused by the bacterium Candidatus Xenohaliotis californiensis, a Rickettsia-like organism (RLO), infects many species of abalone. Black abalone (Haliotis cracherodii), one of two endangered species of abalone, has experienced high population losses along the California coast due to WS. Recently, we observed reduced pathogenicity and mortality events in RLO-infected abalone when a novel bacteriophage (phage) was also present. To better understand phage-bacterium dynamics and develop more informative diagnostic tools, we sequenced the genome of the novel phage associated with the RLO responsible for WS. Metagenomic sequencing libraries were prepared with extracted genomic DNA from two experimentally infected H. cracherodii and phage sequences were enriched using hydroxyapatite chromatography normalization. Normalized libraries were individually barcoded and sequenced with Illumina MiSeq. Raw sequence reads were processed using VIrominer and de novo assembly produced one single phage-like contig (35.7Kb) from the experimentally infected abalone. This highly divergent genome had closest homology with a virus associated with abalone shriveling syndrome (SS). Of the 34 predicted ORFs, overlapping homology with the SS virus ranged from 20-72%, demonstrating the phage sequenced is genetically distinct from any known phage. The phage-like sequences represented a significant portion of the total reads sequenced ( 2 million of the 12 million paired-end reads; 17%) and we obtained 94,000X coverage across the novel phage genome. Beyond characterization of this novel phage, which appears to reduce pathogenicity of the RLO, the genome enabled us to develop quantitative PCR and in situ hybridization assays as diagnostic tools. These tools allow us to detect and quantify this phage in the endangered H. cracherodii.

  2. Blood transcriptomic comparison of individuals with and without autism spectrum disorder: A combined-samples mega-analysis.

    PubMed

    Tylee, Daniel S; Hess, Jonathan L; Quinn, Thomas P; Barve, Rahul; Huang, Hailiang; Zhang-James, Yanli; Chang, Jeffrey; Stamova, Boryana S; Sharp, Frank R; Hertz-Picciotto, Irva; Faraone, Stephen V; Kong, Sek Won; Glatt, Stephen J

    2017-04-01

    Blood-based microarray studies comparing individuals affected with autism spectrum disorder (ASD) and typically developing individuals help characterize differences in circulating immune cell functions and offer potential biomarker signal. We sought to combine the subject-level data from previously published studies by mega-analysis to increase the statistical power. We identified studies that compared ex vivo blood or lymphocytes from ASD-affected individuals and unrelated comparison subjects using Affymetrix or Illumina array platforms. Raw microarray data and clinical meta-data were obtained from seven studies, totaling 626 affected and 447 comparison subjects. Microarray data were processed using uniform methods. Covariate-controlled mixed-effect linear models were used to identify gene transcripts and co-expression network modules that were significantly associated with diagnostic status. Permutation-based gene-set analysis was used to identify functionally related sets of genes that were over- and under-expressed among ASD samples. Our results were consistent with diminished interferon-, EGF-, PDGF-, PI3K-AKT-mTOR-, and RAS-MAPK-signaling cascades, and increased ribosomal translation and NK-cell related activity in ASD. We explored evidence for sex-differences in the ASD-related transcriptomic signature. We also demonstrated that machine-learning classifiers using blood transcriptome data perform with moderate accuracy when data are combined across studies. Comparing our results with those from blood-based studies of protein biomarkers (e.g., cytokines and trophic factors), we propose that ASD may feature decoupling between certain circulating signaling proteins (higher in ASD samples) and the transcriptional cascades which they typically elicit within circulating immune cells (lower in ASD samples). These findings provide insight into ASD-related transcriptional differences in circulating immune cells. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  3. High-coverage sequencing and annotated assembly of the genome of the Australian dragon lizard Pogona vitticeps.

    PubMed

    Georges, Arthur; Li, Qiye; Lian, Jinmin; O'Meally, Denis; Deakin, Janine; Wang, Zongji; Zhang, Pei; Fujita, Matthew; Patel, Hardip R; Holleley, Clare E; Zhou, Yang; Zhang, Xiuwen; Matsubara, Kazumi; Waters, Paul; Graves, Jennifer A Marshall; Sarre, Stephen D; Zhang, Guojie

    2015-01-01

    The lizards of the family Agamidae are one of the most prominent elements of the Australian reptile fauna. Here, we present a genomic resource built on the basis of a wild-caught male ZZ central bearded dragon Pogona vitticeps. The genomic sequence for P. vitticeps, generated on the Illumina HiSeq 2000 platform, comprised 317 Gbp (179X raw read depth) from 13 insert libraries ranging from 250 bp to 40 kbp. After filtering for low-quality and duplicated reads, 146 Gbp of data (83X) was available for assembly. Exceptionally high levels of heterozygosity (0.85 % of single nucleotide polymorphisms plus sequence insertions or deletions) complicated assembly; nevertheless, 96.4 % of reads mapped back to the assembled scaffolds, indicating that the assembly included most of the sequenced genome. Length of the assembly was 1.8 Gbp in 545,310 scaffolds (69,852 longer than 300 bp), the longest being 14.68 Mbp. N50 was 2.29 Mbp. Genes were annotated on the basis of de novo prediction, similarity to the green anole Anolis carolinensis, Gallus gallus and Homo sapiens proteins, and P. vitticeps transcriptome sequence assemblies, to yield 19,406 protein-coding genes in the assembly, 63 % of which had intact open reading frames. Our assembly captured 99 % (246 of 248) of core CEGMA genes, with 93 % (231) being complete. The quality of the P. vitticeps assembly is comparable or superior to that of other published squamate genomes, and the annotated P. vitticeps genome can be accessed through a genome browser available at https://genomics.canberra.edu.au.

  4. Microbial community composition and electricity generation in cattle manure slurry treatment using microbial fuel cells: effects of inoculum addition.

    PubMed

    Xie, Binghan; Gong, Weijia; Ding, An; Yu, Huarong; Qu, Fangshu; Tang, Xiaobin; Yan, Zhongsen; Li, Guibai; Liang, Heng

    2017-10-01

    Microbial fuel cell (MFC) is a sustainable technology to treat cattle manure slurry (CMS) for converting chemical energy to bioelectricity. In this work, two types of allochthonous inoculum including activated sludge (AS) and domestic sewage (DS) were added into the MFC systems to enhance anode biofilm formation and electricity generation. Results indicated that MFCs (AS + CMS) obtained the maximum electricity output with voltage approaching 577 ± 7 mV (~ 196 h), followed by MFCs (DS + CMS) (520 ± 21 mV, ~ 236 h) and then MFCs with autochthonous inoculum (429 ± 62 mV, ~ 263.5 h). Though the raw cattle manure slurry (RCMS) could facilitate electricity production in MFCs, the addition of allochthonous inoculum (AS/DS) significantly reduced the startup time and enhanced the output voltage. Moreover, the maximum power (1.259 ± 0.015 W/m 2 ) and the highest COD removal (84.72 ± 0.48%) were obtained in MFCs (AS + CMS). With regard to microbial community, Illumina HiSeq of the 16S rRNA gene was employed in this work and the exoelectrogens (Geobacter and Shewanella) were identified as the dominant members on all anode biofilms in MFCs. For anode microbial diversity, the MFCs (AS + CMS) outperformed MFCs (DS + CMS) and MFCs (RCMS), allowing the occurrence of the fermentative (e.g., Bacteroides) and nitrogen fixation bacteria (e.g., Azoarcus and Sterolibacterium) which enabled the efficient degradation of the slurry. This study provided a feasible strategy to analyze the anode biofilm formation by adding allochthonous inoculum and some implications for quick startup of MFC reactors for CMS treatment.

  5. Draft genome of the lined seahorse, Hippocampus erectus.

    PubMed

    Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa; Xu, Junming; Fu, Hongtuo; Shi, Qiong

    2017-06-01

    The lined seahorse, Hippocampus erectus , is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse ( H. comes ). We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. © The Authors 2017. Published by Oxford University Press.

  6. Draft genome of the sea cucumber Apostichopus japonicus and genetic polymorphism among color variants.

    PubMed

    Jo, Jihoon; Oh, Jooseong; Lee, Hyun-Gwan; Hong, Hyun-Hee; Lee, Sung-Gwon; Cheon, Seongmin; Kern, Elizabeth M A; Jin, Soyeong; Cho, Sung-Jin; Park, Joong-Ki; Park, Chungoo

    2017-01-01

    The Japanese sea cucumber (Apostichopus japonicus Selenka 1867) is an economically important species as a source of seafood and ingredient in traditional medicine. It is mainly found off the coasts of northeast Asia. Recently, substantial exploitation and widespread biotic diseases in A. japonicus have generated increasing conservation concern. However, the genomic knowledge base and resources available for researchers to use in managing this natural resource and to establish genetically based breeding systems for sea cucumber aquaculture are still in a nascent stage. A total of 312 Gb of raw sequences were generated using the Illumina HiSeq 2000 platform and assembled to a final size of 0.66 Gb, which is about 80.5% of the estimated genome size (0.82 Gb). We observed nucleotide-level heterozygosity within the assembled genome to be 0.986%. The resulting draft genome assembly comprising 132 607 scaffolds with an N50 value of 10.5 kb contains a total of 21 771 predicted protein-coding genes. We identified 6.6-14.5 million heterozygous single nucleotide polymorphisms in the assembled genome of the three natural color variants (green, red, and black), resulting in an estimated nucleotide diversity of 0.00146. We report the first draft genome of A. japonicus and provide a general overview of the genetic variation in the three major color variants of A. japonicus. These data will help provide a comprehensive view of the genetic, physiological, and evolutionary relationships among color variants in A. japonicus, and will be invaluable resources for sea cucumber genomic research. © The Author 2017. Published by Oxford University Press.

  7. De Novo RNA Sequencing and Transcriptome Analysis of Colletotrichum gloeosporioides ES026 Reveal Genes Related to Biosynthesis of Huperzine A

    PubMed Central

    Zhang, Xiangmei; Xia, Qianqian; Zhao, Xinmei; Ahn, Youngjoon; Ahmed, Nevin; Cosoveanu, Andreea; Wang, Mo; Wang, Jialu; Shu, Shaohua

    2015-01-01

    Huperzine A is important in the treatment of Alzheimer’s disease. There are major challenges for the mass production of huperzine A from plants due to the limited number of huperzine-A-producing plants, as well as the low content of huperzine A in these plants. Various endophytic fungi produce huperzine A. Colletotrichum gloeosporioides ES026 was previously isolated from a huperzine-A-producing plant Huperzia serrata, and this fungus also produces huperzine A. In this study, de novo RNA sequencing of C. gloeosporioides ES026 was carried out with an Illumina HiSeq2000. A total of 4,324,299,051 bp from 50,442,617 high-quality sequence reads of ES026 were obtained. These raw data were assembled into 24,998 unigenes, 40,536,684 residues and 19,790 genes. The majority of the unique sequences were assigned to corresponding putative functions based on BLAST searches of public databases. The molecular functions, biological processes and biochemical pathways of these unique sequences were determined using gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) assignments. A gene encoding copper amine oxidase (CAO) (unigene 9322) was annotated for the conversion of cadaverine to 5-aminopentanal in the biosynthesis of huperzine A. This gene was also detected in the root, stem and leaf of H. serrata. Furthermore, a close relationship was observed between expression of the CAO gene (unigene 9322) and quantity of crude huperzine A extracted from ES026. Therefore, CAO might be involved in the biosynthesis of huperzine A and it most likely plays a key role in regulating the content of huperzine A in ES026. PMID:25799531

  8. Transcriptome analysis of hexaploid hulless oat in response to salinity stress

    PubMed Central

    Wu, Bin; Hu, Yani; Huo, Pengjie; Zhang, Qian; Chen, Xin; Zhang, Zongwen

    2017-01-01

    Background Oat is a cereal crop of global importance used for food, feed, and forage. Understanding salinity stress tolerance mechanisms in plants is an important step towards generating crop varieties that can cope with environmental stresses. To date, little is known about the salt tolerance of oat at the molecular level. To better understand the molecular mechanisms underlying salt tolerance in oat, we investigated the transcriptomes of control and salt-treated oat using RNA-Seq. Results Using Illumina HiSeq 4000 platform, we generated 72,291,032 and 356,891,432 reads from non-stressed control and salt-stressed oat, respectively. Assembly of 64 Gb raw sequence data yielded 128,414 putative unique transcripts with an average length of 1,189 bp. Analysis of the assembled unigenes from the salt stressed and control libraries indicated that about 65,000 unigenes were differentially expressed at different stages. Functional annotation showed that ABC transporters, plant hormone signal transduction, plant-pathogen interactions, starch and sucrose metabolism, arginine and proline metabolism, and other secondary metabolite pathways were enriched under salt stress. Based on the RPKM values of assembled unigenes, 24 differentially expressed genes under salt stress were selected for quantitative RT-PCR validation, which successfully confirmed the results of RNA-Seq. Furthermore, we identified 18,039 simple sequence repeats, which may help further elucidate salt tolerance mechanisms in oat. Conclusions Our global survey of transcriptome profiles of oat plants in response to salt stress provides useful insights into the molecular mechanisms underlying salt tolerance in this crop. These findings also represent a rich resource for further analysis of salt tolerance and for breeding oat with improved salt tolerance through the use of salt-related genes. PMID:28192458

  9. De novo Assembly and Characterization of Cajanus scarabaeoides (L.) Thouars Transcriptome by Paired-End Sequencing

    PubMed Central

    Nigam, Deepti; Saxena, Swati; Ramakrishna, G.; Singh, Archana; Singh, N. K.; Gaikwad, Kishor

    2017-01-01

    Pigeonpea [Cajanus cajan (L.) Millsp.] is a heat and drought resilient legume crop grown mostly in Asia and Africa. Pigeonpea is affected by various biotic (diseases and insect pests) and abiotic stresses (salinity and water logging) which limit the yield potential of this crop. However, resistance to all these constraints is not readily available in the cultivated genotypes and some of the wild relatives have been found to withstand these resistances. Thus, the utilization of crop wild relatives (CWR) in pigeonpea breeding has been effective in conferring resistance, quality and breeding efficiency traits to this crop. Bud and leaf tissue of Cajanus scarabaeoides, a wild relative of pigeon pea were used for transcriptome profiling. Approximately 30 million clean reads filtered from raw reads by removal of adaptors, ambiguous reads and low-quality reads (3.02 gigabase pairs) were generated by Illumina paired-end RNA-seq technology. All of these clean reads were pooled and assembled de novo into 1,17,007 transcripts using the Trinity. Finally, a total of 98,664 unigenes were derived with mean length of 396 bp and N50 values of 1393. The assembly produced significant mapping results (73.68%) in BLASTN searches of the Glycine max CDS sequence database (Ensembl). Further, uniprot database of Viridiplantae was used for unigene annotation; 81,799 of 98,664 (82.90%) unigenes were finally annotated with gene descriptions or conserved protein domains. Further, a total of 23,475 SSRs were identified in 27,321 unigenes. This data will provide useful information for mining of functionally important genes and SSR markers for pigeonpea improvement. PMID:28748187

  10. De novo sequencing and analysis of the cranberry fruit transcriptome to identify putative genes involved in flavonoid biosynthesis, transport and regulation.

    PubMed

    Sun, Haiyue; Liu, Yushan; Gai, Yuzhuo; Geng, Jinman; Chen, Li; Liu, Hongdi; Kang, Limin; Tian, Youwen; Li, Yadong

    2015-09-02

    Cranberries (Vaccinium macrocarpon Ait.), renowned for their excellent health benefits, are an important berry crop. Here, we performed transcriptome sequencing of one cranberry cultivar, from fruits at two different developmental stages, on the Illumina HiSeq 2000 platform. Our main goals were to identify putative genes for major metabolic pathways of bioactive compounds and compare the expression patterns between white fruit (W) and red fruit (R) in cranberry. In this study, two cDNA libraries of W and R were constructed. Approximately 119 million raw sequencing reads were generated and assembled de novo, yielding 57,331 high quality unigenes with an average length of 739 bp. Using BLASTx, 38,460 unigenes were identified as putative homologs of annotated sequences in public protein databases, including NCBI NR, NT, Swiss-Prot, KEGG, COG and GO. Of these, 21,898 unigenes mapped to 128 KEGG pathways, with the metabolic pathways, secondary metabolites, glycerophospholipid metabolism, ether lipid metabolism, starch and sucrose metabolism, purine metabolism, and pyrimidine metabolism being well represented. Among them, many candidate genes were involved in flavonoid biosynthesis, transport and regulation. Furthermore, digital gene expression (DEG) analysis identified 3,257 unigenes that were differentially expressed between the two fruit developmental stages. In addition, 14,473 simple sequence repeats (SSRs) were detected. Our results present comprehensive gene expression information about the cranberry fruit transcriptome that could facilitate our understanding of the molecular mechanisms of fruit development in cranberries. Although it will be necessary to validate the functions carried out by these genes, these results could be used to improve the quality of breeding programs for the cranberry and related species.

  11. Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory

    PubMed Central

    Kozyreva, Varvara K.; Truong, Chau-Linda; Greninger, Alexander L.; Crandall, John; Mukhopadhyay, Rituparna

    2017-01-01

    ABSTRACT Public health microbiology laboratories (PHLs) are on the cusp of unprecedented improvements in pathogen identification, antibiotic resistance detection, and outbreak investigation by using whole-genome sequencing (WGS). However, considerable challenges remain due to the lack of common standards. Here, we describe the validation of WGS on the Illumina platform for routine use in PHLs according to Clinical Laboratory Improvements Act (CLIA) guidelines for laboratory-developed tests (LDTs). We developed a validation panel comprising 10 Enterobacteriaceae isolates, 5 Gram-positive cocci, 5 Gram-negative nonfermenting species, 9 Mycobacterium tuberculosis isolates, and 5 miscellaneous bacteria. The genome coverage range was 15.71× to 216.4× (average, 79.72×; median, 71.55×); the limit of detection (LOD) for single nucleotide polymorphisms (SNPs) was 60×. The accuracy, reproducibility, and repeatability of base calling were >99.9%. The accuracy of phylogenetic analysis was 100%. The specificity and sensitivity inferred from multilocus sequence typing (MLST) and genome-wide SNP-based phylogenetic assays were 100%. The following objectives were accomplished: (i) the establishment of the performance specifications for WGS applications in PHLs according to CLIA guidelines, (ii) the development of quality assurance and quality control measures, (iii) the development of a reporting format for end users with or without WGS expertise, (iv) the availability of a validation set of microorganisms, and (v) the creation of a modular template for the validation of WGS processes in PHLs. The validation panel, sequencing analytics, and raw sequences could facilitate multilaboratory comparisons of WGS data. Additionally, the WGS performance specifications and modular template are adaptable for the validation of other platforms and reagent kits. PMID:28592550

  12. Draft genome of the lined seahorse, Hippocampus erectus

    PubMed Central

    Lin, Qiang; Qiu, Ying; Gu, Ruobo; Xu, Meng; Li, Jia; Bian, Chao; Zhang, Huixian; Qin, Geng; Zhang, Yanhong; Luo, Wei; Chen, Jieming; You, Xinxin; Fan, Mingjun; Sun, Min; Xu, Pao; Venkatesh, Byrappa

    2017-01-01

    Abstract Background: The lined seahorse, Hippocampus erectus, is an Atlantic species and mainly inhabits shallow sea beds or coral reefs. It has become very popular in China for its wide use in traditional Chinese medicine. In order to improve the aquaculture yield of this valuable fish species, we are trying to develop genomic resources for assistant selection in genetic breeding. Here, we provide whole genome sequencing, assembly, and gene annotation of the lined seahorse, which can enrich genome resource and further application for its molecular breeding. Findings: A total of 174.6 Gb (Gigabase) raw DNA sequences were generated by the Illumina Hiseq2500 platform. The final assembly of the lined seahorse genome is around 458 Mb, representing 94% of the estimated genome size (489 Mb by k-mer analysis). The contig N50 and scaffold N50 reached 14.57 kb and 1.97 Mb, respectively. Quality of the assembled genome was assessed by BUSCO with prediction of 85% of the known vertebrate genes and evaluated using the de novo assembled RNA-seq transcripts to prove a high mapping ratio (more than 99% transcripts could be mapped to the assembly). Using homology-based, de novo and transcriptome-based prediction methods, we predicted 20 788 protein-coding genes in the generated assembly, which is less than our previously reported gene number (23 458) of the tiger tail seahorse (H. comes). Conclusion: We report a draft genome of the lined seahorse. These generated genomic data are going to enrich genome resource of this economically important fish, and also provide insights into the genetic mechanisms of its iconic morphology and male pregnancy behavior. PMID:28444302

  13. Fast, accurate and easy-to-pipeline methods for amplicon sequence processing

    NASA Astrophysics Data System (ADS)

    Antonielli, Livio; Sessitsch, Angela

    2016-04-01

    Next generation sequencing (NGS) technologies established since years as an essential resource in microbiology. While on the one hand metagenomic studies can benefit from the continuously increasing throughput of the Illumina (Solexa) technology, on the other hand the spreading of third generation sequencing technologies (PacBio, Oxford Nanopore) are getting whole genome sequencing beyond the assembly of fragmented draft genomes, making it now possible to finish bacterial genomes even without short read correction. Besides (meta)genomic analysis next-gen amplicon sequencing is still fundamental for microbial studies. Amplicon sequencing of the 16S rRNA gene and ITS (Internal Transcribed Spacer) remains a well-established widespread method for a multitude of different purposes concerning the identification and comparison of archaeal/bacterial (16S rRNA gene) and fungal (ITS) communities occurring in diverse environments. Numerous different pipelines have been developed in order to process NGS-derived amplicon sequences, among which Mothur, QIIME and USEARCH are the most well-known and cited ones. The entire process from initial raw sequence data through read error correction, paired-end read assembly, primer stripping, quality filtering, clustering, OTU taxonomic classification and BIOM table rarefaction as well as alternative "normalization" methods will be addressed. An effective and accurate strategy will be presented using the state-of-the-art bioinformatic tools and the example of a straightforward one-script pipeline for 16S rRNA gene or ITS MiSeq amplicon sequencing will be provided. Finally, instructions on how to automatically retrieve nucleotide sequences from NCBI and therefore apply the pipeline to targets other than 16S rRNA gene (Greengenes, SILVA) and ITS (UNITE) will be discussed.

  14. Blood Transcriptomic Comparison of Individuals with and without Autism Spectrum Disorder: A Combined-Samples Mega-Analysis

    PubMed Central

    Tylee, Daniel S.; Hess, Jonathan L.; Quinn, Thomas P.; Barve, Rahul; Huang, Hailiang; Zhang-James, Yanli; Chang, Jeffrey; Stamova, Boryana S.; Sharp, Frank R.; Hertz-Picciotto, Irva; Faraone, Stephen V.; Kong, Sek Won; Glatt, Stephen J.

    2017-01-01

    Blood-based microarray studies comparing individuals affected with autism spectrum disorder (ASD) and typically developing individuals help characterize differences in circulating immune cell functions and offer potential biomarker signal. We sought to combine the subject-level data from previously published studies by mega-analysis to increase the statistical power. We identified studies that compared ex-vivo blood or lymphocytes from ASD-affected individuals and unrelated comparison subjects using Affymetrix or Illumina array platforms. Raw microarray data and clinical meta-data were obtained from seven studies, totaling 626 affected and 447 comparison subjects. Microarray data were processed using uniform methods. Covariate-controlled mixed-effect linear models were used to identify gene transcripts and co-expression network modules that were significantly associated with diagnostic status. Permutation-based gene-set analysis was used to identify functionally related sets of genes that were over- and under-expressed among ASD samples. Our results were consistent with diminished interferon-, EGF-, PDGF-, PI3K-AKT-mTOR-, and RAS-MAPK-signaling cascades, and increased ribosomal translation and NK-cell related activity in ASD. We explored evidence for sex-differences in the ASD-related transcriptomic signature. We also demonstrated that machine-learning classifiers using blood transcriptome data perform with moderate accuracy when data are combined across studies. Comparing our results with those from blood-based studies of protein biomarkers (e.g., cytokines and trophic factors), we propose that ASD may feature decoupling between certain circulating signaling proteins (higher in ASD samples) and the transcriptional cascades which they typically elicit within circulating immune cells (lower in ASD samples). These findings provide insight into ASD-related transcriptional differences in circulating immune cells. PMID:27862943

  15. Identification and characterization of EBV genomes in spontaneously immortalized human peripheral blood B lymphocytes by NGS technology.

    PubMed

    Lei, Haiyan; Li, Tianwei; Hung, Guo-Chiuan; Li, Bingjie; Tsai, Shien; Lo, Shyh-Ching

    2013-11-19

    We conducted genomic sequencing to identify Epstein Barr Virus (EBV) genomes in 2 human peripheral blood B lymphocytes that underwent spontaneous immortalization promoted by mycoplasma infections in culture, using the high-throughput sequencing (HTS) Illumina MiSeq platform. The purpose of this study was to examine if rapid detection and characterization of a viral agent could be effectively achieved by HTS using a platform that has become readily available in general biology laboratories. Raw read sequences, averaging 175 bps in length, were mapped with DNA databases of human, bacteria, fungi and virus genomes using the CLC Genomics Workbench bioinformatics tool. Overall 37,757 out of 49,520,834 total reads in one lymphocyte line (# K4413-Mi) and 28,178 out of 45,335,960 reads in the other lymphocyte line (# K4123-Mi) were identified as EBV sequences. The two EBV genomes with estimated 35.22-fold and 31.06-fold sequence coverage respectively, designated K4413-Mi EBV and K4123-Mi EBV (GenBank accession number KC440852 and KC440851 respectively), are characteristic of type-1 EBV. Sequence comparison and phylogenetic analysis among K4413-Mi EBV, K4123-Mi EBV and the EBV genomes previously reported to GenBank as well as the NA12878 EBV genome assembled from database of the 1000 Genome Project showed that these 2 EBVs are most closely related to B95-8, an EBV previously isolated from a patient with infectious mononucleosis and WT-EBV. They are less similar to EBVs associated with nasopharyngeal carcinoma (NPC) from Hong Kong and China as well as the Akata strain of a case of Burkitt's lymphoma from Japan. They are most different from type 2 EBV found in Western African Burkitt's lymphoma.

  16. Transcriptome and Proteome Exploration to Provide a Resource for the Study of Agrocybe aegerita

    PubMed Central

    Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Background Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. Methodology/Principal Findings To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. Conclusions/Significance This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry. PMID:23418592

  17. Transcriptome and proteome exploration to provide a resource for the study of Agrocybe aegerita.

    PubMed

    Wang, Man; Gu, Bianli; Huang, Jie; Jiang, Shuai; Chen, Yijie; Yin, Yalin; Pan, Yongfu; Yu, Guojun; Li, Yamu; Wong, Barry Hon Cheung; Liang, Yi; Sun, Hui

    2013-01-01

    Agrocybe aegerita, the black poplar mushroom, has been highly valued as a functional food for its medicinal and nutritional benefits. Several bioactive extracts from A. aegerita have been found to exhibit antitumor and antioxidant activities. However, limited genetic resources for A. aegerita have hindered exploration of this species. To facilitate the research on A. aegerita, we established a deep survey of the transcriptome and proteome of this mushroom. We applied high-throughput sequencing technology (Illumina) to sequence A. aegerita transcriptomes from mycelium and fruiting body. The raw clean reads were de novo assembled into a total of 36,134 expressed sequences tags (ESTs) with an average length of 663 bp. These ESTs were annotated and classified according to Gene Ontology (GO), Clusters of Orthologous Groups (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways. Gene expression profile analysis showed that 18,474 ESTs were differentially expressed, with 10,131 up-regulated in mycelium and 8,343 up-regulated in fruiting body. Putative genes involved in polysaccharide and steroid biosynthesis were identified from A. aegerita transcriptome, and these genes were differentially expressed at the two stages of A. aegerita. Based on one-dimensional gel electrophoresis (1-DGE) coupled with electrospray ionization liquid chromatography tandem MS (LC-ESI-MS/MS), we identified a total of 309 non-redundant proteins. And many metabolic enzymes involved in glycolysis were identified in the protein database. This is the first study on transcriptome and proteome analyses of A. aegerita. The data in this study serve as a resource of A. aegerita transcripts and proteins, and offer clues to the applications of this mushroom in nutrition, pharmacy and industry.

  18. Validation and Implementation of Clinical Laboratory Improvements Act-Compliant Whole-Genome Sequencing in the Public Health Microbiology Laboratory.

    PubMed

    Kozyreva, Varvara K; Truong, Chau-Linda; Greninger, Alexander L; Crandall, John; Mukhopadhyay, Rituparna; Chaturvedi, Vishnu

    2017-08-01

    Public health microbiology laboratories (PHLs) are on the cusp of unprecedented improvements in pathogen identification, antibiotic resistance detection, and outbreak investigation by using whole-genome sequencing (WGS). However, considerable challenges remain due to the lack of common standards. Here, we describe the validation of WGS on the Illumina platform for routine use in PHLs according to Clinical Laboratory Improvements Act (CLIA) guidelines for laboratory-developed tests (LDTs). We developed a validation panel comprising 10 Enterobacteriaceae isolates, 5 Gram-positive cocci, 5 Gram-negative nonfermenting species, 9 Mycobacterium tuberculosis isolates, and 5 miscellaneous bacteria. The genome coverage range was 15.71× to 216.4× (average, 79.72×; median, 71.55×); the limit of detection (LOD) for single nucleotide polymorphisms (SNPs) was 60×. The accuracy, reproducibility, and repeatability of base calling were >99.9%. The accuracy of phylogenetic analysis was 100%. The specificity and sensitivity inferred from multilocus sequence typing (MLST) and genome-wide SNP-based phylogenetic assays were 100%. The following objectives were accomplished: (i) the establishment of the performance specifications for WGS applications in PHLs according to CLIA guidelines, (ii) the development of quality assurance and quality control measures, (iii) the development of a reporting format for end users with or without WGS expertise, (iv) the availability of a validation set of microorganisms, and (v) the creation of a modular template for the validation of WGS processes in PHLs. The validation panel, sequencing analytics, and raw sequences could facilitate multilaboratory comparisons of WGS data. Additionally, the WGS performance specifications and modular template are adaptable for the validation of other platforms and reagent kits. Copyright © 2017 Kozyreva et al.

  19. Transcriptome analysis of Ruditapes philippinarum hepatopancreas provides insights into immune signaling pathways under Vibrio anguillarum infection.

    PubMed

    Ren, Yipeng; Xue, Junli; Yang, Huanhuan; Pan, Baoping; Bu, Wenjun

    2017-05-01

    The Manila clam, Ruditapes philippinarum, is one of the most economically important aquatic clams that are harvested on a large scale by the mariculture industry in China. However, increasing reports of bacterial pathogenic diseases have had a negative effect on the aquaculture industry of R. philippinarum. In the present study, the two transcriptome libraries of untreated (termed H) and challenged Vibrio anguillarum (termed HV) hepatopancreas were constructed and sequenced from Manila clam using an Illumina-based paired-end sequencing platform. In total, 75,302,886 and 66,578,976 high-quality clean reads were assembled from 101,080,746 and 99,673,538 raw data points from the two transcriptome libraries described above, respectively. Furthermore, 156,116 unigenes were generated from 210,685 transcripts, with an N50 length of 1125 bp, and from the annotated SwissProt, NR, NT, KO, GO, KOG and KEGG databases. Moreover, a total of 4071 differentially expressed unigenes (HV vs H) were detected, including 903 up-regulated and 3168 down-regulated genes. Among these differentially expressed unigenes, 226 unigenes were annotated using KEGG annotation in 16 immune-related signaling pathways, including Toll-like receptor, NF-kappa B, MAPK, NOD-like receptor, RIG-I-like receptor, and the TNF and chemokine signaling pathways. Finally, 20,341 simple sequence repeats (SSRs) and 214,430 potential single nucleotide polymorphisms (SNPs) were detected from the H and HV transcriptome libraries. In conclusion, these studies identified many candidate immune-related genes and signaling pathways and conducted a comparative analysis of the differentially expressed unigenes from Manila clam hepatopancreas in response to V. anguillarum stimulation. These data laid the foundation for studying the innate immune systems and defense mechanisms in R. philippinarum. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Transcriptomic immune response of Tenebrio molitor pupae to parasitization by Scleroderma guani.

    PubMed

    Zhu, Jia-Ying; Yang, Pu; Zhang, Zhong; Wu, Guo-Xing; Yang, Bin

    2013-01-01

    Host and parasitoid interaction is one of the most fascinating relationships of insects, which is currently receiving an increasing interest. Understanding the mechanisms evolved by the parasitoids to evade or suppress the host immune system is important for dissecting this interaction, while it was still poorly known. In order to gain insight into the immune response of Tenebrio molitor to parasitization by Scleroderma guani, the transcriptome of T. molitor pupae was sequenced with focus on immune-related gene, and the non-parasitized and parasitized T. molitor pupae were analyzed by digital gene expression (DGE) analysis with special emphasis on parasitoid-induced immune-related genes using Illumina sequencing. In a single run, 264,698 raw reads were obtained. De novo assembly generated 71,514 unigenes with mean length of 424 bp. Of those unigenes, 37,373 (52.26%) showed similarity to the known proteins in the NCBI nr database. Via analysis of the transcriptome data in depth, 430 unigenes related to immunity were identified. DGE analysis revealed that parasitization by S. guani had considerable impacts on the transcriptome profile of T. molitor pupae, as indicated by the significant up- or down-regulation of 3,431 parasitism-responsive transcripts. The expression of a total of 74 unigenes involved in immune response of T. molitor was significantly altered after parasitization. obtained T. molitor transcriptome, in addition to establishing a fundamental resource for further research on functional genomics, has allowed the discovery of a large group of immune genes that might provide a meaningful framework to better understand the immune response in this species and other beetles. The DGE profiling data provides comprehensive T. molitor immune gene expression information at the transcriptional level following parasitization, and sheds valuable light on the molecular understanding of the host-parasitoid interaction.

  1. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    PubMed

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  2. Analysis of Litopenaeus vannamei Transcriptome Using the Next-Generation DNA Sequencing Technique

    PubMed Central

    Li, Chaozheng; Weng, Shaoping; Chen, Yonggui; Yu, Xiaoqiang; Lü, Ling; Zhang, Haiqing; He, Jianguo; Xu, Xiaopeng

    2012-01-01

    Background Pacific white shrimp (Litopenaeus vannamei), the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. Methodology/Principal Findings This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp) with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG) categories, 8171 unigenes were assigned into 51 Gene ontology (GO) functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. Conclusions/Significance The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei. PMID:23071809

  3. Design of a 9K illumina BeadChip for polar bears (Ursus maritimus) from RAD and transcriptome sequencing.

    PubMed

    Malenfant, René M; Coltman, David W; Davis, Corey S

    2015-05-01

    Single-nucleotide polymorphisms (SNPs) offer numerous advantages over anonymous markers such as microsatellites, including improved estimation of population parameters, finer-scale resolution of population structure and more precise genomic dissection of quantitative traits. However, many SNPs are needed to equal the resolution of a single microsatellite, and reliable large-scale genotyping of SNPs remains a challenge in nonmodel species. Here, we document the creation of a 9K Illumina Infinium BeadChip for polar bears (Ursus maritimus), which will be used to investigate: (i) the fine-scale population structure among Canadian polar bears and (ii) the genomic architecture of phenotypic traits in the Western Hudson Bay subpopulation. To this end, we used restriction-site associated DNA (RAD) sequencing from 38 bears across their circumpolar range, as well as blood/fat transcriptome sequencing of 10 individuals from Western Hudson Bay. Six-thousand RAD SNPs and 3000 transcriptomic SNPs were selected for the chip, based primarily on genomic spacing and gene function respectively. Of the 9000 SNPs ordered from Illumina, 8042 were successfully printed, and - after genotyping 1450 polar bears - 5441 of these SNPs were found to be well clustered and polymorphic. Using this array, we show rapid linkage disequilibrium decay among polar bears, we demonstrate that in a subsample of 78 individuals, our SNPs detect known genetic structure more clearly than 24 microsatellites genotyped for the same individuals and that these results are not driven by the SNP ascertainment scheme. Here, we present one of the first large-scale genotyping resources designed for a threatened species. © 2014 John Wiley & Sons Ltd.

  4. RNA Deep Sequencing Reveals Differential MicroRNA Expression during Development of Sea Urchin and Sea Star

    PubMed Central

    Kadri, Sabah; Hinman, Veronica F.; Benos, Panayiotis V.

    2011-01-01

    microRNAs (miRNAs) are small (20–23 nt), non-coding single stranded RNA molecules that act as post-transcriptional regulators of mRNA gene expression. They have been implicated in regulation of developmental processes in diverse organisms. The echinoderms, Strongylocentrotus purpuratus (sea urchin) and Patiria miniata (sea star) are excellent model organisms for studying development with well-characterized transcriptional networks. However, to date, nothing is known about the role of miRNAs during development in these organisms, except that the genes that are involved in the miRNA biogenesis pathway are expressed during their developmental stages. In this paper, we used Illumina Genome Analyzer (Illumina, Inc.) to sequence small RNA libraries in mixed stage population of embryos from one to three days after fertilization of sea urchin and sea star (total of 22,670,000 reads). Analysis of these data revealed the miRNA populations in these two species. We found that 47 and 38 known miRNAs are expressed in sea urchin and sea star, respectively, during early development (32 in common). We also found 13 potentially novel miRNAs in the sea urchin embryonic library. miRNA expression is generally conserved between the two species during development, but 7 miRNAs are highly expressed in only one species. We expect that our two datasets will be a valuable resource for everyone working in the field of developmental biology and the regulatory networks that affect it. The computational pipeline to analyze Illumina reads is available at http://www.benoslab.pitt.edu/services.html. PMID:22216218

  5. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    PubMed Central

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484

  6. Genomics of Compositae crops: reference transcriptome assemblies and evidence of hybridization with wild relatives.

    PubMed

    Hodgins, Kathryn A; Lai, Zhao; Oliveira, Luiz O; Still, David W; Scascitelli, Moira; Barker, Michael S; Kane, Nolan C; Dempewolf, Hannes; Kozik, Alex; Kesseli, Richard V; Burke, John M; Michelmore, Richard W; Rieseberg, Loren H

    2014-01-01

    Although the Compositae harbours only two major food crops, sunflower and lettuce, many other species in this family are utilized by humans and have experienced various levels of domestication. Here, we have used next-generation sequencing technology to develop 15 reference transcriptome assemblies for Compositae crops or their wild relatives. These data allow us to gain insight into the evolutionary and genomic consequences of plant domestication. Specifically, we performed Illumina sequencing of Cichorium endivia, Cichorium intybus, Echinacea angustifolia, Iva annua, Helianthus tuberosus, Dahlia hybrida, Leontodon taraxacoides and Glebionis segetum, as well 454 sequencing of Guizotia scabra, Stevia rebaudiana, Parthenium argentatum and Smallanthus sonchifolius. Illumina reads were assembled using Trinity, and 454 reads were assembled using MIRA and CAP3. We evaluated the coverage of the transcriptomes using BLASTX analysis of a set of ultra-conserved orthologs (UCOs) and recovered most of these genes (88-98%). We found a correlation between contig length and read length for the 454 assemblies, and greater contig lengths for the 454 compared with the Illumina assemblies. This suggests that longer reads can aid in the assembly of more complete transcripts. Finally, we compared the divergence of orthologs at synonymous sites (Ks) between Compositae crops and their wild relatives and found greater divergence when the progenitors were self-incompatible. We also found greater divergence between pairs of taxa that had some evidence of postzygotic isolation. For several more distantly related congeners, such as chicory and endive, we identified a signature of introgression in the distribution of Ks values. © 2013 John Wiley & Sons Ltd.

  7. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  8. Epigenome-Wide DNA Methylation in Hearing Ability: New Mechanisms for an Old Problem

    PubMed Central

    Wolber, Lisa E.; Steves, Claire J.; Tsai, Pei-Chien; Deloukas, Panos; Spector, Tim D.

    2014-01-01

    Epigenetic regulation of gene expression has been shown to change over time and may be associated with environmental exposures in common complex traits. Age-related hearing impairment is a complex disorder, known to be heritable, with heritability estimates of 57–70%. Epigenetic regulation might explain the observed difference in age of onset and magnitude of hearing impairment with age. Epigenetic epidemiology studies using unrelated samples can be limited in their ability to detect small effects, and recent epigenetic findings in twins underscore the power of this well matched study design. We investigated the association between venous blood DNA methylation epigenome-wide and hearing ability. Pure-tone audiometry (PTA) and Illumina HumanMethylation array data were obtained from female twin volunteers enrolled in the TwinsUK register. Two study groups were explored: first, an epigenome-wide association scan (EWAS) was performed in a discovery sample (n = 115 subjects, age range: 47–83 years, Illumina 27 k array), then replication of the top ten associated probes from the discovery EWAS was attempted in a second unrelated sample (n = 203, age range: 41–86 years, Illumina 450 k array). Finally, a set of monozygotic (MZ) twin pairs (n = 21 pairs) within the discovery sample (Illumina 27 k array) was investigated in more detail in an MZ discordance analysis. Hearing ability was strongly associated with DNA methylation levels in the promoter regions of several genes, including TCF25 (cg01161216, p = 6.6×10−6), FGFR1 (cg15791248, p = 5.7×10−5) and POLE (cg18877514, p = 6.3×10−5). Replication of these results in a second sample confirmed the presence of differential methylation at TCF25 (p(replication) = 6×10−5) and POLE (p(replication) = 0.016). In the MZ discordance analysis, twins' intrapair difference in hearing ability correlated with DNA methylation differences at ACP6 (cg01377755, r = −0.75, p = 1.2×10−4) and MEF2D (cg08156349, r = −0.75, p = 1.4×10−4). Examination of gene expression in skin, suggests an influence of differential methylation on expression, which may account for the variation in hearing ability with age. PMID:25184702

  9. Detection and assessment of copy number variation using PacBio long-read and Illumina sequencing in New Zealand dairy cattle.

    PubMed

    Couldrey, C; Keehan, M; Johnson, T; Tiplady, K; Winkelman, A; Littlejohn, M D; Scott, A; Kemper, K E; Hayes, B; Davis, S R; Spelman, R J

    2017-07-01

    Single nucleotide polymorphisms have been the DNA variant of choice for genomic prediction, largely because of the ease of single nucleotide polymorphism genotype collection. In contrast, structural variants (SV), which include copy number variants (CNV), translocations, insertions, and inversions, have eluded easy detection and characterization, particularly in nonhuman species. However, evidence increasingly shows that SV not only contribute a substantial proportion of genetic variation but also have significant influence on phenotypes. Here we present the discovery of CNV in a prominent New Zealand dairy bull using long-read PacBio (Pacific Biosciences, Menlo Park, CA) sequencing technology and the Sniffles SV discovery tool (version 0.0.1; https://github.com/fritzsedlazeck/Sniffles). The CNV identified from long reads were compared with CNV discovered in the same bull from Illumina sequencing using CNVnator (read depth-based tool; Illumina Inc., San Diego, CA) as a means of validation. Subsequently, further validation was undertaken using whole-genome Illumina sequencing of 556 cattle representing the wider New Zealand dairy cattle population. Very limited overlap was observed in CNV discovered from the 2 sequencing platforms, in part because of the differences in size of CNV detected. Only a few CNV were therefore able to be validated using this approach. However, the ability to use CNVnator to genotype the 557 cattle for copy number across all regions identified as putative CNV allowed a genome-wide assessment of transmission level of copy number based on pedigree. The more highly transmissible a putative CNV region was observed to be, the more likely the distribution of copy number was multimodal across the 557 sequenced animals. Furthermore, visual assessment of highly transmissible CNV regions provided evidence supporting the presence of CNV across the sequenced animals. This transmission-based approach was able to confirm a subset of CNV that segregates in the New Zealand dairy cattle population. Genome-wide identification and validation of CNV is an important step toward their inclusion in genomic selection strategies. The Authors. Published by the Federation of Animal Science Societies and Elsevier Inc. on behalf of the American Dairy Science Association®. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).

  10. Genome sequence of Stachybotrys chartarum Strain 51-11

    EPA Science Inventory

    Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina Hiseq 2000 and PacBio long read technology. Since Stachybotrys chartarum has been implicated in health impacts within water-damaged buildings, any information extracted from the geno...

  11. System simulation application for determining the size of daily raw material purchases at PT XY

    NASA Astrophysics Data System (ADS)

    Napitupulu, H. L.

    2018-02-01

    Every manufacturing company needs to implement green production, including PT XY as a marine catchment processing industry in Sumatera Utara Province. The company is engaged in the processing of squid for export purposes. The company’s problem relates to the absence of a decision on the daily purchase amount of the squid. The purchase of daily raw materials in varying quantities has caused companies to face the problem of excess raw materials or otherwise the lack of raw materials. The low purchase of raw materials will result in reduced productivity, while large purchases will lead to increased cooling costs for storage of excess raw materials, as well as possible loss of damage raw material. Therefore it is necessary to determine the optimal amount of raw material purchases every day. This can be determined by applying simulation. Application of system simulations can provide the expected optimal amount of raw material purchases.

  12. The draft genome of Globodera ellingtonae

    USDA-ARS?s Scientific Manuscript database

    Globodera ellingtonae is a newly described potato cyst nematode found in Idaho, Oregon, and Argentina. Here we present a genome assembly for G. ellingtonae, a relative of the quarantine nematodes G. pallida and G. rostochiensis, produced using data from Illumina and Pacific Biosciences sequencing te...

  13. Occurrence of Campylobacter spp. in raw and ready-to-eat foods and in a Canadian food service operation.

    PubMed

    Medeiros, Diane T; Sattar, Syed A; Farber, Jeffrey M; Carrillo, Catherine D

    2008-10-01

    The occurrence of Campylobacter spp. in a variety of foods from Ottawa, Ontario, Canada, and raw milk samples from across Canada was determined over a 2-year period. The samples consisted of 55 raw foods (chicken, pork, and beef), 126 raw milk samples from raw milk cheese manufacturers, and 135 ready-to-eat foods (meat products, salads, and raw milk cheeses). Campylobacter jejuni was detected in 4 of the 316 samples analyzed: 1 raw beef liver sample and 3 raw chicken samples. An isolation rate of 9.7% was observed among the raw chicken samples tested. This study also investigated the role of cross-contamination in disseminating Campylobacter from raw poultry within a food service operation specializing in poultry dishes. Accordingly, kitchen surfaces within a restaurant in Ottawa, Ontario, were sampled between March and August 2001. Tests of the sampling method indicated that as few as 100 Campylobacter cells could be detected if sampling was done within 45 min of inoculation; however, Campylobacter spp. were not detected in 125 swabs of surfaces within the kitchens of this food service operation. Despite the reported high prevalence of Campylobacter spp. in raw poultry, this organism was not detected on surfaces within a kitchen of a restaurant specializing in poultry dishes.

  14. 78 FR 73561 - Raw Flexible Magnets From China and Taiwan; Scheduling of Expedited Five-Year Reviews Concerning...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-06

    ... INTERNATIONAL TRADE COMMISSION [Investigation Nos. 701-TA-452 and 731-TA-1129-1130 (Review)] Raw... Countervailing Duty Order on Raw Flexible Magnets From China and the Antidumping Duty Orders on Raw Flexible... the countervailing duty order on raw flexible magnets from China and the antidumping duty orders on...

  15. 21 CFR 161.175 - Frozen raw breaded shrimp.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 2 2010-04-01 2010-04-01 false Frozen raw breaded shrimp. 161.175 Section 161.175... § 161.175 Frozen raw breaded shrimp. (a) Frozen raw breaded shrimp is the food prepared by coating one..., other than those provided for in this paragraph, are not suitable ingredients of frozen raw breaded...

  16. 21 CFR 161.175 - Frozen raw breaded shrimp.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 2 2011-04-01 2011-04-01 false Frozen raw breaded shrimp. 161.175 Section 161.175... § 161.175 Frozen raw breaded shrimp. (a) Frozen raw breaded shrimp is the food prepared by coating one..., other than those provided for in this paragraph, are not suitable ingredients of frozen raw breaded...

  17. Effect of torrefaction on the properties of rice straw high temperature pyrolysis char: Pore structure, aromaticity and gasification activity.

    PubMed

    Chen, Handing; Chen, Xueli; Qin, Yueqiang; Wei, Juntao; Liu, Haifeng

    2017-03-01

    The influence of torrefaction on the physicochemical characteristics of char during raw and water washed rice straw pyrolysis at 800-1200°C is investigated. Pore structure, aromaticity and gasification activity of pyrolysis chars are compared between raw and torrefied samples. For raw straw, BET specific surface area decreases with the increased torrefaction temperature at the same pyrolysis temperature and it approximately increases linearly with weight loss during pyrolysis. The different pore structure evolutions relate to the different volatile matters and pore structures between raw and torrefied straw. Torrefaction at higher temperature would bring about a lower graphitization degree of char during pyrolysis of raw straw. Pore structure and carbon crystalline structure evolutions of raw and torrefied water washed straw are different from these of raw straw during pyrolysis. For both raw and water washed straw, CO 2 gasification activities of pyrolysis chars are different between raw and torrefied samples. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. RNA-seq analysis of Rubus idaeus cv. Nova: transcriptome sequencing and de novo assembly for subsequent functional genomics approaches.

    PubMed

    Hyun, Tae Kyung; Lee, Sarah; Kumar, Dhinesh; Rim, Yeonggil; Kumar, Ritesh; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-10-01

    Using Illumina sequencing technology, we have generated the large-scale transcriptome sequencing data containing abundant information on genes involved in the metabolic pathways in R. idaeus cv. Nova fruits. Rubus idaeus (Red raspberry) is one of the important economical crops that possess numerous nutrients, micronutrients and phytochemicals with essential health benefits to human. The molecular mechanism underlying the ripening process and phytochemical biosynthesis in red raspberry is attributed to the changes in gene expression, but very limited transcriptomic and genomic information in public databases is available. To address this issue, we generated more than 51 million sequencing reads from R. idaeus cv. Nova fruit using Illumina RNA-Seq technology. After de novo assembly, we obtained 42,604 unigenes with an average length of 812 bp. At the protein level, Nova fruit transcriptome showed 77 and 68 % sequence similarities with Rubus coreanus and Fragaria versa, respectively, indicating the evolutionary relationship between them. In addition, 69 % of assembled unigenes were annotated using public databases including NCBI non-redundant, Cluster of Orthologous Groups and Gene ontology database, suggesting that our transcriptome dataset provides a valuable resource for investigating metabolic processes in red raspberry. To analyze the relationship between several novel transcripts and the amounts of metabolites such as γ-aminobutyric acid and anthocyanins, real-time PCR and target metabolite analysis were performed on two different ripening stages of Nova. This is the first attempt using Illumina sequencing platform for RNA sequencing and de novo assembly of Nova fruit without reference genome. Our data provide the most comprehensive transcriptome resource available for Rubus fruits, and will be useful for understanding the ripening process and for breeding R. idaeus cultivars with improved fruit quality.

  19. Genomics of a revived breed: Case study of the Belgian campine cattle

    PubMed Central

    Wijnrocx, Katrien; Colinet, Frédéric G.; Gengler, Nicolas; Hulsegge, Bettine; Windig, Jack J.; Buys, Nadine

    2017-01-01

    Through centuries of both natural and artificial selection, a variety of local cattle populations arose with highly specific phenotypes. However, the intensification and expansion of scale in animal production systems led to the predominance of a few highly productive cattle breeds. The loss of local populations is often considered irreversible and with them specific qualities and rare variants could be lost as well. Over these last years, the interest in these local breeds has increased again leading to increasing efforts to conserve these breeds or even revive lost populations, e.g. through the use of crosses with similar breeds. However, the remaining populations are expected to contain crossbred individuals resulting from introgressions. They are likely to carry exogenous genes that affect the breed’s authenticity on a genomic level. Using the revived Campine breed as a case study, 289 individuals registered as purebreds were genotyped on the Illumina BovineSNP50. In addition, genomic information on the Illumina BovineHD and Illumina BovineSNP50 of ten breeds was available to assess the current population structure, genetic diversity, and introgression with phenotypically similar and/or historically related breeds. Introgression with Holstein and beef cattle genotypes was limited to only a few farms. While the current population shows a substantial amount of within-breed variation, the majority of genotypes can be separated from other breeds in the study, supporting the re-establishment of the Campine breed. The majority of the population is genetically close to the Deep Red (NL), Improved Red (NL) and Eastern Belgium Red and White (BE) cattle, breeds known for their historical ties to the Campine breed. This would support an open herdbook policy, thereby increasing the population size and consequently providing a more secure future for the breed. PMID:28426822

  20. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

    PubMed

    Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong

    2012-07-24

    Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  1. Low Diversity in the Mitogenome of Sperm Whales Revealed by Next-Generation Sequencing

    PubMed Central

    Alexander, Alana; Steel, Debbie; Slikas, Beth; Hoekzema, Kendra; Carraher, Colm; Parks, Matthew; Cronn, Richard; Baker, C. Scott

    2013-01-01

    Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20 mitogenomes from 17 sperm whales representative of worldwide diversity using Next Generation Sequencing (NGS) technologies (Illumina GAIIx, Roche 454 GS Junior). Resequencing of three individuals with both NGS platforms and partial Sanger sequencing showed low discrepancy rates (454-Illumina: 0.0071%; Sanger-Illumina: 0.0034%; and Sanger-454: 0.0023%) confirming suitability of both NGS platforms for investigating low mitogenomic diversity. Using the 17 sperm whale mitogenomes in a phylogenetic reconstruction with 41 other species, including 11 new dolphin mitogenomes, we tested two hypotheses for the low CR diversity. First, the hypothesis that CR-specific constraints have reduced diversity solely in the CR was rejected as diversity was low throughout the mitogenome, not just in the CR (overall diversity π = 0.096%; protein-coding 3rd codon = 0.22%; CR = 0.35%), and CR phylogenetic signal was congruent with protein-coding regions. Second, the hypothesis that slow substitution rates reduced diversity throughout the sperm whale mitogenome was rejected as sperm whales had significantly higher rates of CR evolution and no evidence of slow coding region evolution relative to other cetaceans. The estimated time to most recent common ancestor for sperm whale mitogenomes was 72,800 to 137,400 years ago (95% highest probability density interval), consistent with previous hypotheses of a bottleneck or selective sweep as likely causes of low mitogenome diversity. PMID:23254394

  2. De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

    PubMed Central

    2010-01-01

    Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097

  3. Illumina Amplicon Sequencing of 16S rRNA Tag Reveals Bacterial Community Development in the Rhizosphere of Apple Nurseries at a Replant Disease Site and a New Planting Site

    PubMed Central

    Sun, Jian; Zhang, Qiang; Zhou, Jia; Wei, Qinping

    2014-01-01

    We used a next-generation, Illumina-based sequencing approach to characterize the bacterial community development of apple rhizosphere soil in a replant site (RePlant) and a new planting site (NewPlant) in Beijing. Dwarfing apple nurseries of ‘Fuji’/SH6/Pingyitiancha trees were planted in the spring of 2013. Before planting, soil from the apple rhizosphere of the replant site (ReSoil) and from the new planting site (NewSoil) was sampled for analysis on the Illumina MiSeq platform. In late September, the rhizosphere soil from both sites was resampled (RePlant and NewPlant). More than 16,000 valid reads were obtained for each replicate, and the community was composed of five dominant groups (Proteobacteria, Acidobacteria, Bacteroidetes, Gemmatimonadetes and Actinobacteria). The bacterial diversity decreased after apple planting. Principal component analyses revealed that the rhizosphere samples were significantly different among treatments. Apple nursery planting showed a large impact on the soil bacterial community, and the community development was significantly different between the replanted and newly planted soils. Verrucomicrobia were less abundant in RePlant soil, while Pseudomonas and Lysobacter were increased in RePlant compared with ReSoil and NewPlant. Both RePlant and ReSoil showed relatively higher invertase and cellulase activities than NewPlant and NewSoil, but only NewPlant soil showed higher urease activity, and this soil also had the higher plant growth. Our experimental results suggest that planting apple nurseries has a significant impact on soil bacterial community development at both replant and new planting sites, and planting on new site resulted in significantly higher soil urease activity and a different bacterial community composition. PMID:25360786

  4. An initiator codon mutation in SDE2 causes recessive embryonic lethality in Holstein cattle.

    PubMed

    Fritz, Sébastien; Hoze, Chris; Rebours, Emmanuelle; Barbat, Anne; Bizard, Méline; Chamberlain, Amanda; Escouflaire, Clémentine; Vander Jagt, Christy; Boussaha, Mekki; Grohs, Cécile; Allais-Bonnet, Aurélie; Philippe, Maëlle; Vallée, Amélie; Amigues, Yves; Hayes, Benjamin J; Boichard, Didier; Capitan, Aurélien

    2018-04-18

    Researching depletions in homozygous genotypes for specific haplotypes among the large cohorts of animals genotyped for genomic selection is a very efficient strategy to map recessive lethal mutations. In this study, by analyzing real or imputed Illumina BovineSNP50 (Illumina Inc., San Diego, CA) genotypes from more than 250,000 Holstein animals, we identified a new locus called HH6 showing significant negative effects on conception rate and nonreturn rate at 56 d in at-risk versus control mating. We fine-mapped this locus in a 1.1-Mb interval and analyzed genome sequence data from 12 carrier and 284 noncarrier Holstein bulls. We report the identification of a strong candidate mutation in the gene encoding SDE2 telomere maintenance homolog (SDE2), a protein essential for genomic stability in eukaryotes. This A-to-G transition changes the initiator ATG (methionine) codon to ACG because the gene is transcribed on the reverse strand. Using RNA sequencing and quantitative reverse-transcription PCR, we demonstrated that this mutation does not significantly affect SDE2 splicing and expression level in heterozygous carriers compared with control animals. Initiation of translation at the closest in-frame methionine codon would truncate the SDE2 precursor by 83 amino acids, including the cleavage site necessary for its activation. Finally, no homozygote for the G allele was observed in a large population of nearly 29,000 individuals genotyped for the mutation. The low frequency (1.3%) of the derived allele in the French population and the availability of a diagnostic test on the Illumina EuroG10K SNP chip routinely used for genomic evaluation will enable rapid and efficient selection against this deleterious mutation. Copyright © 2018 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  5. Illumina-based analysis of endophytic bacterial diversity and space-time dynamics in sugar beet on the north slope of Tianshan mountain.

    PubMed

    Shi, YingWu; Yang, Hongmei; Zhang, Tao; Sun, Jian; Lou, Kai

    2014-01-01

    Plants harbors complex and variable microbial communities. Endophytic bacteria play an important function and potential role more effectively in developing sustainable systems of crop production. To examine how endophytic bacteria in sugar beet (Beta vulgaris L.) vary across both host growth period and location, PCR-based Illumina was applied to revealed the diversity and stability of endophytic bacteria in sugar beet on the north slope of Tianshan mountain, China. A total of 60.84 M effective sequences of 16S rRNA gene V3 region were obtained from sugar beet samples. These sequences revealed huge amount of operational taxonomic units (OTUs) in sugar beet, that is, 19-121 OTUs in a beet sample, at 3 % cutoff level and sequencing depth of 30,000 sequences. We identified 13 classes from the resulting 449,585 sequences. Alphaproteobacteria were the dominant class in all sugar beets, followed by Acidobacteria, Gemmatimonadetes and Actinobacteria. A marked difference in the diversity of endophytic bacteria in sugar beet for different growth periods was evident. The greatest number of OTUs was detected during rossette formation (109 OTUs) and tuber growth (146 OTUs). Endophytic bacteria diversity was reduced during seedling growth (66 OTUs) and sucrose accumulation (95 OTUs). Forty-three OTUs were common to all four periods. There were more tags of Alphaproteobacteria and Gammaproteobacteria in Shihezi than in Changji. The dynamics of endophytic bacteria communities were influenced by plant genotype and plant growth stage. To the best of our knowledge, this study is the first application of PCR-based Illumina pyrosequencing to characterize and compare multiple sugar beet samples.

  6. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    PubMed

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  7. Low diversity in the mitogenome of sperm whales revealed by next-generation sequencing.

    PubMed

    Alexander, Alana; Steel, Debbie; Slikas, Beth; Hoekzema, Kendra; Carraher, Colm; Parks, Matthew; Cronn, Richard; Baker, C Scott

    2013-01-01

    Large population sizes and global distributions generally associate with high mitochondrial DNA control region (CR) diversity. The sperm whale (Physeter macrocephalus) is an exception, showing low CR diversity relative to other cetaceans; however, diversity levels throughout the remainder of the sperm whale mitogenome are unknown. We sequenced 20 mitogenomes from 17 sperm whales representative of worldwide diversity using Next Generation Sequencing (NGS) technologies (Illumina GAIIx, Roche 454 GS Junior). Resequencing of three individuals with both NGS platforms and partial Sanger sequencing showed low discrepancy rates (454-Illumina: 0.0071%; Sanger-Illumina: 0.0034%; and Sanger-454: 0.0023%) confirming suitability of both NGS platforms for investigating low mitogenomic diversity. Using the 17 sperm whale mitogenomes in a phylogenetic reconstruction with 41 other species, including 11 new dolphin mitogenomes, we tested two hypotheses for the low CR diversity. First, the hypothesis that CR-specific constraints have reduced diversity solely in the CR was rejected as diversity was low throughout the mitogenome, not just in the CR (overall diversity π = 0.096%; protein-coding 3rd codon = 0.22%; CR = 0.35%), and CR phylogenetic signal was congruent with protein-coding regions. Second, the hypothesis that slow substitution rates reduced diversity throughout the sperm whale mitogenome was rejected as sperm whales had significantly higher rates of CR evolution and no evidence of slow coding region evolution relative to other cetaceans. The estimated time to most recent common ancestor for sperm whale mitogenomes was 72,800 to 137,400 years ago (95% highest probability density interval), consistent with previous hypotheses of a bottleneck or selective sweep as likely causes of low mitogenome diversity.

  8. Sequencing on the SOLiD 5500xl System - in-depth characterization of the GC bias.

    PubMed

    Roeh, Simone; Weber, Peter; Rex-Haffner, Monika; Deussing, Jan M; Binder, Elisabeth B; Jakovcevski, Mira

    2017-07-04

    Different types of sequencing biases have been described and subsequently improved for a variety of sequencing systems, mostly focusing on the widely used Illumina systems. Similar studies are missing for the SOLiD 5500xl system, a sequencer which produced many data sets available to researchers today. Describing and understanding the bias is important to accurately interpret and integrate these published data in various ongoing research projects. We report a particularly strong GC bias for this sequencing system when analyzing a defined gDNA mix of 5 microbes with a wide range of different GC contents (20-72%) when comparing to the expected distribution and Illumina MiSeq data from the same DNA pool. Since we observed this bias already under PCR-free conditions, changing the PCR conditions during library preparation - a common strategy to handle bias in the Illumina system - was not relevant. Source of the bias appeared to be an uneven heat distribution during the SOLiD emulsion PCR (ePCR) - for enrichment of libraries prior loading - since ePCR in either small pouches or in 96-well plates improved the GC bias. Sequencing of chromatin immunoprecipitated DNA (ChIP-seq) is a common approach in epigenetics. ChIP-seq of the mixed source histone mark H3K9ac (acetyl Histone H3 lysine 9), typically found on promoter regions and on gene bodies, including CpG islands, performed on a SOLiD 5500xl machine, resulted in major loss of reads at GC rich loci (GC content ≥ 62%), not explained by low sequencing depth. This was improved with adaptations of the ePCR.

  9. A review of the Forest Service Remote Automated Weather Station (RAWS) network

    Treesearch

    John Zachariassen; Karl F. Zeller; Ned Nikolov; Tom McClelland

    2003-01-01

    The RAWS network and RAWS data-use systems are closely reviewed and summarized in this report. RAWS is an active program created by the many land-management agencies that share a common need for accurate and timely weather data from remote locations for vital operational and program decisions specific to wildland and prescribed fires. A RAWS measures basic observable...

  10. 76 FR 44855 - Common or Usual Name for Raw Meat and Poultry Products Containing Added Solutions

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-27

    .... FSIS-2010-0012] RIN 0583-AD41 Common or Usual Name for Raw Meat and Poultry Products Containing Added... name for raw meat and poultry products that do not meet standard of identity regulations and to which... description of the raw meat or poultry component, the percentage of added solution incorporated into the raw...

  11. 21 CFR 101.44 - What are the 20 most frequently consumed raw fruits, vegetables, and fish in the United States?

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 2 2011-04-01 2011-04-01 false What are the 20 most frequently consumed raw... raw fruits, vegetables, and fish in the United States? (a) The 20 most frequently consumed raw fruits..., and watermelon. (b) The 20 most frequently consumed raw vegetables are: Asparagus, bell pepper...

  12. 40 CFR 63.1345 - Emissions limits for affected sources other than kilns; in-line kiln/raw mills; clinker coolers...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... other than kilns; in-line kiln/raw mills; clinker coolers; new and reconstructed raw material dryers; and raw and finish mills, and open clinker piles. 63.1345 Section 63.1345 Protection of Environment... for affected sources other than kilns; in-line kiln/raw mills; clinker coolers; new and reconstructed...

  13. 21 CFR 101.44 - What are the 20 most frequently consumed raw fruits, vegetables, and fish in the United States?

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 2 2010-04-01 2010-04-01 false What are the 20 most frequently consumed raw... raw fruits, vegetables, and fish in the United States? (a) The 20 most frequently consumed raw fruits..., and watermelon. (b) The 20 most frequently consumed raw vegetables are: Asparagus, bell pepper...

  14. An Immunoassay for Quantification of Contamination by Raw Meat Juice on Food Contact Surfaces.

    PubMed

    Chen, Fur-Chi; Godwin, Sandria; Chambers, Edgar

    2016-11-01

    Raw chicken products often are contaminated with Salmonella and Campylobacter , which can be transmitted from packages to contact surfaces. Raw meat juices from these packages also provide potential media for cross-contamination. There are limited quantitative data on the levels of consumer exposure to raw meat juice during shopping for and handling of chicken products. An exposure assessment is needed to quantify the levels of transmission and to assess the risk. An enzyme-linked immunosorbent assay (ELISA) was developed and validated for quantitative detection of raw meat juice on hands and various food contact surfaces. Analytical procedures were designed to maximize the recovery of raw meat juice from various surfaces: hands, plastic, wood, stainless steel, laminated countertops, glass, and ceramics. The ELISA was based on the detection of a soluble muscle protein, troponin I (TnI), in the raw meat juice. The assay can detect levels as low as 1.25 ng of TnI, which is equivalent to less than 1 μl of the raw meat juice. The concentrations of TnI in the raw meat juices from 10 retail chicken packages, as determined by ELISA, were between 0.46 and 3.56 ng/μl, with an average of 1.69 ng/μl. The analytical procedures, which include swabbing, extraction, and concentration, enable the detection of TnI from various surfaces. The recoveries of raw meat juice from surfaces of hands were 92%, and recoveries from other tested surfaces were from 55% on plastic cutting boards to 75% on laminated countertops. The ELISA developed has been used for monitoring the transfer of raw meat juice during shopping for and handling of raw chicken products in our studies. The assay also can be applied to other raw meat products, such as pork and beef.

  15. The draft genome of a diploid cotton Gossypium raimondii

    USDA-ARS?s Scientific Manuscript database

    We have sequenced and assembled the draft genome of Gossypium raimondii, whose progenitor is considered the contributor of the D-subgenome to the economically important natural textile fiber producer, G. hirsutum. Next-generation Illumina pair-end (PE) sequencing strategies were employed to obtain ...

  16. Assembly and analysis of a male sterile rubber tree mitochondrial genome reveals DNA rearrangement events and a novel transcript.

    PubMed

    Shearman, Jeremy R; Sangsrakru, Duangjai; Ruang-Areerate, Panthita; Sonthirod, Chutima; Uthaipaisanwong, Pichahpuk; Yoocha, Thippawan; Poopear, Supannee; Theerawattanasuk, Kanikar; Tragoonrung, Somvong; Tangphatsornruang, Sithichoke

    2014-02-10

    The rubber tree, Hevea brasiliensis, is an important plant species that is commercially grown to produce latex rubber in many countries. The rubber tree variety BPM 24 exhibits cytoplasmic male sterility, inherited from the variety GT 1. We constructed the rubber tree mitochondrial genome of a cytoplasmic male sterile variety, BPM 24, using 454 sequencing, including 8 kb paired-end libraries, plus Illumina paired-end sequencing. We annotated this mitochondrial genome with the aid of Illumina RNA-seq data and performed comparative analysis. We then compared the sequence of BPM 24 to the contigs of the published rubber tree, variety RRIM 600, and identified a rearrangement that is unique to BPM 24 resulting in a novel transcript containing a portion of atp9. The novel transcript is consistent with changes that cause cytoplasmic male sterility through a slight reduction to ATP production efficiency. The exhaustive nature of the search rules out alternative causes and supports previous findings of novel transcripts causing cytoplasmic male sterility.

  17. Effect of two seaweed polysaccharides on intestinal microbiota in mice evaluated by illumina PE250 sequencing.

    PubMed

    Zhang, Zhongshan; Wang, Xiaomei; Han, Shuwen; Liu, Chundong; Liu, Feng

    2018-06-01

    Effect of polysaccharides from two seaweeds, Porphyra haitanensis and Ulva prolifera, on intestinal microbiota in mice was evaluated by illumina PE250 sequencing. Analysis showed significant structural changes in fecal microbiota among the three sample groups. There were significant differences in the composition of fecal microbiota among the three groups at phylum and genus levels. At the phylum level, the most predominant phylum was Bacteroidetes contributing 58.76%, 73.39%, 75.38% and 64.40% of the fecal microbiota in K, Z, H and D groups respectively, followed by Firmicutes, contributing 37.61%, 23.99%, 21.87% and 30.82% respectively. Many genera were significantly higher in the Z and H group than in the K group, including Prevotellaceae UCG-001 (p<0.05) and Rikenellaceae RC9 (p<0.01). In conclusion, our results suggest that polysaccharide type and glycoside may contribute to shaping mice gut microbiota. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. High coverage of the complete mitochondrial genome of the rare Gray's beaked whale (Mesoplodon grayi) using Illumina next generation sequencing.

    PubMed

    Thompson, Kirsten F; Patel, Selina; Williams, Liam; Tsai, Peter; Constantine, Rochelle; Baker, C Scott; Millar, Craig D

    2016-01-01

    Using an Illumina platform, we shot-gun sequenced the complete mitochondrial genome of Gray's beaked whale (Mesoplodon grayi) to an average coverage of 152X. We performed a de novo assembly using SOAPdenovo2 and determined the total mitogenome length to be 16,347 bp. The nucleotide composition was asymmetric (33.3% A, 24.6% C, 12.6% G, 29.5% T) with an overall GC content of 37.2%. The gene organization was similar to that of other cetaceans with 13 protein-coding genes, 2 rRNAs (12S and 16S), 22 predicted tRNAs and 1 control region or D-loop. We found no evidence of heteroplasmy or nuclear copies of mitochondrial DNA in this individual. Beaked whales within the genus Mesoplodon are rarely seen at sea and their basic biology is poorly understood. These data will contribute to resolving the phylogeography and population ecology of this speciose group.

  19. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  20. ChIP-seq.

    PubMed

    Kim, Tae Hoon; Dekker, Job

    2018-05-01

    Owing to its digital nature, ChIP-seq has become the standard method for genome-wide ChIP analysis. Using next-generation sequencing platforms (notably the Illumina Genome Analyzer), millions of short sequence reads can be obtained. The densities of recovered ChIP sequence reads along the genome are used to determine the binding sites of the protein. Although a relatively small amount of ChIP DNA is required for ChIP-seq, the current sequencing platforms still require amplification of the ChIP DNA by ligation-mediated PCR (LM-PCR). This protocol, which involves linker ligation followed by size selection, is the standard ChIP-seq protocol using an Illumina Genome Analyzer. The size-selected ChIP DNA is amplified by LM-PCR and size-selected for the second time. The purified ChIP DNA is then loaded into the Genome Analyzer. The ChIP DNA can also be processed in parallel for ChIP-chip results. © 2018 Cold Spring Harbor Laboratory Press.

  1. Hidden Markov Model-Based CNV Detection Algorithms for Illumina Genotyping Microarrays.

    PubMed

    Seiser, Eric L; Innocenti, Federico

    2014-01-01

    Somatic alterations in DNA copy number have been well studied in numerous malignancies, yet the role of germline DNA copy number variation in cancer is still emerging. Genotyping microarrays generate allele-specific signal intensities to determine genotype, but may also be used to infer DNA copy number using additional computational approaches. Numerous tools have been developed to analyze Illumina genotype microarray data for copy number variant (CNV) discovery, although commonly utilized algorithms freely available to the public employ approaches based upon the use of hidden Markov models (HMMs). QuantiSNP, PennCNV, and GenoCN utilize HMMs with six copy number states but vary in how transition and emission probabilities are calculated. Performance of these CNV detection algorithms has been shown to be variable between both genotyping platforms and data sets, although HMM approaches generally outperform other current methods. Low sensitivity is prevalent with HMM-based algorithms, suggesting the need for continued improvement in CNV detection methodologies.

  2. Microbial community structure in fermentation process of Shaoxing rice wine by Illumina-based metagenomic sequencing.

    PubMed

    Xie, Guangfa; Wang, Lan; Gao, Qikang; Yu, Wenjing; Hong, Xutao; Zhao, Lingyun; Zou, Huijun

    2013-09-01

    To understand the role of the community structure of microbes in the environment in the fermentation of Shaoxing rice wine, samples collected from a wine factory were subjected to Illumina-based metagenomic sequencing. De novo assembly of the sequencing reads allowed the characterisation of more than 23 thousand microbial genes derived from 1.7 and 1.88 Gbp of sequences from two samples fermented for 5 and 30 days respectively. The microbial community structure at different fermentation times of Shaoxing rice wine was revealed, showing the different roles of the microbiota in the fermentation process of Shaoxing rice wine. The gene function of both samples was also studied in the COG database, with most genes belonging to category S (function unknown), category E (amino acid transport and metabolism) and unclassified group. The results show that both the microbial community structure and gene function composition change greatly at different time points of Shaoxing rice wine fermentation. © 2013 Society of Chemical Industry.

  3. Bacterial community compositions of coking wastewater treatment plants in steel industry revealed by Illumina high-throughput sequencing.

    PubMed

    Ma, Qiao; Qu, Yuanyuan; Shen, Wenli; Zhang, Zhaojing; Wang, Jingwei; Liu, Ziyan; Li, Duanxing; Li, Huijie; Zhou, Jiti

    2015-03-01

    In this study, Illumina high-throughput sequencing was used to reveal the community structures of nine coking wastewater treatment plants (CWWTPs) in China for the first time. The sludge systems exhibited a similar community composition at each taxonomic level. Compared to previous studies, some of the core genera in municipal wastewater treatment plants such as Zoogloea, Prosthecobacter and Gp6 were detected as minor species. Thiobacillus (20.83%), Comamonas (6.58%), Thauera (4.02%), Azoarcus (7.78%) and Rhodoplanes (1.42%) were the dominant genera shared by at least six CWWTPs. The percentages of autotrophic ammonia-oxidizing bacteria and nitrite-oxidizing bacteria were unexpectedly low, which were verified by both real-time PCR and fluorescence in situ hybridization analyses. Hierarchical clustering and canonical correspondence analysis indicated that operation mode, flow rate and temperature might be the key factors in community formation. This study provides new insights into our understanding of microbial community compositions and structures of CWWTPs. Copyright © 2014 Elsevier Ltd. All rights reserved.

  4. ISMRM Raw data format: A proposed standard for MRI raw datasets.

    PubMed

    Inati, Souheil J; Naegele, Joseph D; Zwart, Nicholas R; Roopchansingh, Vinai; Lizak, Martin J; Hansen, David C; Liu, Chia-Ying; Atkinson, David; Kellman, Peter; Kozerke, Sebastian; Xue, Hui; Campbell-Washburn, Adrienne E; Sørensen, Thomas S; Hansen, Michael S

    2017-01-01

    This work proposes the ISMRM Raw Data format as a common MR raw data format, which promotes algorithm and data sharing. A file format consisting of a flexible header and tagged frames of k-space data was designed. Application Programming Interfaces were implemented in C/C++, MATLAB, and Python. Converters for Bruker, General Electric, Philips, and Siemens proprietary file formats were implemented in C++. Raw data were collected using magnetic resonance imaging scanners from four vendors, converted to ISMRM Raw Data format, and reconstructed using software implemented in three programming languages (C++, MATLAB, Python). Images were obtained by reconstructing the raw data from all vendors. The source code, raw data, and images comprising this work are shared online, serving as an example of an image reconstruction project following a paradigm of reproducible research. The proposed raw data format solves a practical problem for the magnetic resonance imaging community. It may serve as a foundation for reproducible research and collaborations. The ISMRM Raw Data format is a completely open and community-driven format, and the scientific community is invited (including commercial vendors) to participate either as users or developers. Magn Reson Med 77:411-421, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  5. CIDR

    Science.gov Websites

    NGS Pretesting and QC Using Illumina Infinium Arrays CIDR IGES Posters - 2017 A Comparison of Methods fragmentation methods for input into library construction protocol Development of a Low Input FFPE workflow for Evaluation of Copy Number Variation (CNV) detection methods in whole exome sequencing (WES) data CIDR AGBT

  6. CIDR

    Science.gov Websites

    Genotyping General Information Genome Wide Association Custom FFPE Sample Options Methylation Linkage Enrichment Options 51 Mb 51 Mb plus 6.8 - 24Mb custom option 54 Mb Clinical Exome 71 Mb (includes UTRs) Next Generation Sequencing Platform Illumina HiSeq sequencers Options for Formalin-Fixed Paraffin-Embedded (FFPE

  7. Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children

    USDA-ARS?s Scientific Manuscript database

    To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...

  8. The Genome of the Cucumber, Cucumis Sativus L

    USDA-ARS?s Scientific Manuscript database

    Cucumber is an economically important crop as well as a model system for sex determination studies and plant vascular biology. Here we report the draft genome sequence of Cucumis sativus var. sativus L., assembled using a novel combination of traditional Sanger and next-generation Illumina GA sequen...

  9. Comparison of Comparative Genomic Hybridization Technologies across Microarray Platforms

    EPA Science Inventory

    In the 2007 Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) project, we analyzed HL-60 DNA with five platforms: Agilent, Affymetrix 500K, Affymetrix U133 Plus 2.0, Illumina, and RPCI 19K BAC arrays. Copy number variation (CNV) was analyzed ...

  10. Genome sequences of nine vesicular stomatitis virus isolates from South America

    USDA-ARS?s Scientific Manuscript database

    We report nine full-genome sequences of vesicular stomatitis virus obtrained by Illumina next-generation sequencing of RNA, isolated from either cattle epithelial suspensions or cell culture supernatants. Seven of these viral genomes belonged to the New Jersey serotype/species, clade III, while two...

  11. Optimization of conditions to sequence long cDNAs from viruses

    USDA-ARS?s Scientific Manuscript database

    Fourth generation sequencing with the Minion nanopore sequencer provides opportunity to obtain deep coverage and long read for single molecules. This will benefit studies on RNA viruses. In the past, Sanger, Illumina, and Ion Torrent sequencing have been utilized to study RNA viruses. Both technique...

  12. Daytime soybean transcriptome fluctuations during water deficit stress

    USDA-ARS?s Scientific Manuscript database

    Since drought can seriously affect plant growth and development and little is known about how the oscillations of gene expression during the drought stress-acclimation response in soybean is affected, we applied Illumina technology to sequence 36 cDNA libraries synthesized from control and drought-s...

  13. Characterization of circulating transfer RNA-Derived RNA fragments in cattle

    USDA-ARS?s Scientific Manuscript database

    The objective was to characterize naturally occurring circulating transfer RNA-derived RNA Fragments (tRFs) in cattle. Serum from eight clinically normal adult dairy cows was collected, and small non-coding RNAs were extracted immediately after collection and sequenced by Illumina MiSeq. Sequences a...

  14. Integrated analysis of 454 and Illumina transcriptomic sequencing characterizes carbon flux and energy source for fatty acid synthesis in developing Lindera glauca fruits for woody biodiesel.

    PubMed

    Lin, Zixin; An, Jiyong; Wang, Jia; Niu, Jun; Ma, Chao; Wang, Libing; Yuan, Guanshen; Shi, Lingling; Liu, Lili; Zhang, Jinsong; Zhang, Zhixiang; Qi, Ji; Lin, Shanzhi

    2017-01-01

    Lindera glauca fruit with high quality and quantity of oil has emerged as a novel potential source of biodiesel in China, but the molecular regulatory mechanism of carbon flux and energy source for oil biosynthesis in developing fruits is still unknown. To better develop fruit oils of L. glauca as woody biodiesel, a combination of two different sequencing platforms (454 and Illumina) and qRT-PCR analysis was used to define a minimal reference transcriptome of developing L. glauca fruits, and to construct carbon and energy metabolic model for regulation of carbon partitioning and energy supply for FA biosynthesis and oil accumulation. We first analyzed the dynamic patterns of growth tendency, oil content, FA compositions, biodiesel properties, and the contents of ATP and pyridine nucleotide of L. glauca fruits from seven different developing stages. Comprehensive characterization of transcriptome of the developing L. glauca fruit was performed using a combination of two different next-generation sequencing platforms, of which three representative fruit samples (50, 125, and 150 DAF) and one mixed sample from seven developing stages were selected for Illumina and 454 sequencing, respectively. The unigenes separately obtained from long and short reads (201, and 259, respectively, in total) were reconciled using TGICL software, resulting in a total of 60,031 unigenes (mean length = 1061.95 bp) to describe a transcriptome for developing L. glauca fruits. Notably, 198 genes were annotated for photosynthesis, sucrose cleavage, carbon allocation, metabolite transport, acetyl-CoA formation, oil synthesis, and energy metabolism, among which some specific transporters, transcription factors, and enzymes were identified to be implicated in carbon partitioning and energy source for oil synthesis by an integrated analysis of transcriptomic sequencing and qRT-PCR. Importantly, the carbon and energy metabolic model was well established for oil biosynthesis of developing L. glauca fruits, which could help to reveal the molecular regulatory mechanism of the increased oil production in developing fruits. This study presents for the first time the application of an integrated two different sequencing analyses (Illumina and 454) and qRT-PCR detection to define a minimal reference transcriptome for developing L. glauca fruits, and to elucidate the molecular regulatory mechanism of carbon flux control and energy provision for oil synthesis. Our results will provide a valuable resource for future fundamental and applied research on the woody biodiesel plants.

  15. Genomic profiling of 766 cancer-related genes in archived esophageal normal and carcinoma tissues.

    PubMed

    Chen, Jing; Guo, Liping; Peiffer, Daniel A; Zhou, Lixin; Chan, Owen Tsan Mo; Bibikova, Marina; Wickham-Garcia, Eliza; Lu, Shih-Hsin; Zhan, Qimin; Wang-Rodriguez, Jessica; Jiang, Wei; Fan, Jian-Bing

    2008-05-15

    We employed the BeadArraytrade mark technology to perform a genetic analysis in 33 formalin-fixed, paraffin-embedded (FFPE) human esophageal carcinomas, mostly squamous-cell-carcinoma (ESCC), and their adjacent normal tissues. A total of 1,432 single nucleotide polymorphisms (SNPs) derived from 766 cancer-related genes were genotyped with partially degraded genomic DNAs isolated from these samples. This directly targeted genomic profiling identified not only previously reported somatic gene amplifications (e.g., CCND1) and deletions (e.g., CDKN2A and CDKN2B) but also novel genomic aberrations. Among these novel targets, the most frequently deleted genomic regions were chromosome 3p (including tumor suppressor genes FANCD2 and CTNNB1) and chromosome 5 (including tumor suppressor gene APC). The most frequently amplified genomic region was chromosome 3q (containing DVL3, MLF1, ABCC5, BCL6, AGTR1 and known oncogenes TNK2, TNFSF10, FGF12). The chromosome 3p deletion and 3q amplification occurred coincidently in nearly all of the affected cases, suggesting a molecular mechanism for the generation of somatic chromosomal aberrations. We also detected significant differences in germline allele frequency between the esophageal cohort of our study and normal control samples from the International HapMap Project for 10 genes (CSF1, KIAA1804, IL2, PMS2, IRF7, FLT3, NTRK2, MAP3K9, ERBB2 and PRKAR1A), suggesting that they might play roles in esophageal cancer susceptibility and/or development. Taken together, our results demonstrated the utility of the BeadArray technology for high-throughput genetic analysis in FFPE tumor tissues and provided a detailed genetic profiling of cancer-related genes in human esophageal cancer. (c) 2008 Wiley-Liss, Inc.

  16. Effects of vitamin K3 and K5 on proliferation, cytokine production, and regulatory T cell-frequency in human peripheral-blood mononuclear cells.

    PubMed

    Hatanaka, Hiroshige; Ishizawa, Hitomi; Nakamura, Yurie; Tadokoro, Hiroko; Tanaka, Sachiko; Onda, Kenji; Sugiyama, Kentaro; Hirano, Toshihiko

    2014-03-18

    The effects of vitamin K (VK) derivatives VK3 and VK5 on human immune cells have not been extensively investigated. We examined the effects of VK3 and VK5 on proliferation, apoptosis, cytokine production, and CD4+CD25+Foxp3+ regulatory T (Treg) cell-frequency in human peripheral blood mononuclear cells (PBMCs) activated by T cell mitogen in vitro. Anti-proliferative effects of VK3 and VK5 on T-cell mitogen activated PBMCs were assessed by WST assay procedures. Apoptotic cells were determined as Annexin V positive/propidium iodide (PI) negative cells. Cytokine concentrations in the supernatant of the culture medium were measured with bead-array procedures followed by analysis with flow cytometry. The CD4+CD25+Foxp3+Treg cells in mitogen-activated PBMCs were stained with fluorescence-labeled specific antibodies followed by flow cytometry. VK3 and VK5 suppressed the mitogen-activated proliferation of PBMCs significantly at 10-100μM (p<0.05). The data also suggest that VK3 and VK5 promote apoptosis in the mitogen-activated T cells. VK3 and VK5 significantly inhibited the production of tumor necrosis factor (TNF) α, interleukin (IL)-4, -6, and -10 from the activated PBMCs at 10-100μM (p<0.05). In contrast, VK3 and VK5 significantly increased Treg cell-frequency in the activated PBMCs at concentrations more than 10μM (p<0.001). Our data suggest that VK3 and VK5 attenuate T cell mediated immunity by inhibiting the proliferative response and inducing apoptosis in activated T cells. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Cross-activating invariant NKT cells and kupffer cells suppress cholestatic liver injury in a mouse model of biliary obstruction.

    PubMed

    Duwaerts, Caroline C; Sun, Eric P; Cheng, Chao-Wen; van Rooijen, Nico; Gregory, Stephen H

    2013-01-01

    Both Kupffer cells and invariant natural killer T (iNKT) cells suppress neutrophil-dependent liver injury in a mouse model of biliary obstruction. We hypothesize that these roles are interdependent and require iNKT cell-Kupffer cell cross-activation. Female, wild-type and iNKT cell-deficient C57Bl/6 mice were injected with magnetic beads 3 days prior to bile duct ligation (BDL) in order to facilitate subsequent Kupffer cell isolation. On day three post-BDL, the animals were euthanized and the livers dissected. Necrosis was scored; Kupffer cells were isolated and cell surface marker expression (flow cytometry), mRNA expression (qtPCR), nitric oxide (NO (.) ) production (Griess reaction), and protein secretion (cytometric bead-array or ELISAs) were determined. To address the potential role of NO (.) in suppressing neutrophil accumulation, a group of WT mice received 1400W, a specific inducible nitric oxide synthase (iNOS) inhibitor, prior to BDL. To clarify the mechanisms underlying Kupffer cell-iNKT cell cross-activation, WT animals were administered anti-IFN-γ or anti-lymphocyte function-associated antigen (LFA)-1 antibody prior to BDL. Compared to their WT counterparts, Kupffer cells obtained from BDL iNKT cell-deficient mice expressed lower iNOS mRNA levels, produced less NO (.) , and secreted more neutrophil chemoattractants. Both iNOS inhibition and IFN-γ neutralization increased neutrophil accumulation in the livers of BDL WT mice. Anti-LFA-1 pre-treatment reduced iNKT cell accumulation in these same animals. These data indicate that the LFA-1-dependent cross-activation of iNKT cells and Kupffer cells inhibits neutrophil accumulation and cholestatic liver injury.

  18. Cross-Activating Invariant NKT Cells and Kupffer Cells Suppress Cholestatic Liver Injury in a Mouse Model of Biliary Obstruction

    PubMed Central

    Duwaerts, Caroline C.; Sun, Eric P.; Cheng, Chao-Wen; van Rooijen, Nico; Gregory, Stephen H.

    2013-01-01

    Both Kupffer cells and invariant natural killer T (iNKT) cells suppress neutrophil-dependent liver injury in a mouse model of biliary obstruction. We hypothesize that these roles are interdependent and require iNKT cell-Kupffer cell cross-activation. Female, wild-type and iNKT cell-deficient C57Bl/6 mice were injected with magnetic beads 3 days prior to bile duct ligation (BDL) in order to facilitate subsequent Kupffer cell isolation. On day three post-BDL, the animals were euthanized and the livers dissected. Necrosis was scored; Kupffer cells were isolated and cell surface marker expression (flow cytometry), mRNA expression (qtPCR), nitric oxide (NO.) production (Griess reaction), and protein secretion (cytometric bead-array or ELISAs) were determined. To address the potential role of NO. in suppressing neutrophil accumulation, a group of WT mice received 1400W, a specific inducible nitric oxide synthase (iNOS) inhibitor, prior to BDL. To clarify the mechanisms underlying Kupffer cell-iNKT cell cross-activation, WT animals were administered anti-IFN-γ or anti-lymphocyte function-associated antigen (LFA)-1 antibody prior to BDL. Compared to their WT counterparts, Kupffer cells obtained from BDL iNKT cell-deficient mice expressed lower iNOS mRNA levels, produced less NO., and secreted more neutrophil chemoattractants. Both iNOS inhibition and IFN-γ neutralization increased neutrophil accumulation in the livers of BDL WT mice. Anti-LFA-1 pre-treatment reduced iNKT cell accumulation in these same animals. These data indicate that the LFA-1-dependent cross-activation of iNKT cells and Kupffer cells inhibits neutrophil accumulation and cholestatic liver injury. PMID:24260285

  19. 31 CFR 560.407 - Transactions related to Iranian-origin goods.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... from third countries of goods containing Iranian-origin raw materials or components is not prohibited if those raw materials or components have been incorporated into manufactured products or... Iranian-origin raw materials or components are not prohibited if those raw materials or components have...

  20. Generation of a Saturated Genetic Recombination Map for Avocado (Persea americana)

    USDA-ARS?s Scientific Manuscript database

    Two large mapping populations of avocado consisting of 1582 trees were genotyped with 5050 SNP markers from transcribed genes using an Illumina Infinium SNP chip. A Florida mapping population consisted of 527 progeny from 'Tonnage' x 'Simmonds' and 249 from 'Simmonds' x 'Tonnage'. A California map...

  1. Bifiguratus adelaidae, gen. et sp. nov., a new member of Mucoromycotina in endophytic and soil-dwelling habitats

    USDA-ARS?s Scientific Manuscript database

    Illumina amplicon sequencing of soil in a temperate pine forest in the southeastern United States detected an abundant, N-responsive fungal genotype of unknown phylogenetic affiliation. Two isolates with ribosomal sequences consistent with that genotype were subsequently obtained in culture. Examina...

  2. A first report and complete genome sequence of alfalfa enamovirus from Sudan

    USDA-ARS?s Scientific Manuscript database

    A full genome sequence of a viral pathogen, provisionally named alfalfa enamovirus 2 (AEV-2), was reconstructed from short reads obtained by Illumina RNA sequencing of alfalfa sample originating from Sudan. Ambiguous nucleotides in the resultant consensus assembly and identity of the predicted virus...

  3. The Apis mellifera filamentous virus genome

    USDA-ARS?s Scientific Manuscript database

    A complete reference genome of the Apis mellifera Filamentous virus (AmFV) was determined using Illumina Hiseq sequencing. The AmFV genome is a double strand DNA molecule of approximately 498’500 nucleotides with a GC content of 50.8%. It encompasses 251 non overlapping open reading frames (ORFs), e...

  4. Sequence, assembly and annotation of the maize W22 genome

    USDA-ARS?s Scientific Manuscript database

    Since its adoption by Brink and colleagues in the 1950s and 60s, the maize W22 inbred has been utilized extensively to understand fundamental genetic and epigenetic processes such recombination, transposition and paramutation. To maximize the utility of W22 in gene discovery, we have Illumina sequen...

  5. The genome of the fire ant Solenopsis invicta

    USDA-ARS?s Scientific Manuscript database

    Ants have evolved very complex societies and are key ecosystem members. Some of them are also major pests, as exemplified by the fire ant Solenopsis invicta. We present here the draft genome of S. invicta, assembled from 454 and Illumina reads obtained from a focal haploid male and his brothers. In ...

  6. Complete Genome Sequence of Listeria monocytogenes DFPST0073, Isolated from Imported Mexican Soft Cheese.

    PubMed

    Salazar, Joelle K; Gonsalves, Lauren J; Schill, Kristin M; Sanchez Leon, Maria; Anderson, Nathan; Keller, Susanne E

    2018-06-07

    The genome of Listeria monocytogenes strain DFPST0073, isolated from imported fresh Mexican soft cheese in 2003, was sequenced using the Illumina MiSeq platform. Reads were assembled using SPAdes, and genome annotation was performed using the NCBI Prokaryotic Genome Annotation Pipeline.

  7. The complete chloroplast genome of common walnut (Juglans regia)

    Treesearch

    Yiheng ​Hu; Keith E. Woeste; Meng Dang; Tao Zhou; Xiaojia Feng; Guifang Zhao; Zhanlin Liu; Zhonghu Li; Peng Zhao

    2016-01-01

    Common walnut (Juglans regia L.) is cultivated in temperate regions worldwide for its wood and nuts. The complete chloroplast genome of J. regia was sequenced using the Illumina MiSeq platform. This is the first complete chloroplast sequence for the Juglandaceae, a family that includes numerous species of economic importance....

  8. Identification of genomic regions associated with feed efficiency in Nelore cattle

    USDA-ARS?s Scientific Manuscript database

    Feed efficiency is jointly determined by productivity and feed requirements, both of which are economically relevant traits in beef cattle production systems. The objective of this study was to identify genes/QTLs associated with components of feed efficiency in Nelore cattle using Illumina BovineHD...

  9. DARPA Antibody Technology Program. Standardized Test Bed for Antibody Characterization: Characterization of an MS2 ScFv Antibody Produced by Illumina

    DTIC Science & Technology

    2016-08-01

    platforms. 15. SUBJECT TERMS Antibody Antibody Technology Program (ATP) Quality Enzyme-linked immunosorbent assay ( ELISA ) Biosurveillance Single-chain...2.6 Thermal Stress Test............................................................................................4 2.7 ELISA ...3.5 ELISA Results .................................................................................................11 3.6 SPR Results

  10. Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

    USDA-ARS?s Scientific Manuscript database

    We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...

  11. HIGH-THROUGHPUT PHYLOGENOMICS: FROM ANCIENT DNA TO SIGNATURES OF HUMAN ANIMAL HUSBANDRY

    USDA-ARS?s Scientific Manuscript database

    We utilized the Illumina BovineSNP50 BeadChip with 54,693 single nucleotide polymorphism loci developed for Bos taurus taurus to rapidly genotype 677 individuals representing 61 Pecoran (horned ruminant) species diverged by up to 29 million years. We produced a completely bifurcating tree, the first...

  12. Current Status of Genotyping and Discovery Work at USMARC

    USDA-ARS?s Scientific Manuscript database

    The Illumina BovineSNP50 DNA chip has substantially changed the genetic and genomic research program at USMARC. It has enhanced our commitment to produce genetic tools that can be exported to beef cattle producers to further their selection goals in hard-to-measure traits such as feed efficiency, co...

  13. Microbial genome sequencing using optical mapping and Illumina sequencing

    USDA-ARS?s Scientific Manuscript database

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  14. Development and Implementation of High-Throughput SNP Genotyping in Barley

    USDA-ARS?s Scientific Manuscript database

    Approximately 22,000 SNPs were identified from barley ESTs and sequenced amplicons; 4,596 of them were tested for performance in three pilot phase Illumina GoldenGate assays. Pilot phase data from three barley doubled haploid mapping populations supported the production of an initial consensus map, ...

  15. Results from raw milk microbiological tests do not predict the shelf-life performance of commercially pasteurized fluid milk.

    PubMed

    Martin, N H; Ranieri, M L; Murphy, S C; Ralyea, R D; Wiedmann, M; Boor, K J

    2011-03-01

    Analytical tools that accurately predict the performance of raw milk following its manufacture into commercial food products are of economic interest to the dairy industry. To evaluate the ability of currently applied raw milk microbiological tests to predict the quality of commercially pasteurized fluid milk products, samples of raw milk and 2% fat pasteurized milk were obtained from 4 New York State fluid milk processors for a 1-yr period. Raw milk samples were examined using a variety of tests commonly applied to raw milk, including somatic cell count, standard plate count, psychrotrophic bacteria count, ropy milk test, coliform count, preliminary incubation count, laboratory pasteurization count, and spore pasteurization count. Differential and selective media were used to identify groups of bacteria present in raw milk. Pasteurized milk samples were held at 6°C for 21 d and evaluated for standard plate count, coliform count, and sensory quality throughout shelf-life. Bacterial isolates from select raw and pasteurized milk tests were identified using 16S ribosomal DNA sequencing. Linear regression analysis of raw milk test results versus results reflecting pasteurized milk quality consistently showed low R(2) values (<0.45); the majority of R(2) values were <0.25, indicating small relationship between the results from the raw milk tests and results from tests used to evaluate pasteurized milk quality. Our findings suggest the need for new raw milk tests that measure the specific biological barriers that limit shelf-life and quality of fluid milk products. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  16. Parsing and Quantification of Raw Orbitrap Mass Spectrometer Data Using RawQuant.

    PubMed

    Kovalchik, Kevin A; Moggridge, Sophie; Chen, David D Y; Morin, Gregg B; Hughes, Christopher S

    2018-06-01

    Effective analysis of protein samples by mass spectrometry (MS) requires careful selection and optimization of a range of experimental parameters. As the output from the primary detection device, the "raw" MS data file can be used to gauge the success of a given sample analysis. However, the closed-source nature of the standard raw MS file can complicate effective parsing of the data contained within. To ease and increase the range of analyses possible, the RawQuant tool was developed to enable parsing of raw MS files derived from Thermo Orbitrap instruments to yield meta and scan data in an openly readable text format. RawQuant can be commanded to export user-friendly files containing MS 1 , MS 2 , and MS 3 metadata as well as matrices of quantification values based on isobaric tagging approaches. In this study, the utility of RawQuant is demonstrated in several scenarios: (1) reanalysis of shotgun proteomics data for the identification of the human proteome, (2) reanalysis of experiments utilizing isobaric tagging for whole-proteome quantification, and (3) analysis of a novel bacterial proteome and synthetic peptide mixture for assessing quantification accuracy when using isobaric tags. Together, these analyses successfully demonstrate RawQuant for the efficient parsing and quantification of data from raw Thermo Orbitrap MS files acquired in a range of common proteomics experiments. In addition, the individual analyses using RawQuant highlights parametric considerations in the different experimental sets and suggests targetable areas to improve depth of coverage in identification-focused studies and quantification accuracy when using isobaric tags.

  17. 40 CFR 89.418 - Raw emission sampling calculations.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Raw emission sampling calculations. 89... Test Procedures § 89.418 Raw emission sampling calculations. (a) The final test results shall be... measured on a wet basis. This section is applicable only for measurements made on raw exhaust gas...

  18. 40 CFR 409.40 - Applicability; description of the Louisiana raw cane sugar processing subcategory.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Louisiana raw cane sugar processing subcategory. 409.40 Section 409.40 Protection of Environment... CATEGORY Louisiana Raw Cane Sugar Processing Subcategory § 409.40 Applicability; description of the Louisiana raw cane sugar processing subcategory. The provisions of this subpart are applicable to discharges...

  19. 40 CFR 89.418 - Raw emission sampling calculations.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Raw emission sampling calculations. 89... Test Procedures § 89.418 Raw emission sampling calculations. (a) The final test results shall be... measured on a wet basis. This section is applicable only for measurements made on raw exhaust gas...

  20. 40 CFR 409.70 - Applicability; description of the Hawaiian raw cane sugar processing subcategory.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Hawaiian raw cane sugar processing subcategory. 409.70 Section 409.70 Protection of Environment... CATEGORY Hawaiian Raw Cane Sugar Processing Subcategory § 409.70 Applicability; description of the Hawaiian raw cane sugar processing subcategory. The provisions of this subpart are applicable to discharges...

  1. 40 CFR 409.70 - Applicability; description of the Hawaiian raw cane sugar processing subcategory.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Hawaiian raw cane sugar processing subcategory. 409.70 Section 409.70 Protection of Environment... CATEGORY Hawaiian Raw Cane Sugar Processing Subcategory § 409.70 Applicability; description of the Hawaiian raw cane sugar processing subcategory. The provisions of this subpart are applicable to discharges...

  2. 21 CFR 1304.31 - Reports from manufacturers importing narcotic raw material.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 9 2010-04-01 2010-04-01 false Reports from manufacturers importing narcotic raw... RECORDS AND REPORTS OF REGISTRANTS Reports § 1304.31 Reports from manufacturers importing narcotic raw material. (a) Every manufacturer which imports or manufactures from narcotic raw material (opium, poppy...

  3. 21 CFR 1304.31 - Reports from manufacturers importing narcotic raw material.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 9 2011-04-01 2011-04-01 false Reports from manufacturers importing narcotic raw... RECORDS AND REPORTS OF REGISTRANTS Reports § 1304.31 Reports from manufacturers importing narcotic raw material. (a) Every manufacturer which imports or manufactures from narcotic raw material (opium, poppy...

  4. 77 FR 57115 - Certain Reduced Folate; Nutraceutical Products and L-Methylfolate Raw Ingredients Used Therein...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-17

    ... and L-Methylfolate Raw Ingredients Used Therein; Notice of Receipt of Complaint; Solicitation of... entitled Certain Reduced Folate Nutraceutical Products and L-methylfolate Raw Ingredients Used Therein, DN... importation of certain reduced folate nutraceutical products and L- methylfolate raw ingredients used therein...

  5. 75 FR 65658 - Importer of Controlled Substances; Notice of Application

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-26

    ... Raw Opium (9600) II Concentrate of Poppy Straw (9670) II The company plans to import narcotic raw... a manufacturer of several controlled substances that are manufactured from raw opium, poppy straw... narcotic raw material are not appropriate. As noted in a previous notice published in the Federal Register...

  6. 40 CFR 1065.230 - Raw exhaust flow meter.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 33 2011-07-01 2011-07-01 false Raw exhaust flow meter. 1065.230... CONTROLS ENGINE-TESTING PROCEDURES Measurement Instruments Flow-Related Measurements § 1065.230 Raw exhaust flow meter. (a) Application. You may use measured raw exhaust flow, as follows: (1) Use the actual...

  7. 40 CFR 409.40 - Applicability; description of the Louisiana raw cane sugar processing subcategory.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Louisiana raw cane sugar processing subcategory. 409.40 Section 409.40 Protection of Environment... CATEGORY Louisiana Raw Cane Sugar Processing Subcategory § 409.40 Applicability; description of the Louisiana raw cane sugar processing subcategory. The provisions of this subpart are applicable to discharges...

  8. 31 CFR 560.407 - Transactions related to Iranian-origin goods.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... United States from third countries of goods containing Iranian-origin raw materials or components is not prohibited if those raw materials or components have been incorporated into manufactured products or... Iranian-origin raw materials or components are not prohibited if those raw materials or components have...

  9. 7 CFR 58.519 - Dairy products.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... Specifications for Dairy Plants Approved for USDA Inspection and Grading Service 1 Quality Specifications for Raw Material § 58.519 Dairy products. (a) Raw skim milk. All raw skim milk obtained from a secondary source... used, shall be prepared from raw milk or skim milk that meets the same quality requirements outlined...

  10. EMC Global Climate And Weather Modeling Branch Personnel

    Science.gov Websites

    Comparison Statistics which includes: NCEP Raw and Bias-Corrected Ensemble Domain Averaged Bias NCEP Raw and Bias-Corrected Ensemble Domain Averaged Bias Reduction (Percents) CMC Raw and Bias-Corrected Control Forecast Domain Averaged Bias CMC Raw and Bias-Corrected Control Forecast Domain Averaged Bias Reduction

  11. 21 CFR 161.176 - Frozen raw lightly breaded shrimp.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 2 2011-04-01 2011-04-01 false Frozen raw lightly breaded shrimp. 161.176 Section 161.176 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... Shellfish § 161.176 Frozen raw lightly breaded shrimp. Frozen raw lightly breaded shrimp complies with the...

  12. 40 CFR 90.419 - Raw emission sampling calculations-gasoline fueled engines.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Raw emission sampling calculations... KILOWATTS Gaseous Exhaust Test Procedures § 90.419 Raw emission sampling calculations—gasoline fueled... selected as the basis for mass emission calculations using the raw gas method. ER03JY95.022 Where: WHC...

  13. 40 CFR 63.1343 - Standards for kilns and in-line kiln/raw mills.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    .../raw mills. 63.1343 Section 63.1343 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY... Industry Emission Standards and Operating Limits § 63.1343 Standards for kilns and in-line kiln/raw mills. (a) General. The provisions in this section apply to each kiln, each in-line kiln/raw mill, and any...

  14. 21 CFR 161.176 - Frozen raw lightly breaded shrimp.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 2 2010-04-01 2010-04-01 false Frozen raw lightly breaded shrimp. 161.176 Section 161.176 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF HEALTH AND HUMAN SERVICES... Shellfish § 161.176 Frozen raw lightly breaded shrimp. Frozen raw lightly breaded shrimp complies with the...

  15. 40 CFR 409.80 - Applicability; description of the Puerto Rican raw cane sugar processing subcategory.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Puerto Rican raw cane sugar processing subcategory. 409.80 Section 409.80 Protection of Environment... CATEGORY Puerto Rican Raw Cane Sugar Processing Subcategory § 409.80 Applicability; description of the Puerto Rican raw cane sugar processing subcategory. The provisions of this subpart are applicable to...

  16. 40 CFR 90.419 - Raw emission sampling calculations-gasoline fueled engines.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Raw emission sampling calculations... KILOWATTS Gaseous Exhaust Test Procedures § 90.419 Raw emission sampling calculations—gasoline fueled... selected as the basis for mass emission calculations using the raw gas method. ER03JY95.022 Where: WHC...

  17. 19 CFR 151.22 - Estimated duties on raw sugar.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 19 Customs Duties 2 2010-04-01 2010-04-01 false Estimated duties on raw sugar. 151.22 Section 151.22 Customs Duties U.S. CUSTOMS AND BORDER PROTECTION, DEPARTMENT OF HOMELAND SECURITY; DEPARTMENT OF... Molasses § 151.22 Estimated duties on raw sugar. Estimated duties shall be taken on raw sugar, as defined...

  18. 76 FR 72331 - Shiga Toxin-Producing Escherichia coli in Certain Raw Beef Products

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-11-23

    ... Escherichia coli in Certain Raw Beef Products AGENCY: Food Safety and Inspection Service, USDA. ACTION: Public...-O157 Shiga toxin-producing Escherichia coli in raw, intact and non-intact beef products and product... implementation plans and methods for controlling non-O157 Shiga toxin-producing Escherichia coli in raw, intact...

  19. 40 CFR 91.414 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Raw gaseous exhaust sampling and... Gaseous Exhaust Test Procedures § 91.414 Raw gaseous exhaust sampling and analytical system description... the component systems. (g) The following requirements must be incorporated in each system used for raw...

  20. 40 CFR 90.414 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Raw gaseous exhaust sampling and... OR BELOW 19 KILOWATTS Gaseous Exhaust Test Procedures § 90.414 Raw gaseous exhaust sampling and... between the muffler and the sample probe. The mixing chamber is an optional component of the raw gas...

  1. 19 CFR 151.22 - Estimated duties on raw sugar.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 19 Customs Duties 2 2011-04-01 2011-04-01 false Estimated duties on raw sugar. 151.22 Section 151.22 Customs Duties U.S. CUSTOMS AND BORDER PROTECTION, DEPARTMENT OF HOMELAND SECURITY; DEPARTMENT OF... Molasses § 151.22 Estimated duties on raw sugar. Estimated duties shall be taken on raw sugar, as defined...

  2. 40 CFR 91.414 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Raw gaseous exhaust sampling and... Gaseous Exhaust Test Procedures § 91.414 Raw gaseous exhaust sampling and analytical system description... the component systems. (g) The following requirements must be incorporated in each system used for raw...

  3. 40 CFR 409.80 - Applicability; description of the Puerto Rican raw cane sugar processing subcategory.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Puerto Rican raw cane sugar processing subcategory. 409.80 Section 409.80 Protection of Environment... CATEGORY Puerto Rican Raw Cane Sugar Processing Subcategory § 409.80 Applicability; description of the Puerto Rican raw cane sugar processing subcategory. The provisions of this subpart are applicable to...

  4. 40 CFR 89.412 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 20 2010-07-01 2010-07-01 false Raw gaseous exhaust sampling and...-IGNITION ENGINES Exhaust Emission Test Procedures § 89.412 Raw gaseous exhaust sampling and analytical... must be incorporated in each system used for raw testing under this subpart. (1) [Reserved] (2) The...

  5. 75 FR 52721 - Raw Flexible Magnets From the People's Republic of China: Notice of Rescission of Countervailing...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-27

    ... DEPARTMENT OF COMMERCE International Trade Administration [C-570-923] Raw Flexible Magnets From... countervailing duty (CVD) order on raw flexible magnets from the People's Republic of China (PRC) covering the..., Washington, DC 20230, telephone: (202) 482-4793. SUPPLEMENTARY INFORMATION: Background The CVD order on raw...

  6. 27 CFR 17.165 - Receipt of raw ingredients.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2011-04-01 2011-04-01 false Receipt of raw ingredients. 17.165 Section 17.165 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU... PRODUCTS Records § 17.165 Receipt of raw ingredients. For raw ingredients destined to be used in...

  7. 40 CFR 63.1346 - Standards for new or reconstructed raw material dryers.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 11 2010-07-01 2010-07-01 true Standards for new or reconstructed raw... Industry Emission Standards and Operating Limits § 63.1346 Standards for new or reconstructed raw material dryers. (a) New or reconstructed raw material dryers located at facilities that are major sources can not...

  8. 75 FR 22741 - Raw Flexible Magnets From the People's Republic of China: Initiation of Countervailing Duty New...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-30

    ... DEPARTMENT OF COMMERCE International Trade Administration [C-570-923] Raw Flexible Magnets From... review of the countervailing duty (CVD) order on raw flexible magnets (RFM) from the People's Republic of... published on September 17, 2008. See Raw Flexible Magnets from the People's Republic of China...

  9. 40 CFR 89.412 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Raw gaseous exhaust sampling and...-IGNITION ENGINES Exhaust Emission Test Procedures § 89.412 Raw gaseous exhaust sampling and analytical... must be incorporated in each system used for raw testing under this subpart. (1) [Reserved] (2) The...

  10. 27 CFR 17.165 - Receipt of raw ingredients.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Receipt of raw ingredients. 17.165 Section 17.165 Alcohol, Tobacco Products and Firearms ALCOHOL AND TOBACCO TAX AND TRADE BUREAU... PRODUCTS Records § 17.165 Receipt of raw ingredients. For raw ingredients destined to be used in...

  11. 40 CFR 90.414 - Raw gaseous exhaust sampling and analytical system description.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 20 2011-07-01 2011-07-01 false Raw gaseous exhaust sampling and... OR BELOW 19 KILOWATTS Gaseous Exhaust Test Procedures § 90.414 Raw gaseous exhaust sampling and... between the muffler and the sample probe. The mixing chamber is an optional component of the raw gas...

  12. 75 FR 22740 - Raw Flexible Magnets From the People's Republic of China: Initiation of Antidumping Duty New...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-30

    ... DEPARTMENT OF COMMERCE International Trade Administration [A-570-922] Raw Flexible Magnets From... review of the antidumping duty order on raw flexible magnets (``magnets'') from the People's Republic of... September 17, 2008. See Antidumping Duty Order: Raw Flexible Magnets from the People's Republic of China, 73...

  13. 78 FR 77425 - Raw Flexible Magnets From the People's Republic of China: Final Results of Expedited Sunset Review

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-23

    ... DEPARTMENT OF COMMERCE International Trade Administration [C-570-923] Raw Flexible Magnets From... the countervailing duty (``CVD'') order on raw flexible magnets (``RFM'') from the People's Republic... of Expedited Sunset Review of the Countervailing Duty Order on Raw Flexible Magnets from the People's...

  14. 77 FR 31975 - Shiga Toxin-Producing Escherichia coli in Certain Raw Beef Products

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-31

    ...-2010-0023] Shiga Toxin-Producing Escherichia coli in Certain Raw Beef Products AGENCY: Food Safety and... testing raw beef manufacturing trimmings. SUMMARY: The Food Safety and Inspection Service (FSIS) is... (STEC), in addition to E. coli O157:H7, in raw beef manufacturing trimmings beginning June 4, 2012. FSIS...

  15. 40 CFR 1065.230 - Raw exhaust flow meter.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... flow meter. (a) Application. You may use measured raw exhaust flow, as follows: (1) Use the actual... the following cases, you may use a raw exhaust flow meter signal that does not give the actual value... consumed. (b) Component requirements. We recommend that you use a raw-exhaust flow meter that meets the...

  16. 19 CFR 151.22 - Estimated duties on raw sugar.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... 19 Customs Duties 2 2013-04-01 2013-04-01 false Estimated duties on raw sugar. 151.22 Section 151... THE TREASURY (CONTINUED) EXAMINATION, SAMPLING, AND TESTING OF MERCHANDISE Sugars, Sirups, and Molasses § 151.22 Estimated duties on raw sugar. Estimated duties shall be taken on raw sugar, as defined...

  17. 19 CFR 151.22 - Estimated duties on raw sugar.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 19 Customs Duties 2 2012-04-01 2012-04-01 false Estimated duties on raw sugar. 151.22 Section 151... THE TREASURY (CONTINUED) EXAMINATION, SAMPLING, AND TESTING OF MERCHANDISE Sugars, Sirups, and Molasses § 151.22 Estimated duties on raw sugar. Estimated duties shall be taken on raw sugar, as defined...

  18. 19 CFR 151.22 - Estimated duties on raw sugar.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 19 Customs Duties 2 2014-04-01 2014-04-01 false Estimated duties on raw sugar. 151.22 Section 151... THE TREASURY (CONTINUED) EXAMINATION, SAMPLING, AND TESTING OF MERCHANDISE Sugars, Sirups, and Molasses § 151.22 Estimated duties on raw sugar. Estimated duties shall be taken on raw sugar, as defined...

  19. ISMRM Raw Data Format: A Proposed Standard for MRI Raw Datasets

    PubMed Central

    Inati, Souheil J.; Naegele, Joseph D.; Zwart, Nicholas R.; Roopchansingh, Vinai; Lizak, Martin J.; Hansen, David C.; Liu, Chia-Ying; Atkinson, David; Kellman, Peter; Kozerke, Sebastian; Xue, Hui; Campbell-Washburn, Adrienne E.; Sørensen, Thomas S.; Hansen, Michael S.

    2015-01-01

    Purpose This work proposes the ISMRM Raw Data (ISMRMRD) format as a common MR raw data format, which promotes algorithm and data sharing. Methods A file format consisting of a flexible header and tagged frames of k-space data was designed. Application Programming Interfaces were implemented in C/C++, MATLAB, and Python. Converters for Bruker, General Electric, Philips, and Siemens proprietary file formats were implemented in C++. Raw data were collected using MRI scanners from four vendors, converted to ISMRMRD format, and reconstructed using software implemented in three programming languages (C++, MATLAB, Python). Results Images were obtained by reconstructing the raw data from all vendors. The source code, raw data, and images comprising this work are shared online, serving as an example of an image reconstruction project following a paradigm of reproducible research. Conclusion The proposed raw data format solves a practical problem for the MRI community. It may serve as a foundation for reproducible research and collaborations. The ISMRMRD format is a completely open and community-driven format, and the scientific community is invited (including commercial vendors) to participate either as users or developers. PMID:26822475

  20. Raw Milk Consumption: Risks and Benefits.

    PubMed

    Lucey, John A

    2015-07-01

    There continues to be considerable public debate on the possible benefits regarding the growing popularity of the consumption of raw milk. However, there are significant concerns by regulatory, or public health, organizations like the Food and Drug Administration and the Centers for Disease Control and Prevention because of risk of contracting milkborne illnesses if the raw milk is contaminated with human pathogens. This review describes why pasteurization of milk was introduced more than 100 years ago, how pasteurization helped to reduce the incidence of illnesses associated with raw milk consumption, and the prevalence of pathogens in raw milk. In some studies, up to a third of all raw milk samples contained pathogens, even when sourced from clinically healthy animals or from milk that appeared to be of good quality. This review critically evaluates some of the popularly suggested benefits of raw milk. Claims related to improved nutrition, prevention of lactose intolerance, or provision of "good" bacteria from the consumption of raw milk have no scientific basis and are myths. There are some epidemiological data that indicate that children growing up in a farming environment are associated with a decreased risk of allergy and asthma; a variety of environmental factors may be involved and there is no direct evidence that raw milk consumption is involved in any "protective" effect.

  1. 40 CFR 409.50 - Applicability; description of the Florida and Texas raw cane sugar processing subcategory.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... Florida and Texas raw cane sugar processing subcategory. 409.50 Section 409.50 Protection of Environment... CATEGORY Florida and Texas Raw Cane Sugar Processing Subcategory § 409.50 Applicability; description of the Florida and Texas raw cane sugar processing subcategory. The provisions of this subpart are applicable to...

  2. 40 CFR 409.50 - Applicability; description of the Florida and Texas raw cane sugar processing subcategory.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... Florida and Texas raw cane sugar processing subcategory. 409.50 Section 409.50 Protection of Environment... CATEGORY Florida and Texas Raw Cane Sugar Processing Subcategory § 409.50 Applicability; description of the Florida and Texas raw cane sugar processing subcategory. The provisions of this subpart are applicable to...

  3. 75 FR 53013 - Fiscal Year 2011 Tariff-rate Quota Allocations for Raw Cane Sugar, Refined and Specialty Sugar...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-30

    ... for Raw Cane Sugar, Refined and Specialty Sugar, and Sugar-containing Products; Revision AGENCY... August 17, 2010 concerning Fiscal Year 2011 tariff-rate quota allocations of raw cane sugar, refined and... announced that sugar entering the United States under the Fiscal Year 2011 raw sugar tariff-rate quota will...

  4. 40 CFR 1065.362 - Non-stoichiometric raw exhaust FID O2 interference verification.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 33 2011-07-01 2011-07-01 false Non-stoichiometric raw exhaust FID O2... Measurements § 1065.362 Non-stoichiometric raw exhaust FID O2 interference verification. (a) Scope and frequency. If you use FID analyzers for raw exhaust measurements from engines that operate in a non...

  5. 77 FR 63336 - Certain Reduced Folate Nutraceutical Products and L-Methylfolate Raw Ingredients Used Therein...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-10-16

    ... Products and L-Methylfolate Raw Ingredients Used Therein; Institution of Investigation Pursuant to United... nutraceutical products and l-methylfolate raw ingredients used therein by reason of infringement of certain...-methylfolate raw ingredients used therein that infringe one or more of claims 37, 39, 40, 47, 66, 67, 73, 76...

  6. 19 CFR 151.23 - Allowance for moisture in raw sugar.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 19 Customs Duties 2 2011-04-01 2011-04-01 false Allowance for moisture in raw sugar. 151.23... Molasses § 151.23 Allowance for moisture in raw sugar. Inasmuch as the absorption of sea water or moisture reduces the polariscopic test of sugar, there shall be no allowance on account of increased weight of raw...

  7. 19 CFR 151.23 - Allowance for moisture in raw sugar.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 19 Customs Duties 2 2010-04-01 2010-04-01 false Allowance for moisture in raw sugar. 151.23... Molasses § 151.23 Allowance for moisture in raw sugar. Inasmuch as the absorption of sea water or moisture reduces the polariscopic test of sugar, there shall be no allowance on account of increased weight of raw...

  8. 75 FR 61425 - Raw Flexible Magnets from the People's Republic of China: Rescission of New Shipper Review

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-10-05

    ... DEPARTMENT OF COMMERCE International Trade Administration [A-570-922] Raw Flexible Magnets from... the initiation of a new shipper review of the antidumping duty order on raw flexible magnets from the... forth in 19 CFR 351.214(b) and initiated an antidumping duty new shipper review. See Raw Flexible...

  9. 40 CFR 63.1347 - Standards for raw and finish mills.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 11 2010-07-01 2010-07-01 true Standards for raw and finish mills. 63... and Operating Limits § 63.1347 Standards for raw and finish mills. The owner or operator of each new or existing raw mill or finish mill at a facility which is a major source subject to the provisions...

  10. 75 FR 39612 - Allocation of Second Additional Fiscal Year (FY) 2010 In-Quota Volume for Raw Cane Sugar

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-09

    ...) 2010 In-Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative... the tariff-rate quota (TRQ) for imported raw cane sugar. DATES: Effective Date: July 9, 2010... Harmonized Tariff Schedule of the United States (HTS), the United States maintains TRQs for imports of raw...

  11. 40 CFR 1065.362 - Non-stoichiometric raw exhaust FID O2 interference verification.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 32 2010-07-01 2010-07-01 false Non-stoichiometric raw exhaust FID O2... Measurements § 1065.362 Non-stoichiometric raw exhaust FID O2 interference verification. (a) Scope and frequency. If you use FID analyzers for raw exhaust measurements from engines that operate in a non...

  12. The risk of Vibrio parahaemolyticus infections associated with consumption of raw oysters as affected by processing and distribution conditions in Taiwan

    USDA-ARS?s Scientific Manuscript database

    The steadily increased consumption of raw oysters in Taiwan warrants an assessment of the risk (probability of illness) of raw oyster consumption attributed by Vibrio parahaemolyticus. The aim of this study was to estimate the risk of V. parahaemolyticus infection associated with raw oyster consumpt...

  13. Simultaneous identification and molecular characterization of viruses associated with an apple tree with mosaic symptom

    USDA-ARS?s Scientific Manuscript database

    We conducted genomic sequencing to identify viruses associated with mosaic disease of an apple tree using the high-throughput sequencing (HTS) Illumina RNA-seq platform. The objective was to examine if rapid identification and characterization of viruses could be effectively achieved by RNA-seq anal...

  14. High-Quality Genome Sequence of the Highly Resistant Bacterium Staphylococcus haemolyticus, Isolated from a Neonatal Bloodstream Infection.

    PubMed

    Hosseinkhani, Farideh; Emaneini, Mohammad; van Leeuwen, Willem

    2017-07-20

    Using Illumina HiSeq and PacBio technologies, we sequenced the genome of the multidrug-resistant bacterium Staphylococcus haemolyticus , originating from a bloodstream infection in a neonate. The sequence data can be used as an accurate reference sequence. Copyright © 2017 Hosseinkhani et al.

  15. Selection and Management of DNA Markers for Use in Genomic Evaluation

    USDA-ARS?s Scientific Manuscript database

    A database was constructed to store genotypes for 50,972 single-nucleotide polymorphisms (SNP) from the Illumina BovineSNP50 BeadChip for over 30,000 animals. The database allows storage of multiple samples per animal and stores all SNP genotypes for a sample in a single row. An indicator specifies ...

  16. Candidate causative mutation on BTA18 associated with calving and conformation traits in Holstein bulls

    USDA-ARS?s Scientific Manuscript database

    Complementing quantitative methods with sequence data analysis is a major goal of the post-genome era of biology. In this study, we analyzed Illumina HiSeq sequence data derived from 11 US Holstein bulls in order to identify putative causal mutations associated with calving and conformation traits. ...

  17. Complete nucleotide sequence and genome organization of a novel allexivirus from alfalfa (Medicago sativa)

    USDA-ARS?s Scientific Manuscript database

    A new species of the family Alphaflexiviridae provisionally named Alfalfa virus S (AVS) was diagnosed in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by Illumina NGS technology ...

  18. The American cranberry mitochondrial genome reveals the presence of selenocysteine (tRNA-Sec and SECIS) insertion machinery in land plants

    USDA-ARS?s Scientific Manuscript database

    The American cranberry (Vaccinium macrocarpon Ait.) mitochondrial genome was assembled and reconstructed from whole genome 454 Roche GS-FLX and Illumina shotgun sequences. Compared with other Asterids, the reconstruction of the genome revealed an average size mitochondrion (459,678 nt) with comparat...

  19. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing

    USDA-ARS?s Scientific Manuscript database

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RN...

  20. High-quality genome of the peach scab pathogen, Venturia carpophila

    USDA-ARS?s Scientific Manuscript database

    Venturia carpophila causes peach scab, a disease that renders peach (Prunus persica) fruit unmarketable. We report a high-quality draft genome (36.9 Mb) of V. carpophila from an isolate collected from a peach tree in central Georgia. The genome was sequenced by MiSeq using an Illumina paired-end lib...

  1. Do you really know where this SNP goes?

    USDA-ARS?s Scientific Manuscript database

    The release of build 10.2 of the swine genome was a marked improvement over previous builds and has proven extremely useful. However, as most know, there are regions of the genome that this particular build does not accurately represent. For instance, nearly 25% of the 62,162 SNP on the Illumina Por...

  2. Sequencing Technologies Panel at SFAF

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turner, Steve; Fiske, Haley; Knight, Jim

    2010-06-02

    From left to right: Steve Turner of Pacific Biosciences, Haley Fiske of Illumina, Jim Knight of Roche, Michael Rhodes of Life Technologies and Peter Vander Horn of Life Technologies' Single Molecule Sequencing group discuss new sequencing technologies and applications on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  3. Near complete genome sequence of Clostridium paradoxum strain JW-YL-7

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lancaster, Andrew; Utturkar, Sagar M.; Poole, Farris

    2016-05-05

    Clostridium paradoxum strain JW-YL-7 is a moderately thermophilic anaerobic alkaliphile isolated from the municipal sewage treatment plant in Athens, GA. We report the near-complete genome sequence of C. paradoxum strain JW-YL-7 obtained by using PacBio DNA sequencing and Pilon for sequence assembly refinement with Illumina data.

  4. Genomic imputation and evaluation using high density Holstein genotypes

    USDA-ARS?s Scientific Manuscript database

    Genomic evaluations for 161,341 Holsteins were computed using 311,725 of the 777,962 markers on the Illumina high-density (HD) chip. Initial edits with 1,741 HD genotypes from 5 breeds revealed that 636,967 markers were usable but that half were redundant. Usable Holstein genotypes included 1,510 an...

  5. A new rainbow trout (Oncorhynchus mykiss) reference genome assembly

    USDA-ARS?s Scientific Manuscript database

    In an effort to improve the rainbow trout reference genome assembly, we have re-sequenced the doubled-haploid Swanson line using the longest available reads from the Illumina technology. Overall we generated over 510 million 260nt paired-end shotgun reads, and 1 billion 160nt mate-pair reads from f...

  6. New Generation Sequencing Technology Panel at SFAF-Part II

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fiske, Haley; Turner, Steve; Rhodes, Michael

    2009-05-27

    From left to right: Haley Fiske of Illumina Inc., Steve Turner of Pacific Biosciences, Michael Rhodes of Applied Biosystems, Patrice Milos of Helicos Biosciences and Tim Harkins of Roche Diagnostics answer questions in a forum moderated by Bob Fulton at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  7. New Generation Sequencing Technology Panel at SFAF-Part I

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fiske, Haley; Turner, Steve; Rhodes, Michael

    2009-05-27

    From left to right: Haley Fiske of Illumina Inc., Steve Turner of Pacific Biosciences, Michael Rhodes of Applied Biosystems, Patrice Milos of Helicos Biosciences and Tim Harkins of Roche Diagnostics answer questions in a forum moderated by Bob Fulton at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  8. Design of a bovine low-density SNP array optimized for imputation

    USDA-ARS?s Scientific Manuscript database

    The Illumina BovineLD BeadChip was designed to support imputation to higher density genotypes in dairy and beef breeds by including single-nucleotide polymorphisms (SNPs) that had a high minor allele frequency as well as uniform spacing across the genome except at the ends of the chromosome where de...

  9. Characterization of the complete chloroplast genome of wheel wingnut (Cyclocarya paliurus), an endemic in China

    Treesearch

    Yiheng Hu; Jing Yan; Xiaojia Feng; Meng Dang; Keith E. Woeste; Peng. Zhao

    2017-01-01

    The wheel wingnut (Cyclocarya paliurus) is an endemic species distributed in eastern and central China. Cyclocarya is a woody genus in the Juglandaceae used in medicine and horticulture. The complete chloroplast genome of C. paliurus was sequenced using the Illumina Hiseq 2500 platform. The total genome...

  10. Beatrice Hill Virus Represents a Novel Species in the Genus Tibrovirus (Mononegavirales: Rhabdoviridae)

    DTIC Science & Technology

    2017-01-26

    Huang et al. published a 5,734 nt-long contig of the Beatrice Hill virus genome, 48 which indicated that this virus most likely falls into the... Desktop sequencer. Illumina and SISPA-RACE adapter sequences were trimmed from 56 the sequencing reads using Cutadapt-1.2.1 (14), quality filtering

  11. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  12. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    USDA-ARS?s Scientific Manuscript database

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  13. Information Theoretical Analysis of a Bovine Gene Atlas Reveals Chromosomal Regions with Tissue Specific Gene Expression.

    USDA-ARS?s Scientific Manuscript database

    An essential step to understanding the genomic biology of any organism is to comprehensively survey its transcriptome. We present the Bovine Gene Atlas (BGA) a compendium of over 7.2 million unique 20 base Illumina DGE tags representing 100 tissue transcriptomes collected primarily from L1 Dominette...

  14. Polymorphic SSR markers for Plasmopara obducens (Peronosporaceae), the newly emergent downy mildew pathogen of Impatiens (Balsaminaceae)

    USDA-ARS?s Scientific Manuscript database

    Premise of the study: Microsatellite markers were developed for Plasmopara obducens, the causal agent of the newly emergent downy mildew disease of Impatiens walleriana. Methods and Results: A 151.2 Mb draft genome assembly was generated from P. obducens using Illumina technology and mined to identi...

  15. Deletion mutagenesis identifies a haploinsufficient role for gamma-zein in opaque-2 endosperm modification

    USDA-ARS?s Scientific Manuscript database

    Quality Protein Maize (QPM) is a hard kernel variant of the high-lysine mutant, opaque-2. Using gamma irradiation, we created opaque QPM variants to identify opaque-2 modifier genes and to investigate deletion mutagenesis combined with Illumina sequencing as a maize functional genomics tool. A K0326...

  16. funtooNorm: an R package for normalization of DNA methylation data when there are multiple cell or tissue types.

    PubMed

    Oros Klein, Kathleen; Grinek, Stepan; Bernatsky, Sasha; Bouchard, Luigi; Ciampi, Antonio; Colmegna, Ines; Fortin, Jean-Philippe; Gao, Long; Hivert, Marie-France; Hudson, Marie; Kobor, Michael S; Labbe, Aurelie; MacIsaac, Julia L; Meaney, Michael J; Morin, Alexander M; O'Donnell, Kieran J; Pastinen, Tomi; Van Ijzendoorn, Marinus H; Voisin, Gregory; Greenwood, Celia M T

    2016-02-15

    DNA methylation patterns are well known to vary substantially across cell types or tissues. Hence, existing normalization methods may not be optimal if they do not take this into account. We therefore present a new R package for normalization of data from the Illumina Infinium Human Methylation450 BeadChip (Illumina 450 K) built on the concepts in the recently published funNorm method, and introducing cell-type or tissue-type flexibility. funtooNorm is relevant for data sets containing samples from two or more cell or tissue types. A visual display of cross-validated errors informs the choice of the optimal number of components in the normalization. Benefits of cell (tissue)-specific normalization are demonstrated in three data sets. Improvement can be substantial; it is strikingly better on chromosome X, where methylation patterns have unique inter-tissue variability. An R package is available at https://github.com/GreenwoodLab/funtooNorm, and has been submitted to Bioconductor at http://bioconductor.org. © The Author 2015. Published by Oxford University Press.

  17. Assembly and analysis of a male sterile rubber tree mitochondrial genome reveals DNA rearrangement events and a novel transcript

    PubMed Central

    2014-01-01

    Background The rubber tree, Hevea brasiliensis, is an important plant species that is commercially grown to produce latex rubber in many countries. The rubber tree variety BPM 24 exhibits cytoplasmic male sterility, inherited from the variety GT 1. Results We constructed the rubber tree mitochondrial genome of a cytoplasmic male sterile variety, BPM 24, using 454 sequencing, including 8 kb paired-end libraries, plus Illumina paired-end sequencing. We annotated this mitochondrial genome with the aid of Illumina RNA-seq data and performed comparative analysis. We then compared the sequence of BPM 24 to the contigs of the published rubber tree, variety RRIM 600, and identified a rearrangement that is unique to BPM 24 resulting in a novel transcript containing a portion of atp9. Conclusions The novel transcript is consistent with changes that cause cytoplasmic male sterility through a slight reduction to ATP production efficiency. The exhaustive nature of the search rules out alternative causes and supports previous findings of novel transcripts causing cytoplasmic male sterility. PMID:24512148

  18. Small RNA Deep Sequencing and the Effects of microRNA408 on Root Gravitropic Bending in Arabidopsis

    NASA Astrophysics Data System (ADS)

    Li, Huasheng; Lu, Jinying; Sun, Qiao; Chen, Yu; He, Dacheng; Liu, Min

    2015-11-01

    MicroRNA (miRNA) is a non-coding small RNA composed of 20 to 24 nucleotides that influences plant root development. This study analyzed the miRNA expression in Arabidopsis root tip cells using Illumina sequencing and real-time PCR before (sample 0) and 15 min after (sample 15) a 3-D clinostat rotational treatment was administered. After stimulation was performed, the expression levels of seven miRNA genes, including Arabidopsis miR160, miR161, miR394, miR402, miR403, miR408, and miR823, were significantly upregulated. Illumina sequencing results also revealed two novel miRNAsthat have not been previously reported, The target genes of these miRNAs included pentatricopeptide repeat-containing protein and diadenosine tetraphosphate hydrolase. An overexpression vector of Arabidopsis miR408 was constructed and transferred to Arabidopsis plant. The roots of plants over expressing miR408 exhibited a slower reorientation upon gravistimulation in comparison with those of wild-type. This result indicate that miR408 could play a role in root gravitropic response.

  19. Preliminary report for analysis of genome wide mutations from four ciprofloxacin resistant B. anthracis Sterne isolates generated by Illumina, 454 sequencing and microarrays for DHS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jaing, Crystal; Vergez, Lisa; Hinckley, Aubree

    2011-06-21

    The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, Taqman PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. As the result of a different DHS project, we have selected for and isolated a large number of ciprofloxacin resistant B. anthracis Sterne isolates. These isolates vary in the concentrations of ciprofloxacin that they can tolerate, suggesting multiple mutations in the samples. In collaboration with University of Houston, Eureka Genomics and Oak Ridge National Laboratory, we analyzedmore » the ciprofloxacin resistant B. anthracis Sterne isolates by microarray hybridization, Illumina and Roche 454 sequencing to understand the error rates and sensitivity of the different methods. The report provides an assessment of the results and a complete set of all protocols used and all data generated along with information to interpret the protocols and data sets.« less

  20. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    PubMed Central

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse; Hansen, Thomas Arn; Kjartansdóttir, Kristín Rós; Guldberg Frøslev, Tobias; Snogdal Boutrup, Torsten; Nielsen, Lars Peter; Willerslev, Eske; Hansen, Anders J.

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads are derived. First, we showed by simulations that we can robustly infer the level of genetic diversity from short sequence reads. Second, we find that the measures of nucleotide diversity inferred from our retroviral sequences significantly exceed the level observed from Human Immunodeficiency Virus infections, prompting us to conclude that the novel retroviruses are both of endogenous origin. Through further simulations, we rule out the possibility that the observed elevated levels of nucleotide diversity are the result of co-infection with two closely related exogenous retroviruses. PMID:26493184

  1. Illumina MiSeq sequencing reveals the key microorganisms involved in partial nitritation followed by simultaneous sludge fermentation, denitrification and anammox process.

    PubMed

    Wang, Bo; Peng, Yongzhen; Guo, Yuanyuan; Zhao, Mengyue; Wang, Shuying

    2016-05-01

    A combined process including a partial nitritation SBR (PN-SBR) followed by a simultaneous sludge fermentation, denitrification and anammox reactor (SFDA) was established to treat low C/N domestic wastewater in this study. An average nitrite accumulation rate of 97.8% and total nitrogen of 9.4mg/L in the effluent was achieved during 140days' operation. The underlying mechanisms were investigated by using Illumina MiSeq sequencing to analyze the microbial community structures in the PN-SBR and SFDA. Results showed that the predominant bacterial phylum was Proteobacteria in the external waste activated sludge (WAS, added to the SFDA) and SFDA while Bacteroidetes in the PN-SBR. Further study indicated that in the PN-SBR, the dominant nitrobacteria, Nitrosomonas genus, facilitated nitritation and little nitrate was generated in the PN-SBR effluent. In the SFDA, the co-existence of functional microorganisms Thauera, Candidatus Anammoximicrobium and Pseudomonas were found to contribute to simultaneous sludge fermentation, denitrification and anammox. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.

    PubMed

    Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D

    2017-01-01

    This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.

  3. Effect of domestic cooking on the starch digestibility, predicted glycemic indices, polyphenol contents and alpha amylase inhibitory properties of beans (Phaseolis vulgaris) and breadfruit (Treculia africana).

    PubMed

    Chinedum, E; Sanni, S; Theressa, N; Ebere, A

    2018-01-01

    The effect of processing on starch digestibility, predicted glycemic indices (pGI), polyphenol contents and alpha amylase inhibitory properties of beans (Phaseolis vulgaris) and breadfruit (Treculia africana) was studied. Total starch ranged from 4.3 to 68.3g/100g, digestible starch ranged from 4.3 to 59.2 to 65.7g/100g for the raw and processed legumes; Resistance starch was not detected in most of the legumes except in fried breadfruit and the starches in both the raw and processed breadfruit were more rapidly digested than those from raw and cooked beans. Raw and processed breadfruit had higher hydrolysis curves than raw and processed beans with the amylolysis level in raw breadfruit close to that of white bread. Raw beans had a low glycemic index (GI); boiled beans and breadfruit had intermediate glycemic indices respectively while raw and fried breadfruit had high glycemic indices. Aqueous extracts of the food samples had weak α-amylase inhibition compared to acarbose. The raw and processed legumes contained considerable amounts of dietary phenols and flavonoids. The significant correlation (r=0.626) between α-amylase inhibitory actions of the legumes versus their total phenolic contents suggests the contribution of the phenolic compounds in these legumes to their α-amylase inhibitory properties. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. 75 FR 26316 - Allocation of Additional Fiscal Year (FY) 2010 In-Quota Volume for Raw Cane Sugar

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-05-11

    ...-Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative. ACTION: Notice...) for imported raw cane sugar. DATES: Effective Date: May 11, 2010. ADDRESSES: Inquiries may be mailed... (HTS), the United States maintains TRQs for imports of raw cane and refined sugar. Section 404(d)(3) of...

  5. 21 CFR 1210.26 - Permits for raw milk or cream.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Permits for raw milk or cream. 1210.26 Section... FEDERAL IMPORT MILK ACT Permit Control § 1210.26 Permits for raw milk or cream. Except as provided in § 1210.27, permits to ship or transport raw milk or cream into the United States will be granted only...

  6. 77 FR 57180 - Fiscal Year 2013 Tariff-rate Quota Allocations for Raw Cane Sugar, Refined and Specialty Sugar...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-17

    ... OFFICE OF THE TRADE REPRESENTATIVE Fiscal Year 2013 Tariff-rate Quota Allocations for Raw Cane... quantity of the tariff-rate quotas for imported raw cane sugar, refined and specialty sugar, and sugar... imports of raw cane sugar and refined sugar. Pursuant to Additional U.S. Note 8 to Chapter 17 of the HTS...

  7. 76 FR 21418 - Fiscal Year 2011 Allocation of Additional Tariff-Rate Quota Volume for Raw Cane Sugar and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-15

    ...-Rate Quota Volume for Raw Cane Sugar and Reallocation of Unused Fiscal Year 2011 Tariff-Rate Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative. ACTION: Notice...) for imported raw cane sugar and of country-by-country reallocations of the FY 2011 in-quota quantity...

  8. 40 CFR 409.60 - Applicability; description of the Hilo-Hamakua Coast of the Island of Hawaii raw cane sugar...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ...-Hamakua Coast of the Island of Hawaii raw cane sugar processing subcategory. 409.60 Section 409.60... PROCESSING POINT SOURCE CATEGORY Hilo-Hamakua Coast of the Island of Hawaii Raw Cane Sugar Processing Subcategory § 409.60 Applicability; description of the Hilo-Hamakua Coast of the Island of Hawaii raw cane...

  9. 75 FR 22095 - USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2010 Raw Sugar Tariff...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-04-27

    ... USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2010 Raw Sugar Tariff-Rate... announced a reassignment of surplus sugar under domestic cane sugar allotments of 200,000 short tons raw value (STRV) to imports, and increased the fiscal year (FY) 2010 raw sugar tariff-rate quota (TRQ) by...

  10. 40 CFR 63.1344 - Operating limits for kilns and in-line kiln/raw mills.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... kiln/raw mills. 63.1344 Section 63.1344 Protection of Environment ENVIRONMENTAL PROTECTION AGENCY... Industry Emission Standards and Operating Limits § 63.1344 Operating limits for kilns and in-line kiln/raw... specified in paragraph (b) of this section. The owner or operator of an in-line kiln/raw mill subject to a D...

  11. 78 FR 77423 - Raw Flexible Magnets From the People's Republic of China and Taiwan: Final Results of the...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-12-23

    ... DEPARTMENT OF COMMERCE International Trade Administration [A-570-922, A-583-842] Raw Flexible... orders on raw flexible magnets from the People's Republic of China and Taiwan pursuant to section 751(c... antidumping duty orders on raw flexible magnets from the People's Republic of China and Taiwan. Scope of the...

  12. 40 CFR 63.1343 - What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker piles?

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ..., clinker coolers, raw material dryers, and open clinker piles? 63.1343 Section 63.1343 Protection of... What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker piles? (a..., clinker cooler, and raw material dryer. All dioxin D/F, HCl, and total hydrocarbon (THC) emission limits...

  13. 75 FR 38764 - USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2010 Raw Sugar Tariff...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-06

    ... USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2010 Raw Sugar Tariff-Rate... announced a reassignment of surplus sugar under domestic cane sugar allotments of 300,000 short tons raw value (STRV) to imports, and increased the fiscal year (FY) 2010 raw sugar tariff-rate quota (TRQ) by...

  14. 21 CFR 1210.26 - Permits for raw milk or cream.

    Code of Federal Regulations, 2010 CFR

    2010-04-01

    ... 21 Food and Drugs 8 2010-04-01 2010-04-01 false Permits for raw milk or cream. 1210.26 Section... FEDERAL IMPORT MILK ACT Permit Control § 1210.26 Permits for raw milk or cream. Except as provided in § 1210.27, permits to ship or transport raw milk or cream into the United States will be granted only...

  15. 76 FR 20305 - USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2011 Raw Sugar Tariff...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-12

    ... USDA Reassigns Domestic Cane Sugar Allotments and Increases the Fiscal Year 2011 Raw Sugar Tariff-Rate... announced a reassignment of surplus sugar under domestic cane sugar allotments of 325,000 short tons raw value (STRV) to imports, and increased the fiscal year (FY) 2011 raw sugar tariff-rate quota (TRQ) by...

  16. 77 FR 25012 - Fiscal Year 2012 Allocation of Additional Tariff-Rate Quota Volume for Raw Cane Sugar and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-04-26

    ...-Rate Quota Volume for Raw Cane Sugar and Reallocation of Unused Fiscal Year 2012 Tariff-Rate Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative. ACTION: Notice...) for imported raw cane sugar and of country-by-country reallocations of the FY 2012 in-quota quantity...

  17. 40 CFR 409.60 - Applicability; description of the Hilo-Hamakua Coast of the Island of Hawaii raw cane sugar...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ...-Hamakua Coast of the Island of Hawaii raw cane sugar processing subcategory. 409.60 Section 409.60... PROCESSING POINT SOURCE CATEGORY Hilo-Hamakua Coast of the Island of Hawaii Raw Cane Sugar Processing Subcategory § 409.60 Applicability; description of the Hilo-Hamakua Coast of the Island of Hawaii raw cane...

  18. 76 FR 50285 - Fiscal Year 2012 Tariff-Rate Quota Allocations for Raw Cane Sugar, Refined and Specialty Sugar...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-12

    ... for Raw Cane Sugar, Refined and Specialty Sugar and Sugar-Containing Products AGENCY: Office of the... quantity of the tariff-rate quotas for imported raw cane sugar, refined and specialty sugar and sugar...), the United States maintains tariff-rate quotas (TRQs) for imports of raw cane sugar and refined sugar...

  19. 7 CFR 205.307 - Labeling of nonretail containers used for only shipping or storage of raw or processed...

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... shipping or storage of raw or processed agricultural products labeled as â100 percent organic,â âorganic,â... containers used for only shipping or storage of raw or processed agricultural products labeled as “100...) Nonretail containers used only to ship or store raw or processed agricultural product labeled as containing...

  20. 75 FR 14479 - Reallocation of Unused Fiscal Year 2010 Tariff-Rate Quota Volume for Raw Cane Sugar

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-03-25

    ...-Rate Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative. ACTION... (TRQ) for imported raw cane sugar. DATES: Effective Date: March 25, 2010. ADDRESSES: Inquiries may be... (HTS), the United States maintains TRQs for imports of raw cane and refined sugar. Section 404(d)(3) of...

  1. 78 FR 57445 - Fiscal Year 2014 WTO Tariff-Rate Quota Allocations for Raw Cane Sugar, Refined and Specialty...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-18

    ... Allocations for Raw Cane Sugar, Refined and Specialty Sugar, and Sugar-Containing Products AGENCY: Office of..., 2013, through Sept. 30, 2014) in-quota quantity of the tariff-rate quotas (TRQs) for imported raw cane...), the United States maintains TRQs for imports of raw cane sugar and refined sugar (syrups and molasses...

  2. 7 CFR 205.307 - Labeling of nonretail containers used for only shipping or storage of raw or processed...

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... shipping or storage of raw or processed agricultural products labeled as â100 percent organic,â âorganic,â... containers used for only shipping or storage of raw or processed agricultural products labeled as “100...) Nonretail containers used only to ship or store raw or processed agricultural product labeled as containing...

  3. 26 CFR 1.472-1 - Last-in, first-out inventories.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... may elect to have such method apply to the raw materials only (including those included in goods in... adjustments are confined to costs of the raw material in the inventory and the cost of the raw material in... that the opening inventory had 10 units of raw material, 10 units of goods in process, and 10 units of...

  4. 40 CFR 63.1343 - What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker storage...

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ..., clinker coolers, raw material dryers, and open clinker storage piles? 63.1343 Section 63.1343 Protection... Limits § 63.1343 What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker... associated with that kiln, clinker cooler, raw material dryer, and open clinker storage pile. All D/F, HCl...

  5. 40 CFR 63.1343 - What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker storage...

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ..., clinker coolers, raw material dryers, and open clinker storage piles? 63.1343 Section 63.1343 Protection... Limits § 63.1343 What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker... associated with that kiln, clinker cooler, raw material dryer, and open clinker storage pile. All D/F, HCl...

  6. 40 CFR 63.1343 - What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker piles?

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ..., clinker coolers, raw material dryers, and open clinker piles? 63.1343 Section 63.1343 Protection of... What standards apply to my kilns, clinker coolers, raw material dryers, and open clinker piles? (a..., clinker cooler, and raw material dryer. All dioxin D/F, HCl, and total hydrocarbon (THC) emission limits...

  7. 26 CFR 1.472-1 - Last-in, first-out inventories.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... may elect to have such method apply to the raw materials only (including those included in goods in... adjustments are confined to costs of the raw material in the inventory and the cost of the raw material in... that the opening inventory had 10 units of raw material, 10 units of goods in process, and 10 units of...

  8. 26 CFR 1.472-1 - Last-in, first-out inventories.

    Code of Federal Regulations, 2013 CFR

    2013-04-01

    ... may elect to have such method apply to the raw materials only (including those included in goods in... adjustments are confined to costs of the raw material in the inventory and the cost of the raw material in... that the opening inventory had 10 units of raw material, 10 units of goods in process, and 10 units of...

  9. 26 CFR 1.472-1 - Last-in, first-out inventories.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... may elect to have such method apply to the raw materials only (including those included in goods in... adjustments are confined to costs of the raw material in the inventory and the cost of the raw material in... that the opening inventory had 10 units of raw material, 10 units of goods in process, and 10 units of...

  10. [Which one is more important, raw materials or productive technology?--a case study for quality consistency control of Gegen Qinlian decoction].

    PubMed

    Zhong, Wen; Chen, Sha; Zhang, Jun; Wang, Yu-Sheng; Liu, An

    2016-03-01

    To investigate the effect of Chinese medicine raw materials and production technology on quality consistency of Chinese patent medicines with Gegen Qinlian decoction as an example, and establish a suitable method for the quality consistency control of Chinese patent medicines. The results showed that the effect of production technology on the quality consistency was generally not more than 5%, while the effect of raw materials was even more than 30%, indicating that the effect of raw materials was much greater than that of the production technology. In this study, blend technology was used to improve the quality consistency of raw materials. As a result, the difference between the product produced by raw materials and reference groups was less than 5%, thus increasing the quality consistence of finished products. The results showed that under the current circumstances, the main factor affecting the quality consistency of Chinese patent medicines was raw materials, so we shall pay more attention to the quality of Chinese medicine's raw materials. Finally, a blend technology can improve the quality consistency of Chinese patent medicines. Copyright© by the Chinese Pharmaceutical Association.

  11. Production of raw cassava starch-degrading enzyme by Penicillium and its use in conversion of raw cassava flour to ethanol.

    PubMed

    Lin, Hai-Juan; Xian, Liang; Zhang, Qiu-Jiang; Luo, Xue-Mei; Xu, Qiang-Sheng; Yang, Qi; Duan, Cheng-Jie; Liu, Jun-Liang; Tang, Ji-Liang; Feng, Jia-Xun

    2011-06-01

    A newly isolated strain Penicillium sp. GXU20 produced a raw starch-degrading enzyme which showed optimum activity towards raw cassava starch at pH 4.5 and 50 °C. Maximum raw cassava starch-degrading enzyme (RCSDE) activity of 20 U/ml was achieved when GXU20 was cultivated under optimized conditions using wheat bran (3.0% w/v) and soybean meal (2.5% w/v) as carbon and nitrogen sources at pH 5.0 and 28 °C. This represented about a sixfold increment as compared with the activity obtained under basal conditions. Starch hydrolysis degree of 95% of raw cassava flour (150 g/l) was achieved after 72 h of digestion by crude RCSDE (30 U/g flour). Ethanol yield reached 53.3 g/l with fermentation efficiency of 92% after 48 h of simultaneous saccharification and fermentation of raw cassava flour at 150 g/l using the RCSDE (30 U/g flour), carried out at pH 4.0 and 40 °C. This strain and its RCSDE have potential applications in processing of raw cassava starch to ethanol.

  12. Improvement in methanol production by regulating the composition of synthetic gas mixture and raw biogas.

    PubMed

    Patel, Sanjay K S; Mardina, Primata; Kim, Dongwook; Kim, Sang-Yong; Kalia, Vipin C; Kim, In-Won; Lee, Jung-Kul

    2016-10-01

    Raw biogas can be an alternative feedstock to pure methane (CH4) for methanol production. In this investigation, we evaluated the methanol production potential of Methylosinus sporium from raw biogas originated from an anaerobic digester. Furthermore, the roles of different gases in methanol production were investigated using synthetic gas mixtures of CH4, carbon dioxide (CO2), and hydrogen (H2). Maximum methanol production was 5.13, 4.35, 6.28, 7.16, 0.38, and 0.36mM from raw biogas, CH4:CO2, CH4:H2, CH4:CO2:H2, CO2, and CO2:H2, respectively. Supplementation of H2 into raw biogas increased methanol production up to 3.5-fold. Additionally, covalent immobilization of M. sporium on chitosan resulted in higher methanol production from raw biogas. This study provides a suitable approach to improve methanol production using low cost raw biogas as a feed containing high concentrations of H2S (0.13%). To our knowledge, this is the first report on methanol production from raw biogas, using immobilized cells of methanotrophs. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. Sensory quality and appropriateness of raw and boiled Jerusalem artichoke tubers (Helianthus tuberosus L.).

    PubMed

    Bach, Vibe; Kidmose, Ulla; Thybo, Anette K; Edelenbos, Merete

    2013-03-30

    The aim of the present study was to investigate the sensory attributes, dry matter and sugar content of five varieties of Jerusalem artichoke tubers and their relation to the appropriateness of the tubers for raw and boiled preparation. Sensory evaluation of raw and boiled Jerusalem artichoke tubers was performed by a trained sensory panel and a semi-trained consumer panel of 49 participants, who also evaluated the appropriateness of the tubers for raw and boiled preparation. The appropriateness of raw Jerusalem artichoke tubers was related to Jerusalem artichoke flavour, green nut flavour, sweetness and colour intensity, whereas the appropriateness of boiled tubers was related to celeriac aroma, sweet aroma, sweetness and colour intensity. In both preparations the variety Dwarf stood out from the others by being the least appropriate tuber. A few sensory attributes can be used as predictors of the appropriateness of Jerusalem artichoke tubers for raw and boiled consumption. Knowledge on the quality of raw and boiled Jerusalem artichoke tubers can be used to inform consumers on the right choice of raw material and thereby increase the consumption of the vegetable. © 2012 Society of Chemical Industry.

  14. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology

    PubMed Central

    Pareek, Chandra Shekhar; Smoczyński, Rafał; Kadarmideen, Haja N.; Dziuba, Piotr; Błaszczyk, Paweł; Sikora, Marcin; Walendzik, Paulina; Grzybowski, Tomasz; Pierzchała, Mariusz; Horbańczuk, Jarosław; Szostak, Agnieszka; Ogluszka, Magdalena; Zwierzchowski, Lech; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Wąsowicz, Krzysztof; Gelfand, Brian; Feng, Yaping; Kumar, Dibyendu

    2016-01-01

    Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs) within potential candidate genes (CGs) or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF), Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis of 76 QTLs/CGs with RNA-seq data identified KCNIP4, CCSER1, DPP6, MAP3K5 and GHR CGs with highest SNPs hit loci in all three breeds and developmental ages. However, CAST CG with more than 100 SNPs hits were observed only in Polish HF and Hereford breeds.These findings are important for identification and construction of novel tissue specific SNP-db and breed specific SNP-db dataset by screening of putative SNPs according to QTL db and candidate genes for bovine growth and reproduction traits, one can develop genomic selection strategies for growth and reproductive traits. PMID:27606429

  15. Single Nucleotide Polymorphism Discovery in Bovine Pituitary Gland Using RNA-Seq Technology.

    PubMed

    Pareek, Chandra Shekhar; Smoczyński, Rafał; Kadarmideen, Haja N; Dziuba, Piotr; Błaszczyk, Paweł; Sikora, Marcin; Walendzik, Paulina; Grzybowski, Tomasz; Pierzchała, Mariusz; Horbańczuk, Jarosław; Szostak, Agnieszka; Ogluszka, Magdalena; Zwierzchowski, Lech; Czarnik, Urszula; Fraser, Leyland; Sobiech, Przemysław; Wąsowicz, Krzysztof; Gelfand, Brian; Feng, Yaping; Kumar, Dibyendu

    2016-01-01

    Examination of bovine pituitary gland transcriptome by strand-specific RNA-seq allows detection of putative single nucleotide polymorphisms (SNPs) within potential candidate genes (CGs) or QTLs regions as well as to understand the genomics variations that contribute to economic trait. Here we report a breed-specific model to successfully perform the detection of SNPs in the pituitary gland of young growing bulls representing Polish Holstein-Friesian (HF), Polish Red, and Hereford breeds at three developmental ages viz., six months, nine months, and twelve months. A total of 18 bovine pituitary gland polyA transcriptome libraries were prepared and sequenced using the Illumina NextSeq 500 platform. Sequenced FastQ databases of all 18 young bulls were submitted to NCBI-SRA database with NCBI-SRA accession numbers SRS1296732. For the investigated young bulls, a total of 113,882,3098 raw paired-end reads with a length of 156 bases were obtained, resulting in an approximately 63 million paired-end reads per library. Breed-wise, a total of 515.38, 215.39, and 408.04 million paired-end reads were obtained for Polish HF, Polish Red, and Hereford breeds, respectively. Burrows-Wheeler Aligner (BWA) read alignments showed 93.04%, 94.39%, and 83.46% of the mapped sequencing reads were properly paired to the Polish HF, Polish Red, and Hereford breeds, respectively. Constructed breed-specific SNP-db of three cattle breeds yielded at 13,775,885 SNPs. On an average 765,326 breed-specific SNPs per young bull were identified. Using two stringent filtering parameters, i.e., a minimum 10 SNP reads per base with an accuracy ≥ 90% and a minimum 10 SNP reads per base with an accuracy = 100%, SNP-db records were trimmed to construct a highly reliable SNP-db. This resulted in a reduction of 95,7% and 96,4% cut-off mark of constructed raw SNP-db. Finally, SNP discoveries using RNA-Seq data were validated by KASP™ SNP genotyping assay. The comprehensive QTLs/CGs analysis of 76 QTLs/CGs with RNA-seq data identified KCNIP4, CCSER1, DPP6, MAP3K5 and GHR CGs with highest SNPs hit loci in all three breeds and developmental ages. However, CAST CG with more than 100 SNPs hits were observed only in Polish HF and Hereford breeds.These findings are important for identification and construction of novel tissue specific SNP-db and breed specific SNP-db dataset by screening of putative SNPs according to QTL db and candidate genes for bovine growth and reproduction traits, one can develop genomic selection strategies for growth and reproductive traits.

  16. The impact of methylation quantitative trait loci (mQTLs) on active smoking-related DNA methylation changes.

    PubMed

    Gao, Xu; Thomsen, Hauke; Zhang, Yan; Breitling, Lutz Philipp; Brenner, Hermann

    2017-01-01

    Methylation quantitative trait loci (mQTLs) are the genetic variants that may affect the DNA methylation patterns of CpG sites. However, their roles in influencing the disturbances of smoking-related epigenetic changes have not been well established. This study was conducted to address whether mQTLs exist in the vicinity of smoking-related CpG sites (± 50 kb) and to examine their associations with smoking exposure and all-cause mortality in older adults. We obtained DNA methylation profiles in whole blood samples by Illumina Infinium Human Methylation 450 BeadChip array of two independent subsamples of the ESTHER study (discovery set, n  = 581; validation set, n  = 368) and their corresponding genotyping data using the Illumina Infinium OncoArray BeadChip. After correction for multiple testing (FDR), we successfully identified that 70 out of 151 previously reported smoking-related CpG sites were significantly associated with 192 SNPs within the 50 kb search window of each locus. The 192 mQTLs significantly influenced the active smoking-related DNA methylation changes, with percentage changes ranging from 0.01 to 18.96%, especially for the weakly/moderately smoking-related CpG sites. However, these identified mQTLs were not directly associated with active smoking exposure or all-cause mortality. Our findings clearly demonstrated that if not dealt with properly, the mQTLs might impair the power of epigenetic-based models of smoking exposure to a certain extent. In addition, such genetic variants could be the key factor to distinguish between the heritable and smoking-induced impact on epigenome disparities. These mQTLs are of special importance when DNA methylation markers measured by Illumina Infinium assay are used for any comparative population studies related to smoking-related cancers and chronic diseases.

  17. Qualitative and quantitative assessment of Illumina's forensic STR and SNP kits on MiSeq FGx™.

    PubMed

    Sharma, Vishakha; Chow, Hoi Yan; Siegel, Donald; Wurmbach, Elisa

    2017-01-01

    Massively parallel sequencing (MPS) is a powerful tool transforming DNA analysis in multiple fields ranging from medicine, to environmental science, to evolutionary biology. In forensic applications, MPS offers the ability to significantly increase the discriminatory power of human identification as well as aid in mixture deconvolution. However, before the benefits of any new technology can be employed, a thorough evaluation of its quality, consistency, sensitivity, and specificity must be rigorously evaluated in order to gain a detailed understanding of the technique including sources of error, error rates, and other restrictions/limitations. This extensive study assessed the performance of Illumina's MiSeq FGx MPS system and ForenSeq™ kit in nine experimental runs including 314 reaction samples. In-depth data analysis evaluated the consequences of different assay conditions on test results. Variables included: sample numbers per run, targets per run, DNA input per sample, and replications. Results are presented as heat maps revealing patterns for each locus. Data analysis focused on read numbers (allele coverage), drop-outs, drop-ins, and sequence analysis. The study revealed that loci with high read numbers performed better and resulted in fewer drop-outs and well balanced heterozygous alleles. Several loci were prone to drop-outs which led to falsely typed homozygotes and therefore to genotype errors. Sequence analysis of allele drop-in typically revealed a single nucleotide change (deletion, insertion, or substitution). Analyses of sequences, no template controls, and spurious alleles suggest no contamination during library preparation, pooling, and sequencing, but indicate that sequencing or PCR errors may have occurred due to DNA polymerase infidelities. Finally, we found utilizing Illumina's FGx System at recommended conditions does not guarantee 100% outcomes for all samples tested, including the positive control, and required manual editing due to low read numbers and/or allele drop-in. These findings are important for progressing towards implementation of MPS in forensic DNA testing.

  18. Microbial Populations in Naked Neck Chicken Ceca Raised on Pasture Flock Fed with Commercial Yeast Cell Wall Prebiotics via an Illumina MiSeq Platform

    PubMed Central

    Park, Si Hong; Lee, Sang In; Ricke, Steven C.

    2016-01-01

    Prebiotics are non-digestible carbohydrate dietary supplements that selectively stimulate the growth of one or more beneficial bacteria in the gastrointestinal tract of the host. These bacteria can inhibit colonization of pathogenic bacteria by producing antimicrobial substances such as short chain fatty acids (SCFAs) and competing for niches with pathogens within the gut. Pasture flock chickens are generally raised outdoors with fresh grass, sunlight and air, which represents different environmental growth conditions compared to conventionally raised chickens. The purpose of this study was to evaluate the difference in microbial populations from naked neck chicken ceca fed with commercial prebiotics derived from brewer’s yeast cell wall via an Illumina MiSeq platform. A total of 147 day-of-hatch naked neck chickens were distributed into 3 groups consisted of 1) C: control (no prebiotic), 2) T1: Biolex® MB40 with 0.2%, and 3) T2: Leiber® ExCel with 0.2%, consistently supplemented prebiotics during the experimental period. At 8 weeks, a total of 15 birds from each group were randomly selected and ceca removed for DNA extraction. The Illumina Miseq platform based on V4 region of 16S rRNA gene was applied for microbiome analysis. Both treatments exhibited limited impact on the microbial populations at the phylum level, with no significant differences in the OTU number of Bacteroidetes among groups and an increase of Proteobacteria OTUs for the T1 (Biolex® MB40) group. In addition there was a significant increase of genus Faecalibacterium OTU, phylum Firmicutes. According to the development of next generation sequencing (NGS), microbiome analysis based on 16S rRNA gene proved to be informative on the prebiotic impact on poultry gut microbiota in pasture-raised naked neck birds. PMID:26992104

  19. Microbial Populations in Naked Neck Chicken Ceca Raised on Pasture Flock Fed with Commercial Yeast Cell Wall Prebiotics via an Illumina MiSeq Platform.

    PubMed

    Park, Si Hong; Lee, Sang In; Ricke, Steven C

    2016-01-01

    Prebiotics are non-digestible carbohydrate dietary supplements that selectively stimulate the growth of one or more beneficial bacteria in the gastrointestinal tract of the host. These bacteria can inhibit colonization of pathogenic bacteria by producing antimicrobial substances such as short chain fatty acids (SCFAs) and competing for niches with pathogens within the gut. Pasture flock chickens are generally raised outdoors with fresh grass, sunlight and air, which represents different environmental growth conditions compared to conventionally raised chickens. The purpose of this study was to evaluate the difference in microbial populations from naked neck chicken ceca fed with commercial prebiotics derived from brewer's yeast cell wall via an Illumina MiSeq platform. A total of 147 day-of-hatch naked neck chickens were distributed into 3 groups consisted of 1) C: control (no prebiotic), 2) T1: Biolex® MB40 with 0.2%, and 3) T2: Leiber® ExCel with 0.2%, consistently supplemented prebiotics during the experimental period. At 8 weeks, a total of 15 birds from each group were randomly selected and ceca removed for DNA extraction. The Illumina Miseq platform based on V4 region of 16S rRNA gene was applied for microbiome analysis. Both treatments exhibited limited impact on the microbial populations at the phylum level, with no significant differences in the OTU number of Bacteroidetes among groups and an increase of Proteobacteria OTUs for the T1 (Biolex® MB40) group. In addition there was a significant increase of genus Faecalibacterium OTU, phylum Firmicutes. According to the development of next generation sequencing (NGS), microbiome analysis based on 16S rRNA gene proved to be informative on the prebiotic impact on poultry gut microbiota in pasture-raised naked neck birds.

  20. Development and validation of microsatellite markers for Brachiaria ruziziensis obtained by partial genome assembly of Illumina single-end reads

    PubMed Central

    2013-01-01

    Background Brachiaria ruziziensis is one of the most important forage species planted in the tropics. The application of genomic tools to aid the selection of superior genotypes can provide support to B. ruziziensis breeding programs. However, there is a complete lack of information about the B. ruziziensis genome. Also, the availability of genomic tools, such as molecular markers, to support B. ruziziensis breeding programs is rather limited. Recently, next-generation sequencing technologies have been applied to generate sequence data for the identification of microsatellite regions and primer design. In this study, we present a first validated set of SSR markers for Brachiaria ruziziensis, selected from a de novo partial genome assembly of single-end Illumina reads. Results A total of 85,567 perfect microsatellite loci were detected in contigs with a minimum 10X coverage. We selected a set of 500 microsatellite loci identified in contigs with minimum 100X coverage for primer design and synthesis, and tested a subset of 269 primer pairs, 198 of which were polymorphic on 11 representative B. ruziziensis accessions. Descriptive statistics for these primer pairs are presented, as well as estimates of marker transferability to other relevant brachiaria species. Finally, a set of 11 multiplex panels containing the 30 most informative markers was validated and proposed for B. ruziziensis genetic analysis. Conclusions We show that the detection and development of microsatellite markers from genome assembled Illumina single-end DNA sequences is highly efficient. The developed markers are readily suitable for genetic analysis and marker assisted selection of Brachiaria ruziziensis. The use of this approach for microsatellite marker development is promising for species with limited genomic information, whose breeding programs would benefit from the use of genomic tools. To our knowledge, this is the first set of microsatellite markers developed for this important species. PMID:23324172

  1. Analysis of the global transcriptome of longan (Dimocarpus longan Lour.) embryogenic callus using Illumina paired-end sequencing

    PubMed Central

    2013-01-01

    Background Longan is a tropical/subtropical fruit tree of great economic importance in Southeast Asia. Progress in understanding molecular mechanisms of longan embryogenesis, which is the primary influence on fruit quality and yield, is slowed by lack of transcriptomic and genomic information. Illumina second generation sequencing, which is suitable for generating enormous numbers of transcript sequences that can be used for functional genomic analysis of longan. Results In this study, a longan embryogenic callus (EC) cDNA library was sequenced using an Illumina HiSeq 2000 system. A total of 64,876,258 clean reads comprising 5.84 Gb of nucleotides were assembled into 68,925 unigenes of 448-bp mean length, with unigenes ≥1000 bp accounting for 8.26% of the total. Using BLASTx, 40,634 unigenes were found to have significant similarity with accessions in Nr and Swiss- Prot databases. Of these, 38,845 unigenes were assigned to 43 GO sub-categories and 17,118 unigenes were classified into 25 COG sub-groups. In addition, 17,306 unigenes mapped to 199 KEGG pathways, with the categories of Metabolic pathways, Plant-pathogen interaction, Biosynthesis of secondary metabolites, and Genetic information processing being well represented. Analyses of unigenes ≥1000 bp revealed 328 embryogenesis-related unigenes as well as numerous unigenes expressed in EC associated with functions of reproductive growth, such as flowering, gametophytogenesis, and fertility, and vegetative growth, such as root and shoot growth. Furthermore, 23 unigenes related to embryogenesis and reproductive and vegetative growth were validated by quantitative real time PCR (qPCR) in samples from different stages of longan somatic embryogenesis (SE); their differentially expressions in the various embryogenic cultures indicated their possible roles in longan SE. Conclusions The quantity and variety of expressed EC genes identified in this study is sufficient to serve as a global transcriptome dataset for longan EC and to provide more molecular resources for longan functional genomics. PMID:23957614

  2. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.

  3. Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity

    PubMed Central

    Passera, Alessandro; Marcolungo, Luca; Brasca, Milena; Quaglino, Fabio; Cantaloni, Chiara; Delledonne, Massimo

    2018-01-01

    Bacteria of the Paenibacillus genus are becoming important in many fields of science, including agriculture, for their positive effects on the health of plants. However, there are little information available on this genus compared to other bacteria (such as Bacillus or Pseudomonas), especially when considering genomic information. Sequencing the genomes of plant-beneficial bacteria is a crucial step to identify the genetic elements underlying the adaptation to life inside a plant host and, in particular, which of these features determine the differences between a helpful microorganism and a pathogenic one. In this study, we have characterized the genome of Paenibacillus pasadenensis, strain R16, recently investigated for its antifungal activities and plant-associated features. An hybrid assembly approach was used integrating the very precise reads obtained by Illumina technology and long fragments acquired with Oxford Nanopore Technology (ONT) sequencing. De novo genome assembly based solely on Illumina reads generated a relatively fragmented assembly of 5.72 Mbp in 99 ungapped sequences with an N50 length of 544 Kbp; hybrid assembly, integrating Illumina and ONT reads, improved the assembly quality, generating a genome of 5.75 Mbp, organized in 6 contigs with an N50 length of 3.4 Mbp. Annotation of the latter genome identified 4987 coding sequences, of which 1610 are hypothetical proteins. Enrichment analysis identified pathways of particular interest for the endophyte biology, including the chitin-utilization pathway and the incomplete siderophore pathway which hints at siderophore parasitism. In addition the analysis led to the identification of genes for the production of terpenes, as for example farnesol, that was hypothesized as the main antifungal molecule produced by the strain. The functional analysis on the genome confirmed several plant-associated, plant-growth promotion, and biocontrol traits of strain R16, thus adding insights in the genetic bases of these complex features, and of the Paenibacillus genus in general. PMID:29351296

  4. Transcriptome analysis of Houttuynia cordata Thunb. by Illumina paired-end RNA sequencing and SSR marker discovery.

    PubMed

    Wei, Lin; Li, Shenghua; Liu, Shenggui; He, Anna; Wang, Dan; Wang, Jie; Tang, Yulian; Wu, Xianjin

    2014-01-01

    Houttuynia cordata Thunb. is an important traditional medical herb in China and other Asian countries, with high medicinal and economic value. However, a lack of available genomic information has become a limitation for research on this species. Thus, we carried out high-throughput transcriptomic sequencing of H. cordata to generate an enormous transcriptome sequence dataset for gene discovery and molecular marker development. Illumina paired-end sequencing technology produced over 56 million sequencing reads from H. cordata mRNA. Subsequent de novo assembly yielded 63,954 unigenes, 39,982 (62.52%) and 26,122 (40.84%) of which had significant similarity to proteins in the NCBI nonredundant protein and Swiss-Prot databases (E-value <10(-5)), respectively. Of these annotated unigenes, 30,131 and 15,363 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. In addition, 24,434 (38.21%) unigenes were mapped onto 128 pathways using the KEGG pathway database and 17,964 (44.93%) unigenes showed homology to Vitis vinifera (Vitaceae) genes in BLASTx analysis. Furthermore, 4,800 cDNA SSRs were identified as potential molecular markers. Fifty primer pairs were randomly selected to detect polymorphism among 30 samples of H. cordata; 43 (86%) produced fragments of expected size, suggesting that the unigenes were suitable for specific primer design and of high quality, and the SSR marker could be widely used in marker-assisted selection and molecular breeding of H. cordata in the future. This is the first application of Illumina paired-end sequencing technology to investigate the whole transcriptome of H. cordata and to assemble RNA-seq reads without a reference genome. These data should help researchers investigating the evolution and biological processes of this species. The SSR markers developed can be used for construction of high-resolution genetic linkage maps and for gene-based association analyses in H. cordata. This work will enable future functional genomic research and research into the distinctive active constituents of this genus.

  5. Development and Evaluation of a 9K SNP Array for Peach by Internationally Coordinated SNP Detection and Validation in Breeding Germplasm

    PubMed Central

    Scalabrin, Simone; Gilmore, Barbara; Lawley, Cynthia T.; Gasic, Ksenija; Micheletti, Diego; Rosyara, Umesh R.; Cattonaro, Federica; Vendramin, Elisa; Main, Dorrie; Aramini, Valeria; Blas, Andrea L.; Mockler, Todd C.; Bryant, Douglas W.; Wilhelm, Larry; Troggio, Michela; Sosinski, Bryon; Aranzana, Maria José; Arús, Pere; Iezzoni, Amy; Morgante, Michele; Peace, Cameron

    2012-01-01

    Although a large number of single nucleotide polymorphism (SNP) markers covering the entire genome are needed to enable molecular breeding efforts such as genome wide association studies, fine mapping, genomic selection and marker-assisted selection in peach [Prunus persica (L.) Batsch] and related Prunus species, only a limited number of genetic markers, including simple sequence repeats (SSRs), have been available to date. To address this need, an international consortium (The International Peach SNP Consortium; IPSC) has pursued a coordinated effort to perform genome-scale SNP discovery in peach using next generation sequencing platforms to develop and characterize a high-throughput Illumina Infinium® SNP genotyping array platform. We performed whole genome re-sequencing of 56 peach breeding accessions using the Illumina and Roche/454 sequencing technologies. Polymorphism detection algorithms identified a total of 1,022,354 SNPs. Validation with the Illumina GoldenGate® assay was performed on a subset of the predicted SNPs, verifying ∼75% of genic (exonic and intronic) SNPs, whereas only about a third of intergenic SNPs were verified. Conservative filtering was applied to arrive at a set of 8,144 SNPs that were included on the IPSC peach SNP array v1, distributed over all eight peach chromosomes with an average spacing of 26.7 kb between SNPs. Use of this platform to screen a total of 709 accessions of peach in two separate evaluation panels identified a total of 6,869 (84.3%) polymorphic SNPs. The almost 7,000 SNPs verified as polymorphic through extensive empirical evaluation represent an excellent source of markers for future studies in genetic relatedness, genetic mapping, and dissecting the genetic architecture of complex agricultural traits. The IPSC peach SNP array v1 is commercially available and we expect that it will be used worldwide for genetic studies in peach and related stone fruit and nut species. PMID:22536421

  6. Significant association of full-thickness rotator cuff tears and estrogen-related receptor-β (ESRRB).

    PubMed

    Teerlink, Craig C; Cannon-Albright, Lisa A; Tashjian, Robert Z

    2015-02-01

    The precise etiology of rotator cuff disease is unknown, but prior evidence suggests a role for genetic factors. Variants of estrogen-related receptor-β (ESRRB) have been previously associated with rotator cuff disease. The purpose of the present study was to confirm the association between multiple candidate genes, including ESRRB, and rotator cuff disease in an independent set of patients with rotator cuff tear. The Illumina 5M (Illumina Inc, San Diego, CA, USA) single nucleotide polymorphism (SNP) platform was used to genotype 175 patients with rotator cuff tear. Genotypes were used to select a set of 2595 genetically matched Caucasian controls available from the Illumina iControls database. Tests of association were performed with Genome-wide Efficient Mixed Model Association (GEMMA) software at 69 SNPs that fell within 20 kb of 6 candidate genes (DEFB1, DENND2C, ESRRB, FGF3, FGF10, and FGFR1). Tests of association revealed 1 significantly associated SNP occurring in ESRRB (rs17583842; P = 4.4E-4). Another SNP within ESRRB (rs7157192) had a nominal P value of 7.8E-3. FastPHASE software estimated 2 frequent haplotypes among 54 individuals who carried both risk alleles at these 2 SNPs. The first haplotype had a frequency of 13.9% (n = 15) in risk-allele carriers and only 2.2% in controls (odds ratio, 6.9; 95% confidence interval, 3.9-2.2). The second haplotype had a frequency of 12.9% in risk-allele carriers and only 2.7% in controls (odds ratio, 5.3; 95% confidence interval, 3.0-9.5). The significant association and the presence of high-risk haplotypes identified in the ESRRB gene confirm the association of variants in ESRRB and rotator cuff disease. Copyright © 2015 Journal of Shoulder and Elbow Surgery Board of Trustees. All rights reserved.

  7. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693

  8. Genome-wide association study for rotator cuff tears identifies two significant single-nucleotide polymorphisms.

    PubMed

    Tashjian, Robert Z; Granger, Erin K; Farnham, James M; Cannon-Albright, Lisa A; Teerlink, Craig C

    2016-02-01

    The precise etiology of rotator cuff disease is unknown, but prior evidence suggests a role for genetic factors. Limited data exist identifying specific genes associated with rotator cuff tearing. The purpose of this study was to identify specific genes or genetic variants associated with rotator cuff tearing by a genome-wide association study with an independent set of rotator cuff tear cases. A set of 311 full-thickness rotator cuff tear cases genotyped on the Illumina 5M single-nucleotide polymorphism (SNP) platform were used in a genome-wide association study with 2641 genetically matched white population controls available from the Illumina iControls database. Tests of association were performed with GEMMA software at 257,558 SNPs that compose the intersection of Illumina SNP platforms and that passed general quality control metrics. SNPs were considered significant if P < 1.94 × 10(-7) (Bonferroni correction: 0.05/257,558). Tests of association revealed 2 significantly associated SNPs, one occurring in SAP30BP (rs820218; P = 3.8E-9) on chromosome 17q25 and another occurring in SASH1 (rs12527089; P = 1.9E-7) on chromosome 6q24. This study represents the first attempt to identify genetic factors influencing rotator cuff tearing by a genome-wide association study using a dense/complete set of SNPs. Two SNPs were significantly associated with rotator cuff tearing, residing in SAP30BP on chromosome 17 and SASH1 on chromosome 6. Both genes are associated with the cellular process of apoptosis. Identification of potential genes or genetic variants associated with rotator cuff tearing may help in identifying individuals at risk for the development of rotator cuff tearing. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.

  9. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  10. Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform

    PubMed Central

    Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga

    2015-01-01

    Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802

  11. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  12. 75 FR 10460 - Improving Tracing Procedures for E. coli O157:H7 Positive Raw Beef Product

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-03-08

    ... Tracing Procedures for E. coli O157:H7 Positive Raw Beef Product AGENCY: Food Safety and Inspection... Agency procedures for identifying suppliers of source material used to produce raw beef product that FSIS... that raw beef is positive for E. coli O157:H7, and whether the Agency takes the appropriate steps to...

  13. 78 FR 56646 - Determination of Total Amounts of Fiscal Year 2014 WTO Tariff-Rate Quotas for Raw Cane Sugar and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2013-09-13

    ... Secretary Determination of Total Amounts of Fiscal Year 2014 WTO Tariff- Rate Quotas for Raw Cane Sugar and... the Fiscal Year (FY) 2014 (October 1, 2013-September 30, 2014) in-quota aggregate quantity of raw cane sugar at 1,117,195 metric tons raw value (MTRV). The Secretary also announces the establishment of the...

  14. 29 CFR 779.333 - Goods sold for use as raw materials in other products.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 29 Labor 3 2011-07-01 2011-07-01 false Goods sold for use as raw materials in other products. 779... Service Establishments Sales Not Made for Resale § 779.333 Goods sold for use as raw materials in other products. Goods are sold for resale where they are sold for use as a raw material in the production of a...

  15. 75 FR 50796 - Fiscal Year 2011 Tariff-Rate Quota Allocations for Raw Cane Sugar, Refined and Specialty Sugar...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-17

    ... for Raw Cane Sugar, Refined and Specialty Sugar, and Sugar-Containing Products AGENCY: Office of the... quantity of the tariff-rate quotas for imported raw cane sugar, refined and specialty sugar, and sugar... imports of raw cane sugar and refined sugar. Pursuant to Additional U.S. Note 8 to Chapter 17 of the HTS...

  16. 29 CFR 779.333 - Goods sold for use as raw materials in other products.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 29 Labor 3 2010-07-01 2010-07-01 false Goods sold for use as raw materials in other products. 779... Service Establishments Sales Not Made for Resale § 779.333 Goods sold for use as raw materials in other products. Goods are sold for resale where they are sold for use as a raw material in the production of a...

  17. 75 FR 47258 - Determination of Total Amounts of Fiscal Year 2011 Tariff-Rate Quotas for Raw Cane Sugar and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-08-05

    ... Determination of Total Amounts of Fiscal Year 2011 Tariff-Rate Quotas for Raw Cane Sugar and Certain Sugars...) 2011 in-quota aggregate quantity of the raw, as well as, refined and specialty sugar Tariff-Rate Quotas (TRQ) as required under the U.S. World Trade Organization (WTO) commitments. The FY 2011 raw cane sugar...

  18. 76 FR 42160 - Allocation of Additional Fiscal Year (FY) 2011 In-Quota Volume for Raw Cane Sugar

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-07-18

    ...-Quota Volume for Raw Cane Sugar AGENCY: Office of the United States Trade Representative. ACTION: Notice...) for imported raw cane sugar. USTR is also reallocating a portion of the unused original FY 2011 TRQ... imports of raw cane and refined sugar. Section 404(d)(3) of the Uruguay Round Agreements Act (19 U.S.C...

  19. 77 FR 55451 - Determination of Total Amounts of Fiscal Year 2013 Tariff-Rate Quotas for Raw Cane Sugar and...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-09-10

    ... Secretary Determination of Total Amounts of Fiscal Year 2013 Tariff-Rate Quotas for Raw Cane Sugar and...) 2013 (October 1, 2012-September 30, 2013) in-quota aggregate quantity of the raw, as well as, refined and specialty sugar Tariff-Rate Quotas (TRQ). The FY 2013 raw cane sugar TRQ is established at 1,117...

  20. 40 CFR Table N-1 to Subpart N of... - CO2 Emission Factors for Carbonate-Based Raw Materials

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ...-Based Raw Materials N Table N-1 to Subpart N of Part 98 Protection of Environment ENVIRONMENTAL... Raw Materials Carbonate-basedraw material—mineral CO2 emission factor a Limestone—CaCO3 0.440 Dolomite... in units of metric tons of CO2 emitted per metric ton of carbonate-based raw material charged to the...

  1. 76 FR 69146 - Common or Usual Name for Raw Meat and Poultry Products Containing Added Solutions-Reopening of...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-11-08

    .... FSIS-2010-0012] RIN 0583-AD41 Common or Usual Name for Raw Meat and Poultry Products Containing Added... for 60 days the comment period for the proposed rule, ``Common or Usual Name for Raw Meat and Poultry..., FSIS published the proposed rule ``Common or Usual Name for Raw Meat and Poultry Products Containing...

  2. Effect of plant essential oils against foodborne pathogens Escherichia coli O157:H7 and Salmonella enterica in raw cookie dough

    USDA-ARS?s Scientific Manuscript database

    Cookie dough can be contaminated by raw ingredients, mishandling, package contamination, etc. Considering the recent outbreak of E. coli O157:H7 with commercial raw cookie dough, the ability of E. coli O157:H7 to survive in the raw cookie dough production and processing environments, it raised conce...

  3. Influence of raw milk quality on processed dairy products: How do raw milk quality test results relate to product quality and yield?

    PubMed

    Murphy, Steven C; Martin, Nicole H; Barbano, David M; Wiedmann, Martin

    2016-12-01

    This article provides an overview of the influence of raw milk quality on the quality of processed dairy products and offers a perspective on the merits of investing in quality. Dairy farmers are frequently offered monetary premium incentives to provide high-quality milk to processors. These incentives are most often based on raw milk somatic cell and bacteria count levels well below the regulatory public health-based limits. Justification for these incentive payments can be based on improved processed product quality and manufacturing efficiencies that provide the processor with a return on their investment for high-quality raw milk. In some cases, this return on investment is difficult to measure. Raw milks with high levels of somatic cells and bacteria are associated with increased enzyme activity that can result in product defects. Use of raw milk with somatic cell counts >100,000cells/mL has been shown to reduce cheese yields, and higher levels, generally >400,000 cells/mL, have been associated with textural and flavor defects in cheese and other products. Although most research indicates that fairly high total bacteria counts (>1,000,000 cfu/mL) in raw milk are needed to cause defects in most processed dairy products, receiving high-quality milk from the farm allows some flexibility for handling raw milk, which can increase efficiencies and reduce the risk of raw milk reaching bacterial levels of concern. Monitoring total bacterial numbers in regard to raw milk quality is imperative, but determining levels of specific types of bacteria present has gained increasing importance. For example, spores of certain spore-forming bacteria present in raw milk at very low levels (e.g., <1/mL) can survive pasteurization and grow in milk and cheese products to levels that result in defects. With the exception of meeting product specifications often required for milk powders, testing for specific spore-forming groups is currently not used in quality incentive programs in the United States but is used in other countries (e.g., the Netherlands). Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  4. Food Safety for Your Family

    MedlinePlus

    ... Preparing and Cooking Raw Meat, Poultry, Fish, and Egg Products Wash your hands with warm water and ... and after handling raw meat, poultry, fish, or egg products. Keep raw meats and their juices away ...

  5. Leveraging “Raw Materials” as Building Blocks and Bioactive Signals in Regenerative Medicine

    PubMed Central

    Renth, Amanda N.

    2012-01-01

    Components found within the extracellular matrix (ECM) have emerged as an essential subset of biomaterials for tissue engineering scaffolds. Collagen, glycosaminoglycans, bioceramics, and ECM-based matrices are the main categories of “raw materials” used in a wide variety of tissue engineering strategies. The advantages of raw materials include their inherent ability to create a microenvironment that contains physical, chemical, and mechanical cues similar to native tissue, which prove unmatched by synthetic biomaterials alone. Moreover, these raw materials provide a head start in the regeneration of tissues by providing building blocks to be bioresorbed and incorporated into the tissue as opposed to being biodegraded into waste products and removed. This article reviews the strategies and applications of employing raw materials as components of tissue engineering constructs. Utilizing raw materials holds the potential to provide both a scaffold and a signal, perhaps even without the addition of exogenous growth factors or cytokines. Raw materials contain endogenous proteins that may also help to improve the translational success of tissue engineering solutions to progress from laboratory bench to clinical therapies. Traditionally, the tissue engineering triad has included cells, signals, and materials. Whether raw materials represent their own new paradigm or are categorized as a bridge between signals and materials, it is clear that they have emerged as a leading strategy in regenerative medicine. The common use of raw materials in commercial products as well as their growing presence in the research community speak to their potential. However, there has heretofore not been a coordinated or organized effort to classify these approaches, and as such we recommend that the use of raw materials be introduced into the collective consciousness of our field as a recognized classification of regenerative medicine strategies. PMID:22462759

  6. Evaluation of two raw diets vs a commercial cooked diet on feline growth.

    PubMed

    Hamper, Beth A; Bartges, Joseph W; Kirk, Claudia A

    2017-04-01

    Objectives The objective of this study was to determine if two raw feline diets were nutritionally adequate for kittens. Methods Twenty-four 9-week-old kittens underwent an Association of American Feed Control Officials' (AAFCO) 10 week growth feeding trial with two raw diet groups and one cooked diet group (eight kittens in each). Morphometric measurements (weight, height and length), complete blood counts, serum chemistry, whole blood taurine and fecal cultures were evaluated. Results Overall, the growth parameters were similar for all diet groups, indicating the two raw diets used in this study supported feline growth, within the limitations of an AAFCO growth feeding trial. Kittens fed the raw diets had lower albumin ( P = 0.010) and higher globulin ( P = 0.04) levels than the kittens fed the cooked diet. These lower albumin levels were not clinically significant, as all groups were still within normal age reference intervals. A red cell microcytosis ( P = 0.001) was noted in the combination raw diet group. Increases in fecal Clostridium perfringens were noted in all groups, along with positive fecal Salmonella serovar Heidelberg and Clostridium difficile toxin in the combination raw diet group. Conclusions and relevance The majority of the parameters for feline growth were similar among all groups, indicating the two raw diets studied passed an AAFCO growth trial. In theory, it is possible to pass an AAFCO growth trial but still have nutrient deficiencies in the long term due to liver and fat storage depots. Some of the raw feeders had elevated globulin and microcytosis, likely associated with known enteropathogenic exposure. Disease risks to both pets and owners are obvious. Additional research in this area is needed to investigate the impact of raw diets on the health of domestic cats.

  7. Evaluation of raw and heated velvet beans (Mucuna pruriens) as feed ingredients for broilers.

    PubMed

    Del Carmen, J; Gernat, A G; Myhrman, R; Carew, L B

    1999-06-01

    Velvet bean plants (Mucuna pruriens) are used widely outside the U.S. as a cover crop. The beans (VB), high in protein, contain toxic substances that possibly can be destroyed by heating. Few data are available on the use of VB in poultry nutrition. We examined the effects of raw and dry-roasted VB on broiler performance in two experiments. In Experiment 1, 10, 20, and 30% raw VB were substituted into nutritionally balanced rations fed 0 to 42 d of age. Raw VB caused progressive reductions in growth; at 42 d of age, broilers fed 30% VB weighed 39% of controls. Feed intake declined significantly only with 30% VB. Feed efficiency decreased significantly with 20 and 30% VB. In Experiment 2, 10% raw VB and 10, 20, and 30% heated VB were fed 0 to 42 d. With 10% raw VB, broilers grew significantly slower but feed intake was unchanged. Inclusion of 10% heated VB allowed better growth than raw VB, and by 42 d of age, growth was not significantly different from that of controls. At 20 and 30%, heated VB promoted much better growth and efficiency than raw VB in Experiment 1, but values were significantly lower than those of controls. With 30% heated VB, broilers grew to 66% of control, a marked improvement over raw VB. Carcass yield was unaffected. Trypsin inhibitor activity but not L-3,4-dihydroxyphenylalanine (L-DOPA) in VB was destroyed by heating. We conclude that dry heating of VB partially destroys its growth-inhibiting factor(s), allowing successful use of 10% heated VB in broiler rations. Higher levels of heated VB reduced broiler performance, although results were much better than those of raw VB.

  8. E. coli

    MedlinePlus

    ... concerns about E. coli . E. coli and Raw Cookie Dough FDA Continues to Warn Against Eating Raw Dough ... Reminds consumers about the risks of eating raw cookie dough. Multistate Outbreak of E. coli O157:H7 Infections ...

  9. Scoping Future Policy Dynamics in Raw Materials Through Scenarios Testing

    NASA Astrophysics Data System (ADS)

    Correia, Vitor; Keane, Christopher; Sturm, Flavius; Schimpf, Sven; Bodo, Balazs

    2017-04-01

    The International Raw Materials Observatory (INTRAW) project is working towards a sustainable future for the European Union in access to raw materials, from an availability, economical, and environmental framework. One of the major exercises for the INTRAW project is the evaluation of potential future scenarios for 2050 to frame economic, research, and environmental policy towards a sustainable raw materials supply. The INTRAW consortium developed three possible future scenarios that encompass defined regimes of political, economic, and technological norms. The first scenario, "Unlimited Trade," reflects a world in which free trade continues to dominate the global political and economic environment, with expectations of a growing demand for raw materials from widely distributed global growth. The "National Walls" scenario reflects a world where nationalism and economic protectionism begins to dominate, leading to stagnating economic growth and uneven dynamics in raw materials supply and demand. The final scenario, "Sustainability Alliance," examines the dynamics of a global political and economic climate that is focused on environmental and economic sustainability, leading towards increasingly towards a circular raw materials economy. These scenarios were reviewed, tested, and provided simulations of impacts with members of the Consortium and a panel of global experts on international raw materials issues which led to expected end conditions for 2050. Given the current uncertainty in global politics, these scenarios are informative to identifying likely opportunities and crises. The details of these simulations and expected responses to the research demand, technology investments, and economic components of raw materials system will be discussed.

  10. 40 CFR 428.75 - Standards of performance for new sources.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... values for 30 consecutive days shall not exceed— Metric units (kg/kkg of raw material) Oil and grease 0.26 0.093 TSS 0.50 0.25 pH (1) (1) English units (lb/1,000 lb of raw material) Oil and grease 0.26 0...— Metric units (kg/kkg of raw material) Lead 0.0017 0.0007 English units (lb/1,000 lb of raw material) Lead...

  11. 40 CFR 428.75 - Standards of performance for new sources.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... values for 30 consecutive days shall not exceed— Metric units (kg/kkg of raw material) Oil and grease 0.26 0.093 TSS 0.50 0.25 pH (1) (1) English units (lb/1,000 lb of raw material) Oil and grease 0.26 0...— Metric units (kg/kkg of raw material) Lead 0.0017 0.0007 English units (lb/1,000 lb of raw material) Lead...

  12. Occurrence of aflatoxin M(1) in raw and market milk commercialized in Greece.

    PubMed

    Roussi, V; Govaris, A; Varagouli, A; Botsoglou, N A

    2002-09-01

    From December 1999 to May 2000, 114 samples of pasteurized, ultrahigh temperature-treated (UHT) and concentrated milk were collected in supermarkets, whereas 52 raw milk samples from cow, sheep and goat were obtained from different milk producers all over Greece. Sample collection was repeated from December 2000 to May 2001 and concerned 54 samples of pasteurized milk, 23 samples of bulk-tank raw milk and 55 raw milk samples from cow, sheep and goat. The total number of samples analysed for aflatoxin M(1) (AFM(1)) contamination by immunoaffinity column extraction and liquid chromatography was 297. In the first sampling, the incidence rates of AFM(1) contamination in pasteurized, UHT, concentrated and cow, sheep and goat raw milk were 85.4, 82.3, 93.3, 73.3, 66.7 and 40%, respectively, with only one cow raw milk and two concentrated milk samples exceeding the EU limit of 50 ng l(-1). In the second sampling, the incidence rates of AFM(1) contamination in pasteurized, bulk-tank and cow, sheep and goat raw milk were 79.6, 78.3, 64.3, 73.3 and 66.7%, respectively, with only one cow and one sheep raw milk samples exceeding the limit of 50 ng l(-1). The results suggest that the current regulatory status in Greece is effective.

  13. Identification of species adulteration in traded medicinal plant raw drugs using DNA barcoding.

    PubMed

    Nithaniyal, Stalin; Vassou, Sophie Lorraine; Poovitha, Sundar; Raju, Balaji; Parani, Madasamy

    2017-02-01

    Plants are the major source of therapeutic ingredients in complementary and alternative medicine (CAM). However, species adulteration in traded medicinal plant raw drugs threatens the reliability and safety of CAM. Since morphological features of medicinal plants are often not intact in the raw drugs, DNA barcoding was employed for species identification. Adulteration in 112 traded raw drugs was tested after creating a reference DNA barcode library consisting of 1452 rbcL and matK barcodes from 521 medicinal plant species. Species resolution of this library was 74.4%, 90.2%, and 93.0% for rbcL, matK, and rbcL + matK, respectively. DNA barcoding revealed adulteration in about 20% of the raw drugs, and at least 6% of them were derived from plants with completely different medicinal or toxic properties. Raw drugs in the form of dried roots, powders, and whole plants were found to be more prone to adulteration than rhizomes, fruits, and seeds. Morphological resemblance, co-occurrence, mislabeling, confusing vernacular names, and unauthorized or fraudulent substitutions might have contributed to species adulteration in the raw drugs. Therefore, this library can be routinely used to authenticate traded raw drugs for the benefit of all stakeholders: traders, consumers, and regulatory agencies.

  14. Study on optimum length of raw material in stainless steel high-lock nuts forging

    NASA Astrophysics Data System (ADS)

    Cheng, Meiwen; Liu, Fenglei; Zhao, Qingyun; Wang, Lidong

    2018-04-01

    Taking 302 stainless steel (1Cr18Ni9) high-lock nuts for research objects, adjusting the length of raw material, then using DEFORM software to simulate the isothermal forging process of each station and conducting the corresponding field tests to study the effects of raw material size on the stainless steel high-lock nuts forming performance. The tests show that the samples of each raw material length is basically the same as the results of the DEFORM software. When the length of the raw material is 10mm, the appearance size of the parts can meet the design requirements.

  15. Within-host evolution of Staphylococcus aureus during asymptomatic carriage.

    PubMed

    Golubchik, Tanya; Batty, Elizabeth M; Miller, Ruth R; Farr, Helen; Young, Bernadette C; Larner-Svensson, Hanna; Fung, Rowena; Godwin, Heather; Knox, Kyle; Votintseva, Antonina; Everitt, Richard G; Street, Teresa; Cule, Madeleine; Ip, Camilla L C; Didelot, Xavier; Peto, Timothy E A; Harding, Rosalind M; Wilson, Daniel J; Crook, Derrick W; Bowden, Rory

    2013-01-01

    Staphylococcus aureus is a major cause of healthcare associated mortality, but like many important bacterial pathogens, it is a common constituent of the normal human body flora. Around a third of healthy adults are carriers. Recent evidence suggests that evolution of S. aureus during nasal carriage may be associated with progression to invasive disease. However, a more detailed understanding of within-host evolution under natural conditions is required to appreciate the evolutionary and mechanistic reasons why commensal bacteria such as S. aureus cause disease. Therefore we examined in detail the evolutionary dynamics of normal, asymptomatic carriage. Sequencing a total of 131 genomes across 13 singly colonized hosts using the Illumina platform, we investigated diversity, selection, population dynamics and transmission during the short-term evolution of S. aureus. We characterized the processes by which the raw material for evolution is generated: micro-mutation (point mutation and small insertions/deletions), macro-mutation (large insertions/deletions) and the loss or acquisition of mobile elements (plasmids and bacteriophages). Through an analysis of synonymous, non-synonymous and intergenic mutations we discovered a fitness landscape dominated by purifying selection, with rare examples of adaptive change in genes encoding surface-anchored proteins and an enterotoxin. We found evidence for dramatic, hundred-fold fluctuations in the size of the within-host population over time, which we related to the cycle of colonization and clearance. Using a newly-developed population genetics approach to detect recent transmission among hosts, we revealed evidence for recent transmission between some of our subjects, including a husband and wife both carrying populations of methicillin-resistant S. aureus (MRSA). This investigation begins to paint a picture of the within-host evolution of an important bacterial pathogen during its prevailing natural state, asymptomatic carriage. These results also have wider significance as a benchmark for future systematic studies of evolution during invasive S. aureus disease.

  16. De novo transcriptome sequencing reveals a considerable bias in the incidence of simple sequence repeats towards the downstream of 'Pre-miRNAs' of black pepper.

    PubMed

    Joy, Nisha; Asha, Srinivasan; Mallika, Vijayan; Soniya, Eppurathu Vasudevan

    2013-01-01

    Next generation sequencing has an advantageon transformational development of species with limited available sequence data as it helps to decode the genome and transcriptome. We carried out the de novo sequencing using illuminaHiSeq™ 2000 to generate the first leaf transcriptome of black pepper (Piper nigrum L.), an important spice variety native to South India and also grown in other tropical regions. Despite the economic and biochemical importance of pepper, a scientifically rigorous study at the molecular level is far from complete due to lack of sufficient sequence information and cytological complexity of its genome. The 55 million raw reads obtained, when assembled using Trinity program generated 2,23,386 contigs and 1,28,157 unigenes. Reports suggest that the repeat-rich genomic regions give rise to small non-coding functional RNAs. MicroRNAs (miRNAs) are the most abundant type of non-coding regulatory RNAs. In spite of the widespread research on miRNAs, little is known about the hair-pin precursors of miRNAs bearing Simple Sequence Repeats (SSRs). We used the array of transcripts generated, for the in silico prediction and detection of '43 pre-miRNA candidates bearing different types of SSR motifs'. The analysis identified 3913 different types of SSR motifs with an average of one SSR per 3.04 MB of thetranscriptome. About 0.033% of the transcriptome constituted 'pre-miRNA candidates bearing SSRs'. The abundance, type and distribution of SSR motifs studied across the hair-pin miRNA precursors, showed a significant bias in the position of SSRs towards the downstream of predicted 'pre-miRNA candidates'. The catalogue of transcripts identified, together with the demonstration of reliable existence of SSRs in the miRNA precursors, permits future opportunities for understanding the genetic mechanism of black pepper and likely functions of 'tandem repeats' in miRNAs.

  17. Metagenomic investigation of the microbial diversity in a chrysotile asbestos mine pit pond, Lowell, Vermont, USA.

    PubMed

    Driscoll, Heather E; Vincent, James J; English, Erika L; Dolci, Elizabeth D

    2016-12-01

    Here we report on a metagenomics investigation of the microbial diversity in a serpentine-hosted aquatic habitat created by chrysotile asbestos mining activity at the Vermont Asbestos Group (VAG) Mine in northern Vermont, USA. The now-abandoned VAG Mine on Belvidere Mountain in the towns of Eden and Lowell includes three open-pit quarries, a flooded pit, mill buildings, roads, and > 26 million metric tons of eroding mine waste that contribute alkaline mine drainage to the surrounding watershed. Metagenomes and water chemistry originated from aquatic samples taken at three depths (0.5 m, 3.5 m, and 25 m) along the water column at three distinct, offshore sites within the mine's flooded pit (near 44°46'00.7673″, - 72°31'36.2699″; UTM NAD 83 Zone 18 T 0695720 E, 4960030 N). Whole metagenome shotgun Illumina paired-end sequences were quality trimmed and analyzed based on a translated nucleotide search of NCBI-NR protein database and lowest common ancestor taxonomic assignments. Our results show strata within the pit pond water column can be distinguished by taxonomic composition and distribution, pH, temperature, conductivity, light intensity, and concentrations of dissolved oxygen. At the phylum level, metagenomes from 0.5 m and 3.5 m contained a similar distribution of taxa and were dominated by Actinobacteria (46% and 53% of reads, respectively), Proteobacteria (45% and 38%, respectively), and Bacteroidetes (7% in both). The metagenomes from 25 m showed a greater diversity of phyla and a different distribution of reads than the two upper strata: Proteobacteria (60%), Actinobacteria (18%), Planctomycetes, (10%), Bacteroidetes (5%) and Cyanobacteria (2.5%), Armatimonadetes (< 1%), Verrucomicrobia (< 1%), Firmicutes (< 1%), and Nitrospirae (< 1%). Raw metagenome sequence data from each sample reside in NCBI's Short Read Archive (SRA ID: SRP056095) and are accessible through NCBI BioProject PRJNA277916.

  18. Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba).

    PubMed

    Rahi, Md Lifat; Amin, Shorash; Mather, Peter B; Hurwood, David A

    2017-01-01

    The endemic Australian freshwater prawn, Macrobrachium koombooloomba , provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200-18,050 bp) with an N50 value of 1597. In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba . Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly.

  19. Candidate genes that have facilitated freshwater adaptation by palaemonid prawns in the genus Macrobrachium: identification and expression validation in a model species (M. koombooloomba)

    PubMed Central

    Amin, Shorash; Mather, Peter B.; Hurwood, David A.

    2017-01-01

    Background The endemic Australian freshwater prawn, Macrobrachium koombooloomba, provides a model for exploring genes involved with freshwater adaptation because it is one of the relatively few Macrobrachium species that can complete its entire life cycle in freshwater. Methods The present study was conducted to identify potential candidate genes that are likely to contribute to effective freshwater adaptation by M. koombooloomba using a transcriptomics approach. De novo assembly of 75 bp paired end 227,564,643 high quality Illumina raw reads from 6 different cDNA libraries revealed 125,917 contigs of variable lengths (200–18,050 bp) with an N50 value of 1597. Results In total, 31,272 (24.83%) of the assembled contigs received significant blast hits, of which 27,686 and 22,560 contigs were mapped and functionally annotated, respectively. CEGMA (Core Eukaryotic Genes Mapping Approach) based transcriptome quality assessment revealed 96.37% completeness. We identified 43 different potential genes that are likely to be involved with freshwater adaptation in M. koombooloomba. Identified candidate genes included: 25 genes for osmoregulation, five for cell volume regulation, seven for stress tolerance, three for body fluid (haemolymph) maintenance, eight for epithelial permeability and water channel regulation, nine for egg size control and three for larval development. RSEM (RNA-Seq Expectation Maximization) based abundance estimation revealed that 6,253, 5,753 and 3,795 transcripts were expressed (at TPM value ≥10) in post larvae, juveniles and adults, respectively. Differential gene expression (DGE) analysis showed that 15 genes were expressed differentially in different individuals but these genes apparently were not involved with freshwater adaptation but rather were involved in growth, development and reproductive maturation. Discussion The genomic resources developed here will be useful for better understanding the molecular basis of freshwater adaptation in Macrobrachium prawns and other crustaceans more broadly. PMID:28194319

  20. Bacterial lineages putatively associated with the dissemination of antibiotic resistance genes in a full-scale urban wastewater treatment plant.

    PubMed

    Narciso-da-Rocha, Carlos; Rocha, Jaqueline; Vaz-Moreira, Ivone; Lira, Felipe; Tamames, Javier; Henriques, Isabel; Martinez, José Luis; Manaia, Célia M

    2018-06-05

    Urban wastewater treatment plants (UWTPs) are reservoirs of antibiotic resistance. Wastewater treatment changes the bacterial community and inevitably impacts the fate of antibiotic resistant bacteria and antibiotic resistance genes (ARGs). Some bacterial groups are major carriers of ARGs and hence, their elimination during wastewater treatment may contribute to increasing resistance removal efficiency. This study, conducted at a full-scale UWTP, evaluated variations in the bacterial community and ARGs loads and explored possible associations among them. With that aim, the bacterial community composition (16S rRNA gene Illumina sequencing) and ARGs abundance (real-time PCR) were characterized in samples of raw wastewater (RWW), secondary effluent (sTWW), after UV disinfection (tTWW), and after a period of 3 days storage to monitoring possible bacterial regrowth (tTWW-RE). Culturable enterobacteria were also enumerated. Secondary treatment was associated with the most dramatic bacterial community variations and coincided with reductions of ~2 log-units in the ARGs abundance. In contrast, no significant changes in the bacterial community composition and ARGs abundance were observed after UV disinfection of sTWW. Nevertheless, after UV treatment, viability losses were indicated ~2 log-units reductions of culturable enterobacteria. The analysed ARGs (qnrS, bla CTX-M , bla OXA-A , bla TEM , bla SHV , sul1, sul2, and intI1) were strongly correlated with taxa more abundant in RWW than in the other types of water, and which associated with humans and animals, such as members of the families Campylobacteraceae, Comamonadaceae, Aeromonadaceae, Moraxellaceae, and Bacteroidaceae. Further knowledge of the dynamics of the bacterial community during wastewater treatment and its relationship with ARGs variations may contribute with information useful for wastewater treatment optimization, aiming at a more effective resistance control. Copyright © 2018 Elsevier Ltd. All rights reserved.

Top