Science.gov

Sample records for enriched genomic libraries

  1. Gene enrichment in plant genomic shotgun libraries.

    PubMed

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  2. Microsatellite markers isolated from the wild medicinal plant Centella asiatica (Apiaceae) from an enriched genomic library.

    PubMed

    Rakotondralambo, Soaharin'ny Ony Raoseta; Lussert, Alexandra; Rivallan, Ronan; Danthu, Pascal; Noyer, Jean-Louis; Baurens, Franc-Christophe

    2012-04-01

    Microsatellite markers for Centella asiatica, an important medicinal herb, were developed and characterized to promote genetic and molecular studies. A GA/GT-enriched genomic library was constructed from an accession from Madagascar. Roughly 75% of the 768 clones of the enriched library contained microsatellites. Eighty sequences containing microsatellites were obtained from 96 positive clones. Specific primers were designed for 20 loci, and 17 of them displayed polymorphism when screened across 17 C. asiatica accessions, with an average of 4.3 alleles per locus. The observed and expected heterozygosity values averaged 0.114 and 0.379, respectively. This is the first report constructing an enriched genomic library and identifying microsatellite markers from C. asiatica. These 17 polymorphic microsatellite markers are a useful resource for this plant, applicable for diversity studies, pedigree analyses, and genetic mapping.

  3. Developing new microsatellite markers in walnut (Juglans regia L.) from Juglans nigra genomic GA enriched library

    Treesearch

    Hayat Topcu; Nergiz Coban; Keith Woeste; Mehmet Sutyemez; Salih. Kafkas

    2015-01-01

    We attempted to develop new polymorphic SSR primer pairs in walnut using sequences derived from Juglans nigra L. genomic enriched library with GA repeat. The designed 94 SSR primer pairs were subjected to gradient PCR in 12 walnut cultivars to determine their optimum annealing temperatures and to determine whether they produce bands. Then, the...

  4. Constructing gene-enriched plant genomic libraries using methylation filtration technology.

    PubMed

    Rabinowicz, Pablo D

    2003-01-01

    Full genome sequencing in higher plants is a very difficult task, because their genomes are often very large and repetitive. For this reason, gene targeted partial genomic sequencing becomes a realistic option. The method reported here is a simple approach to generate gene-enriched plant genomic libraries called methylation filtration. This technique takes advantage of the fact that repetitive DNA is heavily methylated and genes are hypomethylated. Then, by simply using an Escherichia coli host strain harboring a wild-type modified cytosine restriction (McrBC) system, which cuts DNA containing methylcytosine, repetitive DNA is eliminated from these genomic libraries, while low copy DNA (i.e., genes) is recovered. To prevent cloning significant proportions of organelle DNA, a crude nuclear preparation must be performed prior to purifying genomic DNA. Adaptor-mediated cloning and DNA size fractionation are necessary for optimal results.

  5. A modified enrichment method to construct microsatellite library from plateau pika genome (Ochotona curzoniae).

    PubMed

    Geng, Jianing; Li, Kexin; Zhang, Yanming; Hu, Songnian

    2010-03-01

    A microsatellite-enriched library of plateau pika (Ochotona curzoniae) was constructed according to the strong affinity between biotin and streptavidin. Firstly, genomic DNA was fragmented by ultrasonication, which is a major improvement over traditional methods. Linker-ligated DNA fragments were hybridized with biotinylated microsatellite probes, and then were subjected to streptavidin-coated magnetic beads. PCR amplification was performed to obtain double-stranded DNA fragments containing microsatellites. Ligation and transformation were carried out by using the pGEM-T Vector System I and Escherichia coli DH10B competent cells. Sequencing results showed that 80.2% of clones contained microsatellite repeat motif. Several modifications make this protocol time-efficient and technically easier than the traditional ones; particularly, composition and relative abundance of microsatellite repeats in plateau pika genome were truly represented through the optimized PCR conditions. This method has also been successfully applied to construct microsatellite-enriched genomic libraries of Chinese hamster (Cricetulus griseus) and small abalone [Haliotis diversicolor (Reeve)] with high rates of positive clones, demonstrating its feasibility and stability. 2010 Beijing Genomics Institute. Published by Elsevier Ltd. All rights reserved.

  6. Development of SSR markers derived from SSR-enriched genomic library of eggplant (Solanum melongena L.).

    PubMed

    Nunome, Tsukasa; Negoro, Satomi; Kono, Izumi; Kanamori, Hiroyuki; Miyatake, Koji; Yamaguchi, Hirotaka; Ohyama, Akio; Fukuoka, Hiroyuki

    2009-10-01

    Eggplant (Solanum melongena L.), also known as aubergine or brinjal, is an important vegetable in many countries. Few useful molecular markers have been reported for eggplant. We constructed simple sequence repeat (SSR)-enriched genomic libraries in order to develop SSR markers, and sequenced more than 14,000 clones. From these sequences, we designed 2,265 primer pairs to flank SSR motifs. We identified 1,054 SSR markers from amplification of 1,399 randomly selected primer pairs. The markers have an average polymorphic information content of 0.27 among eight lines of S. melongena. Of the 1,054 SSR markers, 214 segregated in an intraspecific mapping population. We constructed cDNA libraries from several eggplant tissues and obtained 6,144 expressed sequence tag (EST) sequences. From these sequences, we designed 209 primer pairs, 7 of which segregated in the mapping population. On the basis of the segregation data, we constructed a linkage map, and mapped the 236 segregating markers to 14 linkage groups. The linkage map spans a total length of 959.1 cM, with an average marker distance of 4.3 cM. The markers should be a useful resource for qualitative and quantitative trait mapping and for marker-assisted selection in eggplant breeding.

  7. Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries

    PubMed Central

    Carpenter, Meredith L.; Buenrostro, Jason D.; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E.; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M. Thomas P.; Willerslev, Eske; Greenleaf, William J.; Bustamante, Carlos D.

    2013-01-01

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062–147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217–73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. PMID:24568772

  8. Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries.

    PubMed

    Carpenter, Meredith L; Buenrostro, Jason D; Valdiosera, Cristina; Schroeder, Hannes; Allentoft, Morten E; Sikora, Martin; Rasmussen, Morten; Gravel, Simon; Guillén, Sonia; Nekhrizov, Georgi; Leshtakov, Krasimir; Dimitrova, Diana; Theodossiev, Nikola; Pettener, Davide; Luiselli, Donata; Sandoval, Karla; Moreno-Estrada, Andrés; Li, Yingrui; Wang, Jun; Gilbert, M Thomas P; Willerslev, Eske; Greenleaf, William J; Bustamante, Carlos D

    2013-11-07

    Most ancient specimens contain very low levels of endogenous DNA, precluding the shotgun sequencing of many interesting samples because of cost. Ancient DNA (aDNA) libraries often contain <1% endogenous DNA, with the majority of sequencing capacity taken up by environmental DNA. Here we present a capture-based method for enriching the endogenous component of aDNA sequencing libraries. By using biotinylated RNA baits transcribed from genomic DNA libraries, we are able to capture DNA fragments from across the human genome. We demonstrate this method on libraries created from four Iron Age and Bronze Age human teeth from Bulgaria, as well as bone samples from seven Peruvian mummies and a Bronze Age hair sample from Denmark. Prior to capture, shotgun sequencing of these libraries yielded an average of 1.2% of reads mapping to the human genome (including duplicates). After capture, this fraction increased substantially, with up to 59% of reads mapped to human and enrichment ranging from 6- to 159-fold. Furthermore, we maintained coverage of the majority of regions sequenced in the precapture library. Intersection with the 1000 Genomes Project reference panel yielded an average of 50,723 SNPs (range 3,062-147,243) for the postcapture libraries sequenced with 1 million reads, compared with 13,280 SNPs (range 217-73,266) for the precapture libraries, increasing resolution in population genetic analyses. Our whole-genome capture approach makes it less costly to sequence aDNA from specimens containing very low levels of endogenous DNA, enabling the analysis of larger numbers of samples. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Methylation-sensitive linking libraries enhance gene-enriched sequencing of complex genomes and map DNA methylation domains

    PubMed Central

    Nelson, William; Luo, Meizhong; Ma, Jianxin; Estep, Matt; Estill, James; He, Ruifeng; Talag, Jayson; Sisneros, Nicholas; Kudrna, David; Kim, HyeRan; Ammiraju, Jetty SS; Collura, Kristi; Bharti, Arvind K; Messing, Joachim; Wing, Rod A; SanMiguel, Phillip; Bennetzen, Jeffrey L; Soderlund, Carol

    2008-01-01

    Background Many plant genomes are resistant to whole-genome assembly due to an abundance of repetitive sequence, leading to the development of gene-rich sequencing techniques. Two such techniques are hypomethylated partial restriction (HMPR) and methylation spanning linker libraries (MSLL). These libraries differ from other gene-rich datasets in having larger insert sizes, and the MSLL clones are designed to provide reads localized to "epigenetic boundaries" where methylation begins or ends. Results A large-scale study in maize generated 40,299 HMPR sequences and 80,723 MSLL sequences, including MSLL clones exceeding 100 kb. The paired end reads of MSLL and HMPR clones were shown to be effective in linking existing gene-rich sequences into scaffolds. In addition, it was shown that the MSLL clones can be used for anchoring these scaffolds to a BAC-based physical map. The MSLL end reads effectively identified epigenetic boundaries, as indicated by their preferential alignment to regions upstream and downstream from annotated genes. The ability to precisely map long stretches of fully methylated DNA sequence is a unique outcome of MSLL analysis, and was also shown to provide evidence for errors in gene identification. MSLL clones were observed to be significantly more repeat-rich in their interiors than in their end reads, confirming the correlation between methylation and retroelement content. Both MSLL and HMPR reads were found to be substantially gene-enriched, with the SalI MSLL libraries being the most highly enriched (31% align to an EST contig), while the HMPR clones exhibited exceptional depletion of repetitive DNA (to ~11%). These two techniques were compared with other gene-enrichment methods, and shown to be complementary. Conclusion MSLL technology provides an unparalleled approach for mapping the epigenetic status of repetitive blocks and for identifying sequences mis-identified as genes. Although the types and natures of epigenetic boundaries are barely

  10. Work Enrichment for Academic Libraries.

    ERIC Educational Resources Information Center

    Martell, Charles; Untawale, Mercedes

    1983-01-01

    Explores important quality of work life strategy--job redesign--and discusses job enlargement and job enrichment. A case study of academic library personnel demonstrates how introduction of automated systems at University of California, Berkeley led to restructuring and enrichment of jobs. References and list of selected resources are appended.…

  11. Construction of a micro-library enriched with genomic replication origins of carrot somatic embryos by laser microdissection.

    PubMed

    Murata, Natsuko; Masuda, Kiyoshi; Nishiyama, Ryutaro; Nomura, Koji

    2005-06-01

    In this paper, we describe an effective method for constructing a micro-library enriched with chromosomal DNA replication origins. Carrot (Daucus carota L.) somatic embryos at early globular stage were incubated for 15 min in the presence of bromodeoxyuridine (BrdU) to pulse label newly synthesized DNA strands. Nuclei were isolated from the cells, and the DNA was extracted on microscopic slides. DNA fibers spread on slides were visualized using anti-BrdU and FITC-conjugated secondary antibodies. DNA regions where BrdU was incorporated were clearly visualized under a fluorescent microscope as dots on DNA fibers. Regions of DNA fiber containing many fluorescent dots should contain replicons in them. DNA fibers showing many fluorescence dots, or replicons were easily cut and collected using a laser microdissection system equipped with a pulse laser beam. DNA fragments containing many replicons were able to be collected with an efficiency of 20-30 DNA fragments per 1 h. Using degenerate oligonucleotide primed PCR, fragments were randomly amplified from the microdissected fragments, and subcloned to construct a micro-library. This is the first report of the application of a laser microdissection technique for constructing a micro-library enriched with replication origins of chromosomal DNA, although there were some reports on laser microdissection of chromosomes. The simple procedure established here should open up a new application of laser optics.

  12. GLANET: genomic loci annotation and enrichment tool.

    PubMed

    Otlu, Burçak; Firtina, Can; Keles, Sündüz; Tastan, Oznur

    2017-09-15

    Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations. We present GLANET as a comprehensive annotation and enrichment analysis tool which implements a sampling-based enrichment test that accounts for GC content and/or mappability biases, jointly or separately. GLANET annotates and performs enrichment analysis on these loci with a rich library. We introduce and perform novel data-driven computational experiments for assessing the power and Type-I error of its enrichment procedure which show that GLANET has attained high statistical power and well-controlled Type-I error rate. As a key feature, users can easily extend its library with new gene sets and genomic intervals. Other key features include assessment of impact of single nucleotide variants (SNPs) on TF binding sites and regulation based pathway enrichment analysis. GLANET can be run using its GUI or on command line. GLANET's source code is available at https://github.com/burcakotlu/GLANET . Tutorials are provided at https://glanet.readthedocs.org . burcak@ceng.metu.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online.

  13. Enriching screening libraries with bioactive fragment space.

    PubMed

    Zhang, Na; Zhao, Hongtao

    2016-08-01

    By deconvoluting 238,073 bioactive molecules in the ChEMBL library into extended Murcko ring systems, we identified a set of 2245 ring systems present in at least 10 molecules. These ring systems belong to 2221 clusters by ECFP4 fingerprints with a minimum intracluster similarity of 0.8. Their overlap with ring systems in commercial libraries was further quantified. Our findings suggest that success of a small fragment library is driven by the convergence of effective coverage of bioactive ring systems (e.g., 10% coverage by 1000 fragments vs. 40% by 2million HTS compounds), high enrichment of bioactive ring systems, and low molecular complexity enhancing the probability of a match with the protein targets. Reconciling with the previous studies, bioactive ring systems are underrepresented in screening libraries. As such, we propose a library of virtual fragments with key functionalities via fragmentation of bioactive molecules. Its utility is exemplified by a prospective application on protein kinase CK2, resulting in the discovery of a series of novel inhibitors with the most potent compound having an IC50 of 0.5μM and a ligand efficiency of 0.41kcal/mol per heavy atom. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Development of novel simple sequence repeat markers in bitter gourd (Momordica charantia L.) through enriched genomic libraries and their utilization in analysis of genetic diversity and cross-species transferability.

    PubMed

    Saxena, Swati; Singh, Archana; Archak, Sunil; Behera, Tushar K; John, Joseph K; Meshram, Sudhir U; Gaikwad, Ambika B

    2015-01-01

    Microsatellite or simple sequence repeat (SSR) markers are the preferred markers for genetic analyses of crop plants. The availability of a limited number of such markers in bitter gourd (Momordica charantia L.) necessitates the development and characterization of more SSR markers. These were developed from genomic libraries enriched for three dinucleotide, five trinucleotide, and two tetranucleotide core repeat motifs. Employing the strategy of polymerase chain reaction-based screening, the number of clones to be sequenced was reduced by 81 % and 93.7 % of the sequenced clones contained in microsatellite repeats. Unique primer-pairs were designed for 160 microsatellite loci, and amplicons of expected length were obtained for 151 loci (94.4 %). Evaluation of diversity in 54 bitter gourd accessions at 51 loci indicated that 20 % of the loci were polymorphic with the polymorphic information content values ranging from 0.13 to 0.77. Fifteen Indian varieties were clearly distinguished indicative of the usefulness of the developed markers. Markers at 40 loci (78.4 %) were transferable to six species, viz. Momordica cymbalaria, Momordica subangulata subsp. renigera, Momordica balsamina, Momordica dioca, Momordica cochinchinesis, and Momordica sahyadrica. The microsatellite markers reported will be useful in various genetic and molecular genetic studies in bitter gourd, a cucurbit of immense nutritive, medicinal, and economic importance.

  15. Whole-Genome Sequencing: Manual Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-01-03

    This protocol describes a manual approach for the preparation of genomic DNA libraries suitable for Illumina sequencing. Genomic DNA fragments produced by shearing by sonication are ligated to adaptors and amplified by polymerase chain reaction (PCR). The amplified DNA, separated by size and gel-purified, is suitable for use as template in whole-genome sequencing.

  16. Optimized Construction of SSR-enriched Libraries

    USDA-ARS?s Scientific Manuscript database

    We have developed a more efficient method to construct simple sequence repeat (SSR) libraries. We have designed and tested several new adapter. We changed the typical blunt-end ligation of adapters for the more effective sticky-end ligation. We optimized temperature, enzymes and length of each st...

  17. Enriching Peptide Libraries for Binding Affinity and Specificity Through Computationally Directed Library Design.

    PubMed

    Foight, Glenna Wink; Chen, T Scott; Richman, Daniel; Keating, Amy E

    2017-01-01

    Peptide reagents with high affinity or specificity for their target protein interaction partner are of utility for many important applications. Optimization of peptide binding by screening large libraries is a proven and powerful approach. Libraries designed to be enriched in peptide sequences that are predicted to have desired affinity or specificity characteristics are more likely to yield success than random mutagenesis. We present a library optimization method in which the choice of amino acids to encode at each peptide position can be guided by available experimental data or structure-based predictions. We discuss how to use analysis of predicted library performance to inform rounds of library design. Finally, we include protocols for more complex library design procedures that consider the chemical diversity of the amino acids at each peptide position and optimize a library score based on a user-specified input model.

  18. Hybridization Capture Using Short PCR Products Enriches Small Genomes by Capturing Flanking Sequences (CapFlank)

    PubMed Central

    Tsangaras, Kyriakos; Wales, Nathan; Sicheritz-Pontén, Thomas; Rasmussen, Simon; Michaux, Johan; Ishida, Yasuko; Morand, Serge; Kampmann, Marie-Louise; Gilbert, M. Thomas P.; Greenwood, Alex D.

    2014-01-01

    Solution hybridization capture methods utilize biotinylated oligonucleotides as baits to enrich homologous sequences from next generation sequencing (NGS) libraries. Coupled with NGS, the method generates kilo to gigabases of high confidence consensus targeted sequence. However, in many experiments, a non-negligible fraction of the resulting sequence reads are not homologous to the bait. We demonstrate that during capture, the bait-hybridized library molecules add additional flanking library sequences iteratively, such that baits limited to targeting relatively short regions (e.g. few hundred nucleotides) can result in enrichment across entire mitochondrial and bacterial genomes. Our findings suggest that some of the off-target sequences derived in capture experiments are non-randomly enriched, and that CapFlank will facilitate targeted enrichment of large contiguous sequences with minimal prior target sequence information. PMID:25275614

  19. Construction of libraries enriched for sequence repeats and jumping clones, and hybridization selection for region-specific markers

    SciTech Connect

    Kandpal, R.P.; Kandpal, G.; Weissman, S.M. )

    1994-01-04

    The authors describe a simple and rapid method for constructing small-insert genomic libraries highly enriched for dimeric, trimeric, and tetrameric nucleotide repeat motifs. The approach involves use of DNA inserts recovered by PCR amplification of a small-insert sonicated genomic phage library or by a single-primer PCR amplification of Mbo I-digested and adaptor-ligated genomic DNA. The genomic DNA inserts are heat denatured and hybridized to a biotinylated oligonucleotde. The biotinylated hybrids are retained on a Vectrex-avidin matrix and eluted specifically. The eluate is PCR amplified and cloned. More than 90% of the clones in a library enriched for (CA)[sub n] microsatellites with this approach contained clones with inserts containing CA repeats. They have also used this protocol for enrichment of (CAG)[sub n] and (AGAT)[sub n] sequence repeats and for Not I jumping clones. They have used the enriched libraries with an adaptation of the cDNA selection method to enrich for repeat motifs encoded in yeast artificial chromosomes.

  20. Selective enrichment of damaged DNA molecules for ancient genome sequencing

    PubMed Central

    2014-01-01

    Contamination by present-day human and microbial DNA is one of the major hindrances for large-scale genomic studies using ancient biological material. We describe a new molecular method, U selection, which exploits one of the most distinctive features of ancient DNA—the presence of deoxyuracils—for selective enrichment of endogenous DNA against a complex background of contamination during DNA library preparation. By applying the method to Neanderthal DNA extracts that are heavily contaminated with present-day human DNA, we show that the fraction of useful sequence information increases ∼10-fold and that the resulting sequences are more efficiently depleted of human contamination than when using purely computational approaches. Furthermore, we show that U selection can lead to a four- to fivefold increase in the proportion of endogenous DNA sequences relative to those of microbial contaminants in some samples. U selection may thus help to lower the costs for ancient genome sequencing of nonhuman samples also. PMID:25081630

  1. Construction of Trypanosoma brucei Illumina RNA-Seq libraries enriched for transcript ends.

    PubMed

    Kolev, Nikolay G; Ullu, Elisabetta; Tschudi, Christian

    2015-01-01

    High-throughput RNA sequencing (RNA-Seq) has quickly occupied center stage in the repertoire of available tools for transcriptomics. Among many advantages, the single-nucleotide resolution of this powerful approach allows mapping on a genome-wide scale of splice junctions and polyadenylation sites, and thus, the precise definition of mature transcript boundaries. This greatly facilitated the transcriptome annotation of the human pathogen Trypanosoma brucei, a protozoan organism in which all mRNA molecules are matured by spliced leader (SL) trans-splicing from longer polycistronic precursors. The protocols described here for the generation of three types of libraries for Illumina RNA-Seq, 5'-SL enriched, 5'-triphosphate-end enriched, and 3'-poly(A) enriched, enabled the discovery of an unprecedented heterogeneity of pre-mRNA-processing sites, a large number of novel coding and noncoding transcripts from previously unannotated genes, and quantify the cellular abundance of RNA molecules. The method for producing 5'-triphosphate-end-enriched libraries was instrumental for obtaining evidence that transcription initiation by RNA polymerase II in trypanosomes is bidirectional and biosynthesis of mRNA precursors is primed not only at the beginning of unidirectional gene clusters, but also at specific internal sites.

  2. Ancient whole genome enrichment using baits built from modern DNA.

    PubMed

    Enk, Jacob M; Devault, Alison M; Kuch, Melanie; Murgha, Yusuf E; Rouillard, Jean-Marie; Poinar, Hendrik N

    2014-05-01

    We report metrics from complete genome capture of nuclear DNA from extinct mammoths using biotinylated RNAs transcribed from an Asian elephant DNA extract. Enrichment of the nuclear genome ranged from 1.06- to 18.65-fold, to an apparent maximum threshold of ∼80% on-target. This projects an order of magnitude less costly complete genome sequencing from long-dead organisms, even when a reference genome is unavailable for bait design.

  3. Large insert environmental genomic library production.

    PubMed

    Taupp, Marcus; Lee, Sangwon; Hawley, Alyse; Yang, Jinshu; Hallam, Steven J

    2009-09-23

    The vast majority of microbes in nature currently remain inaccessible to traditional cultivation methods. Over the past decade, culture-independent environmental genomic (i.e. metagenomic) approaches have emerged, enabling researchers to bridge this cultivation gap by capturing the genetic content of indigenous microbial communities directly from the environment. To this end, genomic DNA libraries are constructed using standard albeit artful laboratory cloning techniques. Here we describe the construction of a large insert environmental genomic fosmid library with DNA derived from the vertical depth continuum of a seasonally hypoxic fjord. This protocol is directly linked to a series of connected protocols including coastal marine water sampling [1], large volume filtration of microbial biomass [2] and a DNA extraction and purification protocol [3]. At the outset, high quality genomic DNA is end-repaired with the creation of 5 -phosphorylated blunt ends. End-repaired DNA is subjected to pulsed-field gel electrophoresis (PFGE) for size selection and gel extraction is performed to recover DNA fragments between 30 and 60 thousand base pairs (Kb) in length. Size selected DNA is purified away from the PFGE gel matrix and ligated to the phosphatase-treated blunt-end fosmid CopyControl vector pCC1 (EPICENTRE http://www.epibio.com/item.asp?ID=385). Linear concatemers of pCC1 and insert DNA are subsequently headfull packaged into phage particles by lambda terminase, with subsequent infection of phage-resistant E. coli cells. Successfully transduced clones are recovered on LB agar plates under antibiotic selection and archived in 384-well plate format using an automated colony picking robot (Qpix2, GENETIX). The current protocol draws from various sources including the CopyControl Fosmid Library Production Kit from EPICENTRE and the published works of multiple research groups [4-7]. Each step is presented with best practice in mind. Whenever possible we highlight subtleties

  4. Ten polymorphic microsatellite loci identified from a small insert genomic library for Peronospora tabacina

    USDA-ARS?s Scientific Manuscript database

    Ten polymorphic microsatellite loci for the oomycete obligate, biotrophic pathogen Peronospora tabacina of tobacco (Nicotiana tabacum) were identified from a small insert genomic library enriched for GT motifs. Eighty-five percent of the loci were composed of dinucleotide repeats, whereas only 4% ...

  5. A phloem-enriched cDNA library from Ricinus: insights into phloem function.

    PubMed

    Doering-Saad, C; Newbury, H J; Couldridge, C E; Bale, J S; Pritchard, J

    2006-01-01

    The aim of this study was to identify genes that are expressed in the phloem. Increased knowledge of phloem regulation will contribute to our understanding of its many roles, from transport of solutes to information about interactions with pathogens. A cDNA library constructed from phloem-enriched sap exuding from cut Ricinus communis (L.) hypocotyls was sequenced. To assess contamination from other tissues, two libraries were constructed: one using the first 15 min of exudation and the other from sap collected after 120 min of exudation had elapsed. Of 1012 clones sequenced, 158 unique transcripts were identified. The presence of marker molecules such as profilin, the low occurrence of chloroplast-related mRNAs, and the sieve element localization of constituent mRNA using in situ hybridization were consistent with a phloem origin of the sap. Functional analysis of the cDNAs revealed classifications including ribosomal function, interaction with the environment, transport, DNA/RNA binding, and protein turnover. An analysis of the closest Arabidopsis thaliana (L.) homologue for each clone indicated that genes involved in cell localization, protein synthesis, tissue localization, organ localization, organ differentiation, and cell fate were represented at twice the level occurring in the whole Arabidopsis genome. The transcripts found in this phloem-enriched library are discussed in the context of phloem function and the relationship between the companion cell and sieve element.

  6. Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

    PubMed Central

    Weitemier, Kevin; Straub, Shannon C. K.; Cronn, Richard C.; Fishbein, Mark; Schmickl, Roswitha; McDonnell, Angela; Liston, Aaron

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. • Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics. PMID:25225629

  7. Tissue enrichment analysis for C. elegans genomics.

    PubMed

    Angeles-Albores, David; N Lee, Raymond Y; Chan, Juancarlos; Sternberg, Paul W

    2016-09-13

    Over the last ten years, there has been explosive development in methods for measuring gene expression. These methods can identify thousands of genes altered between conditions, but understanding these datasets and forming hypotheses based on them remains challenging. One way to analyze these datasets is to associate ontologies (hierarchical, descriptive vocabularies with controlled relations between terms) with genes and to look for enrichment of specific terms. Although Gene Ontology (GO) is available for Caenorhabditis elegans, it does not include anatomical information. We have developed a tool for identifying enrichment of C. elegans tissues among gene sets and generated a website GUI where users can access this tool. Since a common drawback to ontology enrichment analyses is its verbosity, we developed a very simple filtering algorithm to reduce the ontology size by an order of magnitude. We adjusted these filters and validated our tool using a set of 30 gold standards from Expression Cluster data in WormBase. We show our tool can even discriminate between embryonic and larval tissues and can even identify tissues down to the single-cell level. We used our tool to identify multiple neuronal tissues that are down-regulated due to pathogen infection in C. elegans. Our Tissue Enrichment Analysis (TEA) can be found within WormBase, and can be downloaded using Python's standard pip installer. It tests a slimmed-down C. elegans tissue ontology for enrichment of specific terms and provides users with a text and graphic representation of the results.

  8. Enzymatically Generated CRISPR Libraries for Genome Labeling and Screening

    PubMed Central

    Lane, Andrew B.; Strzelecka, Magdalena; Ettinger, Andreas; Grenfell, Andrew W.; Wittmann, Torsten; Heald, Rebecca

    2015-01-01

    Summary CRISPR-based technologies have emerged as powerful tools to alter genomes and mark chromosomal loci, but an inexpensive method for generating large numbers of RNA guides for whole genome screening and labeling is lacking. Using a method that permits library construction from any source of DNA, we generated guide libraries that label repetitive loci or a single chromosomal locus in Xenopus egg extracts and show that a complex library can target the E. coli genome at high frequency. PMID:26212133

  9. Consequences of Normalizing Transcriptomic and Genomic Libraries of Plant Genomes Using a Duplex-Specific Nuclease and Tetramethylammonium Chloride

    PubMed Central

    Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard

    2013-01-01

    Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088

  10. The Sleipnir library for computational functional genomics.

    PubMed

    Huttenhower, Curtis; Schroeder, Mark; Chikina, Maria D; Troyanskaya, Olga G

    2008-07-01

    Biological data generation has accelerated to the point where hundreds or thousands of whole-genome datasets of various types are available for many model organisms. This wealth of data can lead to valuable biological insights when analyzed in an integrated manner, but the computational challenge of managing such large data collections is substantial. In order to mine these data efficiently, it is necessary to develop methods that use storage, memory and processing resources carefully. The Sleipnir C++ library implements a variety of machine learning and data manipulation algorithms with a focus on heterogeneous data integration and efficiency for very large biological data collections. Sleipnir allows microarray processing, functional ontology mining, clustering, Bayesian learning and inference and support vector machine tasks to be performed for heterogeneous data on scales not previously practical. In addition to the library, which can easily be integrated into new computational systems, prebuilt tools are provided to perform a variety of common tasks. Many tools are multithreaded for parallelization in desktop or high-throughput computing environments, and most tasks can be performed in minutes for hundreds of datasets using a standard personal computer. Source code (C++) and documentation are available at http://function.princeton.edu/sleipnir and compiled binaries are available from the authors on request.

  11. Construction and characterization of a yeast artificial chromosome library containing seven haploid human genome equivalents

    SciTech Connect

    Albertsen, H.M.; Abderrahim, H.; Cann, H.M.; Dausset, J.; Le Paslier, D.; Cohen, D. )

    1990-06-01

    Prior to constructing a library of yeast artificial chromosomes (YACs) containing very large human DNA fragments, the authors performed a series of preliminary experiments aimed at developing a suitable protocol. They found an inverse relationship between YAC insert size and transformation efficiency. Evidence of occasional rearrangement within YAC inserts was found resulting in clonally stable internal deletions or clonally unstable size variations. A protocol was developed for preparative electrophoretic enrichment of high molecular mass human DNA fragments from partial restriction digests and ligation with the YAC vector in agarose. A YAC library has been constructed from large fragments of DNA from an Epstein-Barr virus-transformed human lymphoblastoid cell line. The library presently contains 50,000 clones, 95% of which are greater than 250 kilobase pairs in size. The mean YAC size of the library, calculated from 132 randomly isolated clones, is 430 kilobase pairs. The library thus contains the equivalent of approximately seven haploid human genomes.

  12. cDNA Library Enrichment of Full Length Transcripts for SMRT Long Read Sequencing

    PubMed Central

    Hartwig, Benjamin; Reinhardt, Richard; Schneeberger, Korbinian

    2016-01-01

    The utility of genome assemblies does not only rely on the quality of the assembled genome sequence, but also on the quality of the gene annotations. The Pacific Biosciences Iso-Seq technology is a powerful support for accurate eukaryotic gene model annotation as it allows for direct readout of full-length cDNA sequences without the need for noisy short read-based transcript assembly. We propose the implementation of the TeloPrime Full Length cDNA Amplification kit to the Pacific Biosciences Iso-Seq technology in order to enrich for genuine full-length transcripts in the cDNA libraries. We provide evidence that TeloPrime outperforms the commonly used SMARTer PCR cDNA Synthesis Kit in identifying transcription start and end sites in Arabidopsis thaliana. Furthermore, we show that TeloPrime-based Pacific Biosciences Iso-Seq can be successfully applied to the polyploid genome of bread wheat (Triticum aestivum) not only to efficiently annotate gene models, but also to identify novel transcription sites, gene homeologs, splicing isoforms and previously unidentified gene loci. PMID:27327613

  13. Enriching Critical Thinking and Language Learning with Educational Digital Libraries

    ERIC Educational Resources Information Center

    Lu, Hsin-lin

    2012-01-01

    As the amount of information available in online digital libraries increases exponentially, questions arise concerning the most productive way to use that information to advance learning. Applying the earlier information seeking theories advocated by Kelly (1963), Taylor (1968), and Belkin (1980) to the digital libraries experience, Carol Kuhlthau…

  14. Enriching Critical Thinking and Language Learning with Educational Digital Libraries

    ERIC Educational Resources Information Center

    Lu, Hsin-lin

    2012-01-01

    As the amount of information available in online digital libraries increases exponentially, questions arise concerning the most productive way to use that information to advance learning. Applying the earlier information seeking theories advocated by Kelly (1963), Taylor (1968), and Belkin (1980) to the digital libraries experience, Carol Kuhlthau…

  15. A novel method for the multiplexed target enrichment of MinION next generation sequencing libraries using PCR-generated baits.

    PubMed

    Karamitros, Timokratis; Magiorkinis, Gkikas

    2015-12-15

    The enrichment of targeted regions within complex next generation sequencing libraries commonly uses biotinylated baits to capture the desired sequences. This method results in high read coverage over the targets and their flanking regions. Oxford Nanopore Technologies recently released an USB3.0-interfaced sequencer, the MinION. To date no particular method for enriching MinION libraries has been standardized. Here, using biotinylated PCR-generated baits in a novel approach, we describe a simple and efficient way for multiplexed enrichment of MinION libraries, overcoming technical limitations related with the chemistry of the sequencing-adapters and the length of the DNA fragments. Using Phage Lambda and Escherichia coli as models we selectively enrich for specific targets, significantly increasing the corresponding read-coverage, eliminating unwanted regions. We show that by capturing genomic fragments, which contain the target sequences, we recover reads extending targeted regions and thus can be used for the determination of potentially unknown flanking sequences. By pooling enriched libraries derived from two distinct E. coli strains and analyzing them in parallel, we demonstrate the efficiency of this method in multiplexed format. Crucially we evaluated the optimal bait size for large fragment libraries and we describe for the first time a standardized method for target enrichment in MinION platform.

  16. Deep subsurface life from North Pond: Enrichment, isolation, characterization and genomes of heterotrophic bacteria

    DOE PAGES

    Russell, Joseph A.; Leon-Zayas, Rosa; Wrighton, Kelly; ...

    2016-05-10

    Studies of subsurface microorganisms have yielded few environmentally relevant isolates for laboratory studies. In order to address this lack of cultivated microorganisms, we initiated several enrichments on sediment and underlying basalt samples from North Pond, a sediment basin ringed by basalt outcrops underlying an oligotrophic watercolumn west of the Mid-Atlantic Ridge at 22° N. In contrast to anoxic enrichments, growth was observed in aerobic, heterotrophic enrichments from sediment of IODP Hole U1382B at 4 and 68 m below seafloor (mbsf). These sediment depths, respectively, correspond to the fringes of oxygen penetration from overlying seawater in the top of the sedimentmore » column and upward migration of oxygen from oxic seawater from the basalt aquifer below the sediment. Here we report the enrichment, isolation, initial characterization and genomes of three isolated aerobic heterotrophs from North Pond sediments; an Arthrobacter species from 4 mbsf, and Paracoccus and Pseudomonas species from 68 mbsf. These cultivated bacteria are represented in the amplicon 16S rRNA gene libraries created from whole sediments, albeit at low (up to 2%) relative abundance. We provide genomic evidence from our isolates demonstrating that the Arthrobacter and Pseudomonas isolates have the potential to respire nitrate and oxygen, though dissimilatory nitrate reduction could not be confirmed in laboratory cultures. Furthermore, the cultures from this study represent members of abundant phyla, as determined by amplicon sequencing of environmental DNA extracts, and allow for further studies into geochemical factors impacting life in the deep subsurface.« less

  17. Deep Subsurface Life from North Pond: Enrichment, Isolation, Characterization and Genomes of Heterotrophic Bacteria

    PubMed Central

    Russell, Joseph A.; León-Zayas, Rosa; Wrighton, Kelly; Biddle, Jennifer F.

    2016-01-01

    Studies of subsurface microorganisms have yielded few environmentally relevant isolates for laboratory studies. In order to address this lack of cultivated microorganisms, we initiated several enrichments on sediment and underlying basalt samples from North Pond, a sediment basin ringed by basalt outcrops underlying an oligotrophic water-column west of the Mid-Atlantic Ridge at 22°N. In contrast to anoxic enrichments, growth was observed in aerobic, heterotrophic enrichments from sediment of IODP Hole U1382B at 4 and 68 m below seafloor (mbsf). These sediment depths, respectively, correspond to the fringes of oxygen penetration from overlying seawater in the top of the sediment column and upward migration of oxygen from oxic seawater from the basalt aquifer below the sediment. Here we report the enrichment, isolation, initial characterization and genomes of three isolated aerobic heterotrophs from North Pond sediments; an Arthrobacter species from 4 mbsf, and Paracoccus and Pseudomonas species from 68 mbsf. These cultivated bacteria are represented in the amplicon 16S rRNA gene libraries created from whole sediments, albeit at low (up to 2%) relative abundance. We provide genomic evidence from our isolates demonstrating that the Arthrobacter and Pseudomonas isolates have the potential to respire nitrate and oxygen, though dissimilatory nitrate reduction could not be confirmed in laboratory cultures. The cultures from this study represent members of abundant phyla, as determined by amplicon sequencing of environmental DNA extracts, and allow for further studies into geochemical factors impacting life in the deep subsurface. PMID:27242705

  18. Adaptation of a commercial robot for genome library replication

    SciTech Connect

    Uber, D.C.; Searles, W.L.

    1994-01-01

    This report describes tools and fixtures developed at the Human Genome Center at Lawrence Berkeley Laboratory for the Hewlett-Packard ORCA{trademark} (Optimized Robot for Chemical Analysis) to replicate large genome libraries. Photographs and engineering drawings of the various custom-designed components are included.

  19. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon.

    PubMed

    Clepet, Christian; Joobeur, Tarek; Zheng, Yi; Jublot, Delphine; Huang, Mingyun; Truniger, Veronica; Boualem, Adnane; Hernandez-Gonzalez, Maria Elena; Dolcet-Sanjuan, Ramon; Portnoy, Vitaly; Mascarell-Creus, Albert; Caño-Delgado, Ana I; Katzir, Nurit; Bendahmane, Abdelhafid; Giovannoni, James J; Aranda, Miguel A; Garcia-Mas, Jordi; Fei, Zhangjun

    2011-05-20

    Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon

  20. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    PubMed Central

    2011-01-01

    Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot

  1. Targeted Genome-Wide Enrichment of Functional Regions

    PubMed Central

    Senapathy, Periannan; Bhasi, Ashwini; Mattox, Jeffrey; Dhandapany, Perundurai S.; Sadayappan, Sakthivel

    2010-01-01

    Only a small fraction of large genomes such as that of the human contains the functional regions such as the exons, promoters, and polyA sites. A platform technique for selective enrichment of functional genomic regions will enable several next-generation sequencing applications that include the discovery of causal mutations for disease and drug response. Here, we describe a powerful platform technique, termed “functional genomic fingerprinting” (FGF), for the multiplexed genomewide isolation and analysis of targeted regions such as the exome, promoterome, or exon splice enhancers. The technique employs a fixed part of a uniquely designed Fixed-Randomized primer, while the randomized part contains all the possible sequence permutations. The Fixed-Randomized primers bind with full sequence complementarity at multiple sites where the fixed sequence (such as the splice signals) occurs within the genome, and multiplex amplify many regions bounded by the fixed sequences (e.g., exons). Notably, validation of this technique using cardiac myosin binding protein-C (MYBPC3) gene as an example strongly supports the application and efficacy of this method. Further, assisted by genomewide computational analyses of such sequences, the FGF technique may provide a unique platform for high-throughput sample production and analysis of targeted genomic regions by the next-generation sequencing techniques, with powerful applications in discovering disease and drug response genes. PMID:20585402

  2. Alternative splicing enriched cDNA libraries identify breast cancer-associated transcripts

    PubMed Central

    2010-01-01

    Background Alternative splicing (AS) is a central mechanism in the generation of genomic complexity and is a major contributor to transcriptome and proteome diversity. Alterations of the splicing process can lead to deregulation of crucial cellular processes and have been associated with a large spectrum of human diseases. Cancer-associated transcripts are potential molecular markers and may contribute to the development of more accurate diagnostic and prognostic methods and also serve as therapeutic targets. Alternative splicing-enriched cDNA libraries have been used to explore the variability generated by alternative splicing. In this study, by combining the use of trapping heteroduplexes and RNA amplification, we developed a powerful approach that enables transcriptome-wide exploration of the AS repertoire for identifying AS variants associated with breast tumor cells modulated by ERBB2 (HER-2/neu) oncogene expression. Results The human breast cell line (C5.2) and a pool of 5 ERBB2 over-expressing breast tumor samples were used independently for the construction of two AS-enriched libraries. In total, 2,048 partial cDNA sequences were obtained, revealing 214 alternative splicing sequence-enriched tags (ASSETs). A subset with 79 multiple exon ASSETs was compared to public databases and reported 138 different AS events. A high success rate of RT-PCR validation (94.5%) was obtained, and 2 novel AS events were identified. The influence of ERBB2-mediated expression on AS regulation was evaluated by capillary electrophoresis and probe-ligation approaches in two mammary cell lines (Hb4a and C5.2) expressing different levels of ERBB2. The relative expression balance between AS variants from 3 genes was differentially modulated by ERBB2 in this model system. Conclusions In this study, we presented a method for exploring AS from any RNA source in a transcriptome-wide format, which can be directly easily adapted to next generation sequencers. We identified AS transcripts

  3. Isolation and characterization of rice cesium transporter genes from a rice-transporter-enriched yeast expression library.

    PubMed

    Yamaki, Tomohiro; Otani, Masahiro; Ono, Kohei; Mimura, Takuro; Oda, Koshiro; Minamii, Takeshi; Matsumoto, Shingo; Matsuo, Yuzy; Kawamukai, Makoto; Akihiro, Takashi

    2017-03-28

    A considerable portion of agricultural land in central-east Japan has been contaminated by radioactive material, particularly radioactive Cs, due to the industrial accident at the Fukushima Daiichi nuclear power plant. Understanding the mechanism of absorption, translocation, and accumulation of Cs(+) in plants will greatly assist in developing approaches to help reduce the radioactive contamination of agricultural products. At present, however, little is known regarding the Cs(+) transporters in rice. A transporter-enriched yeast expression library was constructed and the library was screened for Cs(+) transporter genes. The 1452 full length cDNAs encoding transporter genes were obtained from the Rice Genome Resource Center and 1358 clones of these transporter genes were successively subcloned into yeast expression vectors; which were then transferred into yeast. Using this library, both positive and negative selection screens can be performed, which have not been previously possible. The constructed library is an excellent tool for the isolation of novel transporter genes. This library was screened for clones that were sensitive to Cs(+) using a SD-Gal medium containing either 30 or 70 mM CsCl; resulting in the isolation of thirteen Cs(+) sensitive clones. (137) Cs absorption experiments were conducted and confirmed that all of the identified clones were able to absorb (137) Cs. Three potassium transporters, two ABC transporters, and one NRAMP transporter were among the thirteen identified clones.

  4. Antigen discovery using whole-genome phage display libraries.

    PubMed

    Beghetto, Elisa; Gargano, Nicola

    2013-01-01

    In the last two decades phage display technology has been used for investigating complex biological processes and isolating molecules of practical value in several applications. Bacteriophage lambda, representing a classical cloning and expression system, has also been exploited for generating display libraries of small peptides and protein domains. More recently, large cDNA and whole-genome lambda-display libraries of human pathogens have been generated for the discovery of new antigens for biomedical applications. Here, we describe the construction of a whole-genome library of a common pathogen-Streptococcus pneumoniae-and the use of this library for the molecular dissection of the human B-cell response against bacterial infection and colonization.

  5. Hyb-Seq: combining target enrichment and genome skimming for plant phylogenomics

    Treesearch

    Kevin Weitemier; Shannon C.K. Straub; Richard C. Cronn; Mark Fishbein; Roswitha Schmickl; Angela McDonnell; Aaron. Liston

    2014-01-01

    • Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. • Methods and Results: Genome and transcriptome assemblies for milkweed ( Asclepias syriaca ) were used to design enrichment probes for 3385...

  6. GenomeD3Plot: a library for rich, interactive visualizations of genomic data in web applications.

    PubMed

    Laird, Matthew R; Langille, Morgan G I; Brinkman, Fiona S L

    2015-10-15

    A simple static image of genomes and associated metadata is very limiting, as researchers expect rich, interactive tools similar to the web applications found in the post-Web 2.0 world. GenomeD3Plot is a light weight visualization library written in javascript using the D3 library. GenomeD3Plot provides a rich API to allow the rapid visualization of complex genomic data using a convenient standards based JSON configuration file. When integrated into existing web services GenomeD3Plot allows researchers to interact with data, dynamically alter the view, or even resize or reposition the visualization in their browser window. In addition GenomeD3Plot has built in functionality to export any resulting genome visualization in PNG or SVG format for easy inclusion in manuscripts or presentations. GenomeD3Plot is being utilized in the recently released Islandviewer 3 (www.pathogenomics.sfu.ca/islandviewer/) to visualize predicted genomic islands with other genome annotation data. However, its features enable it to be more widely applicable for dynamic visualization of genomic data in general. GenomeD3Plot is licensed under the GNU-GPL v3 at https://github.com/brinkmanlab/GenomeD3Plot/. brinkman@sfu.ca. © The Author 2015. Published by Oxford University Press.

  7. Identification of immunogenic polypeptides from a Mycoplasma hyopneumoniae genome library by phage display.

    PubMed

    Kügler, Jonas; Nieswandt, Simone; Gerlach, Gerald F; Meens, Jochen; Schirrmann, Thomas; Hust, Michael

    2008-09-01

    The identification of immunogenic polypeptides of pathogens is helpful for the development of diagnostic assays and therapeutic applications like vaccines. Routinely, these proteins are identified by two-dimensional polyacrylamide gel electrophoresis and Western blot using convalescent serum, followed by mass spectrometry. This technology, however, is limited, because low or differentially expressed proteins, e.g. dependent on pathogen-host interaction, cannot be identified. In this work, we developed and improved a M13 genomic phage display-based method for the selection of immunogenic polypeptides of Mycoplasma hyopneumoniae, a pathogen causing porcine enzootic pneumonia. The fragmented genome of M. hyopneumoniae was cloned into a phage display vector, and the genomic library was packaged using the helperphage Hyperphage to enrich open reading frames (ORFs). Afterwards, the phage display library was screened by panning using convalescent serum. The analysis of individual phage clones resulted in the identification of five genes encoding immunogenic proteins, only two of which had been previously identified and described as immunogenic. This M13 genomic phage display, directly combining ORF enrichment and the presentation of the corresponding polypeptide on the phage surface, complements proteome-based methods for the identification of immunogenic polypeptides and is particularly well suited for the use in mycoplasma species.

  8. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment.

    PubMed

    Kim, Jonghwan; Bhinge, Akshay A; Morgan, Xochitl C; Iyer, Vishwanath R

    2005-01-01

    Identifying the chromosomal targets of transcription factors is important for reconstructing the transcriptional regulatory networks underlying global gene expression programs. We have developed an unbiased genomic method called sequence tag analysis of genomic enrichment (STAGE) to identify the direct binding targets of transcription factors in vivo. STAGE is based on high-throughput sequencing of concatemerized tags derived from target DNA enriched by chromatin immunoprecipitation. We first used STAGE in yeast to confirm that RNA polymerase III genes are the most prominent targets of the TATA-box binding protein. We optimized the STAGE protocol and developed analysis methods to allow the identification of transcription factor targets in human cells. We used STAGE to identify several previously unknown binding targets of human transcription factor E2F4 that we independently validated by promoter-specific PCR and microarray hybridization. STAGE provides a means of identifying the chromosomal targets of DNA-associated proteins in any sequenced genome.

  9. Deep subsurface life from North Pond: Enrichment, isolation, characterization and genomes of heterotrophic bacteria

    SciTech Connect

    Russell, Joseph A.; Leon-Zayas, Rosa; Wrighton, Kelly; Biddle, Jennifer F.

    2016-05-10

    Studies of subsurface microorganisms have yielded few environmentally relevant isolates for laboratory studies. In order to address this lack of cultivated microorganisms, we initiated several enrichments on sediment and underlying basalt samples from North Pond, a sediment basin ringed by basalt outcrops underlying an oligotrophic watercolumn west of the Mid-Atlantic Ridge at 22° N. In contrast to anoxic enrichments, growth was observed in aerobic, heterotrophic enrichments from sediment of IODP Hole U1382B at 4 and 68 m below seafloor (mbsf). These sediment depths, respectively, correspond to the fringes of oxygen penetration from overlying seawater in the top of the sediment column and upward migration of oxygen from oxic seawater from the basalt aquifer below the sediment. Here we report the enrichment, isolation, initial characterization and genomes of three isolated aerobic heterotrophs from North Pond sediments; an Arthrobacter species from 4 mbsf, and Paracoccus and Pseudomonas species from 68 mbsf. These cultivated bacteria are represented in the amplicon 16S rRNA gene libraries created from whole sediments, albeit at low (up to 2%) relative abundance. We provide genomic evidence from our isolates demonstrating that the Arthrobacter and Pseudomonas isolates have the potential to respire nitrate and oxygen, though dissimilatory nitrate reduction could not be confirmed in laboratory cultures. Furthermore, the cultures from this study represent members of abundant phyla, as determined by amplicon sequencing of environmental DNA extracts, and allow for further studies into geochemical factors impacting life in the deep subsurface.

  10. Whole-Genome Enrichment Provides Deep Insights into Vibrio cholerae Metagenome from an African River.

    PubMed

    Vezzulli, L; Grande, C; Tassistro, G; Brettar, I; Höfle, M G; Pereira, R P A; Mushi, D; Pallavicini, A; Vassallo, P; Pruzzo, C

    2017-04-01

    The detection and typing of Vibrio cholerae in natural aquatic environments encounter major methodological challenges related to the fact that the bacterium is often present in environmental matrices at very low abundance in nonculturable state. This study applied, for the first time to our knowledge, a whole-genome enrichment (WGE) and next-generation sequencing (NGS) approach for direct genotyping and metagenomic analysis of low abundant V. cholerae DNA (<50 genome unit/L) from natural water collected in the Morogoro river (Tanzania). The protocol is based on the use of biotinylated RNA baits for target enrichment of V. cholerae metagenomic DNA via hybridization. An enriched V. cholerae metagenome library was generated and sequenced on an Illumina MiSeq platform. Up to 1.8 × 10(7) bp (4.5× mean read depth) were found to map against V. cholerae reference genome sequences representing an increase of about 2500 times in target DNA coverage compared to theoretical calculations of performance for shotgun metagenomics. Analysis of metagenomic data revealed the presence of several V. cholerae virulence and virulence associated genes in river water including major virulence regions (e.g. CTX prophage and Vibrio pathogenicity island-1) and genetic markers of epidemic strains (e.g. O1-antigen biosynthesis gene cluster) that were not detectable by standard culture and molecular techniques. Overall, besides providing a powerful tool for direct genotyping of V. cholerae in complex environmental matrices, this study provides a 'proof of concept' on the methodological gap that might currently preclude a more comprehensive understanding of toxigenic V. cholerae emergence from natural aquatic environments.

  11. Universal Human Papillomavirus Typing Assay: Whole-Genome Sequencing following Target Enrichment

    PubMed Central

    Li, Tengguo; Unger, Elizabeth R.; Batra, Dhwani; Sheth, Mili; Steinau, Martin; Jasinski, Jean; Jones, Jennifer

    2016-01-01

    ABSTRACT We designed a universal human papillomavirus (HPV) typing assay based on target enrichment and whole-genome sequencing (eWGS). The RNA bait included 23,941 probes targeting 191 HPV types and 12 probes targeting beta-globin as a control. We used the Agilent SureSelect XT2 protocol for library preparation, Illumina HiSeq 2500 for sequencing, and CLC Genomics Workbench for sequence analysis. Mapping stringency for type assignment was determined based on 8 (6 HPV-positive and 2 HPV-negative) control samples. Using the optimal mapping conditions, types were assigned to 24 blinded samples. eWGS results were 100% concordant with Linear Array (LA) genotyping results for 9 plasmid samples and fully or partially concordant for 9 of the 15 cervical-vaginal samples, with 95.83% overall type-specific concordance for LA genotyping. eWGS identified 7 HPV types not included in the LA genotyping. Since this method does not involve degenerate primers targeting HPV genomic regions, PCR bias in genotype detection is minimized. With further refinements aimed at reducing cost and increasing throughput, this first application of eWGS for universal HPV typing could be a useful method to elucidate HPV epidemiology. PMID:27974548

  12. Selection-by-function: efficient enrichment of cathepsin E inhibitors from a DNA library.

    PubMed

    Naimuddin, Mohammed; Kitamura, Koichirou; Kinoshita, Yasunori; Honda-Takahashi, Yoko; Murakami, Marina; Ito, Masato; Yamamoto, Kenji; Hanada, Kazunori; Husimi, Yuzuru; Nishigaki, Koichi

    2007-01-01

    A method for efficient enrichment of protease inhibitors out of a DNA library was developed by introducing SF-link technology. A two-step selection strategy was designed consisting of the initial enrichment of aptamers based on binding function while the second enrichment step was based on the inhibitory activity to a protease, cathepsin E (CE). The latter was constructed by covalently linking of a biotinylated peptide substrate to each of the ssDNA molecule contained in the preliminarily selected DNA library, generating 'SF-link'. Gradual enrichment of inhibitory DNAs was attained in the course of selection. One molecule, SFR-6-3, showed an IC(50) of around 30 nM, a K(d) of around 15 nM and high selectivity for CE. Sequence and structure analysis revealed a C-rich sequence without any guanine and possibly an i-motif structure, which must be novel to be found in in vitro-selected aptamers. SF-link technology, which is novel as the screening technology, provided a remarkable enrichment of specific protease inhibitors and has a potential to be further developed.

  13. Raman spectroscopy detects phenotypic differences among Escherichia coli enriched for 1-butanol tolerance using a metagenomic DNA library.

    PubMed

    Freedman, Benjamin G; Zu, Theresah N K; Wallace, Robert S; Senger, Ryan S

    2016-07-01

    Advances in Raman spectroscopy are enabling more comprehensive measurement of microbial cell chemical composition. Advantages include results returned in near real-time and minimal sample preparation. In this research, Raman spectroscopy is used to analyze E. coli with engineered solvent tolerance, which is a multi-genic trait associated with complex and uncharacterized phenotypes that are of value to industrial microbiology. To generate solvent tolerant phenotypes, E. coli transformed with DNA libraries are serially enriched in the presence of 0.9% (v/v) and 1.1% (v/v) 1-butanol. DNA libraries are created using degenerate oligonucleotide primed PCR (DOP-PCR) from the genomic DNA of E. coli, Clostridium acetobutylicum ATCC 824, and the metagenome of a stream bank soil sample, which contained DNA from 72 different phyla. DOP-PCR enabled high efficiency library cloning (with no DNA shearing or end-polishing) and the inclusion un-culturable organisms. Nine strains with improved tolerance are analyzed by Raman spectroscopy and vastly different solvent-tolerant phenotypes are characterized. Common among these are improved membrane rigidity from increasing the fraction of unsaturated fatty acids at the expense of cyclopropane fatty acids. Raman spectroscopy offers the ability to monitor cell phenotype changes in near real-time and is adaptable to high-throughput screening, making it relevant to metabolic engineering. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  14. Full-length enriched cDNA libraries and ORFeome analysis of sugarcane hybrid and ancestor genotypes.

    PubMed

    Nishiyama, Milton Yutaka; Ferreira, Savio Siqueira; Tang, Pei-Zhong; Becker, Scott; Pörtner-Taliana, Antje; Souza, Glaucia Mendes

    2014-01-01

    Sugarcane is a major crop used for food and bioenergy production. Modern cultivars are hybrids derived from crosses between Saccharum officinarum and Saccharum spontaneum. Hybrid cultivars combine favorable characteristics from ancestral species and contain a genome that is highly polyploid and aneuploid, containing 100-130 chromosomes. These complex genomes represent a huge challenge for molecular studies and for the development of biotechnological tools that can facilitate sugarcane improvement. Here, we describe full-length enriched cDNA libraries for Saccharum officinarum, Saccharum spontaneum, and one hybrid genotype (SP803280) and analyze the set of open reading frames (ORFs) in their genomes (i.e., their ORFeomes). We found 38,195 (19%) sugarcane-specific transcripts that did not match transcripts from other databases. Less than 1.6% of all transcripts were ancestor-specific (i.e., not expressed in SP803280). We also found 78,008 putative new sugarcane transcripts that were absent in the largest sugarcane expressed sequence tag database (SUCEST). Functional annotation showed a high frequency of protein kinases and stress-related proteins. We also detected natural antisense transcript expression, which mapped to 94% of all plant KEGG pathways; however, each genotype showed different pathways enriched in antisense transcripts. Our data appeared to cover 53.2% (17,563 genes) and 46.8% (937 transcription factors) of all sugarcane full-length genes and transcription factors, respectively. This work represents a significant advancement in defining the sugarcane ORFeome and will be useful for protein characterization, single nucleotide polymorphism and splicing variant identification, evolutionary and comparative studies, and sugarcane genome assembly and annotation.

  15. Full-Length Enriched cDNA Libraries and ORFeome Analysis of Sugarcane Hybrid and Ancestor Genotypes

    PubMed Central

    Becker, Scott; Pörtner-Taliana, Antje; Souza, Glaucia Mendes

    2014-01-01

    Sugarcane is a major crop used for food and bioenergy production. Modern cultivars are hybrids derived from crosses between Saccharum officinarum and Saccharum spontaneum. Hybrid cultivars combine favorable characteristics from ancestral species and contain a genome that is highly polyploid and aneuploid, containing 100–130 chromosomes. These complex genomes represent a huge challenge for molecular studies and for the development of biotechnological tools that can facilitate sugarcane improvement. Here, we describe full-length enriched cDNA libraries for Saccharum officinarum, Saccharum spontaneum, and one hybrid genotype (SP803280) and analyze the set of open reading frames (ORFs) in their genomes (i.e., their ORFeomes). We found 38,195 (19%) sugarcane-specific transcripts that did not match transcripts from other databases. Less than 1.6% of all transcripts were ancestor-specific (i.e., not expressed in SP803280). We also found 78,008 putative new sugarcane transcripts that were absent in the largest sugarcane expressed sequence tag database (SUCEST). Functional annotation showed a high frequency of protein kinases and stress-related proteins. We also detected natural antisense transcript expression, which mapped to 94% of all plant KEGG pathways; however, each genotype showed different pathways enriched in antisense transcripts. Our data appeared to cover 53.2% (17,563 genes) and 46.8% (937 transcription factors) of all sugarcane full-length genes and transcription factors, respectively. This work represents a significant advancement in defining the sugarcane ORFeome and will be useful for protein characterization, single nucleotide polymorphism and splicing variant identification, evolutionary and comparative studies, and sugarcane genome assembly and annotation. PMID:25222706

  16. Effective DNA fragmentation technique for simple sequence repeat detection with a microsatellite-enriched library and high-throughput sequencing.

    PubMed

    Tanaka, Keisuke; Ohtake, Rumi; Yoshida, Saki; Shinohara, Takashi

    2017-04-01

    Two different techniques for genomic DNA fragmentation before microsatellite-enriched library construction-restriction enzyme (NlaIII and MseI) digestion and sonication-were compared to examine their effects on simple sequence repeat (SSR) detection using high-throughput sequencing. Tens of thousands of SSR regions from 5 species of the plant family Myrtaceae were detected when the output of individual samples was >1 million paired-end reads. Comparison of the two DNA fragmentation techniques showed that restriction enzyme digestion was superior to sonication for identification of heterozygous genotypes, whereas sonication was superior for detection of various SSR flanking regions with both species-specific and common characteristics. Therefore, choosing the most suitable DNA fragmentation method depends on the type of analysis that is planned.

  17. Whole-Genome Sequencing: Automated, Nonindexed Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing a nonindexed Illumina DNA library and relies on the use of a CyBi-SELMA automated pipetting machine, the Covaris E210 shearing instrument, and the epMotion 5075. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Here, double-stranded DNA is fragmented when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymerase chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing.

  18. Whole-Genome Sequencing: Automated, Indexed Library Preparation.

    PubMed

    Mardis, Elaine; McCombie, W Richard

    2017-03-01

    This protocol describes an automated procedure for constructing an indexed Illumina DNA library. With this method, genomic DNA fragments are produced by sonication, using high-frequency acoustic energy to shear DNA. Double-stranded DNA (dsDNA) will fragment when exposed to the energy of adaptive focused acoustic shearing (AFA). The resulting DNA fragments are ligated to adaptors, amplified by polymer chain reaction (PCR), and subjected to size selection using magnetic beads. The product is suitable for use as template in whole-genome sequencing.

  19. Identification of differentially methylated regions using streptavidin bisulfite ligand methylation enrichment (SuBLiME), a new method to enrich for methylated DNA prior to deep bisulfite genomic sequencing

    PubMed Central

    Ross, Jason P.; Shaw, Jan M.; Molloy, Peter L.

    2013-01-01

    We have developed a method that enriches for methylated cytosines by capturing the fraction of bisulfite-treated DNA with unconverted cytosines. The method, called streptavidin bisulfite ligand methylation enrichment (SuBLiME), involves the specific labeling (using a biotin-labeled nucleotide ligand) of methylated cytosines in bisulfite-converted DNA. This step is then followed by affinity capture, using streptavidin-coupled magnetic beads. SuBLiME is highly adaptable and can be combined with deep sequencing library generation and/or genomic complexity-reduction. In this pilot study, we enriched methylated DNA from Csp6I-cut complexity-reduced genomes of colorectal cancer cell lines (HCT-116, HT-29 and SW-480) and normal blood leukocytes with the aim of discovering colorectal cancer biomarkers. Enriched libraries were sequenced with SOLiD-3 technology. In pairwise comparisons, we scored a total of 1,769 gene loci and 33 miRNA loci as differentially methylated between the cell lines and leukocytes. Of these, 516 loci were differently methylated in at least two promoter-proximal CpG sites over two discrete Csp6I fragments. Identified methylated gene loci were associated with anatomical development, differentiation and cell signaling. The data correlated with good agreement to a number of published colorectal cancer DNA methylation biomarkers and genomic data sets. SuBLiME is effective in the enrichment of methylated nucleic acid and in the detection of known and novel biomarkers. PMID:23257838

  20. 3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.

    PubMed

    Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong

    2008-09-01

    We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.

  1. Selective enrichment of environmental DNA libraries for genes encoding nonribosomal peptides and polyketides by phosphopantetheine transferase-dependent complementation of siderophore biosynthesis

    PubMed Central

    Charlop-Powers, Zachary; Banik, Jacob J.; Owen, Jeremy G.; Craig, Jeffrey W.; Brady, Sean F.

    2012-01-01

    The cloning of DNA directly from environmental samples provides a means to functionally access biosynthetic gene clusters present in the genomes of the large fraction of bacteria that remains recalcitrant to growth in the laboratory. Herein we demonstrate a method by which complementation of phosphopantetheine transferase deletion mutants can be used to restore siderophore biosynthesis and to therefore selectively enrich eDNA libraries for nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) gene sequences to unprecedented levels. The common use of NRPS/PKS-derived siderophores across bacterial taxa makes this method generalizable and should allow for the facile selective enrichment of NRPS/PKS-containing biosynthetic gene clusters from large environmental DNA libraries using a wide variety of phylogenetically diverse bacterial hosts. PMID:23072412

  2. Selective enrichment of environmental DNA libraries for genes encoding nonribosomal peptides and polyketides by phosphopantetheine transferase-dependent complementation of siderophore biosynthesis.

    PubMed

    Charlop-Powers, Zachary; Banik, Jacob J; Owen, Jeremy G; Craig, Jeffrey W; Brady, Sean F

    2013-01-18

    The cloning of DNA directly from environmental samples provides a means to functionally access biosynthetic gene clusters present in the genomes of the large fraction of bacteria that remains recalcitrant to growth in the laboratory. Herein, we demonstrate a method by which complementation of phosphopantetheine transferase deletion mutants can be used to restore siderophore biosynthesis and to therefore selectively enrich eDNA libraries for nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) gene sequences to unprecedented levels. The common use of NRPS/PKS-derived siderophores across bacterial taxa makes this method generalizable and should allow for the facile selective enrichment of NRPS/PKS-containing biosynthetic gene clusters from large environmental DNA libraries using a wide variety of phylogenetically diverse bacterial hosts.

  3. [Identification of mutations associated with coronary artery lesion susceptibility in Kawasaki disease by targeted enrichment of genomic region sequencing technique].

    PubMed

    Zhu, D Y; Song, S R; Xie, L J; Qiu, F; Yang, J; Xiao, T T; Huang, M

    2017-07-02

    Objective: To screen and identify the mutations in Kawasaki disease by targeted enrichment of genomic region sequencing technique and investigate susceptibility genes associated with coronary artery lesion. Method: This was a case-control study.A total of 114 patients diagnosed as Kawasaki disease treated in Shanghai Children's Hospital between December 2015 and November 2016 were studied and another 45 healthy children who were physically examined in outpatient department were enrolled as control group. Patients were divided into two groups based on the results of echocardiogram. Peripheral venous blood was obtained from patients and controls. Genomic DNA was extracted. SeqCap EZ Choice libraries were prepared by targeted enrichment of genomic region technology. Then the libraries were sequenced to identify susceptibility genes associated with coronary artery lesion in patients diagnosed as Kawasaki disease.Susceptible genes were identified by Burden test, Pearson chi-square test or Fisher's exact probability test. Result: There was statistically significant difference in TNFRSF11B(rs2073618)G>C(p.N3K)mutation and GG/GC/CC genotype between Kawasaki disease group and control group(χ(2)=15.52, P=0.00). There was statistically significant difference in TNFRSF13B(rs34562254)C>T(p.P251L)mutation(χ(2)=10.40, P=0.01)and LEFTY1(rs360057)T>G(p.D322A)mutation(χ(2)=8.505, P=0.01)between patients with coronary artery lesions and those without. Conclusion: Targeted enrichment of genomic region sequencing technology can be used to do primary screening for the susceptible genes associated with coronary artery lesions in Chinese Kawasaki patients and may provide theoretical basis for larger sample investigation of risk prediction score standard in Kawasaki disease.

  4. RecA-assisted rapid enrichment of specific clones from model DNA libraries.

    PubMed

    Teintze, M; Arzimanoglou, I I; Lovelace, C I; Xu, Z J; Rigas, B

    1995-06-26

    An approach to library screening is being developed, in which the desired clone is "fished" out of a mixture of all the recombinants in a library with a RecA-coated probe. In the current embodiment of this method, we used as a probe the (+) strand of an M13 phage containing a fragment of the human albumin gene and a (dA)49 stretch. We screened a library of two plasmids, one containing the same albumin fragment as the probe, and one heterologous to the probe in 50-100 fold molar excess. The plasmids were linearized. Probe and library were reacted in the presence of RecA, the mixture was loaded onto an oligo(dT) column, which retained the probe-target complex by base-pairing to the dAs of the probe, the uncaptured plasmids were washed, and the probe-target complex was released from the column, religated and propagated into E. coli. Recovery of the homologous target was 15-28%, and enrichment for the homologous plasmid was 200 to 400-fold. This approach may provide a general method for expedited DNA library screening.

  5. Enrichment of Targetable Mutations in the Relapsed Neuroblastoma Genome

    PubMed Central

    Ostrovnaya, Irina; Rubnitz, Kaitlyn R.; Ali, Siraj M.; Miller, Vincent A.; Mossé, Yael P.; Maris, John M.

    2016-01-01

    Neuroblastoma is characterized by a relative paucity of recurrent somatic mutations at diagnosis. However, recent studies have shown that the mutational burden increases at relapse, likely as a result of clonal evolution of mutation-carrying cells during primary treatment. To inform the development of personalized therapies, we sought to further define the frequency of potentially actionable mutations in neuroblastoma, both at diagnosis and after chemotherapy. We performed a retrospective study to determine mutation frequency, the only inclusion criterion being availability of cancer gene panel sequencing data from Foundation Medicine. We analyzed 151 neuroblastoma tumor samples: 44 obtained at diagnosis, 42 at second look surgery or biopsy for stable disease after chemotherapy, and 59 at relapse (6 were obtained at unknown time points). Nine patients had multiple tumor biopsies. ALK was the most commonly mutated gene in this cohort, and we observed a higher frequency of suspected oncogenic ALK mutations in relapsed disease than at diagnosis. Patients with relapsed disease had, on average, a greater number of mutations reported to be recurrent in cancer, and a greater number of mutations in genes that are potentially targetable with available therapeutics. We also observed an enrichment of reported recurrent RAS/MAPK pathway mutations in tumors obtained after chemotherapy. Our data support recent evidence suggesting that neuroblastomas undergo substantial mutational evolution during therapy, and that relapsed disease is more likely to be driven by a targetable oncogenic pathway, highlighting that it is critical to base treatment decisions on the molecular profile of the tumor at the time of treatment. However, it will be necessary to conduct prospective clinical trials that match sequencing results to targeted therapeutic intervention to determine if cancer genomic profiling improves patient outcomes. PMID:27997549

  6. [Research progress in developing reporter systems for the enrichment of positive cells with targeted genome modification].

    PubMed

    Bai, Yichun; Xu, Kun; Wei, Zehui; Ma, Zheng; Zhang, Zhiying

    2016-01-01

    Targeted genome editing technology plays an important role in studies of gene function, gene therapy and transgenic breeding. Moreover, the efficiency of targeted genome editing is increased dramatically with the application of recently developed artificial nucleases such as ZFNs, TALENs and CRISPR/Cas9. However, obtaining positive cells with targeted genome modification is restricted to some extent by nucleases expression plasmid transfection efficiency, nucleases expression and activity, and repair efficiency after genome editing. Thus, the enrichment and screening of positive cells with targeted genome modification remains a problem that need to be solved. Surrogate reporter systems could be used to reflect the efficiency of nucleases indirectly and enrich genetically modified positive cells effectively, which may increase the efficiency of the enrichment and screening of positive cells with targeted genome modification. In this review, we mainly summarized principles and applications of reporter systems based on NHEJ and SSA repair mechanisms, which may provide references for related studies in future.

  7. Enrichment of chemical libraries docked to protein conformational ensembles and application to aldehyde dehydrogenase 2.

    PubMed

    Wang, Bo; Buchman, Cameron D; Li, Liwei; Hurley, Thomas D; Meroueh, Samy O

    2014-07-28

    Molecular recognition is a complex process that involves a large ensemble of structures of the receptor and ligand. Yet, most structure-based virtual screening is carried out on a single structure typically from X-ray crystallography. Explicit-solvent molecular dynamics (MD) simulations offer an opportunity to sample multiple conformational states of a protein. Here we evaluate our recently developed scoring method SVMSP in its ability to enrich chemical libraries docked to MD structures of seven proteins from the Directory of Useful Decoys (DUD). SVMSP is a target-specific rescoring method that combines machine learning with statistical potentials. We find that enrichment power as measured by the area under the ROC curve (ROC-AUC) is not affected by increasing the number of MD structures. Among individual MD snapshots, many exhibited enrichment that was significantly better than the crystal structure, but no correlation between enrichment and structural deviation from crystal structure was found. We followed an innovative approach by training SVMSP scoring models using MD structures (SVMSPMD). The resulting models were applied to two difficult cases (p38 and CDK2) for which enrichment was not better than random. We found remarkable increase in enrichment power, particularly for p38, where the ROC-AUC increased by 0.30 to 0.85. Finally, we explored approaches for a priori identification of MD snapshots with high enrichment power from an MD simulation in the absence of active compounds. We found that the use of randomly selected compounds docked to the target of interest using SVMSP led to notable enrichment for EGFR and Src MD snapshots. SVMSP rescoring of protein-compound MD structures was applied for the search of small-molecule inhibitors of the mitochondrial enzyme aldehyde dehydrogenase 2 (ALDH2). Rank-ordering of a commercial library of 50 000 compounds docked to MD structures of ALDH2 led to five small-molecule inhibitors. Four compounds had IC50s below 5

  8. Enrichment of Chemical Libraries Docked to Protein Conformational Ensembles and Application to Aldehyde Dehydrogenase 2

    PubMed Central

    2015-01-01

    Molecular recognition is a complex process that involves a large ensemble of structures of the receptor and ligand. Yet, most structure-based virtual screening is carried out on a single structure typically from X-ray crystallography. Explicit-solvent molecular dynamics (MD) simulations offer an opportunity to sample multiple conformational states of a protein. Here we evaluate our recently developed scoring method SVMSP in its ability to enrich chemical libraries docked to MD structures of seven proteins from the Directory of Useful Decoys (DUD). SVMSP is a target-specific rescoring method that combines machine learning with statistical potentials. We find that enrichment power as measured by the area under the ROC curve (ROC-AUC) is not affected by increasing the number of MD structures. Among individual MD snapshots, many exhibited enrichment that was significantly better than the crystal structure, but no correlation between enrichment and structural deviation from crystal structure was found. We followed an innovative approach by training SVMSP scoring models using MD structures (SVMSPMD). The resulting models were applied to two difficult cases (p38 and CDK2) for which enrichment was not better than random. We found remarkable increase in enrichment power, particularly for p38, where the ROC-AUC increased by 0.30 to 0.85. Finally, we explored approaches for a priori identification of MD snapshots with high enrichment power from an MD simulation in the absence of active compounds. We found that the use of randomly selected compounds docked to the target of interest using SVMSP led to notable enrichment for EGFR and Src MD snapshots. SVMSP rescoring of protein–compound MD structures was applied for the search of small-molecule inhibitors of the mitochondrial enzyme aldehyde dehydrogenase 2 (ALDH2). Rank-ordering of a commercial library of 50 000 compounds docked to MD structures of ALDH2 led to five small-molecule inhibitors. Four compounds had IC50s below

  9. Chromosome region-specific libraries for human genome analysis

    SciTech Connect

    Kao, Fa-Ten.

    1991-01-01

    We have made important progress since the beginning of the current grant year. We have further developed the microdissection and PCR- assisted microcloning techniques using the linker-adaptor method. We have critically evaluated the microdissection libraries constructed by this microtechnology and proved that they are of high quality. We further demonstrated that these microdissection clones are useful in identifying corresponding YAC clones for a thousand-fold expansion of the genomic coverage and for contig construction. We are also improving the technique of cloning the dissected fragments in test tube by the TDT method. We are applying both of these PCR cloning technique to human chromosomes 2 and 5 to construct region-specific libraries for physical mapping purposes of LLNL and LANL. Finally, we are exploring efficient procedures to use unique sequence microclones to isolate cDNA clones from defined chromosomal regions as valuable resources for identifying expressed gene sequences in the human genome. We believe that we are making important progress under the auspices of this DOE human genome program grant and we will continue to make significant contributions in the coming year. 4 refs., 4 figs.

  10. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  11. SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications

    PubMed Central

    Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P.; Marth, Gabor T.

    2013-01-01

    Background The Smith-Waterman algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools for next-generation sequencing data. Though various fast Smith-Waterman implementations are developed, they are either designed as monolithic protein database searching tools, which do not return detailed alignment, or are embedded into other tools. These issues make reusing these efficient Smith-Waterman implementations impractical. Results To facilitate easy integration of the fast Single-Instruction-Multiple-Data Smith-Waterman algorithm into third-party software, we wrote a C/C++ library, which extends Farrar’s Striped Smith-Waterman (SSW) to return alignment information in addition to the optimal Smith-Waterman score. In this library we developed a new method to generate the full optimal alignment results and a suboptimal score in linear space at little cost of efficiency. This improvement makes the fast Single-Instruction-Multiple-Data Smith-Waterman become really useful in genomic applications. SSW is available both as a C/C++ software library, as well as a stand-alone alignment tool at: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library. Conclusions The SSW library has been used in the primary read mapping tool MOSAIK, the split-read mapping program SCISSORS, the MEI detector TANGRAM, and the read-overlap graph generation program RZMBLR. The speeds of the mentioned software are improved significantly by replacing their ordinary Smith-Waterman or banded Smith-Waterman module with the SSW Library. PMID:24324759

  12. SSW library: an SIMD Smith-Waterman C/C++ library for use in genomic applications.

    PubMed

    Zhao, Mengyao; Lee, Wan-Ping; Garrison, Erik P; Marth, Gabor T

    2013-01-01

    The Smith-Waterman algorithm, which produces the optimal pairwise alignment between two sequences, is frequently used as a key component of fast heuristic read mapping and variation detection tools for next-generation sequencing data. Though various fast Smith-Waterman implementations are developed, they are either designed as monolithic protein database searching tools, which do not return detailed alignment, or are embedded into other tools. These issues make reusing these efficient Smith-Waterman implementations impractical. To facilitate easy integration of the fast Single-Instruction-Multiple-Data Smith-Waterman algorithm into third-party software, we wrote a C/C++ library, which extends Farrar's Striped Smith-Waterman (SSW) to return alignment information in addition to the optimal Smith-Waterman score. In this library we developed a new method to generate the full optimal alignment results and a suboptimal score in linear space at little cost of efficiency. This improvement makes the fast Single-Instruction-Multiple-Data Smith-Waterman become really useful in genomic applications. SSW is available both as a C/C++ software library, as well as a stand-alone alignment tool at: https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library. The SSW library has been used in the primary read mapping tool MOSAIK, the split-read mapping program SCISSORS, the MEI detector TANGRAM, and the read-overlap graph generation program RZMBLR. The speeds of the mentioned software are improved significantly by replacing their ordinary Smith-Waterman or banded Smith-Waterman module with the SSW Library.

  13. The infectious BAC genomic DNA expression library: a high capacity vector system for functional genomics

    PubMed Central

    Lufino, Michele M. P.; Edser, Pauline A. H.; Quail, Michael A.; Rice, Stephen; Adams, David J.; Wade-Martins, Richard

    2016-01-01

    Gene dosage plays a critical role in a range of cellular phenotypes, yet most cellular expression systems use heterologous cDNA-based vectors which express proteins well above physiological levels. In contrast, genomic DNA expression vectors generate physiologically-relevant levels of gene expression by carrying the whole genomic DNA locus of a gene including its regulatory elements. Here we describe the first genomic DNA expression library generated using the high-capacity herpes simplex virus-1 amplicon technology to deliver bacterial artificial chromosomes (BACs) into cells by viral transduction. The infectious BAC (iBAC) library contains 184,320 clones with an average insert size of 134.5 kb. We show in a Chinese hamster ovary (CHO) disease model cell line and mouse embryonic stem (ES) cells that this library can be used for genetic rescue studies in a range of contexts including the physiological restoration of Ldlr deficiency, and viral receptor expression. The iBAC library represents an important new genetic analysis tool openly available to the research community. PMID:27353647

  14. Optimizing restriction fragment fingerprinting methods for ordering large genomic libraries

    SciTech Connect

    Branscomb, E.; Slezak, T.; Pae, R.; Carrano, A.V. ); Galas, D.; Waterman, M. )

    1990-01-01

    The authors present a statistical analysis of the problem of ordering large genomic cloned libraries through overlap detection based on restriction fingerprinting. Such ordering projects involve a large investment of effort involving many repetitious experiments. Their primary purpose here is to provide methods of maximizing the efficiency of such efforts. To this end, they adopt a statistical approach that uses the likelihood ratio as a statistic to detect overlap. The main advantages of this approach are that (1) it allows the relatively straightforward incorporation of the observed statistical properties of the data; (2) it permits the efficiency of a particular experimental method for detecting overlap to be quantitatively defined so that alternative experimental designs may be compared and optimized; and (3) it yields a direct estimate of the probability that any two library members overlap. This estimate is a critical tool for the accurate, automatic assembly of overlapping sets of fragments into islands called contigs.' These contigs must subsequently be connected by other methods to provide an ordered set of overlapping fragments covering the entire genome.

  15. Draft Genome Sequence of Antarctic Methanogen Enriched from Dry Valley Permafrost

    PubMed Central

    Buongiorno, Joy; Bird, Jordan T.; Krivushin, Kirill; Oshurkova, Victoria; Shcherbakova, Victoria; Rivkina, Elizaveta M.

    2016-01-01

    A genomic reconstruction belonging to the genus Methanosarcina was assembled from metagenomic data from a methane-producing enrichment of Antarctic permafrost. This is the first methanogen genome reported from permafrost of the Dry Valleys and can help shed light on future climate-affected methane dynamics. PMID:27932654

  16. Genome-wide enrichment screening reveals multiple targets and resistance genes for triclosan in Escherichia coli.

    PubMed

    Yu, Byung Jo; Kim, Jung Ae; Ju, Hyun Mok; Choi, Soo-Kyung; Hwang, Seung Jin; Park, Sungyoo; Kim, Euijoong; Pan, Jae-Gu

    2012-10-01

    Triclosan is a widely used biocide effective against different microorganisms. At bactericidal concentrations, triclosan appears to affect multiple targets, while at bacteriostatic concentrations, triclosan targets FabI. The site-specific antibiotic-like mode-of-action and a widespread use of triclosan in household products claimed to possibly induce cross-resistance to other antibiotics. Thus, we set out to define more systematically the genes conferring resistance to triclosan; A genomic library of Escherichia coli strain W3110 was constructed and enriched in a selective medium containing a lethal concentration of triclosan. The genes enabling growth in the presence of triclosan were identified by using a DNA microarray and confirmed consequently by ASKA clones overexpressing the selected 62 candidate genes. Among these, forty-seven genes were further confirmed to enhance the resistance to triclosan; these genes, including the FabI target, were involved in inner or outer membrane synthesis, cell-surface material synthesis, transcriptional activation, sugar phosphotransferase (PTS) systems, various transporter systems, cell division, and ATPase and reductase/dehydrogenase reactions. In particular, overexpression of pgsA, rcsA, or gapC conferred to E. coli cells a similar level of triclosan resistance induced by fabI overexpression. These results indicate that triclosan may have multiple targets other than well-known FabI and that there are several undefined novel mechanisms for the resistance development to triclosan, thus probably inducing cross antibiotic resistance.

  17. A library of TAL effector nucleases spanning the human genome.

    PubMed

    Kim, Yongsub; Kweon, Jiyeon; Kim, Annie; Chon, Jae Kyung; Yoo, Ji Yeon; Kim, Hye Joo; Kim, Sojung; Lee, Choongil; Jeong, Euihwan; Chung, Eugene; Kim, Doyoung; Lee, Mi Seon; Go, Eun Mi; Song, Hye Jung; Kim, Hwangbeom; Cho, Namjin; Bang, Duhee; Kim, Seokjoong; Kim, Jin-Soo

    2013-03-01

    Transcription activator-like (TAL) effector nucleases (TALENs) can be readily engineered to bind specific genomic loci, enabling the introduction of precise genetic modifications such as gene knockouts and additions. Here we present a genome-scale collection of TALENs for efficient and scalable gene targeting in human cells. We chose target sites that did not have highly similar sequences elsewhere in the genome to avoid off-target mutations and assembled TALEN plasmids for 18,740 protein-coding genes using a high-throughput Golden-Gate cloning system. A pilot test involving 124 genes showed that all TALENs were active and disrupted their target genes at high frequencies, although two of these TALENs became active only after their target sites were partially demethylated using an inhibitor of DNA methyltransferase. We used our TALEN library to generate single- and double-gene-knockout cells in which NF-κB signaling pathways were disrupted. Compared with cells treated with short interfering RNAs, these cells showed unambiguous suppression of signal transduction.

  18. Mining non-model genomic libraries for microsatellites: BAC versus EST libraries and the generation of allelic richness

    PubMed Central

    2010-01-01

    Background Simple sequence repeats (SSRs) are tandemly repeated sequence motifs common in genomic nucleotide sequence that often harbor significant variation in repeat number. Frequently used as molecular markers, SSRs are increasingly identified via in silico approaches. Two common classes of genomic resources that can be mined are bacterial artificial chromosome (BAC) libraries and expressed sequence tag (EST) libraries. Results 288 SSR loci were screened in the rapidly radiating Hawaiian swordtail cricket genus Laupala. SSRs were more densely distributed and contained longer repeat structures in BAC library-derived sequence than in EST library-derived sequence, although neither repeat density nor length was exceptionally elevated despite the relatively large genome size of Laupala. A non-random distribution favoring AT-rich SSRs was observed. Allelic diversity of SSRs was positively correlated with repeat length and was generally higher in AT-rich repeat motifs. Conclusion The first large-scale survey of Orthopteran SSR allelic diversity is presented. Selection contributes more strongly to the size and density distributions of SSR loci derived from EST library sequence than from BAC library sequence, although all SSRs likely are subject to similar physical and structural constraints, such as slippage of DNA replication machinery, that may generate increased allelic diversity in AT-rich sequence motifs. Although in silico approaches work well for SSR locus identification in both EST and BAC libraries, BAC library sequence and AT-rich repeat motifs are generally superior SSR development resources for most applications. PMID:20624300

  19. A targeted enrichment strategy for massively parallel sequencing of angiosperm plastid genomes1

    PubMed Central

    Stull, Gregory W.; Moore, Michael J.; Mandala, Venkata S.; Douglas, Norman A.; Kates, Heather-Rose; Qi, Xinshuai; Brockington, Samuel F.; Soltis, Pamela S.; Soltis, Douglas E.; Gitzendanner, Matthew A.

    2013-01-01

    • Premise of the study: We explored a targeted enrichment strategy to facilitate rapid and low-cost next-generation sequencing (NGS) of numerous complete plastid genomes from across the phylogenetic breadth of angiosperms. • Methods and Results: A custom RNA probe set including the complete sequences of 22 previously sequenced eudicot plastomes was designed to facilitate hybridization-based targeted enrichment of eudicot plastid genomes. Using this probe set and an Agilent SureSelect targeted enrichment kit, we conducted an enrichment experiment including 24 angiosperms (22 eudicots, two monocots), which were subsequently sequenced on a single lane of the Illumina GAIIx with single-end, 100-bp reads. This approach yielded nearly complete to complete plastid genomes with exceptionally high coverage (mean coverage: 717×), even for the two monocots. • Conclusions: Our enrichment experiment was highly successful even though many aspects of the capture process employed were suboptimal. Hence, significant improvements to this methodology are feasible. With this general approach and probe set, it should be possible to sequence more than 300 essentially complete plastid genomes in a single Illumina GAIIx lane (achieving ∼50× mean coverage). However, given the complications of pooling numerous samples for multiplex sequencing and the limited number of barcodes (e.g., 96) available in commercial kits, we recommend 96 samples as a current practical maximum for multiplex plastome sequencing. This high-throughput approach should facilitate large-scale plastid genome sequencing at any level of phylogenetic diversity in angiosperms. PMID:25202518

  20. Halogen-Enriched Fragment Libraries as Leads for Drug Rescue of Mutant p53

    PubMed Central

    2012-01-01

    The destabilizing p53 cancer mutation Y220C creates a druggable surface crevice. We developed a strategy exploiting halogen bonding for lead discovery to stabilize the mutant with small molecules. We designed halogen-enriched fragment libraries (HEFLibs) as starting points to complement classical approaches. From screening of HEFLibs and subsequent structure-guided design, we developed substituted 2-(aminomethyl)-4-ethynyl-6-iodophenols as p53-Y220C stabilizers. Crystal structures of their complexes highlight two key features: (i) a central scaffold with a robust binding mode anchored by halogen bonding of an iodine with a main-chain carbonyl and (ii) an acetylene linker, enabling the targeting of an additional subsite in the crevice. The best binders showed induction of apoptosis in a human cancer cell line with homozygous Y220C mutation. Our structural and biophysical data suggest a more widespread applicability of HEFLibs in drug discovery. PMID:22439615

  1. Cost-effective enrichment hybridization capture of chloroplast genomes at deep multiplexing levels for population genetics and phylogeography studies.

    PubMed

    Mariac, Cédric; Scarcelli, Nora; Pouzadou, Juliette; Barnaud, Adeline; Billot, Claire; Faye, Adama; Kougbeadjo, Ayite; Maillol, Vincent; Martin, Guillaume; Sabot, François; Santoni, Sylvain; Vigouroux, Yves; Couvreur, Thomas L P

    2014-11-01

    Biodiversity, phylogeography and population genetic studies will be revolutionized by access to large data sets thanks to next-generation sequencing methods. In this study, we develop an easy and cost-effective protocol for in-solution enrichment hybridization capture of complete chloroplast genomes applicable at deep-multiplexed levels. The protocol uses cheap in-house species-specific probes developed via long-range PCR of the entire chloroplast. Barcoded libraries are constructed, and in-solution enrichment of the chloroplasts is carried out using the probes. This protocol was tested and validated on six economically important West African crop species, namely African rice, pearl millet, three African yam species and fonio. For pearl millet, we also demonstrate the effectiveness of this protocol to retrieve 95% of the sequence of the whole chloroplast on 95 multiplexed individuals in a single MiSeq run at a success rate of 95%. This new protocol allows whole chloroplast genomes to be retrieved at a modest cost and will allow unprecedented resolution for closely related species in phylogeography studies using plastomes.

  2. A human genome-wide library of local phylogeny predictions for whole-genome inference problems

    PubMed Central

    Sridhar, Srinath; Schwartz, Russell

    2008-01-01

    Background Many common inference problems in computational genetics depend on inferring aspects of the evolutionary history of a data set given a set of observed modern sequences. Detailed predictions of the full phylogenies are therefore of value in improving our ability to make further inferences about population history and sources of genetic variation. Making phylogenetic predictions on the scale needed for whole-genome analysis is, however, extremely computationally demanding. Results In order to facilitate phylogeny-based predictions on a genomic scale, we develop a library of maximum parsimony phylogenies within local regions spanning all autosomal human chromosomes based on Haplotype Map variation data. We demonstrate the utility of this library for population genetic inferences by examining a tree statistic we call 'imperfection,' which measures the reuse of variant sites within a phylogeny. This statistic is significantly predictive of recombination rate, shows additional regional and population-specific conservation, and allows us to identify outlier genes likely to have experienced unusual amounts of variation in recent human history. Conclusion Recent theoretical advances in algorithms for phylogenetic tree reconstruction have made it possible to perform large-scale inferences of local maximum parsimony phylogenies from single nucleotide polymorphism (SNP) data. As results from the imperfection statistic demonstrate, phylogeny predictions encode substantial information useful for detecting genomic features and population history. This data set should serve as a platform for many kinds of inferences one may wish to make about human population history and genetic variation. PMID:18710563

  3. Construction of a llama bacterial artificial chromosome library with approximately 9-fold genome equivalent coverage.

    PubMed

    Airmet, K W; Hinckley, J D; Tree, L T; Moss, M; Blumell, S; Ulicny, K; Gustafson, A K; Weed, M; Theodosis, R; Lehnardt, M; Genho, J; Stevens, M R; Kooyman, D L

    2012-01-01

    The Ilama is an important agricultural livestock in much of South America. The llama is increasing in popularity in the United States as a companion animal. Little work has been done to improve llama production using modern technology. A paucity of information is available regarding the llama genome. We report the construction of a llama bacterial artificial chromosome (BAC) library of about 196,224 clones in the vector pECBAC1. Using flow cytometry and bovine, human, mouse, and chicken as controls, we determined the llama genome size to be 2.4 × 10⁹ bp. The average insert size of the library is 137.8 kb corresponding to approximately 9-fold genome coverage. Further studies are needed to further characterize the library and llama genome. We anticipate that this new library will help facilitate future genomic studies in the llama.

  4. Identification of Promoter Regions in the Human Genome by Using a Retroviral Plasmid Library-Based Functional Reporter Gene Assay

    PubMed Central

    Khambata-Ford, Shirin; Liu, Yueyi; Gleason, Christopher; Dickson, Mark; Altman, Russ B.; Batzoglou, Serafim; Myers, Richard M.

    2003-01-01

    Attempts to identify regulatory sequences in the human genome have involved experimental and computational methods such as cross-species sequence comparisons and the detection of transcription factor binding-site motifs in coexpressed genes. Although these strategies provide information on which genomic regions are likely to be involved in gene regulation, they do not give information on their functions. We have developed a functional selection for promoter regions in the human genome that uses a retroviral plasmid library-based system. This approach enriches for and detects promoter function of isolated DNA fragments in an in vitro cell culture assay. By using this method, we have discovered likely promoters of known and predicted genes, as well as many other putative promoter regions based on the presence of features such as CpG islands. Comparison of sequences of 858 plasmid clones selected by this assay with the human genome draft sequence indicates that a significantly higher percentage of sequences align to the 500-bp segment upstream of the transcription start sites of known genes than would be expected from random genomic sequences. We also observed enrichment for putative promoter regions of genes predicted in at least two annotation databases and for clones overlapping with CpG islands. Functional validation of randomly selected clones enriched by this method showed that a large fraction of these putative promoters can drive the expression of a reporter gene in transient transfection experiments. This method promises to be a useful genome-wide function-based approach that can complement existing methods to look for promoters. PMID:12805274

  5. Halogen-enriched fragment libraries as chemical probes for harnessing halogen bonding in fragment-based lead discovery.

    PubMed

    Zimmermann, Markus O; Lange, Andreas; Wilcken, Rainer; Cieslik, Markus B; Exner, Thomas E; Joerger, Andreas C; Koch, Pierre; Boeckler, Frank M

    2014-04-01

    Halogen bonding has recently experienced a renaissance, gaining increased recognition as a useful molecular interaction in the life sciences. Halogen bonds are favorable, fairly directional interactions between an electropositive region on the halogen (the σ-hole) and a number of different nucleophilic interaction partners. Some aspects of halogen bonding are not yet understood well enough to take full advantage of its potential in drug discovery. We describe and present the concept of halogen-enriched fragment libraries. These libraries consist of unique chemical probes, facilitating the identification of favorable halogen bonds by sharing the advantages of classical fragment-based screening. Besides providing insights into the nature and applicability of halogen bonding, halogen-enriched fragment libraries provide smart starting points for hit-to-lead evolution.

  6. Gap Closing/Finishing by Targeted Genomic Region Enrichment and Sequencing

    SciTech Connect

    Singh, Kanwar; Froula, Jeff; Trice, Hope; Pennacchio, Len A.; Chen, Feng

    2010-05-27

    Gap Closing/Finishing of draft genome assemblies is a labor and cost intensive process where several rounds of repetitious amplification and sequencing are required. Here we demonstrate a high throughput procedure where custom primers flanking gaps in draft genomes are designed. Primer libraries containing up to 4,000 unique pairs in independent droplets are merged with a fragmented genomic template. From this millions of picoliter scale droplets are formed, each one being the functional equivalent of an individual PCR reaction. The PCR products are concatenated and sequenced by Illumina which is then assembled and used for gap closure. Here we present an overall experimental strategy, primer design algorithm and initial results.

  7. Comparative genomics of two newly isolated Dehalococcoides strains and an enrichment using a genus microarray

    PubMed Central

    Lee, Patrick K H; Cheng, Dan; Hu, Ping; West, Kimberlee A; Dick, Gregory J; Brodie, Eoin L; Andersen, Gary L; Zinder, Stephen H; He, Jianzhong; Alvarez-Cohen, Lisa

    2011-01-01

    Comparative genomics of Dehalococcoides strains and an enrichment were performed using a microarray targeting genes from all available sequenced genomes of the Dehalococcoides genus. The microarray was designed with 4305 probe sets to target 98.6% of the open-reading frames from strains 195, CBDB1, BAV1 and VS. The microarrays were validated and applied to query the genomes of two recently isolated Dehalococcoides strains, ANAS1 and ANAS2, and their enrichment source (ANAS) to understand the genome–physiology relationships. Strains ANAS1 and ANAS2 can both couple the reduction of trichloroethene, cis-dichloroethene (DCE) and 1,1-DCE, but not tetrachloroethene and trans-DCE with growth, whereas only strain ANAS2 couples vinyl chloride reduction to growth. Comparative genomic analysis showed that the genomes of both strains are similar to each other and to strain 195, except for genes that are within the previously defined integrated elements or high-plasticity regions. Combined results of the two isolates closely matched the results obtained using genomic DNA of the ANAS enrichment. The genome similarities, together with the distinct chlorinated ethene usage of strains ANAS1, ANAS2 and 195 demonstrate that closely phylogenetically related strains can be physiologically different. This incongruence between physiology and core genome phylogeny seems to be related to the presence of distinct reductive dehalogenase-encoding genes with assigned chlorinated ethene functions (pceA, tceA in strain 195; tceA in strain ANAS1; vcrA in strain ANAS2). Overall, the microarrays are a valuable high-throughput tool for comparative genomics of unsequenced Dehalococcoides-containing samples to provide insights into their gene content and dechlorination functions. PMID:21228894

  8. Construction of a Full-Length Enriched cDNA Library and Preliminary Analysis of Expressed Sequence Tags from Bengal Tiger Panthera tigris tigris

    PubMed Central

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-01-01

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105

  9. Construction of a full-length enriched cDNA library and preliminary analysis of expressed sequence tags from Bengal Tiger Panthera tigris tigris.

    PubMed

    Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2013-05-24

    In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.

  10. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    SciTech Connect

    Angelova, Angelina; Park, Sang-Hycuk; Kyndt, John; Fitzsimmons, Kevin; Brown, Judith K

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis. The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.

  11. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections

    PubMed Central

    Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe

    2016-01-01

    Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3. PMID:27703859

  12. Targeted genomic enrichment and sequencing of CyHV-3 from carp tissues confirms low nucleotide diversity and mixed genotype infections.

    PubMed

    Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe; Avarre, Jean-Christophe

    2016-01-01

    Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×10(7). The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.

  13. Application of a simplified method of chloroplast enrichment to small amounts of tissue for chloroplast genome sequencing.

    PubMed

    Sakaguchi, Shota; Ueno, Saneyoshi; Tsumura, Yoshihiko; Setoguchi, Hiroaki; Ito, Motomi; Hattori, Chie; Nozoe, Shogo; Takahashi, Daiki; Nakamasu, Riku; Sakagami, Taishi; Lannuzel, Guillaume; Fogliani, Bruno; Wulff, Adrien S; L'Huillier, Laurent; Isagi, Yuji

    2017-05-01

    High-throughput sequencing of genomic DNA can recover complete chloroplast genome sequences, but the sequence data are usually dominated by sequences from nuclear/mitochondrial genomes. To overcome this deficiency, a simple enrichment method for chloroplast DNA from small amounts of plant tissue was tested for eight plant species including a gymnosperm and various angiosperms. Chloroplasts were enriched using a high-salt isolation buffer without any step gradient procedures, and enriched chloroplast DNA was sequenced by multiplexed high-throughput sequencing. Using this simple method, significant enrichment of chloroplast DNA-derived reads was attained, allowing deep sequencing of chloroplast genomes. As an example, the chloroplast genome of the conifer Callitris sulcata was assembled, from which polymorphic microsatellite loci were isolated successfully. This chloroplast enrichment method from small amounts of plant tissue will be particularly useful for studies that use sequencers with relatively small throughput and that cannot use large amounts of tissue (e.g., for endangered species).

  14. Enrichment of genomic DNA for polymorphism detection in a non-model highly polyploid crop plant.

    PubMed

    Bundock, Peter C; Casu, Rosanne E; Henry, Robert J

    2012-08-01

    Large polyploid genomes of non-model species remain challenging targets for DNA polymorphism discovery despite the increasing throughput and continued reductions in cost of sequencing with new technologies. For these species especially, there remains a requirement to enrich genomic DNA to discover polymorphisms in regions of interest because of large genome size and to provide the sequence depth to enable estimation of copy number. Various methods of enriching DNA have been utilised, but some recent methods enable the efficient sampling of large regions (e.g. the exome). We have utilised one of these methods, solution-based hybridization (Agilent SureSelect), to capture regions of the genome of two sugarcane genotypes (one Saccharum officinarum and one Saccharum hybrid) based mainly on gene sequences from the close relative Sorghum bicolor. The capture probes span approximately 5.8 megabases (Mb). The enrichment over whole-genome shotgun sequencing was 10-11-fold for the two genotypes tested. This level of enrichment has important consequences for detecting single nucleotide polymorphisms (SNPs) from a single lane of Illumina (Genome Analyzer) sequence reads. The detection of polymorphisms was enabled by the depth of sequence at or near probe sites and enabled the detection of 270 000-280 000 SNPs within each genotype from a single lane of sequence using stringent detection parameters. The SNPs were present in 13 000-16 000 targeted genes, which would enable mapping of a large number of these chosen genes. SNP validation from 454 sequencing and between-genotype confirmations gave an 87%-91% validation rate.

  15. Robotic Enrichment Processing of Roche 454 Titanium Emlusion PCR at the DOE Joint Genome Institute

    SciTech Connect

    Hamilton, Matthew; Wilson, Steven; Bauer, Diane; Miller, Don; Duffy-Wei, Kecia; Hammon, Nancy; Lucas, Susan; Pollard, Martin; Cheng, Jan-Fang

    2010-05-28

    Enrichment of emulsion PCR product is the most laborious and pipette-intensive step in the 454 Titanium process, posing the biggest obstacle for production-oriented scale up. The Joint Genome Institute has developed a pair of custom-made robots based on the Microlab Star liquid handling deck manufactured by Hamilton to mediate the complexity and ergonomic demands of the 454 enrichment process. The robot includes a custom built centrifuge, magnetic deck positions, as well as heating and cooling elements. At present processing eight emulsion cup samples in a single 2.5 hour run, these robots are capable of processing up to 24 emulsion cup samples. Sample emulsions are broken using the standard 454 breaking process and transferred from a pair of 50ml conical tubes to a single 2ml tube and loaded on the robot. The robot performs the enrichment protocol and produces beads in 2ml tubes ready for counting. The robot follows the Roche 454 enrichment protocol with slight exceptions to the manner in which it resuspends beads via pipette mixing rather than vortexing and a set number of null bead removal washes. The robotic process is broken down in similar discrete steps: First Melt and Neutralization, Enrichment Primer Annealing, Enrichment Bead Incubation, Null Bead Removal, Second Melt and Neutralization and Sequencing Primer Annealing. Data indicating our improvements in enrichment efficiency and total number of bases per run will also be shown.

  16. Human genome libraries. Final progress report, February 1, 1994--August 31, 1997

    SciTech Connect

    Kao, Fa-Ten

    1998-01-01

    The goal of this program is to use a novel technology of chromosome microdissection and microcloning to construct chromosome region-specific libraries as resources for various human genome program studies. Region specific libraries have been constructed for the entire human chromosomes 2 and 18.

  17. Genomic Library Screens for Genes Involved in n-Butanol Tolerance in Escherichia coli

    PubMed Central

    Reyes, Luis H.; Almario, Maria P.; Kao, Katy C.

    2011-01-01

    Background n-Butanol is a promising emerging biofuel, and recent metabolic engineering efforts have demonstrated the use of several microbial hosts for its production. However, most organisms have very low tolerance to n-butanol (up to 2% (v/v)), limiting the economic viability of this biofuel. The rational engineering of more robust n-butanol production hosts relies upon understanding the mechanisms involved in tolerance. However, the existing knowledge of genes involved in n-butanol tolerance is limited. The goal of this study is therefore to identify E. coli genes that are involved in n-butanol tolerance. Methodology/Principal Findings Using a genomic library enrichment strategy, we identified approximately 270 genes that were enriched or depleted in n-butanol challenge. The effects of these candidate genes on n-butanol tolerance were experimentally determined using overexpression or deletion libraries. Among the 55 enriched genes tested, 11 were experimentally shown to confer enhanced tolerance to n-butanol when overexpressed compared to the wild-type. Among the 84 depleted genes tested, three conferred increased n-butanol resistance when deleted. The overexpressed genes that conferred the largest increase in n-butanol tolerance were related to iron transport and metabolism, entC and feoA, which increased the n-butanol tolerance by 32.8±4.0% and 49.1±3.3%, respectively. The deleted gene that resulted in the largest increase in resistance to n-butanol was astE, which enhanced n-butanol tolerance by 48.7±6.3%. Conclusions/Significance We identified and experimentally verified 14 genes that decreased the inhibitory effect of n-butanol tolerance on E. coli. From the data, we were able to expand the current knowledge on the genes involved in n-butanol tolerance; the results suggest that an increased iron transport and metabolism and decreased acid resistance may enhance n-butanol tolerance. The genes and mechanisms identified in this study will be helpful in the

  18. Generating Exome Enriched Sequencing Libraries from Formalin-Fixed, Paraffin-Embedded Tissue DNA for Next-Generation Sequencing.

    PubMed

    Marosy, Beth A; Craig, Brian D; Hetrick, Kurt N; Witmer, P Dane; Ling, Hua; Griffith, Sean M; Myers, Benjamin; Ostrander, Elaine A; Stanford, Janet L; Brody, Lawrence C; Doheny, Kimberly F

    2017-01-11

    This unit describes a technique for generating exome-enriched sequencing libraries using DNA extracted from formalin-fixed paraffin-embedded (FFPE) samples. Utilizing commercially available kits, we present a low-input FFPE workflow starting with 50 ng of DNA. This procedure includes a repair step to address damage caused by FFPE preservation that improves sequence quality. Subsequently, libraries undergo an in-solution-targeted selection for exons, followed by sequencing using the Illumina next-generation short-read sequencing platform. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  19. Genomes of two new ammonia-oxidizing archaea enriched from deep marine sediments.

    PubMed

    Park, Soo-Je; Ghai, Rohit; Martín-Cuadrado, Ana-Belén; Rodríguez-Valera, Francisco; Chung, Won-Hyong; Kwon, KaeKyoung; Lee, Jung-Hyun; Madsen, Eugene L; Rhee, Sung-Keun

    2014-01-01

    Ammonia-oxidizing archaea (AOA) are ubiquitous and abundant and contribute significantly to the carbon and nitrogen cycles in the ocean. In this study, we assembled AOA draft genomes from two deep marine sediments from Donghae, South Korea, and Svalbard, Arctic region, by sequencing the enriched metagenomes. Three major microorganism clusters belonging to Thaumarchaeota, Epsilonproteobacteria, and Gammaproteobacteria were deduced from their 16S rRNA genes, GC contents, and oligonucleotide frequencies. Three archaeal genomes were identified, two of which were distinct and were designated Ca. "Nitrosopumilus koreensis" AR1 and "Nitrosopumilus sediminis" AR2. AR1 and AR2 exhibited average nucleotide identities of 85.2% and 79.5% to N. maritimus, respectively. The AR1 and AR2 genomes contained genes pertaining to energy metabolism and carbon fixation as conserved in other AOA, but, conversely, had fewer heme-containing proteins and more copper-containing proteins than other AOA. Most of the distinctive AR1 and AR2 genes were located in genomic islands (GIs) that were not present in other AOA genomes or in a reference water-column metagenome from the Sargasso Sea. A putative gene cluster involved in urea utilization was found in the AR2 genome, but not the AR1 genome, suggesting niche specialization in marine AOA. Co-cultured bacterial genome analysis suggested that bacterial sulfur and nitrogen metabolism could be involved in interactions with AOA. Our results provide fundamental information concerning the metabolic potential of deep marine sedimentary AOA.

  20. Designing Focused Chemical Libraries Enriched in Protein-Protein Interaction Inhibitors using Machine-Learning Methods

    PubMed Central

    Reynès, Christelle; Host, Hélène; Camproux, Anne-Claude; Laconde, Guillaume; Leroux, Florence; Mazars, Anne; Deprez, Benoit; Fahraeus, Robin; Villoutreix, Bruno O.; Sperandio, Olivier

    2010-01-01

    Protein-protein interactions (PPIs) may represent one of the next major classes of therapeutic targets. So far, only a minute fraction of the estimated 650,000 PPIs that comprise the human interactome are known with a tiny number of complexes being drugged. Such intricate biological systems cannot be cost-efficiently tackled using conventional high-throughput screening methods. Rather, time has come for designing new strategies that will maximize the chance for hit identification through a rationalization of the PPI inhibitor chemical space and the design of PPI-focused compound libraries (global or target-specific). Here, we train machine-learning-based models, mainly decision trees, using a dataset of known PPI inhibitors and of regular drugs in order to determine a global physico-chemical profile for putative PPI inhibitors. This statistical analysis unravels two important molecular descriptors for PPI inhibitors characterizing specific molecular shapes and the presence of a privileged number of aromatic bonds. The best model has been transposed into a computer program, PPI-HitProfiler, that can output from any drug-like compound collection a focused chemical library enriched in putative PPI inhibitors. Our PPI inhibitor profiler is challenged on the experimental screening results of 11 different PPIs among which the p53/MDM2 interaction screened within our own CDithem platform, that in addition to the validation of our concept led to the identification of 4 novel p53/MDM2 inhibitors. Collectively, our tool shows a robust behavior on the 11 experimental datasets by correctly profiling 70% of the experimentally identified hits while removing 52% of the inactive compounds from the initial compound collections. We strongly believe that this new tool can be used as a global PPI inhibitor profiler prior to screening assays to reduce the size of the compound collections to be experimentally screened while keeping most of the true PPI inhibitors. PPI-HitProfiler is

  1. Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera.

    PubMed

    Faircloth, Brant C; Branstetter, Michael G; White, Noor D; Brady, Seán G

    2015-05-01

    Gaining a genomic perspective on phylogeny requires the collection of data from many putatively independent loci across the genome. Among insects, an increasingly common approach to collecting this class of data involves transcriptome sequencing, because few insects have high-quality genome sequences available; assembling new genomes remains a limiting factor; the transcribed portion of the genome is a reasonable, reduced subset of the genome to target; and the data collected from transcribed portions of the genome are similar in composition to the types of data with which biologists have traditionally worked (e.g. exons). However, molecular techniques requiring RNA as a template, including transcriptome sequencing, are limited to using very high-quality source materials, which are often unavailable from a large proportion of biologically important insect samples. Recent research suggests that DNA-based target enrichment of conserved genomic elements offers another path to collecting phylogenomic data across insect taxa, provided that conserved elements are present in and can be collected from insect genomes. Here, we identify a large set (n = 1510) of ultraconserved elements (UCEs) shared among the insect order Hymenoptera. We used in silico analyses to show that these loci accurately reconstruct relationships among genome-enabled hymenoptera, and we designed a set of RNA baits (n = 2749) for enriching these loci that researchers can use with DNA templates extracted from a variety of sources. We used our UCE bait set to enrich an average of 721 UCE loci from 30 hymenopteran taxa, and we used these UCE loci to reconstruct phylogenetic relationships spanning very old (≥220 Ma) to very young (≤1 Ma) divergences among hymenopteran lineages. In contrast to a recent study addressing hymenopteran phylogeny using transcriptome data, we found ants to be sister to all remaining aculeate lineages with complete support, although this result could be explained by

  2. Rapid enrichment of leucocytes and genomic DNA from blood based on bifunctional core shell magnetic nanoparticles

    NASA Astrophysics Data System (ADS)

    Xie, Xin; Nie, Xiaorong; Yu, Bingbin; Zhang, Xu

    2007-04-01

    A series of protocols are proposed to extract genomic DNA from whole blood at different scales using carboxyl-functionalized magnetic nanoparticles as solid-phase absorbents. The enrichment of leucocytes and the adsorption of genomic DNA can be achieved with the same carboxyl-functionalized magnetic nanoparticles. The DNA bound to the bead surfaces can be used directly as PCR templates. By coupling cell separation and DNA purification, the whole operation can be accomplished in a few minutes. Our simplified protocols proved to be rapid, low cost, and biologically and chemically non-hazardous, and are therefore promising for microfabrication of a DNA-preparation chip and routine laboratory use.

  3. Screening of a minimal enriched P450 BM3 mutant library for hydroxylation of cyclic and acyclic alkanes.

    PubMed

    Weber, Evelyne; Seifert, Alexander; Antonovici, Mihaela; Geinitz, Christopher; Pleiss, Jürgen; Urlacher, Vlada B

    2011-01-21

    A minimal enriched P450 BM3 library was screened for the ability to oxidize inert cyclic and acyclic alkanes. The F87A/A328V mutant was found to effectively hydroxylate cyclooctane, cyclodecane and cyclododecane. F87V/A328F with high activity towards cyclooctane hydroxylated acyclic n-octane to 2-(R)-octanol (46% ee) with high regioselectivity (92%).

  4. Advancing Eucalyptus genomics: identification and sequencing of lignin biosynthesis genes from deep-coverage BAC libraries

    PubMed Central

    2011-01-01

    Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC) libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1) digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb) to 157 Kb (Eg_Ba), very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×), contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae, including genome

  5. High capsid-genome correlation facilitates creation of AAV libraries for directed evolution.

    PubMed

    Nonnenmacher, Mathieu; van Bakel, Harm; Hajjar, Roger J; Weber, Thomas

    2015-04-01

    Directed evolution of adeno-associated virus (AAV) through successive rounds of phenotypic selection is a powerful method to isolate variants with improved properties from large libraries of capsid mutants. Importantly, AAV libraries used for directed evolution are based on the "natural" AAV genome organization where the capsid proteins are encoded in cis from replicating genomes. This is necessary to allow the recovery of the capsid DNA after each step of phenotypic selection. For directed evolution to be used successfully, it is essential to minimize the random mixing of capsomers and the encapsidation of nonmatching viral genomes during the production of the viral libraries. Here, we demonstrate that multiple AAV capsid variants expressed from Rep/Cap containing viral genomes result in near-homogeneous capsids that display an unexpectedly high capsid-DNA correlation. Next-generation sequencing of AAV progeny generated by bulk transfection of a semi-random peptide library showed a strong counter-selection of capsid variants encoding premature stop codons, which further supports a strong capsid-genome identity correlation. Overall, our observations demonstrate that production of "natural" AAVs results in low capsid mosaicism and high capsid-genome correlation. These unique properties allow the production of highly diverse AAV libraries in a one-step procedure with a minimal loss in phenotype-genotype correlation.

  6. High Capsid–Genome Correlation Facilitates Creation of AAV Libraries for Directed Evolution

    PubMed Central

    Nonnenmacher, Mathieu; van Bakel, Harm; Hajjar, Roger J; Weber, Thomas

    2015-01-01

    Directed evolution of adeno-associated virus (AAV) through successive rounds of phenotypic selection is a powerful method to isolate variants with improved properties from large libraries of capsid mutants. Importantly, AAV libraries used for directed evolution are based on the “natural” AAV genome organization where the capsid proteins are encoded in cis from replicating genomes. This is necessary to allow the recovery of the capsid DNA after each step of phenotypic selection. For directed evolution to be used successfully, it is essential to minimize the random mixing of capsomers and the encapsidation of nonmatching viral genomes during the production of the viral libraries. Here, we demonstrate that multiple AAV capsid variants expressed from Rep/Cap containing viral genomes result in near-homogeneous capsids that display an unexpectedly high capsid–DNA correlation. Next-generation sequencing of AAV progeny generated by bulk transfection of a semi-random peptide library showed a strong counter-selection of capsid variants encoding premature stop codons, which further supports a strong capsid–genome identity correlation. Overall, our observations demonstrate that production of “natural” AAVs results in low capsid mosaicism and high capsid–genome correlation. These unique properties allow the production of highly diverse AAV libraries in a one-step procedure with a minimal loss in phenotype–genotype correlation. PMID:25586687

  7. BiNChE: a web tool and library for chemical enrichment analysis based on the ChEBI ontology.

    PubMed

    Moreno, Pablo; Beisken, Stephan; Harsha, Bhavana; Muthukrishnan, Venkatesh; Tudose, Ilinca; Dekker, Adriano; Dornfeldt, Stefanie; Taruttis, Franziska; Grosse, Ivo; Hastings, Janna; Neumann, Steffen; Steinbeck, Christoph

    2015-02-21

    Ontology-based enrichment analysis aids in the interpretation and understanding of large-scale biological data. Ontologies are hierarchies of biologically relevant groupings. Using ontology annotations, which link ontology classes to biological entities, enrichment analysis methods assess whether there is a significant over or under representation of entities for ontology classes. While many tools exist that run enrichment analysis for protein sets annotated with the Gene Ontology, there are only a few that can be used for small molecules enrichment analysis. We describe BiNChE, an enrichment analysis tool for small molecules based on the ChEBI Ontology. BiNChE displays an interactive graph that can be exported as a high-resolution image or in network formats. The tool provides plain, weighted and fragment analysis based on either the ChEBI Role Ontology or the ChEBI Structural Ontology. BiNChE aids in the exploration of large sets of small molecules produced within Metabolomics or other Systems Biology research contexts. The open-source tool provides easy and highly interactive web access to enrichment analysis with the ChEBI ontology tool and is additionally available as a standalone library.

  8. Genomic features of uncultured methylotrophs in activated-sludge microbiomes grown under different enrichment procedures

    PubMed Central

    Fujinawa, Kazuki; Asai, Yusuke; Miyahara, Morio; Kouzuma, Atsushi; Abe, Takashi; Watanabe, Kazuya

    2016-01-01

    Methylotrophs are organisms that are able to grow on C1 compounds as carbon and energy sources. They play important roles in the global carbon cycle and contribute largely to industrial wastewater treatment. To identify and characterize methylotrophs that are involved in methanol degradation in wastewater-treatment plants, methanol-fed activated-sludge (MAS) microbiomes were subjected to phylogenetic and metagenomic analyses, and genomic features of dominant methylotrophs in MAS were compared with those preferentially grown in laboratory enrichment cultures (LECs). These analyses consistently indicate that Hyphomicrobium plays important roles in MAS, while Methylophilus occurred predominantly in LECs. Comparative analyses of bin genomes reconstructed for the Hyphomicrobium and Methylophilus methylotrophs suggest that they have different C1-assimilation pathways. In addition, function-module analyses suggest that their cell-surface structures are different. Comparison of the MAS bin genome with genomes of closely related Hyphomicrobium isolates suggests that genes unnecessary in MAS (for instance, genes for anaerobic respiration) have been lost from the genome of the dominant methylotroph. We suggest that genomic features and coded functions in the MAS bin genome provide us with insights into how this methylotroph adapts to activated-sludge ecosystems. PMID:27221669

  9. Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics

    PubMed Central

    Delmont, Tom O.; Eren, A. Murat; Maccario, Lorrie; Prestat, Emmanuel; Esen, Özcan C.; Pelletier, Eric; Le Paslier, Denis; Simonet, Pascal; Vogel, Timothy M.

    2015-01-01

    Despite extensive direct sequencing efforts and advanced analytical tools, reconstructing microbial genomes from soil using metagenomics have been challenging due to the tremendous diversity and relatively uniform distribution of genomes found in this system. Here we used enrichment techniques in an attempt to decrease the complexity of a soil microbiome prior to sequencing by submitting it to a range of physical and chemical stresses in 23 separate microcosms for 4 months. The metagenomic analysis of these microcosms at the end of the treatment yielded 540 Mb of assembly using standard de novo assembly techniques (a total of 559,555 genes and 29,176 functions), from which we could recover novel bacterial genomes, plasmids and phages. The recovered genomes belonged to Leifsonia (n = 2), Rhodanobacter (n = 5), Acidobacteria (n = 2), Sporolactobacillus (n = 2, novel nitrogen fixing taxon), Ktedonobacter (n = 1, second representative of the family Ktedonobacteraceae), Streptomyces (n = 3, novel polyketide synthase modules), and Burkholderia (n = 2, includes mega-plasmids conferring mercury resistance). Assembled genomes averaged to 5.9 Mb, with relative abundances ranging from rare (<0.0001%) to relatively abundant (>0.01%) in the original soil microbiome. Furthermore, we detected them in samples collected from geographically distant locations, particularly more in temperate soils compared to samples originating from high-latitude soils and deserts. To the best of our knowledge, this study is the first successful attempt to assemble multiple bacterial genomes directly from a soil sample. Our findings demonstrate that developing pertinent enrichment conditions can stimulate environmental genomic discoveries that would have been impossible to achieve with canonical approaches that focus solely upon post-sequencing data treatment. PMID:25983722

  10. Construction of the BAC Library of Small Abalone (Haliotis diversicolor) for Gene Screening and Genome Characterization.

    PubMed

    Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng

    2016-02-01

    The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.

  11. Genome-wide BAC-end sequencing of Cucumis melo using two BAC libraries

    PubMed Central

    2010-01-01

    Background Although melon (Cucumis melo L.) is an economically important fruit crop, no genome-wide sequence information is openly available at the current time. We therefore sequenced BAC-ends representing a total of 33,024 clones, half of them from a previously described melon BAC library generated with restriction endonucleases and the remainder from a new random-shear BAC library. Results We generated a total of 47,140 high-quality BAC-end sequences (BES), 91.7% of which were paired-BES. Both libraries were assembled independently and then cross-assembled to obtain a final set of 33,372 non-redundant, high-quality sequences. These were grouped into 6,411 contigs (4.5 Mb) and 26,961 non-assembled BES (14.4 Mb), representing ~4.2% of the melon genome. The sequences were used to screen genomic databases, identifying 7,198 simple sequence repeats (corresponding to one microsatellite every 2.6 kb) and 2,484 additional repeats of which 95.9% represented transposable elements. The sequences were also used to screen expressed sequence tag (EST) databases, revealing 11,372 BES that were homologous to ESTs. This suggests that ~30% of the melon genome consists of coding DNA. We observed regions of microsynteny between melon paired-BES and six other dicotyledonous plant genomes. Conclusion The analysis of nearly 50,000 BES from two complementary genomic libraries covered ~4.2% of the melon genome, providing insight into properties such as microsatellite and transposable element distribution, and the percentage of coding DNA. The observed synteny between melon paired-BES and six other plant genomes showed that useful comparative genomic data can be derived through large scale BAC-end sequencing by anchoring a small proportion of the melon genome to other sequenced genomes. PMID:21054843

  12. Generation and analysis of a large-scale expressed sequence tags from a full-length enriched cDNA library of Siberian tiger (Panthera tigris altaica).

    PubMed

    Guo, Yu; Liu, Changqing; Lu, Taofeng; Liu, Dan; Bai, Chunyu; Li, Xiangchen; Ma, Yuehui; Guan, Weijun

    2014-05-15

    In this study, a full-length enriched cDNA library was successfully constructed from Siberian tiger, the world's most endangered species. The titers of primary and amplified libraries were 1.28×10(6)pfu/mL and 1.59×10(10)pfu/mL respectively. The proportion of recombinants from unamplified library was 91.3% and the average length of exogenous inserts was 1.06kb. A total of 279 individual ESTs with sizes ranging from 316 to 1258bps were then analyzed. Furthermore, 204 unigenes were successfully annotated and involved in 49 functions of the GO classification, cell (175, 85.5%), cellular process (165, 80.9%), and binding (152, 74.5%) are the dominant terms. 198 unigenes were assigned to 156 KEGG pathways, and the pathways with the most representation are metabolic pathways (18, 9.1%). The proportion pattern of each COG subcategory was similar among Panthera tigris altaica, P. tigris tigris and Homo sapiens, and general function prediction only cluster (44, 15.8%) represents the largest group, followed by translation, ribosomal structure and biogenesis (33, 11.8%), replication, recombination and repair (24, 8.6%), and only 7.2% ESTs classified as novel genes. Moreover, the recombinant plasmid pET32a-TAT-COL6A2 was constructed, coded for the Trx-TAT-COL6A2 fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-COL6A2 recombinant protein was 2.64±0.18mg/mL. This library will provide a useful platform for the functional genome and transcriptome research of for the P. tigris and other felid animals in the future. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    PubMed Central

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  14. Construction and evaluation of normalized cDNA libraries enriched with full-length sequences for rapid discovery of new genes from Sisal (Agave sisalana Perr.) different developmental stages.

    PubMed

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-10-12

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.

  15. Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats

    DOE PAGES

    Botcheva, Krassimira; McCorkle, Sean R.

    2014-11-21

    The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We reportmore » distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.« less

  16. Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats

    SciTech Connect

    Botcheva, Krassimira; McCorkle, Sean R.

    2014-11-21

    The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). In conclusion, our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.

  17. Cell context dependent p53 genome-wide binding patterns and enrichment at repeats.

    PubMed

    Botcheva, Krassimira; McCorkle, Sean R

    2014-01-01

    The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). Our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways.

  18. Cell Context Dependent p53 Genome-Wide Binding Patterns and Enrichment at Repeats

    PubMed Central

    Botcheva, Krassimira; McCorkle, Sean R.

    2014-01-01

    The p53 ability to elicit stress specific and cell type specific responses is well recognized, but how that specificity is established remains to be defined. Whether upon activation p53 binds to its genomic targets in a cell type and stress type dependent manner is still an open question. Here we show that the p53 binding to the human genome is selective and cell context-dependent. We mapped the genomic binding sites for the endogenous wild type p53 protein in the human cancer cell line HCT116 and compared them to those we previously determined in the normal cell line IMR90. We report distinct p53 genome-wide binding landscapes in two different cell lines, analyzed under the same treatment and experimental conditions, using the same ChIP-seq approach. This is evidence for cell context dependent p53 genomic binding. The observed differences affect the p53 binding sites distribution with respect to major genomic and epigenomic elements (promoter regions, CpG islands and repeats). We correlated the high-confidence p53 ChIP-seq peaks positions with the annotated human repeats (UCSC Human Genome Browser) and observed both common and cell line specific trends. In HCT116, the p53 binding was specifically enriched at LINE repeats, compared to IMR90 cells. The p53 genome-wide binding patterns in HCT116 and IMR90 likely reflect the different epigenetic landscapes in these two cell lines, resulting from cancer-associated changes (accumulated in HCT116) superimposed on tissue specific differences (HCT116 has epithelial, while IMR90 has mesenchymal origin). Our data support the model for p53 binding to the human genome in a highly selective manner, mobilizing distinct sets of genes, contributing to distinct pathways. PMID:25415302

  19. Chromosome region-specific libraries for human genome analysis

    SciTech Connect

    Kao, Fa-Ten.

    1992-08-01

    During the grant period progress has been made in the successful demonstration of regional mapping of microclones derived from microdissection libraries; successful demonstration of the feasibility of converting microclones with short inserts into yeast artificial chromosome clones with very large inserts for high resolution physical mapping of the dissected region; Successful demonstration of the usefulness of region-specific microclones to isolate region-specific cDNA clones as candidate genes to facilitate search for the crucial genes underlying genetic diseases assigned to the dissected region; and the successful construction of four region-specific microdissection libraries for human chromosome 2, including 2q35-q37, 2q33-q35, 2p23-p25 and 2p2l-p23. The 2q35-q37 library has been characterized in detail. The characterization of the other three libraries is in progress. These region-specific microdissection libraries and the unique sequence microclones derived from the libraries will be valuable resources for investigators engaged in high resolution physical mapping and isolation of disease-related genes residing in these chromosomal regions.

  20. Local assemblies of paired-end reduced representation libraries sequenced with the illumina genome analyzer in maize.

    PubMed

    Deschamps, Stéphane; Nannapaneni, Kishore; Zhang, Yun; Hayes, Kevin

    2012-01-01

    The use of next-generation DNA sequencing technologies has greatly facilitated reference-guided variant detection in complex plant genomes. However, complications may arise when regions adjacent to a read of interest are used for marker assay development, or when reference sequences are incomplete, as short reads alone may not be long enough to ascertain their uniqueness. Here, the possibility of generating longer sequences in discrete regions of the large and complex genome of maize is demonstrated, using a modified version of a paired-end RAD library construction strategy. Reads are generated from DNA fragments first digested with a methylation-sensitive restriction endonuclease, sheared, enriched with biotin and a selective PCR amplification step, and then sequenced at both ends. Sequences are locally assembled into contigs by subgrouping pairs based on the identity of the read anchored by the restriction site. This strategy applied to two maize inbred lines (B14 and B73) generated 183,609 and 129,018 contigs, respectively, out of which at least 76% were >200 bps in length. A subset of putative single nucleotide polymorphisms from contigs aligning to the B73 reference genome with at least one mismatch was resequenced, and 90% of those in B14 were confirmed, indicating that this method is a potent approach for variant detection and marker development in species with complex genomes or lacking extensive reference sequences.

  1. SPlinted Ligation Adapter Tagging (SPLAT), a novel library preparation method for whole genome bisulphite sequencing

    PubMed Central

    Manlig, Erika; Wahlberg, Per

    2017-01-01

    Abstract Sodium bisulphite treatment of DNA combined with next generation sequencing (NGS) is a powerful combination for the interrogation of genome-wide DNA methylation profiles. Library preparation for whole genome bisulphite sequencing (WGBS) is challenging due to side effects of the bisulphite treatment, which leads to extensive DNA damage. Recently, a new generation of methods for bisulphite sequencing library preparation have been devised. They are based on initial bisulphite treatment of the DNA, followed by adaptor tagging of single stranded DNA fragments, and enable WGBS using low quantities of input DNA. In this study, we present a novel approach for quick and cost effective WGBS library preparation that is based on splinted adaptor tagging (SPLAT) of bisulphite-converted single-stranded DNA. Moreover, we validate SPLAT against three commercially available WGBS library preparation techniques, two of which are based on bisulphite treatment prior to adaptor tagging and one is a conventional WGBS method. PMID:27899585

  2. Genomes of Two New Ammonia-Oxidizing Archaea Enriched from Deep Marine Sediments

    PubMed Central

    Park, Soo-Je; Ghai, Rohit; Martín-Cuadrado, Ana-Belén; Rodríguez-Valera, Francisco; Chung, Won-Hyong; Kwon, KaeKyoung; Lee, Jung-Hyun; Madsen, Eugene L.; Rhee, Sung-Keun

    2014-01-01

    Ammonia-oxidizing archaea (AOA) are ubiquitous and abundant and contribute significantly to the carbon and nitrogen cycles in the ocean. In this study, we assembled AOA draft genomes from two deep marine sediments from Donghae, South Korea, and Svalbard, Arctic region, by sequencing the enriched metagenomes. Three major microorganism clusters belonging to Thaumarchaeota, Epsilonproteobacteria, and Gammaproteobacteria were deduced from their 16S rRNA genes, GC contents, and oligonucleotide frequencies. Three archaeal genomes were identified, two of which were distinct and were designated Ca. “Nitrosopumilus koreensis” AR1 and “Nitrosopumilus sediminis” AR2. AR1 and AR2 exhibited average nucleotide identities of 85.2% and 79.5% to N. maritimus, respectively. The AR1 and AR2 genomes contained genes pertaining to energy metabolism and carbon fixation as conserved in other AOA, but, conversely, had fewer heme-containing proteins and more copper-containing proteins than other AOA. Most of the distinctive AR1 and AR2 genes were located in genomic islands (GIs) that were not present in other AOA genomes or in a reference water-column metagenome from the Sargasso Sea. A putative gene cluster involved in urea utilization was found in the AR2 genome, but not the AR1 genome, suggesting niche specialization in marine AOA. Co-cultured bacterial genome analysis suggested that bacterial sulfur and nitrogen metabolism could be involved in interactions with AOA. Our results provide fundamental information concerning the metabolic potential of deep marine sedimentary AOA. PMID:24798206

  3. Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform

    PubMed Central

    2010-01-01

    Background Complete chloroplast genome sequences provide a valuable source of molecular markers for studies in molecular ecology and evolution of plants. To obtain complete genome sequences, recent studies have made use of the polymerase chain reaction to amplify overlapping fragments from conserved gene loci. However, this approach is time consuming and can be more difficult to implement where gene organisation differs among plants. An alternative approach is to first isolate chloroplasts and then use the capacity of high-throughput sequencing to obtain complete genome sequences. We report our findings from studies of the latter approach, which used a simple chloroplast isolation procedure, multiply-primed rolling circle amplification of chloroplast DNA, Illumina Genome Analyzer II sequencing, and de novo assembly of paired-end sequence reads. Results A modified rapid chloroplast isolation protocol was used to obtain plant DNA that was enriched for chloroplast DNA, but nevertheless contained nuclear and mitochondrial DNA. Multiply-primed rolling circle amplification of this mixed template produced sufficient quantities of chloroplast DNA, even when the amount of starting material was small, and improved the template quality for Illumina Genome Analyzer II (hereafter Illumina GAII) sequencing. We demonstrate, using independent samples of karaka (Corynocarpus laevigatus), that there is high fidelity in the sequence obtained from this template. Although less than 20% of our sequenced reads could be mapped to chloroplast genome, it was relatively easy to assemble complete chloroplast genome sequences from the mixture of nuclear, mitochondrial and chloroplast reads. Conclusions We report successful whole genome sequencing of chloroplast DNA from karaka, obtained efficiently and with high fidelity. PMID:20920211

  4. Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis

    PubMed Central

    Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Michael S.; Yang, Zamin K.; Klingeman, Dawn M.; Land, Miriam L.; Allman, Steve L.; Lu, Tse-Yuan S.; Brown, Steven D.; Schadt, Christopher W.; Podar, Mircea; Doktycz, Mitchel J.

    2016-01-01

    ABSTRACT Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. Here, we describe a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from the plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria. IMPORTANCE Plant roots harbor a diverse collection of microbes that live within host tissues. To gain a comprehensive understanding of microbial adaptations to this endophytic lifestyle from strains that cannot be cultivated, it is necessary to separate bacterial cells from the predominance of plant tissue. This study provides a valuable approach for the separation and isolation of endophytic bacteria from plant root tissue. Isolated live bacteria provide material for microbiome sequencing, single-cell genomics, and analyses

  5. Construction and Analysis of Siberian Tiger Bacterial Artificial Chromosome Library with Approximately 6.5-Fold Genome Equivalent Coverage

    PubMed Central

    Liu, Changqing; Bai, Chunyu; Guo, Yu; Liu, Dan; Lu, Taofeng; Li, Xiangchen; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2014-01-01

    Bacterial artificial chromosome (BAC) libraries are extremely valuable for the genome-wide genetic dissection of complex organisms. The Siberian tiger, one of the most well-known wild primitive carnivores in China, is an endangered animal. In order to promote research on its genome, a high-redundancy BAC library of the Siberian tiger was constructed and characterized. The library is divided into two sub-libraries prepared from blood cells and two sub-libraries prepared from fibroblasts. This BAC library contains 153,600 individually archived clones; for PCR-based screening of the library, BACs were placed into 40 superpools of 10 × 384-deep well microplates. The average insert size of BAC clones was estimated to be 116.5 kb, representing approximately 6.46 genome equivalents of the haploid genome and affording a 98.86% statistical probability of obtaining at least one clone containing a unique DNA sequence. Screening the library with 19 microsatellite markers and a SRY sequence revealed that each of these markers were present in the library; the average number of positive clones per marker was 6.74 (range 2 to 12), consistent with 6.46 coverage of the tiger genome. Additionally, we identified 72 microsatellite markers that could potentially be used as genetic markers. This BAC library will serve as a valuable resource for physical mapping, comparative genomic study and large-scale genome sequencing in the tiger. PMID:24608928

  6. Construction and analysis of Siberian tiger bacterial artificial chromosome library with approximately 6.5-fold genome equivalent coverage.

    PubMed

    Liu, Changqing; Bai, Chunyu; Guo, Yu; Liu, Dan; Lu, Taofeng; Li, Xiangchen; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun

    2014-03-07

    Bacterial artificial chromosome (BAC) libraries are extremely valuable for the genome-wide genetic dissection of complex organisms. The Siberian tiger, one of the most well-known wild primitive carnivores in China, is an endangered animal. In order to promote research on its genome, a high-redundancy BAC library of the Siberian tiger was constructed and characterized. The library is divided into two sub-libraries prepared from blood cells and two sub-libraries prepared from fibroblasts. This BAC library contains 153,600 individually archived clones; for PCR-based screening of the library, BACs were placed into 40 superpools of 10 × 384-deep well microplates. The average insert size of BAC clones was estimated to be 116.5 kb, representing approximately 6.46 genome equivalents of the haploid genome and affording a 98.86% statistical probability of obtaining at least one clone containing a unique DNA sequence. Screening the library with 19 microsatellite markers and a SRY sequence revealed that each of these markers were present in the library; the average number of positive clones per marker was 6.74 (range 2 to 12), consistent with 6.46 coverage of the tiger genome. Additionally, we identified 72 microsatellite markers that could potentially be used as genetic markers. This BAC library will serve as a valuable resource for physical mapping, comparative genomic study and large-scale genome sequencing in the tiger.

  7. Optimizing Illumina next-generation sequencing library preparation for extremely AT-biased genomes.

    PubMed

    Oyola, Samuel O; Otto, Thomas D; Gu, Yong; Maslen, Gareth; Manske, Magnus; Campino, Susana; Turner, Daniel J; Macinnis, Bronwyn; Kwiatkowski, Dominic P; Swerdlow, Harold P; Quail, Michael A

    2012-01-03

    Massively parallel sequencing technology is revolutionizing approaches to genomic and genetic research. Since its advent, the scale and efficiency of Next-Generation Sequencing (NGS) has rapidly improved. In spite of this success, sequencing genomes or genomic regions with extremely biased base composition is still a great challenge to the currently available NGS platforms. The genomes of some important pathogenic organisms like Plasmodium falciparum (high AT content) and Mycobacterium tuberculosis (high GC content) display extremes of base composition. The standard library preparation procedures that employ PCR amplification have been shown to cause uneven read coverage particularly across AT and GC rich regions, leading to problems in genome assembly and variation analyses. Alternative library-preparation approaches that omit PCR amplification require large quantities of starting material and hence are not suitable for small amounts of DNA/RNA such as those from clinical isolates. We have developed and optimized library-preparation procedures suitable for low quantity starting material and tolerant to extremely high AT content sequences. We have used our optimized conditions in parallel with standard methods to prepare Illumina sequencing libraries from a non-clinical and a clinical isolate (containing ~53% host contamination). By analyzing and comparing the quality of sequence data generated, we show that our optimized conditions that involve a PCR additive (TMAC), produces amplified libraries with improved coverage of extremely AT-rich regions and reduced bias toward GC neutral templates. We have developed a robust and optimized Next-Generation Sequencing library amplification method suitable for extremely AT-rich genomes. The new amplification conditions significantly reduce bias and retain the complexity of either extremes of base composition. This development will greatly benefit sequencing clinical samples that often require amplification due to low mass of

  8. Focused chemical libraries--design and enrichment: an example of protein-protein interaction chemical space.

    PubMed

    Zhang, Xu; Betzi, Stéphane; Morelli, Xavier; Roche, Philippe

    2014-07-01

    One of the many obstacles in the development of new drugs lies in the limited number of therapeutic targets and in the quality of screening collections of compounds. In this review, we present general strategies for building target-focused chemical libraries with a particular emphasis on protein-protein interactions (PPIs). We describe the chemical spaces spanned by nine commercially available PPI-focused libraries and compare them to our 2P2I3D academic library, dedicated to orthosteric PPI modulators. We show that although PPI-focused libraries have been designed using different strategies, they share common subspaces. PPI inhibitors are larger and more hydrophobic than standard drugs; however, an effort has been made to improve the drug-likeness of focused chemical libraries dedicated to this challenging class of targets.

  9. Enriching Genomic Resources and Marker Development from Transcript Sequences of Jatropha curcas for Microgravity Studies

    PubMed Central

    Tian, Wenlan; Paudel, Dev

    2017-01-01

    Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822

  10. Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis

    DOE PAGES

    Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Jr., Michael S.; ...

    2016-07-15

    Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. In this paper, we present a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from themore » plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Finally, comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria.« less

  11. Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis

    SciTech Connect

    Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Jr., Michael S.; Yang, Zamin Koo; Klingeman, Dawn Marie; Land, Miriam L.; Allman, Steve L.; Lu, Tse-Yuan S.; Brown, Steven D.; Schadt, Christopher Warren; Podar, Mircea; Doktycz, Mitchel J.; Pelletier, Dale A.

    2016-07-15

    Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. In this paper, we present a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from the plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Finally, comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria.

  12. Scalable whole-genome single-cell library preparation without preamplification.

    PubMed

    Zahn, Hans; Steif, Adi; Laks, Emma; Eirew, Peter; VanInsberghe, Michael; Shah, Sohrab P; Aparicio, Samuel; Hansen, Carl L

    2017-02-01

    Single-cell genomics is critical for understanding cellular heterogeneity in cancer, but existing library preparation methods are expensive, require sample preamplification and introduce coverage bias. Here we describe direct library preparation (DLP), a robust, scalable, and high-fidelity method that uses nanoliter-volume transposition reactions for single-cell whole-genome library preparation without preamplification. We examined 782 cells from cell lines and triple-negative breast xenograft tumors. Low-depth sequencing, compared with existing methods, revealed greater coverage uniformity and more reliable detection of copy-number alterations. Using phylogenetic analysis, we found minor xenograft subpopulations that were undetectable by bulk sequencing, as well as dynamic clonal expansion and diversification between passages. Merging single-cell genomes in silico, we generated 'bulk-equivalent' genomes with high depth and uniform coverage. Thus, low-depth sequencing of DLP libraries may provide an attractive replacement for conventional bulk sequencing methods, permitting analysis of copy number at the cell level and of other genomic variants at the population level.

  13. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data.

    PubMed

    de Andrade, Roberto R S; Vaslin, Maite F S

    2014-03-07

    Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/.

  14. SearchSmallRNA: a graphical interface tool for the assemblage of viral genomes using small RNA libraries data

    PubMed Central

    2014-01-01

    Background Next-generation parallel sequencing (NGS) allows the identification of viral pathogens by sequencing the small RNAs of infected hosts. Thus, viral genomes may be assembled from host immune response products without prior virus enrichment, amplification or purification. However, mapping of the vast information obtained presents a bioinformatics challenge. Methods In order to by pass the need of line command and basic bioinformatics knowledge, we develop a mapping software with a graphical interface to the assemblage of viral genomes from small RNA dataset obtained by NGS. SearchSmallRNA was developed in JAVA language version 7 using NetBeans IDE 7.1 software. The program also allows the analysis of the viral small interfering RNAs (vsRNAs) profile; providing an overview of the size distribution and other features of the vsRNAs produced in infected cells. Results The program performs comparisons between each read sequenced present in a library and a chosen reference genome. Reads showing Hamming distances smaller or equal to an allowed mismatched will be selected as positives and used to the assemblage of a long nucleotide genome sequence. In order to validate the software, distinct analysis using NGS dataset obtained from HIV and two plant viruses were used to reconstruct viral whole genomes. Conclusions SearchSmallRNA program was able to reconstructed viral genomes using NGS of small RNA dataset with high degree of reliability so it will be a valuable tool for viruses sequencing and discovery. It is accessible and free to all research communities and has the advantage to have an easy-to-use graphical interface. Availability and implementation SearchSmallRNA was written in Java and is freely available at http://www.microbiologia.ufrj.br/ssrna/. PMID:24607237

  15. Democratizing Human Genome Project Information: A Model Program for Education, Information and Debate in Public Libraries.

    ERIC Educational Resources Information Center

    Pollack, Miriam

    The "Mapping the Human Genome" project demonstrated that librarians can help whomever they serve in accessing information resources in the areas of biological and health information, whether it is the scientists who are developing the information or a member of the public who is using the information. Public libraries can guide library…

  16. Gowinda: unbiased analysis of gene set enrichment for genome-wide association studies.

    PubMed

    Kofler, Robert; Schlötterer, Christian

    2012-08-01

    An analysis of gene set [e.g. Gene Ontology (GO)] enrichment assumes that all genes are sampled independently from each other with the same probability. These assumptions are violated in genome-wide association (GWA) studies since (i) longer genes typically have more single-nucleotide polymorphisms resulting in a higher probability of being sampled and (ii) overlapping genes are sampled in clusters. Herein, we introduce Gowinda, a software specifically designed to test for enrichment of gene sets in GWA studies. We show that GO tests on GWA data could result in a substantial number of false-positive GO terms. Permutation tests implemented in Gowinda eliminate these biases, but maintain sufficient power to detect enrichment of GO terms. Since sufficient resolution for large datasets requires millions of permutations, we use multi-threading to keep computation times reasonable. Gowinda is implemented in Java (v1.6) and freely available on http://code.google.com/p/gowinda/ christian.schloetterer@vetmeduni.ac.at Manual: http://code.google.com/p/gowinda/wiki/Manual. Test data and tutorial: http://code.google.com/p/gowinda/wiki/Tutorial. http://code.google.com/p/gowinda/wiki/VALIDATION.

  17. Application of a simplified method of chloroplast enrichment to small amounts of tissue for chloroplast genome sequencing1

    PubMed Central

    Sakaguchi, Shota; Ueno, Saneyoshi; Tsumura, Yoshihiko; Setoguchi, Hiroaki; Ito, Motomi; Hattori, Chie; Nozoe, Shogo; Takahashi, Daiki; Nakamasu, Riku; Sakagami, Taishi; Lannuzel, Guillaume; Fogliani, Bruno; Wulff, Adrien S.; L’Huillier, Laurent; Isagi, Yuji

    2017-01-01

    Premise of the study: High-throughput sequencing of genomic DNA can recover complete chloroplast genome sequences, but the sequence data are usually dominated by sequences from nuclear/mitochondrial genomes. To overcome this deficiency, a simple enrichment method for chloroplast DNA from small amounts of plant tissue was tested for eight plant species including a gymnosperm and various angiosperms. Methods: Chloroplasts were enriched using a high-salt isolation buffer without any step gradient procedures, and enriched chloroplast DNA was sequenced by multiplexed high-throughput sequencing. Results: Using this simple method, significant enrichment of chloroplast DNA-derived reads was attained, allowing deep sequencing of chloroplast genomes. As an example, the chloroplast genome of the conifer Callitris sulcata was assembled, from which polymorphic microsatellite loci were isolated successfully. Discussion: This chloroplast enrichment method from small amounts of plant tissue will be particularly useful for studies that use sequencers with relatively small throughput and that cannot use large amounts of tissue (e.g., for endangered species). PMID:28529832

  18. Characterization of expressed sequence tags from a full-length enriched cDNA library of Cryptomeria japonica male strobili

    PubMed Central

    Futamura, Norihiro; Totoki, Yasushi; Toyoda, Atsushi; Igasaki, Tomohiro; Nanjo, Tokihiko; Seki, Motoaki; Sakaki, Yoshiyuki; Mari, Adriano; Shinozaki, Kazuo; Shinohara, Kenji

    2008-01-01

    Background Cryptomeria japonica D. Don is one of the most commercially important conifers in Japan. However, the allergic disease caused by its pollen is a severe public health problem in Japan. Since large-scale analysis of expressed sequence tags (ESTs) in the male strobili of C. japonica should help us to clarify the overall expression of genes during the process of pollen development, we constructed a full-length enriched cDNA library that was derived from male strobili at various developmental stages. Results We obtained 36,011 expressed sequence tags (ESTs) from either one or both ends of 19,437 clones derived from the cDNA library of C. japonica male strobili at various developmental stages. The 19,437 cDNA clones corresponded to 10,463 transcripts. Approximately 80% of the transcripts resembled ESTs from Pinus and Picea, while approximately 75% had homologs in Arabidopsis. An analysis of homologies between ESTs from C. japonica male strobili and known pollen allergens in the Allergome Database revealed that products of 180 transcripts exhibited significant homology. Approximately 2% of the transcripts appeared to encode transcription factors. We identified twelve genes for MADS-box proteins among these transcription factors. The twelve MADS-box genes were classified as DEF/GLO/GGM13-, AG-, AGL6-, TM3- and TM8-like MIKCC genes and type I MADS-box genes. Conclusion Our full-length enriched cDNA library derived from C. japonica male strobili provides information on expression of genes during the development of male reproductive organs. We provided potential allergens in C. japonica. We also provided new information about transcription factors including MADS-box genes expressed in male strobili of C. japonica. Large-scale gene discovery using full-length cDNAs is a valuable tool for studies of gymnosperm species. PMID:18691438

  19. BAC libraries construction from the ancestral diploid genomes of the allotetraploid cultivated peanut

    PubMed Central

    Guimarães, Patricia M; Garsmeur, Olivier; Proite, Karina; Leal-Bertioli, Soraya CM; Seijo, Guilhermo; Chaine, Christian; Bertioli, David J; D'Hont, Angelique

    2008-01-01

    Background Cultivated peanut, Arachis hypogaea is an allotetraploid of recent origin, with an AABB genome. In common with many other polyploids, it seems that a severe genetic bottle-neck was imposed at the species origin, via hybridisation of two wild species and spontaneous chromosome duplication. Therefore, the study of the genome of peanut is hampered both by the crop's low genetic diversity and its polyploidy. In contrast to cultivated peanut, most wild Arachis species are diploid with high genetic diversity. The study of diploid Arachis genomes is therefore attractive, both to simplify the construction of genetic and physical maps, and for the isolation and characterization of wild alleles. The most probable wild ancestors of cultivated peanut are A. duranensis and A. ipaënsis with genome types AA and BB respectively. Results We constructed and characterized two large-insert libraries in Bacterial Artificial Chromosome (BAC) vector, one for each of the diploid ancestral species. The libraries (AA and BB) are respectively c. 7.4 and c. 5.3 genome equivalents with low organelle contamination and average insert sizes of 110 and 100 kb. Both libraries were used for the isolation of clones containing genetically mapped legume anchor markers (single copy genes), and resistance gene analogues. Conclusion These diploid BAC libraries are important tools for the isolation of wild alleles conferring resistances to biotic stresses, comparisons of orthologous regions of the AA and BB genomes with each other and with other legume species, and will facilitate the construction of a physical map. PMID:18230166

  20. The 19 Genomes of Drosophila: A BAC Library Resource for Genus-Wide and Genome-Scale Comparative Evolutionary Research

    PubMed Central

    Song, Xiang; Goicoechea, Jose Luis; Ammiraju, Jetty S. S.; Luo, Meizhong; He, Ruifeng; Lin, Jinke; Lee, So-Jeong; Sisneros, Nicholas; Watts, Tom; Kudrna, David A.; Golser, Wolfgang; Ashley, Elizabeth; Collura, Kristi; Braidotti, Michele; Yu, Yeisoo; Matzkin, Luciano M.; McAllister, Bryant F.; Markow, Therese Ann; Wing, Rod A.

    2011-01-01

    The genus Drosophila has been the subject of intense comparative phylogenomics characterization to provide insights into genome evolution under diverse biological and ecological contexts and to functionally annotate the Drosophila melanogaster genome, a model system for animal and insect genetics. Recent sequencing of 11 additional Drosophila species from various divergence points of the genus is a first step in this direction. However, to fully reap the benefits of this resource, the Drosophila community is faced with two critical needs: i.e., the expansion of genomic resources from a much broader range of phylogenetic diversity and the development of additional resources to aid in finishing the existing draft genomes. To address these needs, we report the first synthesis of a comprehensive set of bacterial artificial chromosome (BAC) resources for 19 Drosophila species from all three subgenera. Ten libraries were derived from the exact source used to generate 10 of the 12 draft genomes, while the rest were generated from a strategically selected set of species on the basis of salient ecological and life history features and their phylogenetic positions. The majority of the new species have at least one sequenced reference genome for immediate comparative benefit. This 19-BAC library set was rigorously characterized and shown to have large insert sizes (125–168 kb), low nonrecombinant clone content (0.3–5.3%), and deep coverage (9.1–42.9×). Further, we demonstrated the utility of this BAC resource for generating physical maps of targeted loci, refining draft sequence assemblies and identifying potential genomic rearrangements across the phylogeny. PMID:21321134

  1. The 19 genomes of Drosophila: a BAC library resource for genus-wide and genome-scale comparative evolutionary research.

    PubMed

    Song, Xiang; Goicoechea, Jose Luis; Ammiraju, Jetty S S; Luo, Meizhong; He, Ruifeng; Lin, Jinke; Lee, So-Jeong; Sisneros, Nicholas; Watts, Tom; Kudrna, David A; Golser, Wolfgang; Ashley, Elizabeth; Collura, Kristi; Braidotti, Michele; Yu, Yeisoo; Matzkin, Luciano M; McAllister, Bryant F; Markow, Therese Ann; Wing, Rod A

    2011-04-01

    The genus Drosophila has been the subject of intense comparative phylogenomics characterization to provide insights into genome evolution under diverse biological and ecological contexts and to functionally annotate the Drosophila melanogaster genome, a model system for animal and insect genetics. Recent sequencing of 11 additional Drosophila species from various divergence points of the genus is a first step in this direction. However, to fully reap the benefits of this resource, the Drosophila community is faced with two critical needs: i.e., the expansion of genomic resources from a much broader range of phylogenetic diversity and the development of additional resources to aid in finishing the existing draft genomes. To address these needs, we report the first synthesis of a comprehensive set of bacterial artificial chromosome (BAC) resources for 19 Drosophila species from all three subgenera. Ten libraries were derived from the exact source used to generate 10 of the 12 draft genomes, while the rest were generated from a strategically selected set of species on the basis of salient ecological and life history features and their phylogenetic positions. The majority of the new species have at least one sequenced reference genome for immediate comparative benefit. This 19-BAC library set was rigorously characterized and shown to have large insert sizes (125-168 kb), low nonrecombinant clone content (0.3-5.3%), and deep coverage (9.1-42.9×). Further, we demonstrated the utility of this BAC resource for generating physical maps of targeted loci, refining draft sequence assemblies and identifying potential genomic rearrangements across the phylogeny.

  2. Outlier analysis of functional genomic profiles enriches for oncology targets and enables precision medicine.

    PubMed

    Zhu, Zhou; Ihle, Nathan T; Rejto, Paul A; Zarrinkar, Patrick P

    2016-06-13

    Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.

  3. A novel ammonia-oxidizing archaeon from wastewater treatment plant: Its enrichment, physiological and genomic characteristics

    PubMed Central

    Li, Yuyang; Ding, Kun; Wen, Xianghua; Zhang, Bing; Shen, Bo; Yang, Yunfeng

    2016-01-01

    Ammonia-oxidizing archaea (AOA) are recently found to participate in the ammonia removal processes in wastewater treatment plants (WWTPs), similar to their bacterial counterparts. However, due to lack of cultivated AOA strains from WWTPs, their functions and contributions in these systems remain unclear. Here we report a novel AOA strain SAT1 enriched from activated sludge, with its physiological and genomic characteristics investigated. The maximal 16S rRNA gene similarity between SAT1 and other reported AOA strain is 96% (with “Ca. Nitrosotenuis chungbukensis”), and it is affiliated with Wastewater Cluster B (WWC-B) based on amoA gene phylogeny, a cluster within group I.1a and specific for activated sludge. Our strain is autotrophic, mesophilic (25 °C–33 °C) and neutrophilic (pH 5.0–7.0). Its genome size is 1.62 Mb, with a large fragment inversion (accounted for 68% genomic size) inside. The strain could not utilize urea due to truncation of the urea transporter gene. The lack of the pathways to synthesize usual compatible solutes makes it intolerant to high salinity (>0.03%), but could adapt to low salinity (0.005%) environments. This adaptation, together with possibly enhanced cell-biofilm attachment ability, makes it suitable for WWTPs environment. We propose the name “Candidatus Nitrosotenuis cloacae” for the strain SAT1. PMID:27030530

  4. A novel ammonia-oxidizing archaeon from wastewater treatment plant: Its enrichment, physiological and genomic characteristics

    NASA Astrophysics Data System (ADS)

    Li, Yuyang; Ding, Kun; Wen, Xianghua; Zhang, Bing; Shen, Bo; Yang, Yunfeng

    2016-03-01

    Ammonia-oxidizing archaea (AOA) are recently found to participate in the ammonia removal processes in wastewater treatment plants (WWTPs), similar to their bacterial counterparts. However, due to lack of cultivated AOA strains from WWTPs, their functions and contributions in these systems remain unclear. Here we report a novel AOA strain SAT1 enriched from activated sludge, with its physiological and genomic characteristics investigated. The maximal 16S rRNA gene similarity between SAT1 and other reported AOA strain is 96% (with “Ca. Nitrosotenuis chungbukensis”), and it is affiliated with Wastewater Cluster B (WWC-B) based on amoA gene phylogeny, a cluster within group I.1a and specific for activated sludge. Our strain is autotrophic, mesophilic (25 °C–33 °C) and neutrophilic (pH 5.0–7.0). Its genome size is 1.62 Mb, with a large fragment inversion (accounted for 68% genomic size) inside. The strain could not utilize urea due to truncation of the urea transporter gene. The lack of the pathways to synthesize usual compatible solutes makes it intolerant to high salinity (>0.03%), but could adapt to low salinity (0.005%) environments. This adaptation, together with possibly enhanced cell-biofilm attachment ability, makes it suitable for WWTPs environment. We propose the name “Candidatus Nitrosotenuis cloacae” for the strain SAT1.

  5. Synthesis of an arrayed sgRNA library targeting the human genome

    PubMed Central

    Schmidt, Tobias; Schmid-Burgk, Jonathan L.; Hornung, Veit

    2015-01-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) in conjunction with CRISPR-associated proteins (Cas) can be employed to introduce double stand breaks into mammalian genomes at user-defined loci. The endonuclease activity of the Cas complex can be targeted to a specific genomic region using a single guide RNA (sgRNA). We developed a ligation-independent cloning (LIC) assembly method for efficient and bias-free generation of large sgRNA libraries. Using this system, we performed an iterative shotgun cloning approach to generate an arrayed sgRNA library that targets one critical exon of almost every protein-coding human gene. An orthogonal mixing and deconvolution approach was used to obtain 19,506 unique sequence-validated sgRNAs (91.4% coverage). As tested in HEK 293T cells, constructs of this library have a median genome editing activity of 54.6% and employing sgRNAs of this library to generate knockout cells was successful for 19 out of 19 genes tested. PMID:26446710

  6. Genome-wide linkage analysis for uric acid in families enriched for hypertension

    PubMed Central

    Rule, Andrew D.; Fridley, Brooke L.; Hunt, Steven C.; Asmann, Yan; Boerwinkle, Eric; Pankow, James S.; Mosley, Thomas H.; Turner, Stephen T.

    2009-01-01

    Background. Uric acid is heritable and associated with hypertension and insulin resistance. We sought to identify genomic regions influencing serum uric acid in families in which two or more siblings had hypertension. Methods. Uric acid levels and microsatellite markers were assayed in the Genetic Epidemiology Network of Arteriopathy (GENOA) cohort (1075 whites and 1333 blacks) and the Hypertension Genetic Epidemiology Network (HyperGEN) cohort (1542 whites and 1627 blacks). Genome-wide linkage analyses of uric acid and bivariate linkage analyses of uric acid with an additional surrogate of insulin resistance were completed. Pathway analysis explored gene sets enriched at loci influencing uric acid. Results. In the GENOA white cohort, loci influencing uric acid were identified on chromosome 8 at 135 cM [multipoint logarithm of odds score (MLS) = 2.4], on chromosome 9 at 113 cM (MLS = 3.7) and on chromosome 16 at 93 cM (MLS = 2.3), but did not replicate in HyperGEN. At these loci, there was evidence of pleiotropy with other surrogates of insulin resistance and genes in the fructose and mannose metabolism pathway were enriched. In the HyperGEN-black cohort, there was some evidence of a locus for uric acid on chromosome 4 at 135 cM (MLS = 2.4) that had modest replication in GENOA (MLS = 1.2). Conclusions. Several novel loci linked to uric acid were identified but none showed clear replication. Widespread diuretic use, a medication that raises uric acid levels, was an important study limitation. Bivariate linkage analyses and pathway analysis were consistent with genes regulating insulin resistance and fructose metabolism contributing to the heritability of uric acid. PMID:19258383

  7. Genome-wide enrichment analysis between endometriosis and obesity-related traits reveals novel susceptibility loci

    PubMed Central

    Rahmioglu, Nilufer; Macgregor, Stuart; Drong, Alexander W.; Hedman, Åsa K.; Harris, Holly R.; Randall, Joshua C.; Prokopenko, Inga; Nyholt, Dale R.; Morris, Andrew P.; Montgomery, Grant W.; Missmer, Stacey A.; Lindgren, Cecilia M.; Zondervan, Krina T.

    2015-01-01

    Endometriosis is a chronic inflammatory condition in women that results in pelvic pain and subfertility, and has been associated with decreased body mass index (BMI). Genetic variants contributing to the heritable component have started to emerge from genome-wide association studies (GWAS), although the majority remain unknown. Unexpectedly, we observed an intergenic locus on 7p15.2 that was genome-wide significantly associated with both endometriosis and fat distribution (waist-to-hip ratio adjusted for BMI; WHRadjBMI) in an independent meta-GWAS of European ancestry individuals. This led us to investigate the potential overlap in genetic variants underlying the aetiology of endometriosis, WHRadjBMI and BMI using GWAS data. Our analyses demonstrated significant enrichment of common variants between fat distribution and endometriosis (P = 3.7 × 10−3), which was stronger when we restricted the investigation to more severe (Stage B) cases (P = 4.5 × 10−4). However, no genetic enrichment was observed between endometriosis and BMI (P = 0.79). In addition to 7p15.2, we identify four more variants with statistically significant evidence of involvement in both endometriosis and WHRadjBMI (in/near KIFAP3, CAB39L, WNT4, GRB14); two of these, KIFAP3 and CAB39L, are novel associations for both traits. KIFAP3, WNT4 and 7p15.2 are associated with the WNT signalling pathway; formal pathway analysis confirmed a statistically significant (P = 6.41 × 10−4) overrepresentation of shared associations in developmental processes/WNT signalling between the two traits. Our results demonstrate an example of potential biological pleiotropy that was hitherto unknown, and represent an opportunity for functional follow-up of loci and further cross-phenotype comparisons to assess how fat distribution and endometriosis pathogenesis research fields can inform each other. PMID:25296917

  8. Core and region-enriched networks of behaviorally regulated genes and the singing genome

    PubMed Central

    Whitney, Osceola; Pfenning, Andreas R.; Howard, Jason T.; Blatti, Charles A; Liu, Fang; Ward, James M.; Wang, Rui; Audet, Jean-Nicolas; Kellis, Manolis; Mukherjee, Sayan; Sinha, Saurabh; Hartemink, Alexander J.; West, Anne E.; Jarvis, Erich D.

    2015-01-01

    Songbirds represent an important model organism for elucidating molecular mechanisms that link genes with complex behaviors, in part because they have discrete vocal learning circuits that have parallels with those that mediate human speech. We found that ~10% of the genes in the avian genome were regulated by singing, and we found a striking regional diversity of both basal and singing-induced programs in the four key song nuclei of the zebra finch, a vocal learning songbird. The region-enriched patterns were a result of distinct combinations of region-enriched transcription factors (TFs), their binding motifs, and presinging acetylation of histone 3 at lysine 27 (H3K27ac) enhancer activity in the regulatory regions of the associated genes. RNA interference manipulations validated the role of the calcium-response transcription factor (CaRF) in regulating genes preferentially expressed in specific song nuclei in response to singing. Thus, differential combinatorial binding of a small group of activity-regulated TFs and predefined epigenetic enhancer activity influences the anatomical diversity of behaviorally regulated gene networks. PMID:25504732

  9. Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries.

    PubMed Central

    Ruberti, I; Fragapane, P; Pierandrei-Amaldi, P; Beccari, E; Amaldi, F; Bozzoni, I

    1982-01-01

    Using a cDNA clone for the histone H3 we have isolated, from two genomic libraries of Xenopus laevis and Xenopus tropicalis, clones containing four different histone gene clusters. The structural organization of X. laevis histone genes has been determined by restriction mapping, Southern blot hybridization and translation of the mRNAs which hybridize to the various restriction fragments. The arrangement of the histone genes in X. tropicalis has been determined by Southern analysis using X. laevis genomic fragments, containing individual genes, as probes. Histone genes are clustered in the genome of X. laevis and X. tropicalis and, compared to invertebrates, show a higher organization heterogeneity as demonstrated by structural analysis of the four genomic clones. In fact, the order of the genes within individual clusters is not conserved. Images PMID:6296782

  10. SVGenes: a library for rendering genomic features in scalable vector graphic format.

    PubMed

    Etherington, Graham J; MacLean, Daniel

    2013-08-01

    Drawing genomic features in attractive and informative ways is a key task in visualization of genomics data. Scalable Vector Graphics (SVG) format is a modern and flexible open standard that provides advanced features including modular graphic design, advanced web interactivity and animation within a suitable client. SVGs do not suffer from loss of image quality on re-scaling and provide the ability to edit individual elements of a graphic on the whole object level independent of the whole image. These features make SVG a potentially useful format for the preparation of publication quality figures including genomic objects such as genes or sequencing coverage and for web applications that require rich user-interaction with the graphical elements. SVGenes is a Ruby-language library that uses SVG primitives to render typical genomic glyphs through a simple and flexible Ruby interface. The library implements a simple Page object that spaces and contains horizontal Track objects that in turn style, colour and positions features within them. Tracks are the level at which visual information is supplied providing the full styling capability of the SVG standard. Genomic entities like genes, transcripts and histograms are modelled in Glyph objects that are attached to a track and take advantage of SVG primitives to render the genomic features in a track as any of a selection of defined glyphs. The feature model within SVGenes is simple but flexible and not dependent on particular existing gene feature formats meaning graphics for any existing datasets can easily be created without need for conversion. The library is provided as a Ruby Gem from https://rubygems.org/gems/bio-svgenes under the MIT license, and open source code is available at https://github.com/danmaclean/bioruby-svgenes also under the MIT License. dan.maclean@tsl.ac.uk.

  11. SVGenes: a library for rendering genomic features in scalable vector graphic format

    PubMed Central

    Etherington, Graham J.; MacLean, Daniel

    2013-01-01

    Motivation: Drawing genomic features in attractive and informative ways is a key task in visualization of genomics data. Scalable Vector Graphics (SVG) format is a modern and flexible open standard that provides advanced features including modular graphic design, advanced web interactivity and animation within a suitable client. SVGs do not suffer from loss of image quality on re-scaling and provide the ability to edit individual elements of a graphic on the whole object level independent of the whole image. These features make SVG a potentially useful format for the preparation of publication quality figures including genomic objects such as genes or sequencing coverage and for web applications that require rich user-interaction with the graphical elements. Results: SVGenes is a Ruby-language library that uses SVG primitives to render typical genomic glyphs through a simple and flexible Ruby interface. The library implements a simple Page object that spaces and contains horizontal Track objects that in turn style, colour and positions features within them. Tracks are the level at which visual information is supplied providing the full styling capability of the SVG standard. Genomic entities like genes, transcripts and histograms are modelled in Glyph objects that are attached to a track and take advantage of SVG primitives to render the genomic features in a track as any of a selection of defined glyphs. The feature model within SVGenes is simple but flexible and not dependent on particular existing gene feature formats meaning graphics for any existing datasets can easily be created without need for conversion. Availability: The library is provided as a Ruby Gem from https://rubygems.org/gems/bio-svgenes under the MIT license, and open source code is available at https://github.com/danmaclean/bioruby-svgenes also under the MIT License. Contact: dan.maclean@tsl.ac.uk PMID:23749959

  12. A new age in functional genomics using CRISPR/Cas9 in arrayed library screening

    PubMed Central

    Agrotis, Alexander; Ketteler, Robin

    2015-01-01

    CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development. What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA. Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions. For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene. In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening. In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening. PMID:26442115

  13. A new age in functional genomics using CRISPR/Cas9 in arrayed library screening.

    PubMed

    Agrotis, Alexander; Ketteler, Robin

    2015-01-01

    CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development. What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA. Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions. For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene. In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening. In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.

  14. Fidelity by design: Yoctoreactor and binder trap enrichment for small-molecule DNA-encoded libraries and drug discovery.

    PubMed

    Blakskjaer, Peter; Heitner, Tara; Hansen, Nils Jakob Vest

    2015-06-01

    DNA-encoded small-molecule library (DEL) technology allows vast drug-like small molecule libraries to be efficiently synthesized in a combinatorial fashion and screened in a single tube method for binding, with an assay readout empowered by advances in next generation sequencing technology. This approach has increasingly been applied as a viable technology for the identification of small-molecule modulators to protein targets and as precursors to drugs in the past decade. Several strategies for producing and for screening DELs have been devised by both academic and industrial institutions. This review highlights some of the most significant and recent strategies along with important results. A special focus on the production of high fidelity DEL technologies with the ability to eliminate screening noise and false positives is included: using a DNA junction called the Yoctoreactor, building blocks (BBs) are spatially confined at the center of the junction facilitating both the chemical reaction between BBs and encoding of the synthetic route. A screening method, known as binder trap enrichment, permits DELs to be screened robustly in a homogeneous manner delivering clean data sets and potent hits for even the most challenging targets.

  15. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    DOE PAGES

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; ...

    2016-11-23

    We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.

  16. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    PubMed Central

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; McBride, Kathryn R.; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D.

    2016-01-01

    Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin. PMID:27881538

  17. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm.

    PubMed

    Woo, Hannah L; O'Dell, Kaela B; Utturkar, Sagar; McBride, Kathryn R; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T B K; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D; Hazen, Terry C

    2016-11-23

    Thalassospira sp. strain KO164 was isolated from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. The near-complete genome sequence presented here will facilitate analyses into this deep-ocean bacterium's ability to degrade recalcitrant organics such as lignin.

  18. Library+

    ERIC Educational Resources Information Center

    Merrill, Alex

    2011-01-01

    This article discusses possible future directions for academic libraries in the post Web/Library 2.0 world. These possible directions include areas such as data literacy, linked data sets, and opportunities for libraries in support of digital humanities. The author provides a brief sketch of the background information regarding the topics and…

  19. Effects of methylation-sensitive enzymes on the enrichment of genic SNPs and the degree of genome complexity reduction in a two-enzyme genotyping-by-sequencing (GBS) approach: a case study in oil palm (Elaeis guineensis).

    PubMed

    Pootakham, Wirulda; Sonthirod, Chutima; Naktang, Chaiwat; Jomchai, Nukoon; Sangsrakru, Duangjai; Tangphatsornruang, Sithichoke

    2016-01-01

    Advances in next generation sequencing have facilitated a large-scale single nucleotide polymorphism (SNP) discovery in many crop species. Genotyping-by-sequencing (GBS) approach couples next generation sequencing with genome complexity reduction techniques to simultaneously identify and genotype SNPs. Choice of enzymes used in GBS library preparation depends on several factors including the number of markers required, the desired level of multiplexing, and whether the enrichment of genic SNP is preferred. We evaluated various combinations of methylation-sensitive (AatII, PstI, MspI) and methylation-insensitive (SphI, MseI) enzymes for their effectiveness in genome complexity reduction and enrichment of genic SNPs. We discovered that the use of two methylation-sensitive enzymes effectively reduced genome complexity and did not require a size selection step. On the contrary, the genome coverage of libraries constructed with methylation-insensitive enzymes was quite high, and the additional size selection step may be required to increase the overall read depth. We also demonstrated the effectiveness of methylation-sensitive enzymes in enriching for SNPs located in genic regions. When two methylation-insensitive enzymes were used, only 16% of SNPs identified were located in genes and 18% in the vicinity (± 5 kb) of the genic regions, while most SNPs resided in the intergenic regions. In contrast, a remarkable degree of enrichment was observed when two methylation-sensitive enzymes were employed. Almost two thirds of the SNPs were located either inside (32-36%) or in the vicinity (28-31%) of the genic regions. These results provide useful information to help researchers choose appropriate GBS enzymes in oil palm and other crop species.

  20. Validation of SCALE 4. 0 -- CSAS25 module and the 27-group ENDF/B-IV cross-section library for low-enriched uranium systems

    SciTech Connect

    Jordan, W.C.

    1993-02-01

    A version of KENO V.a and the 27-group library in SCALE-4.0 were validated for use in evaluating the nuclear criticality safety of low-enriched uranium systems. A total of 59 critical systems were analyzed. A statistical analysis of the results was performed, and subcritical acceptanced criteria are established.

  1. Validation of SCALE 4.0 -- CSAS25 module and the 27-group ENDF/B-IV cross-section library for low-enriched uranium systems

    SciTech Connect

    Jordan, W.C.

    1993-02-01

    A version of KENO V.a and the 27-group library in SCALE-4.0 were validated for use in evaluating the nuclear criticality safety of low-enriched uranium systems. A total of 59 critical systems were analyzed. A statistical analysis of the results was performed, and subcritical acceptanced criteria are established.

  2. Characterization of a BAC Library from Channel Catfish Ictalurus punctatus: Indications of High Rates of Evolution Among Teleost Genomes

    USDA-ARS?s Scientific Manuscript database

    The CHORI-212 bacterial artificial chromosome (BAC) library was constructed by cloning EcoRI/EcoRI partially digested DNA into the pTARBAC2.1 vector. The library has an average insert size of 161 kb, and provides 10.6-fold coverage of the channel catfish haploid genome. Screening of 32 genes using o...

  3. Screening of an E. coli O157:H7 Bacterial Artificial Chromosome Library by Comparative Genomic Hybridization to Identify Genomic Regions Contributing to Growth in Bovine Gastrointestinal Mucus and Epithelial Cell Colonization

    PubMed Central

    Bai, Jianing; McAteer, Sean P.; Paxton, Edith; Mahajan, Arvind; Gally, David L.; Tree, Jai J.

    2011-01-01

    Enterohemorrhagic E. coli (EHEC) O157:H7 can cause serious gastrointestinal and systemic disease in humans following direct or indirect exposure to ruminant feces containing the bacterium. The main colonization site of EHEC O157:H7 in cattle is the terminal rectum where the bacteria intimately attach to the epithelium and multiply in the intestinal mucus. This study aimed to identify genomic regions of EHEC O157:H7 that contribute to colonization and multiplication at this site. A bacterial artificial chromosome (BAC) library was generated from a derivative of the sequenced E. coli O157:H7 Sakai strain. The library contains 1152 clones averaging 150 kbp. To verify the library, clones containing a complete locus of enterocyte effacement (LEE) were identified by DNA hybridization. In line with a previous report, these did not confer a type III secretion (T3S) capacity to the K-12 host strain. However, conjugation of one of the BAC clones into a strain containing a partial LEE deletion restored T3S. Three hundred eighty-four clones from the library were subjected to two different selective screens; one involved three rounds of adherence assays to bovine primary rectal epithelial cells while the other competed the clones over three rounds of growth in bovine rectal mucus. The input strain DNA was then compared with the selected strains using comparative genomic hybridization (CGH) on an E. coli microarray. The adherence assay enriched for pO157 DNA indicating the importance of this plasmid for colonization of rectal epithelial cells. The mucus assay enriched for multiple regions involved in carbohydrate utilization, including hexuronate uptake, indicating that these regions provide a competitive growth advantage in bovine mucus. This BAC-CGH approach provides a positive selection screen that complements negative selection transposon-based screens. As demonstrated, this may be of particular use for identifying genes with redundant functions such as adhesion and carbon

  4. Candidate genes for obesity-susceptibility show enriched association within a large genome-wide association study for BMI

    PubMed Central

    Vimaleswaran, Karani S.; Tachmazidou, Ioanna; Zhao, Jing Hua; Hirschhorn, Joel N.; Dudbridge, Frank; Loos, Ruth J.F.

    2012-01-01

    Before the advent of genome-wide association studies (GWASs), hundreds of candidate genes for obesity-susceptibility had been identified through a variety of approaches. We examined whether those obesity candidate genes are enriched for associations with body mass index (BMI) compared with non-candidate genes by using data from a large-scale GWAS. A thorough literature search identified 547 candidate genes for obesity-susceptibility based on evidence from animal studies, Mendelian syndromes, linkage studies, genetic association studies and expression studies. Genomic regions were defined to include the genes ±10 kb of flanking sequence around candidate and non-candidate genes. We used summary statistics publicly available from the discovery stage of the genome-wide meta-analysis for BMI performed by the genetic investigation of anthropometric traits consortium in 123 564 individuals. Hypergeometric, rank tail-strength and gene-set enrichment analysis tests were used to test for the enrichment of association in candidate compared with non-candidate genes. The hypergeometric test of enrichment was not significant at the 5% P-value quantile (P = 0.35), but was nominally significant at the 25% quantile (P = 0.015). The rank tail-strength and gene-set enrichment tests were nominally significant for the full set of genes and borderline significant for the subset without SNPs at P < 10−7. Taken together, the observed evidence for enrichment suggests that the candidate gene approach retains some value. However, the degree of enrichment is small despite the extensive number of candidate genes and the large sample size. Studies that focus on candidate genes have only slightly increased chances of detecting associations, and are likely to miss many true effects in non-candidate genes, at least for obesity-related traits. PMID:22791748

  5. A Computational Solution to Automatically Map Metabolite Libraries in the Context of Genome Scale Metabolic Networks

    PubMed Central

    Merlet, Benjamin; Paulhe, Nils; Vinson, Florence; Frainay, Clément; Chazalviel, Maxime; Poupin, Nathalie; Gloaguen, Yoann; Giacomoni, Franck; Jourdan, Fabien

    2016-01-01

    This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities. PMID:26909353

  6. A Computational Solution to Automatically Map Metabolite Libraries in the Context of Genome Scale Metabolic Networks.

    PubMed

    Merlet, Benjamin; Paulhe, Nils; Vinson, Florence; Frainay, Clément; Chazalviel, Maxime; Poupin, Nathalie; Gloaguen, Yoann; Giacomoni, Franck; Jourdan, Fabien

    2016-01-01

    This article describes a generic programmatic method for mapping chemical compound libraries on organism-specific metabolic networks from various databases (KEGG, BioCyc) and flat file formats (SBML and Matlab files). We show how this pipeline was successfully applied to decipher the coverage of chemical libraries set up by two metabolomics facilities MetaboHub (French National infrastructure for metabolomics and fluxomics) and Glasgow Polyomics (GP) on the metabolic networks available in the MetExplore web server. The present generic protocol is designed to formalize and reduce the volume of information transfer between the library and the network database. Matching of metabolites between libraries and metabolic networks is based on InChIs or InChIKeys and therefore requires that these identifiers are specified in both libraries and networks. In addition to providing covering statistics, this pipeline also allows the visualization of mapping results in the context of metabolic networks. In order to achieve this goal, we tackled issues on programmatic interaction between two servers, improvement of metabolite annotation in metabolic networks and automatic loading of a mapping in genome scale metabolic network analysis tool MetExplore. It is important to note that this mapping can also be performed on a single or a selection of organisms of interest and is thus not limited to large facilities.

  7. Construction and utility of 10-kb libraries for efficient clone-gap closure for rice genome sequencing.

    PubMed

    Yang, Tae-Jin; Yu, Yeisoo; Nah, Gyoungju; Atkins, Michael; Lee, Seunghee; Frisch, David A; Wing, Rod A

    2003-08-01

    Rice is an important crop and a model system for monocot genomics, and is a target for whole genome sequencing by the International Rice Genome Sequencing Project (IRGSP). The IRGSP is using a clone by clone approach to sequence rice based on minimum tiles of BAC or PAC clones. For chromosomes 10 and 3 we are using an integrated physical map based on two fingerprinted and end-sequenced BAC libraries to identifying a minimum tiling path of clones. In this study we constructed and tested two rice genomic libraries with an average insert size of 10 kb (10-kb library) to support the gap closure and finishing phases of the rice genome sequencing project. The HaeIII library contains 166,752 clones covering approximately 4.6x rice genome equivalents with an average insert size of 10.5 kb. The Sau3AI library contains 138,960 clones covering 4.2x genome equivalents with an average insert size of 11.6 kb. Both libraries were gridded in duplicate onto 11 high-density filters in a 5 x 5 pattern to facilitate screening by hybridization. The libraries contain an unbiased coverage of the rice genome with less than 5% contamination by clones containing organelle DNA or no insert. An efficient method was developed, consisting of pooled overgo hybridization, the selection of 10-kb gap spanning clones using end sequences, transposon sequencing and utilization of in silico draft sequence, to close relatively small gaps between sequenced BAC clones. Using this method we were able to close a majority of the gaps (up to approximately 50 kb) identified during the finishing phase of chromosome-10 sequencing. This method represents a useful way to close clone gaps and thus to complete the entire rice genome.

  8. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum)

    PubMed Central

    2011-01-01

    Background Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. Results A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). Conclusions This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants

  9. Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum).

    PubMed

    Ke, Tao; Dong, Caihua; Mao, Han; Zhao, Yingzhong; Chen, Hong; Liu, Hongyan; Dong, Xuyan; Tong, Chaobo; Liu, Shengyi

    2011-12-24

    Sesame (Sesamum indicum) is one of the most important oilseed crops with high oil contents and rich nutrient value. However, genetic improvement efforts in sesame could not get benefit from molecular biology technology due to poor DNA and RNA sequence resources. In this study, we carried out a large scale of expressed sequence tags (ESTs) sequencing from developing sesame seeds and further conducted analysis on seed storage products-related genes. A normalized and full-length enriched cDNA library from 5 ~ 30 days old immature seeds was constructed and randomly sequenced, leading to generation of 41,248 expressed sequence tags (ESTs) which then formed 4,713 contigs and 27,708 singletons with 44.9% uniESTs being putative full-length open reading frames. Approximately 26,091 of all these uniESTs have significant matches to the counterparts in Nr database of GenBank, and 21,628 of them were assigned to one or more Gene ontology (GO) terms. Homologous genes involved in oil biosynthesis were identified including some conservative transcription factors regulating oil biosynthesis such as LEAFY COTYLEDON1 (LEC1), PICKLE (PKL), WRINKLED1 (WRI1) and majority of them were found for the first time in sesame seeds. One hundred and 17 ESTs were identified possibly involved in biosynthesis of sesame lignans, sesamin and sesamolin. In total, 9,347 putative functional genes from developing seeds were identified, which accounts for one third of total genes in the sesame genome. Further analysis of the uniESTs identified 1,949 non-redundant simple sequence repeats (SSRs). This study has provided an overview of genes expressed during sesame seed development. This collection of sesame full-length cDNAs covered a wide variety of genes in seeds, in particular, candidate genes involved in biosynthesis of sesame oils and lignans. These EST sequences enriched with full length will contribute to comparative genomic studies on sesame and other oilseed plants and serve as an abundant

  10. Pre-capture multiplexing improves efficiency and cost-effectiveness of targeted genomic enrichment

    PubMed Central

    2012-01-01

    Background Targeted genomic enrichment (TGE) is a widely used method for isolating and enriching specific genomic regions prior to massively parallel sequencing. To make effective use of sequencer output, barcoding and sample pooling (multiplexing) after TGE and prior to sequencing (post-capture multiplexing) has become routine. While previous reports have indicated that multiplexing prior to capture (pre-capture multiplexing) is feasible, no thorough examination of the effect of this method has been completed on a large number of samples. Here we compare standard post-capture TGE to two levels of pre-capture multiplexing: 12 or 16 samples per pool. We evaluated these methods using standard TGE metrics and determined the ability to identify several classes of genetic mutations in three sets of 96 samples, including 48 controls. Our overall goal was to maximize cost reduction and minimize experimental time while maintaining a high percentage of reads on target and a high depth of coverage at thresholds required for variant detection. Results We adapted the standard post-capture TGE method for pre-capture TGE with several protocol modifications, including redesign of blocking oligonucleotides and optimization of enzymatic and amplification steps. Pre-capture multiplexing reduced costs for TGE by at least 38% and significantly reduced hands-on time during the TGE protocol. We found that pre-capture multiplexing reduced capture efficiency by 23 or 31% for pre-capture pools of 12 and 16, respectively. However efficiency losses at this step can be compensated by reducing the number of simultaneously sequenced samples. Pre-capture multiplexing and post-capture TGE performed similarly with respect to variant detection of positive control mutations. In addition, we detected no instances of sample switching due to aberrant barcode identification. Conclusions Pre-capture multiplexing improves efficiency of TGE experiments with respect to hands-on time and reagent use compared

  11. Chronic periodontitis genome-wide association studies: gene-centric and gene set enrichment analyses.

    PubMed

    Rhodin, K; Divaris, K; North, K E; Barros, S P; Moss, K; Beck, J D; Offenbacher, S

    2014-09-01

    Recent genome-wide association studies (GWAS) of chronic periodontitis (CP) offer rich data sources for the investigation of candidate genes, functional elements, and pathways. We used GWAS data of CP (n = 4,504) and periodontal pathogen colonization (n = 1,020) from a cohort of adult Americans of European descent participating in the Atherosclerosis Risk in Communities study and employed a MAGENTA approach (i.e., meta-analysis gene set enrichment of variant associations) to obtain gene-centric and gene set association results corrected for gene size, number of single-nucleotide polymorphisms, and local linkage disequilibrium characteristics based on the human genome build 18 (National Center for Biotechnology Information build 36). We used the Gene Ontology, Ingenuity, KEGG, Panther, Reactome, and Biocarta databases for gene set enrichment analyses. Six genes showed evidence of statistically significant association: 4 with severe CP (NIN, p = 1.6 × 10(-7); ABHD12B, p = 3.6 × 10(-7); WHAMM, p = 1.7 × 10(-6); AP3B2, p = 2.2 × 10(-6)) and 2 with high periodontal pathogen colonization (red complex-KCNK1, p = 3.4 × 10(-7); Porphyromonas gingivalis-DAB2IP, p = 1.0 × 10(-6)). Top-ranked genes for moderate CP were HGD (p = 1.4 × 10(-5)), ZNF675 (p = 1.5 × 10(-5)), TNFRSF10C (p = 2.0 × 10(-5)), and EMR1 (p = 2.0 × 10(-5)). Loci containing NIN, EMR1, KCNK1, and DAB2IP had showed suggestive evidence of association in the earlier single-nucleotide polymorphism-based analyses, whereas WHAMM and AP2B2 emerged as novel candidates. The top gene sets included severe CP ("endoplasmic reticulum membrane," "cytochrome P450," "microsome," and "oxidation reduction") and moderate CP ("regulation of gene expression," "zinc ion binding," "BMP signaling pathway," and "ruffle"). Gene-centric analyses offer a promising avenue for efficient interrogation of large-scale GWAS data. These results highlight genes in previously identified loci and new candidate genes and pathways

  12. Genome-wide association analysis and pathways enrichment for lactation persistency in Canadian Holstein cattle.

    PubMed

    Do, D N; Bissonnette, N; Lacasse, P; Miglior, F; Sargolzaei, M; Zhao, X; Ibeagha-Awemu, E M

    2017-03-01

    Lactation persistency (LP), defined as the rate of declining milk yield after milk peak, is an economically important trait for dairy cattle. Improving LP is considered a good alternative method for increasing overall milk production because it does not cause the negative energy balance and other health issues that cows experience during peak milk production. However, little is known about the biology of LP. A genome-wide association study (GWAS) and pathway enrichment were used to explore the genetic mechanisms underlying LP. The GWAS was performed using a univariate regression mixed linear model on LP data of 3,796 cows and 44,100 single nucleotide polymorphisms (SNP). Eight and 47 SNP were significantly and suggestively associated with LP, respectively. The 2 most important quantitative trait loci regions for LP were (1) a region from 106 to 108 Mb on Bos taurus autosome (BTA) 5, where the most significant SNP (ARS-BFGL-NGS-2399) was located and also formed a linkage disequilibrium block with 3 other SNP; and (2) a region from 29.3 to 31.3 Mb on BTA 20, which contained 3 significant SNP. Based on physical positions, MAN1C1, MAP3K5, HCN1, TSPAN9, MRPS30, TEX14, and CCL28 are potential candidate genes for LP because the significant SNP were located in their intronic regions. Enrichment analyses of a list of 536 genes in 0.5-Mb flanking regions of significant and suggestive SNP indicates that synthesis of milk components, regulation of cell apoptosis processes and insulin, and prolactin signaling pathways are important for LP. Upstream regulators relevant for LP positional candidate genes were prolactin (PRL), peroxisome proliferator-activated receptor gamma (PPARG), and Erb-B2 receptor tyrosine kinase 2 (ERBB2). Several networks related to cellular development, proliferation and death were significantly enriched for LP positional candidate genes. In conclusion, this study detected several SNP, genes, and interesting regions for fine mapping and validation of

  13. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web.

    PubMed

    Miller, Chase A; Anthony, Jon; Meyer, Michelle M; Marth, Gabor

    2013-02-01

    High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported.

  14. Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web

    PubMed Central

    Miller, Chase A.; Anthony, Jon; Meyer, Michelle M.; Marth, Gabor

    2013-01-01

    Motivation: High-throughput biological research requires simultaneous visualization as well as analysis of genomic data, e.g. read alignments, variant calls and genomic annotations. Traditionally, such integrative analysis required desktop applications operating on locally stored data. Many current terabyte-size datasets generated by large public consortia projects, however, are already only feasibly stored at specialist genome analysis centers. As even small laboratories can afford very large datasets, local storage and analysis are becoming increasingly limiting, and it is likely that most such datasets will soon be stored remotely, e.g. in the cloud. These developments will require web-based tools that enable users to access, analyze and view vast remotely stored data with a level of sophistication and interactivity that approximates desktop applications. As rapidly dropping cost enables researchers to collect data intended to answer questions in very specialized contexts, developers must also provide software libraries that empower users to implement customized data analyses and data views for their particular application. Such specialized, yet lightweight, applications would empower scientists to better answer specific biological questions than possible with general-purpose genome browsers currently available. Results: Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Scribl simplifies the development of sophisticated web-based graphical tools that approach the dynamism and interactivity of desktop applications. Availability and implementation: Software is freely available online at http://chmille4.github.com/Scribl/ and is implemented in JavaScript with all modern browsers supported. Contact: gabor.marth@bc.edu Supplementary information: Supplementary data are available at Bioinformatics

  15. Genome editing using FACS enrichment of nuclease-expressing cells and indel detection by amplicon analysis.

    PubMed

    Lonowski, Lindsey A; Narimatsu, Yoshiki; Riaz, Anjum; Delay, Catherine E; Yang, Zhang; Niola, Francesco; Duda, Katarzyna; Ober, Elke A; Clausen, Henrik; Wandall, Hans H; Hansen, Steen H; Bennett, Eric P; Frödin, Morten

    2017-03-01

    This protocol describes methods for increasing and evaluating the efficiency of genome editing based on the CRISPR-Cas9 (clustered regularly interspaced short palindromic repeats-CRISPR-associated 9) system, transcription activator-like effector nucleases (TALENs) or zinc-finger nucleases (ZFNs). First, Indel Detection by Amplicon Analysis (IDAA) determines the size and frequency of insertions and deletions elicited by nucleases in cells, tissues or embryos through analysis of fluorophore-labeled PCR amplicons covering the nuclease target site by capillary electrophoresis in a sequenator. Second, FACS enrichment of cells expressing nucleases linked to fluorescent proteins can be used to maximize knockout or knock-in editing efficiencies or to balance editing efficiency and toxic/off-target effects. The two methods can be combined to form a pipeline for cell-line editing that facilitates the testing of new nuclease reagents and the generation of edited cell pools or clonal cell lines, reducing the number of clones that need to be generated and increasing the ease with which they are screened. The pipeline shortens the time line, but it most prominently reduces the workload of cell-line editing, which may be completed within 4 weeks.

  16. Genome-wide survey of ds exonization to enrich transcriptomes and proteomes in plants.

    PubMed

    Liu, Li-Yu Daisy; Charng, Yuh-Chyang

    2012-01-01

    Insertion of transposable elements (TEs) into introns can lead to their activation as alternatively spliced cassette exons, an event called exonization which can enrich the complexity of transcriptomes and proteomes. Previously, we performed the first experimental assessment of TE exonization by inserting a Ds element into each intron of the rice epsps gene. Exonization of Ds in plants was biased toward providing splice donor sites from the beginning of the inserted Ds sequence. Additionally, Ds inserted in the reverse direction resulted in a continuous splice donor consensus region by offering 4 donor sites in the same intron. The current study involved genome-wide computational analysis of Ds exonization events in the dicot Arabidopsis thaliana and the monocot Oryza sativa (rice). Up to 71% of the exonized transcripts were putative targets for the nonsense-mediated decay (NMD) pathway. The insertion patterns of Ds and the polymorphic splice donor sites increased the transcripts and subsequent protein isoforms. Protein isoforms contain protein sequence due to unspliced intron-TE region and/or a shift of the reading frame. The number of interior protein isoforms would be twice that of C-terminal isoforms, on average. TE exonization provides a promising way for functional expansion of the plant proteome.

  17. A Census of rRNA Genes and Linked Genomic Sequences within a Soil Metagenomic Library

    PubMed Central

    Liles, Mark R.; Manske, Brian F.; Bintrim, Scott B.; Handelsman, Jo; Goodman, Robert M.

    2003-01-01

    We have analyzed the diversity of microbial genomes represented in a library of metagenomic DNA from soil. A total of 24,400 bacterial artificial chromosome (BAC) clones were screened for 16S rRNA genes. The sequences obtained from BAC clones were compared with a collection generated by direct PCR amplification and cloning of 16S rRNA genes from the same soil. The results indicated that the BAC library had substantially lower representation of bacteria among the Bacillus, α-Proteobacteria, and CFB groups; greater representation among the β- and γ-Proteobacteria, and OP10 divisions; and no rRNA genes from the domains Eukaryota and Archaea. In addition to rRNA genes recovered from the bacterial divisions Proteobacteria, Verrucomicrobia, Firmicutes, Cytophagales, and OP11, we identified many rRNA genes from the BAC library affiliated with the bacterial division Acidobacterium; all of these sequences were affiliated with subdivisions that lack cultured representatives. The complete sequence of one BAC clone derived from a member of the Acidobacterium division revealed a complete rRNA operon and 20 other open reading frames, including predicted gene products involved in cell division, cell cycling, folic acid biosynthesis, substrate metabolism, amino acid uptake, DNA repair, and transcriptional regulation. This study is the first step in using genomics to reveal the physiology of as-yet-uncultured members of the Acidobacterium division. PMID:12732537

  18. A genome-wide library of CB4856/N2 introgression lines of Caenorhabditis elegans

    PubMed Central

    Doroszuk, Agnieszka; Snoek, L. Basten; Fradin, Emilie; Riksen, Joost; Kammenga, Jan

    2009-01-01

    Recombinant inbred lines (RILs) derived from Caenorhabditis elegans wild-type N2 and CB4856 are increasingly being used for mapping genes underlying complex traits. To speed up mapping and gene discovery, introgression lines (ILs) offer a powerful tool for more efficient QTL identification. We constructed a library of 90 ILs, each carrying a single homozygous CB4856 genomic segment introgressed into the genetic background of N2. The ILs were genotyped by 123 single-nucleotide polymorphism (SNP) markers. The proportion of the CB4856 segments in most lines does not exceed 3%, and together the introgressions cover 96% of the CB4856 genome. The value of the IL library was demonstrated by identifying novel loci underlying natural variation in two ageing-related traits, i.e. lifespan and pharyngeal pumping rate. Bin mapping of lifespan resulted in six QTLs, which all have a lifespan-shortening effect on the CB4856 allele. We found five QTLs for the decrease in pumping rate, of which four colocated with QTLs found for average lifespan. This suggests pleiotropic or closely linked QTL associated with lifespan and pumping rate. Overall, the presented IL library provides a versatile resource toward easier and efficient fine mapping and functional analyses of loci and genes underlying complex traits in C. elegans. PMID:19542186

  19. Construction and Analysis of Two Genome-Scale Deletion Libraries for Bacillus subtilis.

    PubMed

    Koo, Byoung-Mo; Kritikos, George; Farelli, Jeremiah D; Todor, Horia; Tong, Kenneth; Kimsey, Harvey; Wapinski, Ilan; Galardini, Marco; Cabal, Angelo; Peters, Jason M; Hachmann, Anna-Barbara; Rudner, David Z; Allen, Karen N; Typas, Athanasios; Gross, Carol A

    2017-03-22

    A systems-level understanding of Gram-positive bacteria is important from both an environmental and health perspective and is most easily obtained when high-quality, validated genomic resources are available. To this end, we constructed two ordered, barcoded, erythromycin-resistance- and kanamycin-resistance-marked single-gene deletion libraries of the Gram-positive model organism, Bacillus subtilis. The libraries comprise 3,968 and 3,970 genes, respectively, and overlap in all but four genes. Using these libraries, we update the set of essential genes known for this organism, provide a comprehensive compendium of B. subtilis auxotrophic genes, and identify genes required for utilizing specific carbon and nitrogen sources, as well as those required for growth at low temperature. We report the identification of enzymes catalyzing several missing steps in amino acid biosynthesis. Finally, we describe a suite of high-throughput phenotyping methodologies and apply them to provide a genome-wide analysis of competence and sporulation. Altogether, we provide versatile resources for studying gene function and pathway and network architecture in Gram-positive bacteria.

  20. Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.

    PubMed

    Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan

    2017-08-17

    Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.

  1. Strong Enrichment of Aromatic Residues in Binding Sites from a Charge-neutralized Hyperthermostable Sso7d Scaffold Library*

    PubMed Central

    Kiefer, Jonathan D.; Srinivas, Raja R.; Lobner, Elisabeth; Tisdale, Alison W.; Mehta, Naveen K.; Yang, Nicole J.; Tidor, Bruce; Wittrup, K. Dane

    2016-01-01

    The Sso7d protein from the hyperthermophilic archaeon Sulfolobus solfataricus is an attractive binding scaffold because of its small size (7 kDa), high thermal stability (Tm of 98 °C), and absence of cysteines and glycosylation sites. However, as a DNA-binding protein, Sso7d is highly positively charged, introducing a strong specificity constraint for binding epitopes and leading to nonspecific interaction with mammalian cell membranes. In the present study, we report charge-neutralized variants of Sso7d that maintain high thermal stability. Yeast-displayed libraries that were based on this reduced charge Sso7d (rcSso7d) scaffold yielded binders with low nanomolar affinities against mouse serum albumin and several epitopes on human epidermal growth factor receptor. Importantly, starting from a charge-neutralized scaffold facilitated evolutionary adaptation of binders to differentially charged epitopes on mouse serum albumin and human epidermal growth factor receptor, respectively. Interestingly, the distribution of amino acids in the small and rigid binding surface of enriched rcSso7d-based binders is very different from that generally found in more flexible antibody complementarity-determining region loops but resembles the composition of antibody-binding energetic hot spots. Particularly striking was a strong enrichment of the aromatic residues Trp, Tyr, and Phe in rcSso7d-based binders. This suggests that the rigidity and small size of this scaffold determines the unusual amino acid composition of its binding sites, mimicking the energetic core of antibody paratopes. Despite the high frequency of aromatic residues, these rcSso7d-based binders are highly expressed, thermostable, and monomeric, suggesting that the hyperstability of the starting scaffold and the rigidness of the binding surface confer a high tolerance to mutation. PMID:27582495

  2. Generation and analysis of large-scale expressed sequence tags (ESTs) from a full-length enriched cDNA library of porcine backfat tissue

    PubMed Central

    Kim, Tae-Hun; Kim, Nam-Soon; Lim, Dajeong; Lee, Kyung-Tai; Oh, Jung-Hwa; Park, Hye-Sook; Jang, Gil-Won; Kim, Hyung-Yong; Jeon, Mina; Choi, Bong-Hwan; Lee, Hae-Young; Chung, HY; Kim, Heebal

    2006-01-01

    Background Genome research in farm animals will expand our basic knowledge of the genetic control of complex traits, and the results will be applied in the livestock industry to improve meat quality and productivity, as well as to reduce the incidence of disease. A combination of quantitative trait locus mapping and microarray analysis is a useful approach to reduce the overall effort needed to identify genes associated with quantitative traits of interest. Results We constructed a full-length enriched cDNA library from porcine backfat tissue. The estimated average size of the cDNA inserts was 1.7 kb, and the cDNA fullness ratio was 70%. In total, we deposited 16,110 high-quality sequences in the dbEST division of GenBank (accession numbers: DT319652-DT335761). For all the expressed sequence tags (ESTs), approximately 10.9 Mb of porcine sequence were generated with an average length of 674 bp per EST (range: 200–952 bp). Clustering and assembly of these ESTs resulted in a total of 5,008 unique sequences with 1,776 contigs (35.46%) and 3,232 singleton (65.54%) ESTs. From a total of 5,008 unique sequences, 3,154 (62.98%) were similar to other sequences, and 1,854 (37.02%) were identified as having no hit or low identity (<95%) and 60% coverage in The Institute for Genomic Research (TIGR) gene index of Sus scrofa. Gene ontology (GO) annotation of unique sequences showed that approximately 31.7, 32.3, and 30.8% were assigned molecular function, biological process, and cellular component GO terms, respectively. A total of 1,854 putative novel transcripts resulted after comparison and filtering with the TIGR SsGI; these included a large percentage of singletons (80.64%) and a small proportion of contigs (13.36%). Conclusion The sequence data generated in this study will provide valuable information for studying expression profiles using EST-based microarrays and assist in the condensation of current pig TCs into clusters representing longer stretches of cDNA sequences

  3. From the ORFeome concept to highly comprehensive, full-genome screening libraries.

    PubMed

    Rid, Raphaela; Abdel-Hadi, Omar; Maier, Richard; Wagner, Martin; Hundsberger, Harald; Hintner, Helmut; Bauer, Johann; Onder, Kamil

    2013-02-01

    Recombination-based cloning techniques have in recent times facilitated the establishment of genome-scale single-gene ORFeome repositories. Their further handling and downstream application in systematic fashion is, however, practically impeded because of logistical plus economic challenges. At this juncture, simultaneously transferring entire gene collections in compiled pool format could represent an advanced compromise between systematic ORFeome (an organism's entire set of protein-encoding open reading frames) projects and traditional random library approaches, but has not yet been considered in great detail. In our endeavor to merge the comprehensiveness of ORFeomes with a basically simple, streamlined, and easily executable single-tube design, we have here produced five different pooled screening-ready libraries for both Staphylococcus aureus and Homo sapiens. By evaluating the parallel transfer efficiencies of differentially sized genes from initial polymerase chain reaction (PCR) product amplification to entry and final destination library construction via quantitative real-time PCR, we found that the complexity of the gene population is fairly stably maintained once an entry resource has been successfully established, and that no apparent size-selection bias loss of large inserts takes place. Recombinational transfer processes are hence robust enough for straightforwardly achieving such pooled screening libraries.

  4. A genome-wide CRISPR library for high-throughput genetic screening in Drosophila cells.

    PubMed

    Bassett, Andrew R; Kong, Lesheng; Liu, Ji-Long

    2015-06-20

    The simplicity of the CRISPR/Cas9 system of genome engineering has opened up the possibility of performing genome-wide targeted mutagenesis in cell lines, enabling screening for cellular phenotypes resulting from genetic aberrations. Drosophila cells have proven to be highly effective in identifying genes involved in cellular processes through similar screens using partial knockdown by RNAi. This is in part due to the lower degree of redundancy between genes in this organism, whilst still maintaining highly conserved gene networks and orthologs of many human disease-causing genes. The ability of CRISPR to generate genetic loss of function mutations not only increases the magnitude of any effect over currently employed RNAi techniques, but allows analysis over longer periods of time which can be critical for certain phenotypes. In this study, we have designed and built a genome-wide CRISPR library covering 13,501 genes, among which 8989 genes are targeted by three or more independent single guide RNAs (sgRNAs). Moreover, we describe strategies to monitor the population of guide RNAs by high throughput sequencing (HTS). We hope that this library will provide an invaluable resource for the community to screen loss of function mutations for cellular phenotypes, and as a source of guide RNA designs for future studies.

  5. A Genome-Wide CRISPR Library for High-Throughput Genetic Screening in Drosophila Cells

    PubMed Central

    Bassett, Andrew R.; Kong, Lesheng; Liu, Ji-Long

    2015-01-01

    The simplicity of the CRISPR/Cas9 system of genome engineering has opened up the possibility of performing genome-wide targeted mutagenesis in cell lines, enabling screening for cellular phenotypes resulting from genetic aberrations. Drosophila cells have proven to be highly effective in identifying genes involved in cellular processes through similar screens using partial knockdown by RNAi. This is in part due to the lower degree of redundancy between genes in this organism, whilst still maintaining highly conserved gene networks and orthologs of many human disease-causing genes. The ability of CRISPR to generate genetic loss of function mutations not only increases the magnitude of any effect over currently employed RNAi techniques, but allows analysis over longer periods of time which can be critical for certain phenotypes. In this study, we have designed and built a genome-wide CRISPR library covering 13,501 genes, among which 8989 genes are targeted by three or more independent single guide RNAs (sgRNAs). Moreover, we describe strategies to monitor the population of guide RNAs by high throughput sequencing (HTS). We hope that this library will provide an invaluable resource for the community to screen loss of function mutations for cellular phenotypes, and as a source of guide RNA designs for future studies. PMID:26165496

  6. Functional Selection of Vaccine Candidate Peptides from Staphylococcus aureus Whole-Genome Expression Libraries In Vitro

    PubMed Central

    Weichhart, Thomas; Horky, Markus; Söllner, Johannes; Gangl, Susanne; Henics, Tamàs; Nagy, Eszter; Meinke, Andreas; von Gabain, Alexander; Fraser, Claire M.; Gill, Steve R.; Hafner, Martin; von Ahsen, Uwe

    2003-01-01

    An in vitro protein selection method, ribosome display, has been applied to comprehensively identify and map the immunologically relevant proteins of the human pathogen Staphylococcus aureus. A library built up from genomic fragments of the virulent S. aureus COL strain (methicillin-resistant S. aureus) allowed us to screen all possible encoded peptides for immunoreactivity. As selective agents, human sera exhibiting a high antibody titer and opsonic activity against S. aureus were used, since these antibodies indicate the in vivo expression and immunoreactivity of the corresponding proteins. Identified clones cluster in distinct regions of 75 genes, most of them classifiable as secreted or surface-localized proteins, including previously identified virulence factors. In addition, 14 putative novel short open reading frames were identified and their immunoreactivity and in vivo mRNA expression were confirmed, underscoring the annotation-independent, true genomic nature of our approach. Evidence is provided that a large fraction of the identified peptides cannot be expressed in an in vivo-based surface display system. Thus, in vitro protein selection, not biased by the context of living entities, allows screening of genomic expression libraries with a large number of different ligands simultaneously. It is a powerful approach for fingerprinting the repertoire of immune reactive proteins serving as target candidates for active and passive vaccination against pathogens. PMID:12874343

  7. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  8. Construction and analysis of Pst I DNA library for RFLP mapping of the rye genome

    SciTech Connect

    Korzun, V.N.; Kartel, N.A.; Boerner, A.

    1995-06-01

    Pst I, a methylation-sensitive restriction enzyme, was used for producing a library of rye genome DNA rich in low-copy sequences, and intended as probes for genetic mapping. Dot-hybridization and Southern blot analysis showed that 43.6% of the library is represented by low-copy DNA sequences. To locate these sequences on chromosomes and determine the degree of their repetitiveness, 11 clones were hybridized with DNA of nulli-tetrasomic lines of Chinese Spring wheat, wheat-rye addition lines, and barley cleaved by Hind III, EcoR I, EcoR V, Dra I, and BamH I restriction enzymes. Each of the rye DNA clones studied hybridized with wheat and barley DNA, suggesting that low-copy Pst I clones of rye correspond to the evolutionary conservative DNA fraction in cereals. 21 refs., 3 figs., 2 tabs.

  9. Genome sequence of Candidatus Nitrososphaera evergladensis from group I.1b enriched from Everglades soil reveals novel genomic features of the ammonia-oxidizing archaea.

    PubMed

    Zhalnina, Kateryna V; Dias, Raquel; Leonard, Michael T; Dorr de Quadros, Patricia; Camargo, Flavio A O; Drew, Jennifer C; Farmerie, William G; Daroub, Samira H; Triplett, Eric W

    2014-01-01

    The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group.

  10. Genome Sequence of Candidatus Nitrososphaera evergladensis from Group I.1b Enriched from Everglades Soil Reveals Novel Genomic Features of the Ammonia-Oxidizing Archaea

    PubMed Central

    Zhalnina, Kateryna V.; Dias, Raquel; Leonard, Michael T.; Dorr de Quadros, Patricia; Camargo, Flavio A. O.; Drew, Jennifer C.; Farmerie, William G.; Daroub, Samira H.; Triplett, Eric W.

    2014-01-01

    The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group. PMID:24999826

  11. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.)

    PubMed Central

    2013-01-01

    Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome

  12. Microsatellite markers isolated from Cabomba aquatica s.l. (Cabombaceae) from an enriched genomic library1

    PubMed Central

    Barbosa, Tiago D. M.; Trad, Rafaela J.; Bajay, Miklos M.; Amaral, Maria C. E.

    2015-01-01

    Premise of the study: Microsatellite primers were designed for the submersed aquatic plant Cabomba aquatica s.l. (Cabombaceae) and characterized to estimate genetic diversity parameters. Methods and Results: Using a selective hybridization method, we designed and tested 30 simple sequence repeat loci using two natural populations of C. aquatica s.l., resulting in 13 amplifiable loci. Twelve loci were polymorphic, and alleles per locus ranged from two to four across the 49 C. aquatica s.l. individuals. Observed heterozygosity, expected heterozygosity, and fixation index varied from 0.0 to 1.0, 0.0 to 0.5, and −1.0 to −0.0667, respectively, for the Manaus population and from 0.0 to 1.0, 0.0 to 0.6, and −1.0 to 0.4643 for the Viruá population. Conclusions: The developed markers will be used in further taxonomic and population studies within Cabomba. This set of microsatellite primers represents the first report on rapid molecular markers in the genus. PMID:26649271

  13. ParallABEL: an R library for generalized parallelization of genome-wide association studies

    PubMed Central

    2010-01-01

    Background Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Results Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity

  14. ParallABEL: an R library for generalized parallelization of genome-wide association studies.

    PubMed

    Sangket, Unitsa; Mahasirimongkol, Surakameth; Chantratita, Wasun; Tandayya, Pichaya; Aulchenko, Yurii S

    2010-04-29

    Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was

  15. Biases in the SMART-DNA library preparation method associated with genomic poly dA/dT sequences.

    PubMed

    Vardi, Oriya; Shamir, Inbal; Javasky, Elisheva; Goren, Alon; Simon, Itamar

    2017-01-01

    Avoiding biases in next generation sequencing (NGS) library preparation is crucial for obtaining reliable sequencing data. Recently, a new library preparation method has been introduced which has eliminated the need for the ligation step. This method, termed SMART (switching mechanism at the 5' end of the RNA transcript), is based on template switching reverse transcription. To date, there has been no systematic analysis of the additional biases introduced by this method. We analysed the genomic distribution of sequenced reads prepared from genomic DNA using the SMART methodology and found a strong bias toward long (≥12bp) poly dA/dT containing genomic loci. This bias is unique to the SMART-based library preparation and does not appear when libraries are prepared with conventional ligation based methods. Although this bias is obvious only when performing paired end sequencing, it affects single end sequenced samples as well. Our analysis demonstrates that sequenced reads originating from SMART-DNA libraries are heavily skewed toward genomic poly dA/dT tracts. This bias needs to be considered when deciding to use SMART based technology for library preparation.

  16. Biases in the SMART-DNA library preparation method associated with genomic poly dA/dT sequences

    PubMed Central

    Shamir, Inbal; Javasky, Elisheva; Goren, Alon; Simon, Itamar

    2017-01-01

    Avoiding biases in next generation sequencing (NGS) library preparation is crucial for obtaining reliable sequencing data. Recently, a new library preparation method has been introduced which has eliminated the need for the ligation step. This method, termed SMART (switching mechanism at the 5′ end of the RNA transcript), is based on template switching reverse transcription. To date, there has been no systematic analysis of the additional biases introduced by this method. We analysed the genomic distribution of sequenced reads prepared from genomic DNA using the SMART methodology and found a strong bias toward long (≥12bp) poly dA/dT containing genomic loci. This bias is unique to the SMART-based library preparation and does not appear when libraries are prepared with conventional ligation based methods. Although this bias is obvious only when performing paired end sequencing, it affects single end sequenced samples as well. Our analysis demonstrates that sequenced reads originating from SMART-DNA libraries are heavily skewed toward genomic poly dA/dT tracts. This bias needs to be considered when deciding to use SMART based technology for library preparation. PMID:28235101

  17. Successful enrichment and recovery of whole mitochondrial genomes from ancient human dental calculus

    PubMed Central

    Ozga, Andrew T.; Nieves‐Colón, Maria A.; Honap, Tanvi P.; Sankaranarayanan, Krithivasan; Hofman, Courtney A.; Milner, George R.; Lewis, Cecil M.; Stone, Anne C.

    2016-01-01

    ABSTRACT Objectives Archaeological dental calculus is a rich source of host‐associated biomolecules. Importantly, however, dental calculus is more accurately described as a calcified microbial biofilm than a host tissue. As such, concerns regarding destructive analysis of human remains may not apply as strongly to dental calculus, opening the possibility of obtaining human health and ancestry information from dental calculus in cases where destructive analysis of conventional skeletal remains is not permitted. Here we investigate the preservation of human mitochondrial DNA (mtDNA) in archaeological dental calculus and its potential for full mitochondrial genome (mitogenome) reconstruction in maternal lineage ancestry analysis. Materials and Methods Extracted DNA from six individuals at the 700‐year‐old Norris Farms #36 cemetery in Illinois was enriched for mtDNA using in‐solution capture techniques, followed by Illumina high‐throughput sequencing. Results Full mitogenomes (7–34×) were successfully reconstructed from dental calculus for all six individuals, including three individuals who had previously tested negative for DNA preservation in bone using conventional PCR techniques. Mitochondrial haplogroup assignments were consistent with previously published findings, and additional comparative analysis of paired dental calculus and dentine from two individuals yielded equivalent haplotype results. All dental calculus samples exhibited damage patterns consistent with ancient DNA, and mitochondrial sequences were estimated to be 92–100% endogenous. DNA polymerase choice was found to impact error rates in downstream sequence analysis, but these effects can be mitigated by greater sequencing depth. Discussion Dental calculus is a viable alternative source of human DNA that can be used to reconstruct full mitogenomes from archaeological remains. Am J Phys Anthropol 160:220–228, 2016. © 2016 The Authors American Journal of Physical Anthropology

  18. Successful enrichment and recovery of whole mitochondrial genomes from ancient human dental calculus.

    PubMed

    Ozga, Andrew T; Nieves-Colón, Maria A; Honap, Tanvi P; Sankaranarayanan, Krithivasan; Hofman, Courtney A; Milner, George R; Lewis, Cecil M; Stone, Anne C; Warinner, Christina

    2016-06-01

    Archaeological dental calculus is a rich source of host-associated biomolecules. Importantly, however, dental calculus is more accurately described as a calcified microbial biofilm than a host tissue. As such, concerns regarding destructive analysis of human remains may not apply as strongly to dental calculus, opening the possibility of obtaining human health and ancestry information from dental calculus in cases where destructive analysis of conventional skeletal remains is not permitted. Here we investigate the preservation of human mitochondrial DNA (mtDNA) in archaeological dental calculus and its potential for full mitochondrial genome (mitogenome) reconstruction in maternal lineage ancestry analysis. Extracted DNA from six individuals at the 700-year-old Norris Farms #36 cemetery in Illinois was enriched for mtDNA using in-solution capture techniques, followed by Illumina high-throughput sequencing. Full mitogenomes (7-34×) were successfully reconstructed from dental calculus for all six individuals, including three individuals who had previously tested negative for DNA preservation in bone using conventional PCR techniques. Mitochondrial haplogroup assignments were consistent with previously published findings, and additional comparative analysis of paired dental calculus and dentine from two individuals yielded equivalent haplotype results. All dental calculus samples exhibited damage patterns consistent with ancient DNA, and mitochondrial sequences were estimated to be 92-100% endogenous. DNA polymerase choice was found to impact error rates in downstream sequence analysis, but these effects can be mitigated by greater sequencing depth. Dental calculus is a viable alternative source of human DNA that can be used to reconstruct full mitogenomes from archaeological remains. Am J Phys Anthropol 160:220-228, 2016. © 2016 The Authors American Journal of Physical Anthropology Published by Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  19. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries.

    PubMed

    Kumar, Santosh; You, Frank M; Cloutier, Sylvie

    2012-12-06

    Flax (Linum usitatissimum L.) is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents). Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from flax. The genotyping-by-sequencing approach proved to be

  20. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries

    PubMed Central

    2012-01-01

    Background Flax (Linum usitatissimum L.) is a significant fibre and oilseed crop. Current flax molecular markers, including isozymes, RAPDs, AFLPs and SSRs are of limited use in the construction of high density linkage maps and for association mapping applications due to factors such as low reproducibility, intense labour requirements and/or limited numbers. We report here on the use of a reduced representation library strategy combined with next generation Illumina sequencing for rapid and large scale discovery of SNPs in eight flax genotypes. SNP discovery was performed through in silico analysis of the sequencing data against the whole genome shotgun sequence assembly of flax genotype CDC Bethune. Genotyping-by-sequencing of an F6-derived recombinant inbred line population provided validation of the SNPs. Results Reduced representation libraries of eight flax genotypes were sequenced on the Illumina sequencing platform resulting in sequence coverage ranging from 4.33 to 15.64X (genome equivalents). Depending on the relatedness of the genotypes and the number and length of the reads, between 78% and 93% of the reads mapped onto the CDC Bethune whole genome shotgun sequence assembly. A total of 55,465 SNPs were discovered with the largest number of SNPs belonging to the genotypes with the highest mapping coverage percentage. Approximately 84% of the SNPs discovered were identified in a single genotype, 13% were shared between any two genotypes and the remaining 3% in three or more. Nearly a quarter of the SNPs were found in genic regions. A total of 4,706 out of 4,863 SNPs discovered in Macbeth were validated using genotyping-by-sequencing of 96 F6 individuals from a recombinant inbred line population derived from a cross between CDC Bethune and Macbeth, corresponding to a validation rate of 96.8%. Conclusions Next generation sequencing of reduced representation libraries was successfully implemented for genome-wide SNP discovery from flax. The genotyping

  1. Fosmid library end sequencing reveals a rarely known genome structure of marine shrimp Penaeus monodon

    PubMed Central

    2011-01-01

    Background The black tiger shrimp (Penaeus monodon) is one of the most important aquaculture species in the world, representing the crustacean lineage which possesses the greatest species diversity among marine invertebrates. Yet, we barely know anything about their genomic structure. To understand the organization and evolution of the P. monodon genome, a fosmid library consisting of 288,000 colonies and was constructed, equivalent to 5.3-fold coverage of the 2.17 Gb genome. Approximately 11.1 Mb of fosmid end sequences (FESs) from 20,926 non-redundant reads representing 0.45% of the P. monodon genome were obtained for repetitive and protein-coding sequence analyses. Results We found that microsatellite sequences were highly abundant in the P. monodon genome, comprising 8.3% of the total length. The density and the average length of microsatellites were evidently higher in comparison to those of other taxa. AT-rich microsatellite motifs, especially poly (AT) and poly (AAT), were the most abundant. High abundance of microsatellite sequences were also found in the transcribed regions. Furthermore, via self-BlastN analysis we identified 103 novel repetitive element families which were categorized into four groups, i.e., 33 WSSV-like repeats, 14 retrotransposons, 5 gene-like repeats, and 51 unannotated repeats. Overall, various types of repeats comprise 51.18% of the P. monodon genome in length. Approximately 7.4% of the FESs contained protein-coding sequences, and the Inhibitor of Apoptosis Protein (IAP) gene and the Innexin 3 gene homologues appear to be present in high abundance in the P. monodon genome. Conclusions The redundancy of various repeat types in the P. monodon genome illustrates its highly repetitive nature. In particular, long and dense microsatellite sequences as well as abundant WSSV-like sequences highlight the uniqueness of genome organization of penaeid shrimp from those of other taxa. These results provide substantial improvement to our current

  2. [Extraction of DNA from environmental samples and construction of mixed genomic DNA libraries].

    PubMed

    Wang, X; Tang, Y; Wang, J; Huang, Y; Chen, R; Huang, L

    2001-04-01

    A method has been developed for extracting and purifying genomic DNA from environmental samples. In this method, an environmental sample is treated first by grinding and freezing/thawing and subsequently by SDS/proteinase K-based DNA extraction. The yields of purified DNA from three samples used in this study ranged from 2 to 16 micrograms per gram of dry sample. Mixed genomic DNA libraries for two of the environmental samples were constructed by inserting restriction fragments (3-8 kb) of the purified DNAs into plasmid pUC18 and transforming E. coli DH5 alpha with the resultant plasmids. Approximately 10(3) to 10(4) insert-containing clones were obtained from 1 g of each sample. Clone libraries were analyzed by DNA sequencing and gene annotation. Among 20 randomly-selected clones, 14 contained an insert whose sequence had not been reported while the rest had an insert of either E. coli or vector origin. A search of sequence databases using the end sequences of each of the foreign inserts showed that each sequence was part of a gene encoding, in most cases, a predictable function. Our results are of significance to the collection, investigation and exploitation of the genes of uncultured microorganisms.

  3. Identifying microbial fitness determinants by insertion sequencing using genome-wide transposon mutant libraries.

    PubMed

    Goodman, Andrew L; Wu, Meng; Gordon, Jeffrey I

    2011-11-17

    Insertion sequencing (INSeq) is a method for determining the insertion site and relative abundance of large numbers of transposon mutants in a mixed population of isogenic mutants of a sequenced microbial species. INSeq is based on a modified mariner transposon containing MmeI sites at its ends, allowing cleavage at chromosomal sites 16-17 bp from the inserted transposon. Genomic regions adjacent to the transposons are amplified by linear PCR with a biotinylated primer. Products are bound to magnetic beads, digested with MmeI and barcoded with sample-specific linkers appended to each restriction fragment. After limited PCR amplification, fragments are sequenced using a high-throughput instrument. The sequence of each read can be used to map the location of a transposon in the genome. Read count measures the relative abundance of that mutant in the population. Solid-phase library preparation makes this protocol rapid (18 h), easy to scale up, amenable to automation and useful for a variety of samples. A protocol for characterizing libraries of transposon mutant strains clonally arrayed in a multiwell format is provided.

  4. Alzheimer's Disease Variants with the Genome-Wide Significance are Significantly Enriched in Immune Pathways and Active in Immune Cells.

    PubMed

    Jiang, Qinghua; Jin, Shuilin; Jiang, Yongshuai; Liao, Mingzhi; Feng, Rennan; Zhang, Liangcai; Liu, Guiyou; Hao, Junwei

    2017-01-01

    The existing large-scale genome-wide association studies (GWAS) datasets provide strong support for investigating the mechanisms of Alzheimer's disease (AD) by applying multiple methods of pathway analysis. Previous studies using selected single nucleotide polymorphisms (SNPs) with several thresholds of nominal significance for pathway analysis determined that the threshold chosen for SNPs can reflect the disease model. Presumably, then, pathway analysis with a stringent threshold to define "associated" SNPs would test the hypothesis that highly associated SNPs are enriched in one or more particular pathways. Here, we selected 599 AD variants (P < 5.00E-08) to investigate the pathways in which these variants are enriched and the cell types in which these variants are active. Our results showed that AD variants are significantly enriched in pathways of the immune system. Further analysis indicated that AD variants are significantly enriched for enhancers in a number of cell types, in particular the B-lymphocyte, which is the most substantially enriched cell type. This cell type maintains its dominance among the strongest enhancers. AD SNPs also display significant enrichment for DNase in 12 cell types, among which the top 6 significant signals are from immune cell types, including 4 B cells (top 4 significant signals) and CD14+ and CD34+ cells. In summary, our results show that these AD variants with P < 5.00E-08 are significantly enriched in pathways of the immune system and active in immune cells. To a certain degree, the genetic predisposition for development of AD is rooted in the immune system, rather than in neuronal cells.

  5. The Role of Libraries in the Mutual Enrichment of the National Cultures of the Peoples of the Soviet Union

    ERIC Educational Resources Information Center

    Fonotov, G. P.

    1978-01-01

    Traces literacy and library developments throughout the USSR; the focus is on publications in the various languages of the USSR, and on the role of the libraries in the dissemination of such materials. (Author)

  6. Reprogramming cell fate with a genome-scale library of artificial transcription factors

    PubMed Central

    Eguchi, Asuka; Wleklinski, Matthew J.; Spurgat, Mackenzie C.; Heiderscheit, Evan A.; Kropornicka, Anna S.; Vu, Catherine K.; Bhimsaria, Devesh; Swanson, Scott A.; Stewart, Ron; Ramanathan, Parameswaran; Kamp, Timothy J.; Slukvin, Igor; Thomson, James A.; Dutton, James R.; Ansari, Aseem Z.

    2016-01-01

    Artificial transcription factors (ATFs) are precision-tailored molecules designed to bind DNA and regulate transcription in a preprogrammed manner. Libraries of ATFs enable the high-throughput screening of gene networks that trigger cell fate decisions or phenotypic changes. We developed a genome-scale library of ATFs that display an engineered interaction domain (ID) to enable cooperative assembly and synergistic gene expression at targeted sites. We used this ATF library to screen for key regulators of the pluripotency network and discovered three combinations of ATFs capable of inducing pluripotency without exogenous expression of Oct4 (POU domain, class 5, TF 1). Cognate site identification, global transcriptional profiling, and identification of ATF binding sites reveal that the ATFs do not directly target Oct4; instead, they target distinct nodes that converge to stimulate the endogenous pluripotency network. This forward genetic approach enables cell type conversions without a priori knowledge of potential key regulators and reveals unanticipated gene network dynamics that drive cell fate choices. PMID:27930301

  7. Genomic Libraries and a Host Strain Designed for Highly Efficient Two-Hybrid Selection in Yeast

    PubMed Central

    James, P.; Halladay, J.; Craig, E. A.

    1996-01-01

    The two-hybrid system is a powerful technique for detecting protein-protein interactions that utilizes the well-developed molecular genetics of the yeast Saccharomyces cerevisiae. However, the full potential of this technique has not been realized due to limitations imposed by the components available for use in the system. These limitations include unwieldy plasmid vectors, incomplete or poorly designed two-hybrid libraries, and host strains that result in the selection of large numbers of false positives. We have used a novel multienzyme approach to generate a set of highly representative genomic libraries from S. cerevisiae. In addition, a unique host strain was created that contains three easily assayed reporter genes, each under the control of a different inducible promoter. This host strain is extremely sensitive to weak interactions and eliminates nearly all false positives using simple plate assays. Improved vectors were also constructed that simplify the construction of the gene fusions necessary for the two-hybrid system. Our analysis indicates that the libraries and host strain provide significant improvements in both the number of interacting clones identified and the efficiency of two-hybrid selections. PMID:8978031

  8. Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

    PubMed Central

    2013-01-01

    Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that

  9. High-throughput screening of a Corynebacterium glutamicum mutant library on genomic and metabolic level.

    PubMed

    Reimer, Lorenz C; Spura, Jana; Schmidt-Hohagen, Kerstin; Schomburg, Dietmar

    2014-01-01

    Due to impressive achievements in genomic research, the number of genome sequences has risen quickly, followed by an increasing number of genes with unknown or hypothetical function. This strongly calls for development of high-throughput methods in the fields of transcriptomics, proteomics and metabolomics. Of these platforms, metabolic profiling has the strongest correlation with the phenotype. We previously published a high-throughput metabolic profiling method for C. glutamicum as well as the automatic GC/MS processing software MetaboliteDetector. Here, we added a high-throughput transposon insertion determination for our C. glutamicum mutant library. The combination of these methods allows the parallel analysis of genotype/phenotype correlations for a large number of mutants. In a pilot project we analyzed the insertion points of 722 transposon mutants and found that 36% of the affected genes have unknown functions. This underlines the need for further information gathered by high-throughput techniques. We therefore measured the metabolic profiles of 258 randomly chosen mutants. The MetaboliteDetector software processed this large amount of GC/MS data within a few hours with a low relative error of 11.5% for technical replicates. Pairwise correlation analysis of metabolites over all genotypes showed dependencies of known and unknown metabolites. For a first insight into this large data set, a screening for interesting mutants was done by a pattern search, focusing on mutants with changes in specific pathways. We show that our transposon mutant library is not biased with respect to insertion points. A comparison of the results for specific mutants with previously published metabolic results on a deletion mutant of the same gene confirmed the concept of high-throughput metabolic profiling. Altogether the described method could be applied to whole mutant libraries and thereby help to gain comprehensive information about genes with unknown, hypothetical and known

  10. Exon-Enriched Libraries Reveal Large Genic Differences Between Aedes aegypti from Senegal, West Africa, and Populations Outside Africa

    PubMed Central

    Dickson, Laura B.; Campbell, Corey L.; Juneja, Punita; Jiggins, Francis M.; Sylla, Massamba; Black, William C.

    2016-01-01

    Aedes aegypti is one of the most studied mosquito species, and the principal vector of several arboviruses pathogenic to humans. Recently failure to oviposit, low fecundity, and poor egg-to-adult survival were observed when Ae. aegypti from Senegal (SenAae) West Africa were crossed with Ae. aegypti (Aaa) from outside of Africa, and in SenAae intercrosses. Fluorescent in situ hybridization analyses indicated rearrangements on chromosome 1, and pericentric inversions on chromosomes 2 and 3. Herein, high throughput sequencing (HTS) of exon-enriched libraries was used to compare chromosome-wide genetic diversity among Aaa collections from rural Thailand and Mexico, a sylvatic collection from southeastern Senegal (PK10), and an urban collection from western Senegal (Kaolack). Sex-specific polymorphisms were analyzed in Thailand and PK10 to assess genetic differences between sexes. Expected heterozygosity was greatest in SenAae. FST distributions of 15,735 genes among all six pairwise comparisons of the four collections indicated that Mexican and Thailand collections are genetically similar, while FST distributions between PK10 and Kaolack were distinct. All four comparisons of SenAae with Aaa indicated extreme differentiation. FST was uniform between sexes across all chromosomes in Thailand, but were different, especially on the sex autosome 1, in PK10. These patterns correlate with the reproductive isolation noted earlier. We hypothesize that cryptic Ae. aegypti taxa may exist in West Africa, and the large genic differences between Aaa and SenAae detected in the present study have accumulated over a long period following the evolution of chromosome rearrangements in allopatric populations that subsequently cause reproductive isolation when these populations became sympatric. PMID:28007834

  11. Exon-Enriched Libraries Reveal Large Genic Differences Between Aedes aegypti from Senegal, West Africa, and Populations Outside Africa.

    PubMed

    Dickson, Laura B; Campbell, Corey L; Juneja, Punita; Jiggins, Francis M; Sylla, Massamba; Black, William C

    2017-02-09

    Aedes aegypti is one of the most studied mosquito species, and the principal vector of several arboviruses pathogenic to humans. Recently failure to oviposit, low fecundity, and poor egg-to-adult survival were observed when Ae. aegypti from Senegal (SenAae) West Africa were crossed with Ae. aegypti (Aaa) from outside of Africa, and in SenAae intercrosses. Fluorescent in situ hybridization analyses indicated rearrangements on chromosome 1, and pericentric inversions on chromosomes 2 and 3. Herein, high throughput sequencing (HTS) of exon-enriched libraries was used to compare chromosome-wide genetic diversity among Aaa collections from rural Thailand and Mexico, a sylvatic collection from southeastern Senegal (PK10), and an urban collection from western Senegal (Kaolack). Sex-specific polymorphisms were analyzed in Thailand and PK10 to assess genetic differences between sexes. Expected heterozygosity was greatest in SenAae FST distributions of 15,735 genes among all six pairwise comparisons of the four collections indicated that Mexican and Thailand collections are genetically similar, while FST distributions between PK10 and Kaolack were distinct. All four comparisons of SenAae with Aaa indicated extreme differentiation. FST was uniform between sexes across all chromosomes in Thailand, but were different, especially on the sex autosome 1, in PK10. These patterns correlate with the reproductive isolation noted earlier. We hypothesize that cryptic Ae. aegypti taxa may exist in West Africa, and the large genic differences between Aaa and SenAae detected in the present study have accumulated over a long period following the evolution of chromosome rearrangements in allopatric populations that subsequently cause reproductive isolation when these populations became sympatric.

  12. A kingdom-specific protein domain HMM library for improved annotation of fungal genomes

    PubMed Central

    Alam, Intikhab; Hubbard, Simon J; Oliver, Stephen G; Rattray, Magnus

    2007-01-01

    Background Pfam is a general-purpose database of protein domain alignments and profile Hidden Markov Models (HMMs), which is very popular for the annotation of sequence data produced by genome sequencing projects. Pfam provides models that are often very general in terms of the taxa that they cover and it has previously been suggested that such general models may lack some of the specificity or selectivity that would be provided by kingdom-specific models. Results Here we present a general approach to create domain libraries of HMMs for sub-taxa of a kingdom. Taking fungal species as an example, we construct a domain library of HMMs (called Fungal Pfam or FPfam) using sequences from 30 genomes, consisting of 24 species from the ascomycetes group and two basidiomycetes, Ustilago maydis, a fungal pathogen of maize, and the white rot fungus Phanerochaete chrysosporium. In addition, we include the Microsporidion Encephalitozoon cuniculi, an obligate intracellular parasite, and two non-fungal species, the oomycetes Phytophthora sojae and Phytophthora ramorum, both plant pathogens. We evaluate the performance in terms of coverage against the original 30 genomes used in training FPfam and against five more recently sequenced fungal genomes that can be considered as an independent test set. We show that kingdom-specific models such as FPfam can find instances of both novel and well characterized domains, increases overall coverage and detects more domains per sequence with typically higher bitscores than Pfam for the same domain families. An evaluation of the effect of changing E-values on the coverage shows that the performance of FPfam is consistent over the range of E-values applied. Conclusion Kingdom-specific models are shown to provide improved coverage. However, as the models become more specific, some sequences found by Pfam may be missed by the models in FPfam and some of the families represented in the test set are not present in FPfam. Therefore, we recommend

  13. Raalin, a transcript enriched in the honey bee brain, is a remnant of genomic rearrangement in Hymenoptera.

    PubMed

    Tirosh, Y; Morpurgo, N; Cohen, M; Linial, M; Bloch, G

    2012-06-01

    We identified a predicted compact cysteine-rich sequence in the honey bee genome that we called 'Raalin'. Raalin transcripts are enriched in the brain of adult honey bee workers and drones, with only minimum expression in other tissues or in pre-adult stages. Open-reading frame (ORF) homologues of Raalin were identified in the transcriptomes of fruit flies, mosquitoes and moths. The Raalin-like gene from Drosophila melanogaster encodes for a short secreted protein that is maximally expressed in the adult brain with negligible expression in other tissues or pre-imaginal stages. Raalin-like sequences have also been found in the recently sequenced genomes of six ant species, but not in the jewel wasp Nasonia vitripennis. As in the honey bee, the Raalin-like sequences of ants do not have an ORF. A comparison of the genome region containing Raalin in the genomes of bees, ants and the wasp provides evolutionary support for an extensive genome rearrangement in this sequence. Our analyses identify a new family of ancient cysteine-rich short sequences in insects in which insertions and genome rearrangements may have disrupted this locus in the branch leading to the Hymenoptera. The regulated expression of this transcript suggests that it has a brain-specific function. © 2012 The Authors. Insect Molecular Biology © 2012 The Royal Entomological Society.

  14. A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor.

    PubMed

    Han, L Y; Ma, X H; Lin, H H; Jia, J; Zhu, F; Xue, Y; Li, Z R; Cao, Z W; Ji, Z L; Chen, Y Z

    2008-06-01

    Support vector machines (SVM) and other machine-learning (ML) methods have been explored as ligand-based virtual screening (VS) tools for facilitating lead discovery. While exhibiting good hit selection performance, in screening large compound libraries, these methods tend to produce lower hit-rate than those of the best performing VS tools, partly because their training-sets contain limited spectrum of inactive compounds. We tested whether the performance of SVM can be improved by using training-sets of diverse inactive compounds. In retrospective database screening of active compounds of single mechanism (HIV protease inhibitors, DHFR inhibitors, dopamine antagonists) and multiple mechanisms (CNS active agents) from large libraries of 2.986 million compounds, the yields, hit-rates, and enrichment factors of our SVM models are 52.4-78.0%, 4.7-73.8%, and 214-10,543, respectively, compared to those of 62-95%, 0.65-35%, and 20-1200 by structure-based VS and 55-81%, 0.2-0.7%, and 110-795 by other ligand-based VS tools in screening libraries of >or=1 million compounds. The hit-rates are comparable and the enrichment factors are substantially better than the best results of other VS tools. 24.3-87.6% of the predicted hits are outside the known hit families. SVM appears to be potentially useful for facilitating lead discovery in VS of large compound libraries.

  15. Draft Genome Sequence of Paenibacillus sp. Strain DMB5, Acclimatized and Enriched for Catabolizing Anthropogenic Compounds

    PubMed Central

    Johnson, Jenny; Shah, Binal; Jain, Kunal; Parmar, Nidhi; Hinsu, Ankit; Patel, Namrata

    2016-01-01

    Here, we present the draft genome sequence of Paenibacillus sp. strain DMB5, isolated from polluted sediments of the Kharicut Canal, Vatva, India, having a genome size of 7.5 Mbp and 7,077 coding sequences. The genome of this dye-degrading bacterium provides valuable information on the microbe-mediated biodegradation of anthropogenic compounds. PMID:27034501

  16. Functional Screening of Metagenome and Genome Libraries for Detection of Novel Flavonoid-Modifying Enzymes

    PubMed Central

    Rabausch, U.; Juergensen, J.; Ilmberger, N.; Böhnke, S.; Fischer, S.; Schubach, B.; Schulte, M.

    2013-01-01

    The functional detection of novel enzymes other than hydrolases from metagenomes is limited since only a very few reliable screening procedures are available that allow the rapid screening of large clone libraries. For the discovery of flavonoid-modifying enzymes in genome and metagenome clone libraries, we have developed a new screening system based on high-performance thin-layer chromatography (HPTLC). This metagenome extract thin-layer chromatography analysis (META) allows the rapid detection of glycosyltransferase (GT) and also other flavonoid-modifying activities. The developed screening method is highly sensitive, and an amount of 4 ng of modified flavonoid molecules can be detected. This novel technology was validated against a control library of 1,920 fosmid clones generated from a single Bacillus cereus isolate and then used to analyze more than 38,000 clones derived from two different metagenomic preparations. Thereby we identified two novel UDP glycosyltransferase (UGT) genes. The metagenome-derived gtfC gene encoded a 52-kDa protein, and the deduced amino acid sequence was weakly similar to sequences of putative UGTs from Fibrisoma and Dyadobacter. GtfC mediated the transfer of different hexose moieties and exhibited high activities on flavones, flavonols, flavanones, and stilbenes and also accepted isoflavones and chalcones. From the control library we identified a novel macroside glycosyltransferase (MGT) with a calculated molecular mass of 46 kDa. The deduced amino acid sequence was highly similar to sequences of MGTs from Bacillus thuringiensis. Recombinant MgtB transferred the sugar residue from UDP-glucose effectively to flavones, flavonols, isoflavones, and flavanones. Moreover, MgtB exhibited high activity on larger flavonoid molecules such as tiliroside. PMID:23686272

  17. Ligation Bias in Illumina Next-Generation DNA Libraries: Implications for Sequencing Ancient Genomes

    PubMed Central

    Seguin-Orlando, Andaine; Schubert, Mikkel; Clary, Joel; Stagegaard, Julia; Alberdi, Maria T.; Prado, José Luis; Prieto, Alfredo; Willerslev, Eske; Orlando, Ludovic

    2013-01-01

    Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries. PMID:24205269

  18. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.

    PubMed

    Seguin-Orlando, Andaine; Schubert, Mikkel; Clary, Joel; Stagegaard, Julia; Alberdi, Maria T; Prado, José Luis; Prieto, Alfredo; Willerslev, Eske; Orlando, Ludovic

    2013-01-01

    Ancient DNA extracts consist of a mixture of endogenous molecules and contaminant DNA templates, often originating from environmental microbes. These two populations of templates exhibit different chemical characteristics, with the former showing depurination and cytosine deamination by-products, resulting from post-mortem DNA damage. Such chemical modifications can interfere with the molecular tools used for building second-generation DNA libraries, and limit our ability to fully characterize the true complexity of ancient DNA extracts. In this study, we first use fresh DNA extracts to demonstrate that library preparation based on adapter ligation at AT-overhangs are biased against DNA templates starting with thymine residues, contrarily to blunt-end adapter ligation. We observe the same bias on fresh DNA extracts sheared on Bioruptor, Covaris and nebulizers. This contradicts previous reports suggesting that this bias could originate from the methods used for shearing DNA. This also suggests that AT-overhang adapter ligation efficiency is affected in a sequence-dependent manner and results in an uneven representation of different genomic contexts. We then show how this bias could affect the base composition of ancient DNA libraries prepared following AT-overhang ligation, mainly by limiting the ability to ligate DNA templates starting with thymines and therefore deaminated cytosines. This results in particular nucleotide misincorporation damage patterns, deviating from the signature generally expected for authenticating ancient sequence data. Consequently, we show that models adequate for estimating post-mortem DNA damage levels must be robust to the molecular tools used for building ancient DNA libraries.

  19. Toward an Integrated BAC Library Resource for Genome Sequencing and Analysis

    SciTech Connect

    Simon, M. I.; Kim, U.-J.

    2002-02-26

    We developed a great deal of expertise in building large BAC libraries from a variety of DNA sources including humans, mice, corn, microorganisms, worms, and Arabidopsis. We greatly improved the technology for screening these libraries rapidly and for selecting appropriate BACs and mapping BACs to develop large overlapping contigs. We became involved in supplying BACs and BAC contigs to a variety of sequencing and mapping projects and we began to collaborate with Drs. Adams and Venter at TIGR and with Dr. Leroy Hood and his group at University of Washington to provide BACs for end sequencing and for mapping and sequencing of large fragments of chromosome 16. Together with Dr. Ian Dunham and his co-workers at the Sanger Center we completed the mapping and they completed the sequencing of the first human chromosome, chromosome 22. This was published in Nature in 1999 and our BAC contigs made a major contribution to this sequencing effort. Drs. Shizuya and Ding invented an automated highly accurate BAC mapping technique. We also developed long-term collaborations with Dr. Uli Weier at UCSF in the design of BAC probes for characterization of human tumors and specific chromosome deletions and breakpoints. Finally the contribution of our work to the human genome project has been recognized in the publication both by the international consortium and the NIH of a draft sequence of the human genome in Nature last year. Dr. Shizuya was acknowledged in the authorship of that landmark paper. Dr. Simon was also an author on the Venter/Adams Celera project sequencing the human genome that was published in Science last year.

  20. Chromosome region-specific libraries for human genome analysis. Final progress report, 1 March 1991--28 February 1994

    SciTech Connect

    Kao, F.T.

    1994-04-01

    The objectives of this grant proposal include (1) development of a chromosome microdissection and PCR-mediated microcloning technology, (2) application of this microtechnology to the construction of region-specific libraries for human genome analysis. During this grant period, the authors have successfully developed this microtechnology and have applied it to the construction of microdissection libraries for the following chromosome regions: a whole chromosome 21 (21E), 2 region-specific libraries for the long arm of chromosome 2, 2q35-q37 (2Q1) and 2q33-q35 (2Q2), and 4 region-specific libraries for the entire short arm of chromosome 2, 2p23-p25 (2P1), 2p21-p23 (2P2), 2p14-p16 (wP3) and 2p11-p13 (2P4). In addition, 20--40 unique sequence microclones have been isolated and characterized for genomic studies. These region-specific libraries and the single-copy microclones from the library have been used as valuable resources for (1) isolating microsatellite probes in linkage analysis to further refine the disease locus; (2) isolating corresponding clones with large inserts, e.g. YAC, BAC, P1, cosmid and phage, to facilitate construction of contigs for high resolution physical mapping; and (3) isolating region-specific cDNA clones for use as candidate genes. These libraries are being deposited in the American Type Culture Collection (ATCC) for general distribution.

  1. Characterization of Uncultured Genome Fragment from Soil Metagenomic Library Exposed Rare Mismatch of Internal Tetranucleotide Frequency

    PubMed Central

    Liu, Yunpeng; Yang, Dongqing; Zhang, Nan; Chen, Lin; Cui, Zhongli; Shen, Qirong; Zhang, Ruifu

    2016-01-01

    Exploring the genomic information of a specific uncultured soil bacterium is vital to understand its function in the ecosystem but is still a challenge due to the lack of culture techniques. To examine the genomes of uncultured bacteria, a metagenomic bacterial artificial chromosome library derived from a soil sample was screened for 16S rDNA-containing clones. Five clones (4C6, 5E7, 5G4, 5G12, and 5H7) containing uncultured soil bacteria genome fragment (with low 16S rDNA similarity to isolated bacteria) were selected for sequencing. Clone 5E7 and 5G4 showed only 82 and 83% of 16S rDNA identity to known sequences. Phylogenetic analysis of 16S rDNA indicated that 5E7 and 5G4 were potentially from new class of Chloroflexi. Only one-third of the 5G4 open reading frames have significant hits against HMMER. Internal tetranucleotide frequency analysis indicated that the unknown region of 5G4 was poorly correlated with other parts of the clone, indicating that this section might be obtained through lateral transfer. It was suggested that this region rich for unknown genes is under fast evolution. PMID:28066395

  2. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs

    PubMed Central

    Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W.; Gallardo-Escarate, Cristian; Planas, Josep V.; Mackenzie, Simon

    2017-01-01

    This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene

  3. Extending Immunological Profiling in the Gilthead Sea Bream, Sparus aurata, by Enriched cDNA Library Analysis, Microarray Design and Initial Studies upon the Inflammatory Response to PAMPs.

    PubMed

    Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W; Gallardo-Escarate, Cristian; Planas, Josep V; Mackenzie, Simon

    2017-02-03

    This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene

  4. A Deep-Coverage Tomato BAC Library and Prospects Toward Development of an STC Framework for Genome Sequencing

    PubMed Central

    Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.

    2000-01-01

    Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957

  5. Genome engineering uncovers 54 evolutionarily conserved and testis-enriched genes that are not required for male fertility in mice.

    PubMed

    Miyata, Haruhiko; Castaneda, Julio M; Fujihara, Yoshitaka; Yu, Zhifeng; Archambeault, Denise R; Isotani, Ayako; Kiyozumi, Daiji; Kriseman, Maya L; Mashiko, Daisuke; Matsumura, Takafumi; Matzuk, Ryan M; Mori, Masashi; Noda, Taichi; Oji, Asami; Okabe, Masaru; Prunskaite-Hyyrylainen, Renata; Ramirez-Solis, Ramiro; Satouh, Yuhkoh; Zhang, Qian; Ikawa, Masahito; Matzuk, Martin M

    2016-07-12

    Gene-expression analysis studies from Schultz et al. estimate that more than 2,300 genes in the mouse genome are expressed predominantly in the male germ line. As of their 2003 publication [Schultz N, Hamra FK, Garbers DL (2003) Proc Natl Acad Sci USA 100(21):12201-12206], the functions of the majority of these testis-enriched genes during spermatogenesis and fertilization were largely unknown. Since the study by Schultz et al., functional analysis of hundreds of reproductive-tract-enriched genes have been performed, but there remain many testis-enriched genes for which their relevance to reproduction remain unexplored or unreported. Historically, a gene knockout is the "gold standard" to determine whether a gene's function is essential in vivo. Although knockout mice without apparent phenotypes are rarely published, these knockout mouse lines and their phenotypic information need to be shared to prevent redundant experiments. Herein, we used bioinformatic and experimental approaches to uncover mouse testis-enriched genes that are evolutionarily conserved in humans. We then used gene-disruption approaches, including Knockout Mouse Project resources (targeting vectors and mice) and CRISPR/Cas9, to mutate and quickly analyze the fertility of these mutant mice. We discovered that 54 mutant mouse lines were fertile. Thus, despite evolutionary conservation of these genes in vertebrates and in some cases in all eukaryotes, our results indicate that these genes are not individually essential for male mouse fertility. Our phenotypic data are highly relevant in this fiscally tight funding period and postgenomic age when large numbers of genomes are being analyzed for disease association, and will prevent unnecessary expenditures and duplications of effort by others.

  6. Genomic DNA Enrichment Using Sequence Capture Microarrays: a Novel Approach to Discover Sequence Nucleotide Polymorphisms (SNP) in Brassica napus L

    PubMed Central

    Clarke, Wayne E.; Parkin, Isobel A.; Gajardo, Humberto A.; Gerhardt, Daniel J.; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G.; Snowdon, Rod J.; Federico, Maria L.; Iniguez-Luy, Federico L.

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci –QTL– analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species. PMID:24312619

  7. Genomic DNA enrichment using sequence capture microarrays: a novel approach to discover sequence nucleotide polymorphisms (SNP) in Brassica napus L.

    PubMed

    Clarke, Wayne E; Parkin, Isobel A; Gajardo, Humberto A; Gerhardt, Daniel J; Higgins, Erin; Sidebottom, Christine; Sharpe, Andrew G; Snowdon, Rod J; Federico, Maria L; Iniguez-Luy, Federico L

    2013-01-01

    Targeted genomic selection methodologies, or sequence capture, allow for DNA enrichment and large-scale resequencing and characterization of natural genetic variation in species with complex genomes, such as rapeseed canola (Brassica napus L., AACC, 2n=38). The main goal of this project was to combine sequence capture with next generation sequencing (NGS) to discover single nucleotide polymorphisms (SNPs) in specific areas of the B. napus genome historically associated (via quantitative trait loci -QTL- analysis) to traits of agronomical and nutritional importance. A 2.1 million feature sequence capture platform was designed to interrogate DNA sequence variation across 47 specific genomic regions, representing 51.2 Mb of the Brassica A and C genomes, in ten diverse rapeseed genotypes. All ten genotypes were sequenced using the 454 Life Sciences chemistry and to assess the effect of increased sequence depth, two genotypes were also sequenced using Illumina HiSeq chemistry. As a result, 589,367 potentially useful SNPs were identified. Analysis of sequence coverage indicated a four-fold increased representation of target regions, with 57% of the filtered SNPs falling within these regions. Sixty percent of discovered SNPs corresponded to transitions while 40% were transversions. Interestingly, fifty eight percent of the SNPs were found in genic regions while 42% were found in intergenic regions. Further, a high percentage of genic SNPs was found in exons (65% and 64% for the A and C genomes, respectively). Two different genotyping assays were used to validate the discovered SNPs. Validation rates ranged from 61.5% to 84% of tested SNPs, underpinning the effectiveness of this SNP discovery approach. Most importantly, the discovered SNPs were associated with agronomically important regions of the B. napus genome generating a novel data resource for research and breeding this crop species.

  8. Target identification in Fusobacterium nucleatum by subtractive genomics approach and enrichment analysis of host-pathogen protein-protein interactions.

    PubMed

    Kumar, Amit; Thotakura, Pragna Lakshmi; Tiwary, Basant Kumar; Krishna, Ramadas

    2016-05-12

    Fusobacterium nucleatum, a well studied bacterium in periodontal diseases, appendicitis, gingivitis, osteomyelitis and pregnancy complications has recently gained attention due to its association with colorectal cancer (CRC) progression. Treatment with berberine was shown to reverse F. nucleatum-induced CRC progression in mice by balancing the growth of opportunistic pathogens in tumor microenvironment. Intestinal microbiota imbalance and the infections caused by F. nucleatum might be regulated by therapeutic intervention. Hence, we aimed to predict drug target proteins in F. nucleatum, through subtractive genomics approach and host-pathogen protein-protein interactions (HP-PPIs). We also carried out enrichment analysis of host interacting partners to hypothesize the possible mechanisms involved in CRC progression due to F. nucleatum. In subtractive genomics approach, the essential, virulence and resistance related proteins were retrieved from RefSeq proteome of F. nucleatum by searching against Database of Essential Genes (DEG), Virulence Factor Database (VFDB) and Antibiotic Resistance Gene-ANNOTation (ARG-ANNOT) tool respectively. A subsequent hierarchical screening to identify non-human homologous, metabolic pathway-independent/pathway-specific and druggable proteins resulted in eight pathway-independent and 27 pathway-specific druggable targets. Co-aggregation of F. nucleatum with host induces proinflammatory gene expression thereby potentiates tumorigenesis. Hence, proteins from IBDsite, a database for inflammatory bowel disease (IBD) research and those involved in colorectal adenocarcinoma as interpreted from The Cancer Genome Atlas (TCGA) were retrieved to predict drug targets based on HP-PPIs with F. nucleatum proteome. Prediction of HP-PPIs exhibited 186 interactions contributed by 103 host and 76 bacterial proteins. Bacterial interacting partners were accounted as putative targets. And enrichment analysis of host interacting partners showed statistically

  9. SiRNA sequence model: redesign algorithm based on available genome-wide libraries.

    PubMed

    Kozak, Karol

    2013-12-01

    The evolution of RNA interference (RNAi) and the development of technologies exploiting its biology have enabled scientists to rapidly examine the consequences of depleting a particular gene product in cells. Design tools have been developed based on experimental data to increase the knockdown efficiency of siRNAs. Not all siRNAs that are developed to a given target mRNA are equally effective. Currently available design algorithms take an accession, identify conserved regions among their transcript space, find accessible regions within the mRNA, design all possible siRNAs for these regions, filter them based on multi-scores thresholds, and then perform off-target filtration. These different criteria are used by commercial suppliers to produce siRNA genome-wide libraries for different organisms. In this article, we analyze existing siRNA design algorithms and evaluate weight of design parameters for libraries produced in the last decade. We proved that not all essential parameters are currently applied by siRNA vendors. Based on our evaluation results, we were able to suggest an siRNA sequence pattern. The findings in our study can be useful for commercial vendors improving the design of RNAi constructs, by addressing both the issue of potency and the issue of specificity.

  10. From raw materials to validated system: the construction of a genomic library and microarray to interpret systemic perturbations in Northern bobwhite

    PubMed Central

    Rawat, Arun; Deng, Youping; Garcia-Reyero, Natàlia; Quinn, Michael J.; Johnson, Mark S.; Indest, Karl J.; Elasri, Mohamed O.; Perkins, Edward J.

    2010-01-01

    The limited availability of genomic tools and data for nonmodel species impedes computational and systems biology approaches in nonmodel organisms. Here we describe the development, functional annotation, and utilization of genomic tools for the avian wildlife species Northern bobwhite (Colinus virginianus) to determine the molecular impacts of exposure to 2,6-dinitrotoluene (2,6-DNT), a field contaminant of military concern. Massively parallel pyrosequencing of a normalized multitissue library of Northern bobwhite cDNAs yielded 71,384 unique transcripts that were annotated with gene ontology (GO), pathway information, and protein domain analysis. Comparative genome analyses with model organisms revealed functional homologies in 8,825 unique Northern bobwhite genes that are orthologous to 48% of Gallus gallus protein-coding genes. Pathway analysis and GO enrichment of genes differentially expressed in livers of birds exposed for 60 days (d) to 10 and 60 mg/kg/d 2,6-DNT revealed several impacts validated by RT-qPCR including: prostaglandin pathway-mediated inflammation, increased expression of a heme synthesis pathway in response to anemia, and a shift in energy metabolism toward protein catabolism via inhibition of control points for glucose and lipid metabolic pathways, PCK1 and PPARGC1, respectively. This research effort provides the first comprehensive annotated gene library for Northern bobwhite. Transcript expression analysis provided insights into the metabolic perturbations underlying several observed toxicological phenotypes in a 2,6-DNT exposure case study. Furthermore, the systemic impact of dinitrotoluenes on liver function appears conserved across species as PPAR signaling is similarly affected in fathead minnow liver tissue after exposure to 2,4-DNT. PMID:20406850

  11. An arrayed human genomic library constructed in the PAC shuttle vector pJCPAC-Mam2 for genome-wide association studies and gene therapy

    PubMed Central

    Fuesler, John; Nagahama, Yasunori; Szulewski, Joseph; Mundorff, Joshua; Bireley, Stephanie; Coren, Jonathon S.

    2012-01-01

    The various iterations of the HapMap Project and many genome-wide association studies (GWAS) have identified hundreds of potential genes involved in monogenic and multifactorial traits. We constructed an arrayed 115,000-member human genomic library in the PAC shuttle vector pJCPAC-Mam2 that can be propagated in both bacterial and human cells. The library appears to represent a two-fold coverage of the human genome. Transient transfection of a p53-containing PAC clone into p53-null Saos-2 human osteosarcoma cells demonstrated that both p53 mRNA and protein were produced. Additionally, expression of the p53 protein triggered apoptosis in a subset of the Saos-2 cells. This library should serve as a valuable resource to validate potential disease genes identified by GWAS in human cell lines and in animal models. Also, individual library members could potentially be used for gene therapy trials for a variety of recessive disorders. PMID:22285925

  12. Optimization and quality control of genome-wide Hi-C library preparation.

    PubMed

    Zhang, Xiang-Yuan; He, Chao; Ye, Bing-Yu; Xie, De-Jian; Shi, Ming-Lei; Zhang, Yan; Shen, Wen-Long; Li, Ping; Zhao, Zhi-Hu

    2017-09-20

    Highest-throughput chromosome conformation capture (Hi-C) is one of the key assays for genome- wide chromatin interaction studies. It is a time-consuming process that involves many steps and many different kinds of reagents, consumables, and equipments. At present, the reproducibility is unsatisfactory. By optimizing the key steps of the Hi-C experiment, such as crosslinking, pretreatment of digestion, inactivation of restriction enzyme, and in situ ligation etc., we established a robust Hi-C procedure and prepared two biological replicates of Hi-C libraries from the GM12878 cells. After preliminary quality control by Sanger sequencing, the two replicates were high-throughput sequenced. The bioinformatics analysis of the raw sequencing data revealed the mapping-ability and pair-mate rate of the raw data were around 90% and 72%, respectively. Additionally, after removal of self-circular ligations and dangling-end products, more than 96% of the valid pairs were reached. Genome-wide interactome profiling shows clear topological associated domains (TADs), which is consistent with previous reports. Further correlation analysis showed that the two biological replicates strongly correlate with each other in terms of both bin coverage and all bin pairs. All these results indicated that the optimized Hi-C procedure is robust and stable, which will be very helpful for the wide applications of the Hi-C assay.

  13. Library preparation methodology can influence genomic and functional predictions in human microbiome research

    PubMed Central

    Jones, Marcus B.; Highlander, Sarah K.; Anderson, Ericka L.; Li, Weizhong; Dayrit, Mark; Klitgord, Niels; Fabani, Martin M.; Seguritan, Victor; Green, Jessica; Pride, David T.; Yooseph, Shibu; Biggs, William; Nelson, Karen E.; Venter, J. Craig

    2015-01-01

    Observations from human microbiome studies are often conflicting or inconclusive. Many factors likely contribute to these issues including small cohort sizes, sample collection, and handling and processing differences. The field of microbiome research is moving from 16S rDNA gene sequencing to a more comprehensive genomic and functional representation through whole-genome sequencing (WGS) of complete communities. Here we performed quantitative and qualitative analyses comparing WGS metagenomic data from human stool specimens using the Illumina Nextera XT and Illumina TruSeq DNA PCR-free kits, and the KAPA Biosystems Hyper Prep PCR and PCR-free systems. Significant differences in taxonomy are observed among the four different next-generation sequencing library preparations using a DNA mock community and a cell control of known concentration. We also revealed biases in error profiles, duplication rates, and loss of reads representing organisms that have a high %G+C content that can significantly impact results. As with all methods, the use of benchmarking controls has revealed critical differences among methods that impact sequencing results and later would impact study interpretation. We recommend that the community adopt PCR-free–based approaches to reduce PCR bias that affects calculations of abundance and to improve assemblies for accurate taxonomic assignment. Furthermore, the inclusion of a known-input cell spike-in control provides accurate quantitation of organisms in clinical samples. PMID:26512100

  14. Library preparation methodology can influence genomic and functional predictions in human microbiome research.

    PubMed

    Jones, Marcus B; Highlander, Sarah K; Anderson, Ericka L; Li, Weizhong; Dayrit, Mark; Klitgord, Niels; Fabani, Martin M; Seguritan, Victor; Green, Jessica; Pride, David T; Yooseph, Shibu; Biggs, William; Nelson, Karen E; Venter, J Craig

    2015-11-10

    Observations from human microbiome studies are often conflicting or inconclusive. Many factors likely contribute to these issues including small cohort sizes, sample collection, and handling and processing differences. The field of microbiome research is moving from 16S rDNA gene sequencing to a more comprehensive genomic and functional representation through whole-genome sequencing (WGS) of complete communities. Here we performed quantitative and qualitative analyses comparing WGS metagenomic data from human stool specimens using the Illumina Nextera XT and Illumina TruSeq DNA PCR-free kits, and the KAPA Biosystems Hyper Prep PCR and PCR-free systems. Significant differences in taxonomy are observed among the four different next-generation sequencing library preparations using a DNA mock community and a cell control of known concentration. We also revealed biases in error profiles, duplication rates, and loss of reads representing organisms that have a high %G+C content that can significantly impact results. As with all methods, the use of benchmarking controls has revealed critical differences among methods that impact sequencing results and later would impact study interpretation. We recommend that the community adopt PCR-free-based approaches to reduce PCR bias that affects calculations of abundance and to improve assemblies for accurate taxonomic assignment. Furthermore, the inclusion of a known-input cell spike-in control provides accurate quantitation of organisms in clinical samples.

  15. USE OF COMPETITIVE GENOMIC HYBRIDIZATION TO ENRICH FOR GENOME-SPECIFIC DIFFERENCES BETWEEN TWO CLOSELY RELATED HUMAN FECAL INDICATOR BACTERIA

    EPA Science Inventory

    Enterococci are frequently used as indicators of fecal pollution in surface waters. To accelerate the identification of Enterococcus faecalis-specific DNA sequences, we employed a comparative genomic strategy utilizing a positive selection process to compare E. faec...

  16. USE OF COMPETITIVE GENOMIC HYBRIDIZATION TO ENRICH FOR GENOME-SPECIFIC DIFFERENCES BETWEEN TWO CLOSELY RELATED HUMAN FECAL INDICATOR BACTERIA

    EPA Science Inventory

    Enterococci are frequently used as indicators of fecal pollution in surface waters. To accelerate the identification of Enterococcus faecalis-specific DNA sequences, we employed a comparative genomic strategy utilizing a positive selection process to compare E. faec...

  17. The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza.

    PubMed

    Ammiraju, Jetty S S; Luo, Meizhong; Goicoechea, José L; Wang, Wenming; Kudrna, Dave; Mueller, Christopher; Talag, Jayson; Kim, HyeRan; Sisneros, Nicholas B; Blackmon, Barbara; Fang, Eric; Tomkins, Jeffery B; Brar, Darshan; MacKill, David; McCouch, Susan; Kurata, Nori; Lambert, Georgina; Galbraith, David W; Arumuganathan, K; Rao, Kiran; Walling, Jason G; Gill, Navdeep; Yu, Yeisoo; SanMiguel, Phillip; Soderlund, Carol; Jackson, Scott; Wing, Rod A

    2006-01-01

    Rice (Oryza sativa L.) is the most important food crop in the world and a model system for plant biology. With the completion of a finished genome sequence we must now functionally characterize the rice genome by a variety of methods, including comparative genomic analysis between cereal species and within the genus Oryza. Oryza contains two cultivated and 22 wild species that represent 10 distinct genome types. The wild species contain an essentially untapped reservoir of agriculturally important genes that must be harnessed if we are to maintain a safe and secure food supply for the 21st century. As a first step to functionally characterize the rice genome from a comparative standpoint, we report the construction and analysis of a comprehensive set of 12 BAC libraries that represent the 10 genome types of Oryza. To estimate the number of clones required to generate 10 genome equivalent BAC libraries we determined the genome sizes of nine of the 12 species using flow cytometry. Each library represents a minimum of 10 genome equivalents, has an average insert size range between 123 and 161 kb, an average organellar content of 0.4%-4.1% and nonrecombinant content between 0% and 5%. Genome coverage was estimated mathematically and empirically by hybridization and extensive contig and BAC end sequence analysis. A preliminary analysis of BAC end sequences of clones from these libraries indicated that LTR retrotransposons are the predominant class of repeat elements in Oryza and a roughly linear relationship of these elements with genome size was observed.

  18. Comparative genomics of grass EST libraries reveals previously uncharacterized splicing events in crop plants.

    PubMed

    Chuang, Trees-Juen; Yang, Min-Yu; Lin, Chuang-Chieh; Hsieh, Ping-Hung; Hung, Li-Yuan

    2015-02-05

    Crop plants such as rice, maize and sorghum play economically-important roles as main sources of food, fuel, and animal feed. However, current genome annotations of crop plants still suffer false-positive predictions; a more comprehensive registry of alternative splicing (AS) events is also in demand. Comparative genomics of crop plants is largely unexplored. We performed a large-scale comparative analysis (ExonFinder) of the expressed sequence tag (EST) library from nine grass plants against three crop genomes (rice, maize, and sorghum) and identified 2,879 previously-unannotated exons (i.e., novel exons) in the three crops. We validated 81% of the tested exons by RT-PCR-sequencing, supporting the effectiveness of our in silico strategy. Evolutionary analysis reveals that the novel exons, comparing with their flanking annotated ones, are generally under weaker selection pressure at the protein level, but under stronger pressure at the RNA level, suggesting that most of the novel exons also represent novel alternatively spliced variants (ASVs). However, we also observed the consistency of evolutionary rates between certain novel exons and their flanking exons, which provided further evidence of their co-occurrence in the transcripts, suggesting that previously-annotated isoforms might be subject to erroneous predictions. Our validation showed that 54% of the tested genes expressed the newly-identified isoforms that contained the novel exons, rather than the previously-annotated isoforms that excluded them. The consistent results were steadily observed across cultivated (Oryza sativa and O. glaberrima) and wild (O. rufipogon and O. nivara) rice species, asserting the necessity of our curation of the crop genome annotations. Our comparative analyses also inferred the common ancestral transcriptome of grass plants and gain- and loss-of-ASV events. We have reannotated the rice, maize, and sorghum genomes, and showed that evolutionary rates might serve as an indicator

  19. Recovery of a soybean urease genomic clone by sequential library screening with two synthetic oligodeoxynucleotides.

    PubMed

    Krueger, R W; Holland, M A; Chisholm, D; Polacco, J C

    1987-01-01

    We report the first isolation of a low-copy-number gene from a complex higher plant (soybean) genome by direct screening with synthetic oligodeoxynucleotide (oligo) probes. A synthetic, mixed, 21-nucleotide (nt) oligo (21-1) based on a seven amino acid (aa) sequence from soybean seed urease, was used to screen genomic libraries of soybean (Glycine max [L.] Merr.) in the lambda Charon 4 vector. Twenty homologous clones were recovered from a screen of 500,000 plaques. These were counterscreened with embryo-specific cDNA (15-2 cDNA) made by priming with a second, mixed 15-nt oligo (15-2), based on a Jack bean (Canavalia ensiformis) urease peptide [Takishima et al., J. Natl. Def. Med. Coll. 5 (1980) 19-23]. Five out of 20 clones were homologous to 15-2 cDNA and proved to be identical. Nucleotide sequence analysis of representative clone E15 confirmed that it contained urease sequences. Subclones of E15 homologous to the oligo probes contain a deduced amino acid sequence which matches 108 of 130 aa residues of an amino acid run in a recently published [Mamiya et al., Proc. Jap. Acad. 61B (1985) 359-398] complete protein sequence for Jack-bean seed urease. Using clone E15 as a probe of soybean embryonic mRNA revealed a homologous 3.8-kb species that is the size of the urease messenger. This species is absent from mRNA of embryos of a soybean seed urease-null mutant. However, both urease-positive and urease-null genomes contain the 11-kb DNA fragment bearing urease sequences.

  20. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  1. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  2. Electrochemical and genomic analysis of novel electroactive isolates obtained via potentiostatic enrichment from tropical sediment

    NASA Astrophysics Data System (ADS)

    Doyle, Lucinda E.; Yung, Pui Yi; Mitra, Sumitra D.; Wuertz, Stefan; Williams, Rohan B. H.; Lauro, Federico M.; Marsili, Enrico

    2017-07-01

    Enrichment of electrochemically-active microorganisms (EAM) to date has mostly relied on microbial fuel cells fed with wastewater. This study aims to enrich novel EAM by exposing tropical sediment, not frequently reported in the literature, to sustained anodic potentials. Voltamperometric techniques and electrochemical impedance spectroscopy, performed over a wide range of potentials, characterise extracellular electron transfer (EET) over time. Applied potential is found to affect biofilm electrochemical signature. Geobacter metallireducens is heavily enriched on the electrodes, as determined by metagenomic and metatranscriptomic analysis, in the first report of the species in a lactate-fed system. Two novel isolates are grown in pure culture from the enrichment, identified by 16S rRNA gene sequencing as Aeromonas and Enterobacter, respectively. The names proposed are Aeromonas sp. CL-1 and Enterobacter sp. EA-1. Both isolates are capable of EET on carbon felt and screen-printed carbon electrodes without the addition of exogenous redox mediators. Enterobacter sp. EA-1 can also perform mediated electron transfer using the soluble redox mediator 2-hydroxy-1,4-naphthoquinone (HNQ). Both isolates are able to use acetate and lactate as electron donors. This work outlines a comprehensive methodology for characterising novel EAM from unconventional inocula.

  3. Integrating molecular QTL data into genome-wide genetic association analysis: Probabilistic assessment of enrichment and colocalization

    PubMed Central

    2017-01-01

    We propose a novel statistical framework for integrating the result from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objectives of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals. We introduce a natural Bayesian hierarchical model that treats the latent association status of molecular QTLs as SNP-level annotations for candidate SNPs of complex traits. We detail a computational procedure to seamlessly perform enrichment, fine-mapping and colocalization analyses, which is a distinct feature compared to the existing colocalization analysis procedures in the literature. The proposed approach is computationally efficient and requires only summary-level statistics. We evaluate and demonstrate the proposed computational approach through extensive simulation studies and analyses of blood lipid data and the whole blood eQTL data from the GTEx project. In addition, a useful utility from our proposed method enables the computation of expected colocalization signals using simple characteristics of the association data. Using this utility, we further illustrate the importance of enrichment analysis on the ability to discover colocalized signals and the potential limitations of currently available molecular QTL data. The software pipeline that implements the proposed computation procedures, enloc, is freely available at https://github.com/xqwen/integrative. PMID:28278150

  4. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard

    PubMed Central

    Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver

    2011-01-01

    In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864

  5. Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard.

    PubMed

    Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver

    2011-04-29

    In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the "Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)" checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These "machine-readable" reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences.

  6. Genomic Contributors to Rhythm Outcome of Atrial Fibrillation Catheter Ablation – Pathway Enrichment Analysis of GWAS Data

    PubMed Central

    Ueberham, Laura; Dinov, Borislav; Sommer, Philipp; Arya, Arash; Hindricks, Gerhard; Bollmann, Andreas

    2016-01-01

    Background Left atrial enlargement and persistent atrial fibrillation (AF) are well-known predictors for arrhythmia recurrence after AF catheter ablation (LRAF). In this study, by using pathway enrichment analysis of GWAS data, we tested the hypothesis that genetic pathways associated with these phenotypes are also associated with LRAF. Methods Samples from 660 patients with paroxysmal (n = 370) or persistent AF (n = 290) undergoing de-novo AF catheter ablation were genotyped for ~1,000,000 SNPs. SNPs found to be significantly associated with left atrial diameter (LAD) or AF type were used for gene-based association tests in a systematic biological Knowledge-based mining system for Genome-wide Genetic studies (KGG). Associated genes were tested for pathway enrichment using WEB-based Gene SeT AnaLysis Toolkit (WebGestalt), the Gene Annotation Tool to Help Explain Relationships (GATHER) and the databases provided by Kyoto Encyclopedia of Genes and Genomes (KEGG). In a second step, the association of consistently enriched pathways and LRAF was tested. Results By using sequential 7-day Holter ECGs, LRAF between 3 and 12 months was observed in 48% and was associated with LAD (B = 1.801, 95% CI 0.760–2.841, p = 1.0E-3) and persistent AF (OR = 2.1; 95% CI 1.567–2.931, p = 2.0E-6). WebGestalt (adj. p = 2.7E-22) and GATHER (adj. p = 5.2E-3) identified the calcium signaling pathway (hsa04020) as the only consistently enriched pathway for LAD, while the extracellular matrix (ECM) -receptor interaction pathway (hsa04512) was the only consistently enriched pathway for AF type (adj. p = 2.1E-15 in WebGestalt; adj. p = 9.3E-4 in GATHER). Both calcium signaling (adj. p = 2.2E-17 in WebGestalt; adj. p = 2.9E-2 in GATHER) and ECM-receptor interaction (adj. p = 1.2E-10 in WebGestalt; adj. p = 2.9E-2 in GATHER) were significantly associated with LRAF. Conclusions Calcium signaling and ECM-receptor interaction pathways are associated with LAD and AF type and, in turn, with LRAF

  7. Genome-Wide Analyses in Bacteria Show Small-RNA Enrichment for Long and Conserved Intergenic Regions

    PubMed Central

    Tsai, Chen-Hsun; Liao, Rick; Chou, Brendan; Palumbo, Michael

    2014-01-01

    Interest in finding small RNAs (sRNAs) in bacteria has significantly increased in recent years due to their regulatory functions. Development of high-throughput methods and more sophisticated computational algorithms has allowed rapid identification of sRNA candidates in different species. However, given their various sizes (50 to 500 nucleotides [nt]) and their potential genomic locations in the 5′ and 3′ untranslated regions as well as in intergenic regions, identification and validation of true sRNAs have been challenging. In addition, the evolution of bacterial sRNAs across different species continues to be puzzling, given that they can exert similar functions with various sequences and structures. In this study, we analyzed the enrichment patterns of sRNAs in 13 well-annotated bacterial species using existing transcriptome and experimental data. All intergenic regions were analyzed by WU-BLAST to examine conservation levels relative to species within or outside their genus. In total, more than 900 validated bacterial sRNAs and 23,000 intergenic regions were analyzed. The results indicate that sRNAs are enriched in intergenic regions, which are longer and more conserved than the average intergenic regions in the corresponding bacterial genome. We also found that sRNA-coding regions have different conservation levels relative to their flanking regions. This work provides a way to analyze how noncoding RNAs are distributed in bacterial genomes and also shows conserved features of intergenic regions that encode sRNAs. These results also provide insight into the functions of regions surrounding sRNAs and into optimization of RNA search algorithms. PMID:25313390

  8. Genome-wide analyses in bacteria show small-RNA enrichment for long and conserved intergenic regions.

    PubMed

    Tsai, Chen-Hsun; Liao, Rick; Chou, Brendan; Palumbo, Michael; Contreras, Lydia M

    2015-01-01

    Interest in finding small RNAs (sRNAs) in bacteria has significantly increased in recent years due to their regulatory functions. Development of high-throughput methods and more sophisticated computational algorithms has allowed rapid identification of sRNA candidates in different species. However, given their various sizes (50 to 500 nucleotides [nt]) and their potential genomic locations in the 5' and 3' untranslated regions as well as in intergenic regions, identification and validation of true sRNAs have been challenging. In addition, the evolution of bacterial sRNAs across different species continues to be puzzling, given that they can exert similar functions with various sequences and structures. In this study, we analyzed the enrichment patterns of sRNAs in 13 well-annotated bacterial species using existing transcriptome and experimental data. All intergenic regions were analyzed by WU-BLAST to examine conservation levels relative to species within or outside their genus. In total, more than 900 validated bacterial sRNAs and 23,000 intergenic regions were analyzed. The results indicate that sRNAs are enriched in intergenic regions, which are longer and more conserved than the average intergenic regions in the corresponding bacterial genome. We also found that sRNA-coding regions have different conservation levels relative to their flanking regions. This work provides a way to analyze how noncoding RNAs are distributed in bacterial genomes and also shows conserved features of intergenic regions that encode sRNAs. These results also provide insight into the functions of regions surrounding sRNAs and into optimization of RNA search algorithms.

  9. New strategy for mapping the human genome based on a novel procedure for construction of jumping libraries.

    PubMed

    Zabarovsky, E R; Boldog, F; Erlandsson, R; Kashuba, V I; Allikmets, R L; Marcsek, Z; Kisselev, L L; Stanbridge, E; Klein, G; Sumegi, J

    1991-12-01

    A novel procedure for construction of jumping libraries is described. The essential features of this procedure are as follows: (1) two diphasmid vectors (lambda SK17 and lambda SK22) are simultaneously used in the library construction to improve representativity, (2) a partial filling-in reaction is used to eliminate cloning of artifactual jumping clones and to obviate the need for a selectable marker. The procedure has been used to construct a representative human NotI jumping library (220,000 independent recombinant clones) from the lymphoblastoid cell line CBMI-Ral-STO, which features a low level of methylation of its resident EBV genomes. A human chromosome 3-specific NotI jumping library (500,000 independent recombinant clones) from the human chromosome 3 x mouse hybrid cell line MCH 903.1 has also been constructed. Of these recombinant clones 50-80% represent jumps to the neighboring cleavable NotI site. With our previously published method for construction of linking libraries this procedure makes a new genome mapping strategy feasible. This strategy includes the determination of tagging sequences adjacent to NotI sites in random linking and jumping clones. Special features of the lambda SK17 and lambda SK22 vectors facilitate such sequencing. The STS (sequence tagged site) information obtained can be assembled by computer into a map representing the linear order of the NotI sites for a chromosome or for the entire genome. The computerized mapping data can be used to retrieve clones near a region of interest. The corresponding clones can be obtained from the panel of original clones, or necessary probes can be made from genomic DNA by PCR.

  10. The next generation of target capture technologies - large DNA fragment enrichment and sequencing determines regional genomic variation of high complexity.

    PubMed

    Dapprich, Johannes; Ferriola, Deborah; Mackiewicz, Kate; Clark, Peter M; Rappaport, Eric; D'Arcy, Monica; Sasson, Ariella; Gai, Xiaowu; Schug, Jonathan; Kaestner, Klaus H; Monos, Dimitri

    2016-07-09

    The ability to capture and sequence large contiguous DNA fragments represents a significant advancement towards the comprehensive characterization of complex genomic regions. While emerging sequencing platforms are capable of producing several kilobases-long reads, the fragment sizes generated by current DNA target enrichment technologies remain a limiting factor, producing DNA fragments generally shorter than 1 kbp. The DNA enrichment methodology described herein, Region-Specific Extraction (RSE), produces DNA segments in excess of 20 kbp in length. Coupling this enrichment method to appropriate sequencing platforms will significantly enhance the ability to generate complete and accurate sequence characterization of any genomic region without the need for reference-based assembly. RSE is a long-range DNA target capture methodology that relies on the specific hybridization of short (20-25 base) oligonucleotide primers to selected sequence motifs within the DNA target region. These capture primers are then enzymatically extended on the 3'-end, incorporating biotinylated nucleotides into the DNA. Streptavidin-coated beads are subsequently used to pull-down the original, long DNA template molecules via the newly synthesized, biotinylated DNA that is bound to them. We demonstrate the accuracy, simplicity and utility of the RSE method by capturing and sequencing a 4 Mbp stretch of the major histocompatibility complex (MHC). Our results show an average depth of coverage of 164X for the entire MHC. This depth of coverage contributes significantly to a 99.94 % total coverage of the targeted region and to an accuracy that is over 99.99 %. RSE represents a cost-effective target enrichment method capable of producing sequencing templates in excess of 20 kbp in length. The utility of our method has been proven to generate superior coverage across the MHC as compared to other commercially available methodologies, with the added advantage of producing longer sequencing

  11. Rapid Virulence Annotation (RVA): identification of virulence factors using a bacterial genome library and multiple invertebrate hosts.

    PubMed

    Waterfield, Nicholas R; Sanchez-Contreras, Maria; Eleftherianos, Ioannis; Dowling, Andrea; Yang, Guowei; Wilkinson, Paul; Parkhill, Julian; Thomson, Nicholas; Reynolds, Stuart E; Bode, Helge B; Dorus, Steven; Ffrench-Constant, Richard H

    2008-10-14

    Current sequence databases now contain numerous whole genome sequences of pathogenic bacteria. However, many of the predicted genes lack any functional annotation. We describe an assumption-free approach, Rapid Virulence Annotation (RVA), for the high-throughput parallel screening of genomic libraries against four different taxa: insects, nematodes, amoeba, and mammalian macrophages. These hosts represent different aspects of both the vertebrate and invertebrate immune system. Here, we apply RVA to the emerging human pathogen Photorhabdus asymbiotica using "gain of toxicity" assays of recombinant Escherichia coli clones. We describe a wealth of potential virulence loci and attribute biological function to several putative genomic islands, which may then be further characterized using conventional molecular techniques. The application of RVA to other pathogen genomes promises to ascribe biological function to otherwise uncharacterized virulence genes.

  12. Genome wide analysis of Silurana (Xenopus) tropicalis development reveals dynamic expression using network enrichment analysis.

    PubMed

    Langlois, Valérie S; Martyniuk, Christopher J

    2013-01-01

    Development involves precise timing of gene expression and coordinated pathways for organogenesis and morphogenesis. Functional and sub-network enrichment analysis provides an integrated approach for identifying networks underlying development. The objectives of this study were to characterize early gene regulatory networks over Silurana tropicalis development from NF stage 2 to 46 using a custom Agilent 4×44K microarray. There were >8000 unique gene probes that were differentially expressed between Nieuwkoop-Faber (NF) stage 2 and stage 16, and >2000 gene probes differentially expressed between NF 34 and 46. Gene ontology revealed that genes involved in nucleosome assembly, cell division, pattern specification, neurotransmission, and general metabolism were increasingly regulated throughout development, consistent with active development. Sub-network enrichment analysis revealed that processes such as membrane hyperpolarisation, retinoic acid, cholesterol, and dopamine metabolic gene networks were activated/inhibited over time. This study identifies RNA transcripts that are potentially maternally inherited in an anuran species, provides evidence that the expression of genes involved in retinoic acid receptor signaling may increase prior to those involved in thyroid receptor signaling, and characterizes novel gene expression networks preceding organogenesis which increases understanding of the spatiotemporal embryonic development in frogs.

  13. Towards a Library of Standard Operating Procedures (SOPs) for (meta)genomic annotation

    SciTech Connect

    Kyrpides, Nikos; Angiuoli, Samuel V.; Cochrane, Guy; Field, Dawn; Garrity, George; Gussman, Aaron; Kodira, Chinnappa D.; Klimke, William; Kyrpides, Nikos; Madupu, Ramana; Markowitz, Victor; Tatusova, Tatiana; Thomson, Nick; White, Owen

    2008-04-01

    Genome annotations describe the features of genomes and accompany sequences in genome databases. The methodologies used to generate genome annotation are diverse and typically vary amongst groups. Descriptions of the annotation procedure are helpful in interpreting genome annotation data. Standard Operating Procedures (SOPs) for genome annotation describe the processes that generate genome annotations. Some groups are currently documenting procedures but standards are lacking for structure and content of annotation SOPs. In addition, there is no central repository to store and disseminate procedures and protocols for genome annotation. We highlight the importance of SOPs for genome annotation and endorse a central online repository of SOPs.

  14. Identification of Susceptible Loci and Enriched Pathways for Bipolar II Disorder Using Genome-Wide Association Studies

    PubMed Central

    Kao, Chung-Feng; Chen, Hui-Wen; Chen, Hsi-Chung; Yang, Jenn-Hwai; Huang, Ming-Chyi; Chiu, Yi-Hang; Lin, Shih-Ku; Lee, Ya-Chin; Liu, Chih-Min; Chuang, Li-Chung; Chen, Chien-Hsiun; Wu, Jer-Yuarn

    2016-01-01

    Background: This study aimed to identify susceptible loci and enriched pathways for bipolar disorder subtype II. Methods: We conducted a genome-wide association scan in discovery samples with 189 bipolar disorder subtype II patients and 1773 controls, and replication samples with 283 bipolar disorder subtype II patients and 500 controls in a Taiwanese Han population using Affymetrix Axiom Genome-Wide CHB1 Array. We performed single-marker and gene-based association analyses, as well as calculated polygeneic risk scores for bipolar disorder subtype II. Pathway enrichment analyses were employed to reveal significant biological pathways. Results: Seven markers were found to be associated with bipolar disorder subtype II in meta-analysis combining both discovery and replication samples (P<5.0×10–6), including markers in or close to MYO16, HSP90AB3P, noncoding gene LOC100507632, and markers in chromosomes 4 and 10. A novel locus, ETF1, was associated with bipolar disorder subtype II (P<6.0×10–3) in gene-based association tests. Results of risk evaluation demonstrated that higher genetic risk scores were able to distinguish bipolar disorder subtype II patients from healthy controls in both discovery (P=3.9×10–4~1.0×10–3) and replication samples (2.8×10–4~1.7×10–3). Genetic variance explained by chip markers for bipolar disorder subtype II was substantial in the discovery (55.1%) and replication (60.5%) samples. Moreover, pathways related to neurodevelopmental function, signal transduction, neuronal system, and cell adhesion molecules were significantly associated with bipolar disorder subtype II. Conclusion: We reported novel susceptible loci for pure bipolar subtype II disorder that is less addressed in the literature. Future studies are needed to confirm the roles of these loci for bipolar disorder subtype II. PMID:27450446

  15. Identification of Susceptible Loci and Enriched Pathways for Bipolar II Disorder Using Genome-Wide Association Studies.

    PubMed

    Kao, Chung-Feng; Chen, Hui-Wen; Chen, Hsi-Chung; Yang, Jenn-Hwai; Huang, Ming-Chyi; Chiu, Yi-Hang; Lin, Shih-Ku; Lee, Ya-Chin; Liu, Chih-Min; Chuang, Li-Chung; Chen, Chien-Hsiun; Wu, Jer-Yuarn; Lu, Ru-Band; Kuo, Po-Hsiu

    2016-12-01

    This study aimed to identify susceptible loci and enriched pathways for bipolar disorder subtype II. We conducted a genome-wide association scan in discovery samples with 189 bipolar disorder subtype II patients and 1773 controls, and replication samples with 283 bipolar disorder subtype II patients and 500 controls in a Taiwanese Han population using Affymetrix Axiom Genome-Wide CHB1 Array. We performed single-marker and gene-based association analyses, as well as calculated polygeneic risk scores for bipolar disorder subtype II. Pathway enrichment analyses were employed to reveal significant biological pathways. Seven markers were found to be associated with bipolar disorder subtype II in meta-analysis combining both discovery and replication samples (P<5.0×10(-6)), including markers in or close to MYO16, HSP90AB3P, noncoding gene LOC100507632, and markers in chromosomes 4 and 10. A novel locus, ETF1, was associated with bipolar disorder subtype II (P<6.0×10(-3)) in gene-based association tests. Results of risk evaluation demonstrated that higher genetic risk scores were able to distinguish bipolar disorder subtype II patients from healthy controls in both discovery (P=3.9×10(-4)~1.0×10(-3)) and replication samples (2.8×10(-4)~1.7×10(-3)). Genetic variance explained by chip markers for bipolar disorder subtype II was substantial in the discovery (55.1%) and replication (60.5%) samples. Moreover, pathways related to neurodevelopmental function, signal transduction, neuronal system, and cell adhesion molecules were significantly associated with bipolar disorder subtype II. We reported novel susceptible loci for pure bipolar subtype II disorder that is less addressed in the literature. Future studies are needed to confirm the roles of these loci for bipolar disorder subtype II. © The Author 2016. Published by Oxford University Press on behalf of CINP.

  16. A Genomic and Protein-Protein Interaction Analyses of Nonsyndromic Hearing Impairment in Cameroon Using Targeted Genomic Enrichment and Massively Parallel Sequencing.

    PubMed

    Lebeko, Kamogelo; Manyisa, Noluthando; Chimusa, Emile R; Mulder, Nicola; Dandara, Collet; Wonkam, Ambroise

    2017-02-01

    Hearing impairment (HI) is one of the leading causes of disability in the world, impacting the social, economic, and psychological well-being of the affected individual. This is particularly true in sub-Saharan Africa, which carries one of the highest burdens of this condition. Despite this, there are limited data on the most prevalent genes or mutations that cause HI among sub-Saharan Africans. Next-generation technologies, such as targeted genomic enrichment and massively parallel sequencing, offer new promise in this context. This study reports, for the first time to the best of our knowledge, on the prevalence of novel mutations identified through a platform of 116 HI genes (OtoSCOPE(®)), among 82 African probands with HI. Only variants OTOF NM_194248.2:c.766-2A>G and MYO7A NM_000260.3:c.1996C>T, p.Arg666Stop were found in 3 (3.7%) and 5 (6.1%) patients, respectively. In addition and uniquely, the analysis of protein-protein interactions (PPI), through interrogation of gene subnetworks, using a custom script and two databases (Enrichr and PANTHER), and an algorithm in the igraph package of R, identified the enrichment of sensory perception and mechanical stimulus biological processes, and the most significant molecular functions of these variants pertained to binding or structural activity. Furthermore, 10 genes (MYO7A, MYO6, KCTD3, NUMA1, MYH9, KCNQ1, UBC, DIAPH1, PSMC2, and RDX) were identified as significant hubs within the subnetworks. Results reveal that the novel variants identified among familial cases of HI in Cameroon are not common, and PPI analysis has highlighted the role of 10 genes, potentially important in understanding HI genomics among Africans.

  17. Enrichment analysis of Alu elements with different spatial chromatin proximity in the human genome.

    PubMed

    Gu, Zhuoya; Jin, Ke; Crabbe, M James C; Zhang, Yang; Liu, Xiaolin; Huang, Yanyan; Hua, Mengyi; Nan, Peng; Zhang, Zhaolei; Zhong, Yang

    2016-04-01

    Transposable elements (TEs) have no longer been totally considered as "junk DNA" for quite a time since the continual discoveries of their multifunctional roles in eukaryote genomes. As one of the most important and abundant TEs that still active in human genome, Alu, a SINE family, has demonstrated its indispensable regulatory functions at sequence level, but its spatial roles are still unclear. Technologies based on 3C (chromosome conformation capture) have revealed the mysterious three-dimensional structure of chromatin, and make it possible to study the distal chromatin interaction in the genome. To find the role TE playing in distal regulation in human genome, we compiled the new released Hi-C data, TE annotation, histone marker annotations, and the genome-wide methylation data to operate correlation analysis, and found that the density of Alu elements showed a strong positive correlation with the level of chromatin interactions (hESC: r = 0.9, P < 2.2 × 10(16); IMR90 fibroblasts: r = 0.94, P < 2.2 × 10(16)) and also have a significant positive correlation with some remote functional DNA elements like enhancers and promoters (Enhancer: hESC: r = 0.997, P = 2.3 × 10(-4); IMR90: r = 0.934, P = 2 × 10(-2); Promoter: hESC: r = 0.995, P = 3.8 × 10(-4); IMR90: r = 0.996, P = 3.2 × 10(-4)). Further investigation involving GC content and methylation status showed the GC content of Alu covered sequences shared a similar pattern with that of the overall sequence, suggesting that Alu elements also function as the GC nucleotide and CpG site provider. In all, our results suggest that the Alu elements may act as an alternative parameter to evaluate the Hi-C data, which is confirmed by the correlation analysis of Alu elements and histone markers. Moreover, the GC-rich Alu sequence can bring high GC content and methylation flexibility to the regions with more distal chromatin contact, regulating the transcription of tissue-specific genes.

  18. Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome.

    PubMed

    Cui, Pin; Löber, Ulrike; Alquezar-Planas, David E; Ishida, Yasuko; Courtiol, Alexandre; Timms, Peter; Johnson, Rebecca N; Lenz, Dorina; Helgen, Kristofer M; Roca, Alfred L; Hartman, Stefanie; Greenwood, Alex D

    2016-01-01

    Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV) is currently invading the germline of the koala (Phascolarctos cinereus) and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS) and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW) koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small.

  19. Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci

    PubMed Central

    Takata, Atsushi; Matsumoto, Naomichi; Kato, Tadafumi

    2017-01-01

    Detailed analyses of transcriptome have revealed complexity in regulation of alternative splicing (AS). These AS events often undergo modulation by genetic variants. Here we analyse RNA-sequencing data of prefrontal cortex from 206 individuals in combination with their genotypes and identify cis-acting splicing quantitative trait loci (sQTLs) throughout the genome. These sQTLs are enriched among exonic and H3K4me3-marked regions. Moreover, we observe significant enrichment of sQTLs among disease-associated loci identified by GWAS, especially in schizophrenia risk loci. Closer examination of each schizophrenia-associated loci revealed four regions (each encompasses NEK4, FXR1, SNAP91 or APOPT1), where the index SNP in GWAS is in strong linkage disequilibrium with sQTL SNP(s), suggesting dysregulation of AS as the underlying mechanism of the association signal. Our study provides an informative resource of sQTL SNPs in the human brain, which can facilitate understanding of the genetic architecture of complex brain disorders such as schizophrenia. PMID:28240266

  20. Comprehensive profiling of retroviral integration sites using target enrichment methods from historical koala samples without an assembled reference genome

    PubMed Central

    Alquezar-Planas, David E.; Ishida, Yasuko; Courtiol, Alexandre; Timms, Peter; Johnson, Rebecca N.; Lenz, Dorina; Helgen, Kristofer M.; Roca, Alfred L.; Hartman, Stefanie

    2016-01-01

    Background. Retroviral integration into the host germline results in permanent viral colonization of vertebrate genomes. The koala retrovirus (KoRV) is currently invading the germline of the koala (Phascolarctos cinereus) and provides a unique opportunity for studying retroviral endogenization. Previous analysis of KoRV integration patterns in modern koalas demonstrate that they share integration sites primarily if they are related, indicating that the process is currently driven by vertical transmission rather than infection. However, due to methodological challenges, KoRV integrations have not been comprehensively characterized. Results. To overcome these challenges, we applied and compared three target enrichment techniques coupled with next generation sequencing (NGS) and a newly customized sequence-clustering based computational pipeline to determine the integration sites for 10 museum Queensland and New South Wales (NSW) koala samples collected between the 1870s and late 1980s. A secondary aim of this study sought to identify common integration sites across modern and historical specimens by comparing our dataset to previously published studies. Several million sequences were processed, and the KoRV integration sites in each koala were characterized. Conclusions. Although the three enrichment methods each exhibited bias in integration site retrieval, a combination of two methods, Primer Extension Capture and hybridization capture is recommended for future studies on historical samples. Moreover, identification of integration sites shows that the proportion of integration sites shared between any two koalas is quite small. PMID:27069793

  1. Construction of a genomic DNA library with a TA vector and its application in cloning of the phytoene synthase gene from the cyanobacterium Spirulina platensis M-135

    NASA Astrophysics Data System (ADS)

    Yoshikazu, Kawata; Shin-Ichi, Yano; Hiroyuki, Kojima

    1998-03-01

    An efficient and simple method for constructing a genomic DNA library using a TA cloning vector is presented. It is based on the sonicative cleavage of genomic DNA and modification of fragment ends with Taq DNA polymerase, followed by ligation using a TA vector. This method was applied for cloning of the phytoene synthase gene crt B from Spirulina platensis. This method is useful when genomic DNA cannot be efficiently digested with restriction enzymes, a problem often encountered during the construction of a genomic DNA library of cyanobacteria.

  2. Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project.

    PubMed

    Fernández, Paula; Paniego, Norma; Lew, Sergio; Hopp, H Esteban; Heinz, Ruth A

    2003-09-30

    Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits. Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4). The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60%) did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences putatively related to responses to important

  3. Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project

    PubMed Central

    Fernández, Paula; Paniego, Norma; Lew, Sergio; Hopp, H Esteban; Heinz, Ruth A

    2003-01-01

    Background Subtractive hybridization methods are valuable tools for identifying differentially regulated genes in a given tissue avoiding redundant sequencing of clones representing the same expressed genes, maximizing detection of low abundant transcripts and thus, affecting the efficiency and cost effectiveness of small scale cDNA sequencing projects aimed to the specific identification of useful genes for breeding purposes. The objective of this work is to evaluate alternative strategies to high-throughput sequencing projects for the identification of novel genes differentially expressed in sunflower as a source of organ-specific genetic markers that can be functionally associated to important traits. Results Differential organ-specific ESTs were generated from leaf, stem, root and flower bud at two developmental stages (R1 and R4). The use of different sources of RNA as tester and driver cDNA for the construction of differential libraries was evaluated as a tool for detection of rare or low abundant transcripts. Organ-specificity ranged from 75 to 100% of non-redundant sequences in the different cDNA libraries. Sequence redundancy varied according to the target and driver cDNA used in each case. The R4 flower cDNA library was the less redundant library with 62% of unique sequences. Out of a total of 919 sequences that were edited and annotated, 318 were non-redundant sequences. Comparison against sequences in public databases showed that 60% of non-redundant sequences showed significant similarity to known sequences. The number of predicted novel genes varied among the different cDNA libraries, ranging from 56% in the R4 flower to 16 % in the R1 flower bud library. Comparison with sunflower ESTs on public databases showed that 197 of non-redundant sequences (60%) did not exhibit significant similarity to previously reported sunflower ESTs. This approach helped to successfully isolate a significant number of new reported sequences putatively related to responses

  4. Enrichment of low-frequency functional variants revealed by whole-genome sequencing of multiple isolated European populations

    PubMed Central

    Xue, Yali; Mezzavilla, Massimo; Haber, Marc; McCarthy, Shane; Chen, Yuan; Narasimhan, Vagheesh; Gilly, Arthur; Ayub, Qasim; Colonna, Vincenza; Southam, Lorraine; Finan, Christopher; Massaia, Andrea; Chheda, Himanshu; Palta, Priit; Ritchie, Graham; Asimit, Jennifer; Dedoussis, George; Gasparini, Paolo; Palotie, Aarno; Ripatti, Samuli; Soranzo, Nicole; Toniolo, Daniela; Wilson, James F.; Durbin, Richard; Tyler-Smith, Chris; Zeggini, Eleftheria

    2017-01-01

    The genetic features of isolated populations can boost power in complex-trait association studies, and an in-depth understanding of how their genetic variation has been shaped by their demographic history can help leverage these advantageous characteristics. Here, we perform a comprehensive investigation using 3,059 newly generated low-depth whole-genome sequences from eight European isolates and two matched general populations, together with published data from the 1000 Genomes Project and UK10K. Sequencing data give deeper and richer insights into population demography and genetic characteristics than genotype-chip data, distinguishing related populations more effectively and allowing their functional variants to be studied more fully. We demonstrate relaxation of purifying selection in the isolates, leading to enrichment of rare and low-frequency functional variants, using novel statistics, DVxy and SVxy. We also develop an isolation-index (Isx) that predicts the overall level of such key genetic characteristics and can thus help guide population choice in future complex-trait association studies. PMID:28643794

  5. Physical Analysis of the Complex Rye (Secale cereale L.) Alt4 Aluminium (Aluminum) Tolerance Locus Using a Whole-Genome BAC Library of Rye cv. Blanco

    USDA-ARS?s Scientific Manuscript database

    Rye is a diploid crop species with many outstanding qualities, and is also important as a source of new traits for wheat and triticale improvement. Here we describe a BAC library of rye cv. Blanco, representing a valuable resource for rye molecular genetic studies. The library provides a 6 × genome ...

  6. Efficiency of whole genome amplification of single circulating tumor cells enriched by CellSearch and sorted by FACS.

    PubMed

    Swennenhuis, Joost F; Reumers, Joke; Thys, Kim; Aerssens, Jeroen; Terstappen, Leon Wmm

    2013-01-01

    Tumor cells in the blood of patients with metastatic carcinomas are associated with poor survival. Knowledge of the cells' genetic make-up can help to guide targeted therapy. We evaluated the efficiency and quality of isolation and amplification of DNA from single circulating tumor cells (CTC). The efficiency of the procedure was determined by spiking blood with SKBR-3 cells, enrichment with the CellSearch system, followed by single cell sorting by fluorescence-activated cell sorting (FACS) and whole genome amplification. A selection of single cell DNA from fixed and unfixed SKBR-3 cells was exome sequenced and the DNA quality analyzed. Single CTC from patients with lung cancer were used to demonstrate the potential of single CTC molecular characterization. The overall efficiency of the procedure from spiked cell to amplified DNA was approximately 20%. Losses attributed to the CellSearch system were around 20%, transfer to FACS around 25%, sorting around 5% and DNA amplification around 25%. Exome sequencing revealed that the quality of the DNA was affected by the fixation of the cells, amplification, and the low starting quantity of DNA. A single fixed cell had an average coverage at 20× depth of 30% when sequencing to an average of 40× depth, whereas a single unfixed cell had 45% coverage. GenomiPhi-amplified genomic DNA had a coverage of 72% versus a coverage of 87% of genomic DNA. Twenty-one percent of the CTC from patients with lung cancer identified by the CellSearch system could be isolated individually and amplified. CTC enriched by the CellSearch system were sorted by FACS, and DNA retrieved and amplified with an overall efficiency of 20%. Analysis of the sequencing data showed that this DNA could be used for variant calling, but not for quantitative measurements such as copy number detection. Close to 55% of the exome of single SKBR-3 cells were successfully sequenced to 20× depth making it possible to call 72% of the variants. The overall coverage was

  7. Near-Complete Genome Sequence of Thalassospira sp. Strain KO164 Isolated from a Lignin-Enriched Marine Sediment Microcosm

    SciTech Connect

    Woo, Hannah L.; O’Dell, Kaela B.; Utturkar, Sagar; McBride, Kathryn R.; Huntemann, Marcel; Clum, Alicia; Pillay, Manoj; Palaniappan, Krishnaveni; Varghese, Neha; Mikhailova, Natalia; Stamatis, Dimitrios; Reddy, T. B. K.; Ngan, Chew Yee; Daum, Chris; Shapiro, Nicole; Markowitz, Victor; Ivanova, Natalia; Kyrpides, Nikos; Woyke, Tanja; Brown, Steven D.; Hazen, Terry C.

    2016-11-23

    We isolated Thalassospirasp. strain KO164 from eastern Mediterranean seawater and sediment laboratory microcosms enriched on insoluble organosolv lignin under oxic conditions. Furthermore, an analysis of the deep-ocean bacterium’s ability to degrade recalcitrant organics such as lignin near-complete genome sequence, will be presented here.

  8. Genome Sequence of Halomonas sp. Strain KO116, an Ionic Liquid- Tolerant Marine Bacterium Isolated from a Lignin-Enriched Seawater Microcosm

    SciTech Connect

    O'Dell, Kaela; Woo, Hannah L.; Utturkar, Sagar M.; Klingeman, Dawn Marie; Brown, Steven D.; Hazen, Terry C.

    2015-05-07

    Halomonas sp. strain KO116 was isolated from Nile Delta Mediterranean Sea surface water enriched with insoluble organosolv lignin. It was further screened for growth on alkali lignin minimal salts medium agar. The strain tolerates the ionic liquid 1-ethyl-3-methylimidazolium acetate. Its complete genome sequence is presented in this report.

  9. Genome Sequence of Halomonas sp. Strain KO116, an Ionic Liquid- Tolerant Marine Bacterium Isolated from a Lignin-Enriched Seawater Microcosm

    DOE PAGES

    O'Dell, Kaela; Woo, Hannah L.; Utturkar, Sagar M.; ...

    2015-05-07

    Halomonas sp. strain KO116 was isolated from Nile Delta Mediterranean Sea surface water enriched with insoluble organosolv lignin. It was further screened for growth on alkali lignin minimal salts medium agar. The strain tolerates the ionic liquid 1-ethyl-3-methylimidazolium acetate. Its complete genome sequence is presented in this report.

  10. Pleiotropic drug-resistance attenuated genomic library improves elucidation of drug mechanisms.

    PubMed

    Coorey, Namal V C; Matthews, James H; Bellows, David S; Atkinson, Paul H

    2015-11-01

    Identifying Saccharomyces cerevisiae genome-wide gene deletion mutants that confer hypersensitivity to a xenobiotic aids the elucidation of its mechanism of action (MoA). However, the biological activities of many xenobiotics are masked by the pleiotropic drug resistance (PDR) network which effluxes xenobiotics that are PDR substrates. The PDR network in S. cerevisiae is almost entirely under the control of two functionally homologous transcription factors Pdr1p and Pdr3p. Herein we report the construction of a PDR-attenuated haploid non-essential DMA (PA-DMA), lacking PDR1 and PDR3, which permits the MoA elucidation of xenobiotics that are PDR substrates at low concentrations. The functionality of four key cellular processes commonly activated in response to xenobiotic stress: oxidative stress response, general stress response, unfolded stress response and calcium signalling pathways were assessed in the absence of PDR1 and PDR3 genes and were found to unaltered, therefore, these key chemogenomic signatures are not lost when using the PA-DMA. Efficacy of the PA-DMA was demonstrated using cycloheximide and latrunculin A at low nanomolar concentrations to attain chemical genetic profiles that were more specific to their known main mechanisms. We also found a two-fold increase in the number of compounds that are bioactive in the pdr1Δpdr3Δ compared to the wild type strain in screening the commercially available LOPAC(1280) library. The PA-DMA should be particularly applicable to mechanism determination of xenobiotics that have limited availability, such as natural products.

  11. A new genomic library of melon introgression lines in a cantaloupe genetic background for dissecting desirable agronomical traits.

    PubMed

    Perpiñá, Gorka; Esteras, Cristina; Gibon, Yves; Monforte, Antonio J; Picó, Belén

    2016-07-08

    Genomic libraries of introgression lines (ILs) consist of collections of homozygous lines with a single chromosomal introgression from a donor genotype in a common, usually elite, genetic background, representing the whole donor genome in the full collection. Currently, the only available melon IL collection was generated using Piel de sapo (var. inodorus) as the recurrent background. ILs are not available in genetic backgrounds representing other important market class cultivars, such as the cantalupensis. The recent availability of genomic tools in melon, such as SNP collections and genetic maps, facilitates the development of such mapping populations. We have developed a new genomic library of introgression lines from the Japanese cv. Ginsen Makuwa (var. makuwa) into the French Charentais-type cv. Vedrantais (var. cantalupensis) genetic background. In order to speed up the breeding program, we applied medium-throughput SNP genotyping with Sequenom MassARRAY technology in early backcross generations and High Resolution Melting in the final steps. The phenotyping of the backcross generations and of the final set of 27 ILs (averaging 1.3 introgressions/plant and covering nearly 100 % of the donor genome), in three environments, allowed the detection of stable QTLs for flowering and fruit quality traits, including some that affect fruit size in chromosomes 6 and 11, others that change fruit shape in chromosomes 7 and 11, others that change flesh color in chromosomes 2, 8 and 9, and still others that increase sucrose content and delay climacteric behavior in chromosomes 5 and 10. A new melon IL collection in the Charentais genetic background has been developed. Genomic regions that consistently affect flowering and fruit quality traits have been identified, which demonstrates the suitability of this collection for dissecting complex traits in melon. Additionally, pre-breeding lines with new, commercially interesting phenotypes have been observed, including delayed

  12. A BAC library of the SP80-3280 sugarcane variety (saccharum sp.) and its inferred microsynteny with the sorghum genome

    PubMed Central

    2012-01-01

    Background Sugarcane breeding has significantly progressed in the last 30 years, but achieving additional yield gains has been difficult because of the constraints imposed by the complex ploidy of this crop. Sugarcane cultivars are interspecific hybrids between Saccharum officinarum and Saccharum spontaneum. S. officinarum is an octoploid with 2n = 80 chromosomes while S. spontaneum has 2n = 40 to 128 chromosomes and ploidy varying from 5 to 16. The hybrid genome is composed of 70-80% S. officinaram and 5-20% S. spontaneum chromosomes and a small proportion of recombinants. Sequencing the genome of this complex crop may help identify useful genes, either per se or through comparative genomics using closely related grasses. The construction and sequencing of a bacterial artificial chromosome (BAC) library of an elite commercial variety of sugarcane could help assembly the sugarcane genome. Results A BAC library designated SS_SBa was constructed with DNA isolated from the commercial sugarcane variety SP80-3280. The library contains 36,864 clones with an average insert size of 125 Kb, 88% of which has inserts larger than 90 Kb. Based on the estimated genome size of 760–930 Mb, the library exhibits 5–6 times coverage the monoploid sugarcane genome. Bidirectional BAC end sequencing (BESs) from a random sample of 192 BAC clones sampled genes and repetitive elements of the sugarcane genome. Forty-five per cent of the total BES nucleotides represents repetitive elements, 83% of which belonging to LTR retrotransposons. Alignment of BESs corresponding to 42 BACs to the genome sequence of the 10 sorghum chromosomes revealed regions of microsynteny, with expansions and contractions of sorghum genome regions relative to the sugarcane BAC clones. In general, the sampled sorghum genome regions presented an average 29% expansion in relation to the sugarcane syntenic BACs. Conclusion The SS_SBa BAC library represents a new resource for sugarcane genome sequencing

  13. A BAC library of the SP80-3280 sugarcane variety (saccharum sp.) and its inferred microsynteny with the sorghum genome.

    PubMed

    Figueira, Thais Rezende e Silva; Okura, Vagner; Rodrigues da Silva, Felipe; Jose da Silva, Marcio; Kudrna, Dave; Ammiraju, Jetty S S; Talag, Jayson; Wing, Rod; Arruda, Paulo

    2012-04-23

    Sugarcane breeding has significantly progressed in the last 30 years, but achieving additional yield gains has been difficult because of the constraints imposed by the complex ploidy of this crop. Sugarcane cultivars are interspecific hybrids between Saccharum officinarum and Saccharum spontaneum. S. officinarum is an octoploid with 2n = 80 chromosomes while S. spontaneum has 2n = 40 to 128 chromosomes and ploidy varying from 5 to 16. The hybrid genome is composed of 70-80% S. officinaram and 5-20% S. spontaneum chromosomes and a small proportion of recombinants. Sequencing the genome of this complex crop may help identify useful genes, either per se or through comparative genomics using closely related grasses. The construction and sequencing of a bacterial artificial chromosome (BAC) library of an elite commercial variety of sugarcane could help assembly the sugarcane genome. A BAC library designated SS_SBa was constructed with DNA isolated from the commercial sugarcane variety SP80-3280. The library contains 36,864 clones with an average insert size of 125 Kb, 88% of which has inserts larger than 90 Kb. Based on the estimated genome size of 760-930 Mb, the library exhibits 5-6 times coverage the monoploid sugarcane genome. Bidirectional BAC end sequencing (BESs) from a random sample of 192 BAC clones sampled genes and repetitive elements of the sugarcane genome. Forty-five per cent of the total BES nucleotides represents repetitive elements, 83% of which belonging to LTR retrotransposons. Alignment of BESs corresponding to 42 BACs to the genome sequence of the 10 sorghum chromosomes revealed regions of microsynteny, with expansions and contractions of sorghum genome regions relative to the sugarcane BAC clones. In general, the sampled sorghum genome regions presented an average 29% expansion in relation to the sugarcane syntenic BACs. The SS_SBa BAC library represents a new resource for sugarcane genome sequencing. An analysis of insert size, genome

  14. A large-insert (130 kbp) bacterial artificial chromosome library of the rice blast fungus Magnaporthe grisea: genome analysis, contig assembly, and gene cloning.

    PubMed

    Zhu, H; Choi, S; Johnston, A K; Wing, R A; Dean, R A

    1997-06-01

    Magnaporthe grisea (Hebert) Barr causes rice blast, one of the most devastating diseases of rice (Oryza sativa) worldwide. This fungus is an ideal organism for studying a number of aspects of plant-pathogen interactions, including infection-related morphogenesis, avirulence, and pathogen evolution. To facilitate M. grisea genome analysis, physical mapping, and positional cloning, we have constructed a bacterial artificial chromosome (BAC) library from the rice infecting strain 70-15. A new method was developed for separation of partially digested large-molecular-weight DNA fragments that facilitated library construction with large inserts. The library contains 9216 clones, with an average insert size of 130 kbp (> 25 genome equivalents) stored in 384-well microtiter plates that can be double spotted robotically on to a single nylon membrane. Several unlinked single-copy DNA probes were used to screen 4608 clones in the library and an average of 13 (minimum of 6) overlapping BAC clones was found in each case. Hybridization of total genomic DNA to the library and analysis of individual clones indicated that approximately 26% of the clones contain single-copy DNA. Approximately 35% of BAC clones contained the retrotransposon MAGGY. The library was used to identify BAC clones containing a adenylate cyclase gene (mac1). In addition, a 550-kbp contig composed of 6 BAC clones was constructed that encompassed two adjacent RFLP markers on chromosome 2. These data show that the BAC library is suitable for genome analysis of M. grisea. Copies of colony hybridization membranes are available upon request.

  15. Adventures in the Enormous: A 1.8 Million Clone BAC Library for the 21.7 Gb Genome of Loblolly Pine

    PubMed Central

    Magbanua, Zenaida V.; Ozkan, Seval; Bartlett, Benjamin D.; Chouvarine, Philippe; Saski, Christopher A.; Liston, Aaron; Cronn, Richard C.; Nelson, C. Dana; Peterson, Daniel G.

    2011-01-01

    Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we constructed a BAC library for the LP genotype 7-56. The LP BAC library consists of 1,824,768 individually-archived clones making it the largest single BAC library constructed to date, has a mean insert size of 96 kb, and affords 7.6X coverage of the 21.7 Gb LP genome. To demonstrate the efficacy of the library in gene isolation, we screened macroarrays with overgos designed from a pine EST anchored on LP chromosome 10. A positive BAC was sequenced and found to contain the expected full-length target gene, several gene-like regions, and both known and novel repeats. Macroarray analysis using the retrotransposon IFG-7 (the most abundant repeat in the sequenced BAC) as a probe indicates that IFG-7 is found in roughly 210,557 copies and constitutes about 5.8% or 1.26 Gb of LP nuclear DNA; this DNA quantity is eight times the Arabidopsis genome. In addition to its use in genome characterization and gene isolation as demonstrated herein, the BAC library should hasten whole genome sequencing of LP via next-generation sequencing strategies/technologies and facilitate improvement of trees through molecular breeding and genetic engineering. The library and associated products are distributed by the Clemson University Genomics Institute (www.genome.clemson.edu). PMID:21283709

  16. Intravenous infusion of phage-displayed antibody library in human cancer patients: enrichment and cancer-specificity of tumor-homing phage-antibodies.

    PubMed

    Shukla, Girja S; Krag, David N; Peletskaya, Elena N; Pero, Stephanie C; Sun, Yu-Jing; Carman, Chelsea L; McCahill, Laurence E; Roland, Thomas A

    2013-08-01

    Phage display is a powerful method for target discovery and selection of ligands for cancer treatment and diagnosis. Our goal was to select tumor-binding antibodies in cancer patients. Eligibility criteria included absence of preexisting anti-phage-antibodies and a Stage IV cancer status. All patients were intravenously administered 1 × 10(11) TUs/kg of an scFv library 1 to 4 h before surgical resection of their tumors. No significant adverse events related to the phage library infusion were observed. Phage were successfully recovered from all tumors. Individual clones from each patient were assessed for binding to the tumor from which clones were recovered. Multiple tumor-binding phage-antibodies were identified. Soluble scFv antibodies were produced from the phage clones showing higher tumor binding. The tumor-homing phage-antibodies and derived soluble scFvs were found to bind varying numbers (0-5) of 8 tested normal human tissues (breast, cervix, colon, kidney, liver, spleen, skin, and uterus). The clones that showed high tumor-specificity were found to bind corresponding tumors from other patients also. Clone enrichment was observed based on tumor binding and DNA sequence data. Clone sequences of multiple variable regions showed significant matches to certain cancer-related antibodies. One of the clones (07-2,355) that was found to share a 12-amino-acid-long motif with a reported IL-17A antibody was further studied for competitive binding for possible antigen target identification. We conclude that these outcomes support the safety and utility of phage display library panning in cancer patients for ligand selection and target discovery for cancer treatment and diagnosis.

  17. Availability of birth defects and genetic disease information in public libraries -- implications for the Human Genome Project

    SciTech Connect

    Sell, S.; Gettig, E.; Mulvihill, J.J.

    1994-09-01

    In order to better educate the public about birth defects and genetic diseases/testing, access to information is critical. The public library system of the United States is extensive and serves as an invaluable resource to citizens. We surveyed reference librarians at each of 87 public libraries in Allegheny and Westmoreland Counties, Pennsylvania. The study design included a questionnaire to ascertain the genetic knowledge of reference librarians and cataloged current resources in print and via telecommunications available to the public. A high compliance rate was achieved due to the incentive of providing copies of the Alliance of Genetic Support Group Directory to those who responded to the survey along with complete sets of the forty-three March of Dimes Information Sheets currently available. Analysis of demographic data related to the age, gender, and educational background, in addition to the occurrence of personal experiences with genetic disease was ascertained. Reference librarians were chosen as the study group due to the common experience of families seeking further information from the public library after or prior to a genetic consultation. As the Human Genome Project identifies new genes for conditions, people will seek public information more frequently. The study shows that public libraries are an appropriate point of education to and for the public.

  18. Genome-Wide Association Studies Suggest Limited Immune Gene Enrichment in Schizophrenia Compared to 5 Autoimmune Diseases.

    PubMed

    Pouget, Jennie G; Gonçalves, Vanessa F; Spain, Sarah L; Finucane, Hilary K; Raychaudhuri, Soumya; Kennedy, James L; Knight, Jo

    2016-09-01

    There has been intense debate over the immunological basis of schizophrenia, and the potential utility of adjunct immunotherapies. The major histocompatibility complex is consistently the most powerful region of association in genome-wide association studies (GWASs) of schizophrenia and has been interpreted as strong genetic evidence supporting the immune hypothesis. However, global pathway analyses provide inconsistent evidence of immune involvement in schizophrenia, and it remains unclear whether genetic data support an immune etiology per se. Here we empirically test the hypothesis that variation in immune genes contributes to schizophrenia. We show that there is no enrichment of immune loci outside of the MHC region in the largest genetic study of schizophrenia conducted to date, in contrast to 5 diseases of known immune origin. Among 108 regions of the genome previously associated with schizophrenia, we identify 6 immune candidates (DPP4, HSPD1, EGR1, CLU, ESAM, NFATC3) encoding proteins with alternative, nonimmune roles in the brain. While our findings do not refute evidence that has accumulated in support of the immune hypothesis, they suggest that genetically mediated alterations in immune function may not play a major role in schizophrenia susceptibility. Instead, there may be a role for pleiotropic effects of a small number of immune genes that also regulate brain development and plasticity. Whether immune alterations drive schizophrenia progression is an important question to be addressed by future research, especially in light of the growing interest in applying immunotherapies in schizophrenia. © The Author 2016. Published by Oxford University Press on behalf of the Maryland Psychiatric Research Center.

  19. Fast Screening Procedures for Random Transposon Libraries of Cloned Herpesvirus Genomes: Mutational Analysis of Human Cytomegalovirus Envelope Glycoprotein Genes

    PubMed Central

    Hobom, Urs; Brune, Wolfram; Messerle, Martin; Hahn, Gabriele; Koszinowski, Ulrich H.

    2000-01-01

    We have cloned the human cytomegalovirus (HCMV) genome as an infectious bacterial artificial chromosome (BAC) in Escherichia coli. Here, we have subjected the HCMV BAC to random transposon (Tn) mutagenesis using a Tn1721-derived insertion sequence and have provided the conditions for excision of the BAC cassette. We report on a fast and efficient screening procedure for a Tn insertion library. Bacterial clones containing randomly mutated full-length HCMV genomes were transferred into 96-well microtiter plates. A PCR screening method based on two Tn primers and one primer specific for the desired genomic position of the Tn insertion was established. Within three consecutive rounds of PCR a Tn insertion of interest can be assigned to a specific bacterial clone. We applied this method to retrieve mutants of HCMV envelope glycoprotein genes. To determine the infectivities of the mutant HCMV genomes, the DNA of the identified BACs was transfected into permissive fibroblasts. In contrast to BACs with mutations in the genes coding for gB, gH, gL, and gM, which did not yield infectious virus, BACs with disruptions of open reading frame UL4 (gp48) or UL74 (gO) were viable, although gO-deficient viruses showed a severe growth deficit. Thus, gO (UL74), a component of the glycoprotein complex III, is dispensable for viral growth. We conclude that our approach of PCR screening for Tn insertions will greatly facilitate the functional analysis of herpesvirus genomes. PMID:10933677

  20. Use of in vitro OmniPlex libraries for high-throughput comparative genomics and molecular haplotyping

    NASA Astrophysics Data System (ADS)

    Kamberov, Emmanuel; Sleptsova, Irina; Suchyta, Stephen; Bruening, Eric D.; Ziehler, William; Seward Nagel, Julie; Langmore, John P.; Makarov, Vladimir

    2002-06-01

    OmniPlex Technology is a new approach to genome amplification and targeted analysis. Initially the entire genome is reformatted into small, amplifiable molecules called Plexisomes, which represent the entire genome as an OmniPlex Library. The whole genome can be amplified en masse using universal primers; using locus-specific primers, regions as large as 50 kb can be amplified. Amplified Plexisomes can be analyzed using conventional methods such as capillary sequencing and microarray hybridization. The advantages to using OmniPlex as the 'front-end' for conventional analytical instruments are that a) the initial copy number of the analytes can be increased to achieve better signal-to-noire ratio, b) only a single priming site is used and c) up to 20 times fewer biochemical reactions and oligonucleotides are necessary to amplify a large region, compared to conventional PCR. These factors make OmniPlex more flexible, faster, and less expensive than conventional technologies. OmniPlex has been applied to targeted sequencing of human, animal, plant, and microorganism genomes. In addition, OmniPlex is inherently able to haplotype large regions of human DNA to accelerate target discovery and pharmacogenomics. OmniPlex will be a key tool for delivery of improved crops and livestock, new pharmaceutical products, and personalized medicine.

  1. High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.

    PubMed

    Zhou, Yuexin; Zhu, Shiyou; Cai, Changzu; Yuan, Pengfei; Li, Chunmei; Huang, Yanyi; Wei, Wensheng

    2014-05-22

    Targeted genome editing technologies are powerful tools for studying biology and disease, and have a broad range of research applications. In contrast to the rapid development of toolkits to manipulate individual genes, large-scale screening methods based on the complete loss of gene expression are only now beginning to be developed. Here we report the development of a focused CRISPR/Cas-based (clustered regularly interspaced short palindromic repeats/CRISPR-associated) lentiviral library in human cells and a method of gene identification based on functional screening and high-throughput sequencing analysis. Using knockout library screens, we successfully identified the host genes essential for the intoxication of cells by anthrax and diphtheria toxins, which were confirmed by functional validation. The broad application of this powerful genetic screening strategy will not only facilitate the rapid identification of genes important for bacterial toxicity but will also enable the discovery of genes that participate in other biological processes.

  2. All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs.

    PubMed

    Schork, Andrew J; Thompson, Wesley K; Pham, Phillip; Torkamani, Ali; Roddey, J Cooper; Sullivan, Patrick F; Kelsoe, John R; O'Donovan, Michael C; Furberg, Helena; Schork, Nicholas J; Andreassen, Ole A; Dale, Anders M

    2013-04-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1-FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci.

  3. Novel Anti-Campylobacter Compounds Identified Using High Throughput Screening of a Pre-selected Enriched Small Molecules Library

    PubMed Central

    Kumar, Anand; Drozd, Mary; Pina-Mimbela, Ruby; Xu, Xiulan; Helmy, Yosra A.; Antwi, Janet; Fuchs, James R.; Nislow, Corey; Templeton, Jillian; Blackall, Patrick J.; Rajashekara, Gireesh

    2016-01-01

    Campylobacter is a leading cause of foodborne bacterial gastroenteritis worldwide and infections can be fatal. The emergence of antibiotic-resistant Campylobacter spp. necessitates the development of new antimicrobials. We identified novel anti-Campylobacter small molecule inhibitors using a high throughput growth inhibition assay. To expedite screening, we made use of a “bioactive” library of 4182 compounds that we have previously shown to be active against diverse microbes. Screening for growth inhibition of Campylobacter jejuni, identified 781 compounds that were either bactericidal or bacteriostatic at a concentration of 200 μM. Seventy nine of the bactericidal compounds were prioritized for secondary screening based on their physico-chemical properties. Based on the minimum inhibitory concentration against a diverse range of C. jejuni and a lack of effect on gut microbes, we selected 12 compounds. No resistance was observed to any of these 12 lead compounds when C. jejuni was cultured with lethal or sub-lethal concentrations suggesting that C. jejuni is less likely to develop resistance to these compounds. Top 12 compounds also possessed low cytotoxicity to human intestinal epithelial cells (Caco-2 cells) and no hemolytic activity against sheep red blood cells. Next, these 12 compounds were evaluated for ability to clear C. jejuni in vitro. A total of 10 compounds had an anti-C. jejuni effect in Caco-2 cells with some effective even at 25 μM concentrations. These novel 12 compounds belong to five established antimicrobial chemical classes; piperazines, aryl amines, piperidines, sulfonamide, and pyridazinone. Exploitation of analogs of these chemical classes may provide Campylobacter specific drugs that can be applied in both human and animal medicine. PMID:27092106

  4. Novel Anti-Campylobacter Compounds Identified Using High Throughput Screening of a Pre-selected Enriched Small Molecules Library.

    PubMed

    Kumar, Anand; Drozd, Mary; Pina-Mimbela, Ruby; Xu, Xiulan; Helmy, Yosra A; Antwi, Janet; Fuchs, James R; Nislow, Corey; Templeton, Jillian; Blackall, Patrick J; Rajashekara, Gireesh

    2016-01-01

    Campylobacter is a leading cause of foodborne bacterial gastroenteritis worldwide and infections can be fatal. The emergence of antibiotic-resistant Campylobacter spp. necessitates the development of new antimicrobials. We identified novel anti-Campylobacter small molecule inhibitors using a high throughput growth inhibition assay. To expedite screening, we made use of a "bioactive" library of 4182 compounds that we have previously shown to be active against diverse microbes. Screening for growth inhibition of Campylobacter jejuni, identified 781 compounds that were either bactericidal or bacteriostatic at a concentration of 200 μM. Seventy nine of the bactericidal compounds were prioritized for secondary screening based on their physico-chemical properties. Based on the minimum inhibitory concentration against a diverse range of C. jejuni and a lack of effect on gut microbes, we selected 12 compounds. No resistance was observed to any of these 12 lead compounds when C. jejuni was cultured with lethal or sub-lethal concentrations suggesting that C. jejuni is less likely to develop resistance to these compounds. Top 12 compounds also possessed low cytotoxicity to human intestinal epithelial cells (Caco-2 cells) and no hemolytic activity against sheep red blood cells. Next, these 12 compounds were evaluated for ability to clear C. jejuni in vitro. A total of 10 compounds had an anti-C. jejuni effect in Caco-2 cells with some effective even at 25 μM concentrations. These novel 12 compounds belong to five established antimicrobial chemical classes; piperazines, aryl amines, piperidines, sulfonamide, and pyridazinone. Exploitation of analogs of these chemical classes may provide Campylobacter specific drugs that can be applied in both human and animal medicine.

  5. Construction and characterization of a highly redundant Pseudomonas aeruginosa genomic library prepared from 12 clinical isolates: application to studies of gene distribution among populations

    PubMed Central

    Erdos, Geza; Sayeed, Sameera; Hu, Fen Ze; Antalis, Patricia T.; Shen, Kai; Hayes, Jay D.; Ahmed, Azad I.; Johnson, Sandra L.; Post, J. Christopher; Ehrlich, Garth D.

    2006-01-01

    Objective To create, array, and characterize a pooled, high-coverage, genomic library composed of multiple biofilm-forming clinical strains of the opportunistic pathogen, Pseudomonas aeruginosa (PA). Twelve strains were obtained from patients with otorrhea, otitis media, and cystic fibrosis as a resource for investigating: difference in the transcriptomes of planktonic and biofilm envirovars; the size of the PA supragenome and determining the number of virulence genes available at the population level; and for testing the distributed genome hypothesis. Methods High molecular weight genomic DNAs from twelve clinical PA strains were individually hydrodynamically sheared to produce mean fragment sizes of ~1.5Kb. Equimolar amounts of the 12 sheared genomic DNAs were then pooled and used in the construction of a genomic library with ~250,000 clones that was arrayed and subjected to quality control analyses. Results Restriction endonuclease and sequence analyses of 686 clones picked at random from the library demonstrated that >75% of the clones contained inserts larger than 0.5 Kb with the desired mean insert size of 1.4 Kb. Thus, this library provides better than 4.5x coverage for each of the genomes from the twelve component clinical PA isolates. Our sequencing effort (~1 million nucleotides to date) reveals that 13% of the clones present in this library are not represented in the genome of the reference P. aeruginosa strain PA01. Conclusions Our data suggests that reliance on a single laboratory strain, such as PA01, as being representative of a pathogenic bacterial species will fail to identify many important genes, and that to obtain a complete picture of complex phenomena, including bacterial pathogenesis and the genetics of biofilm development will require characterization of the P. aeruginosa population-based supra-genome. PMID:16899304

  6. Genomic-Bioinformatic Analysis of Transcripts Enriched in the Third-Stage Larva of the Parasitic Nematode Ascaris suum

    PubMed Central

    Huang, Cui-Qin; Gasser, Robin B.; Cantacessi, Cinzia; Nisbet, Alasdair J.; Zhong, Weiwei; Sternberg, Paul W.; Loukas, Alex; Mulvenna, Jason; Lin, Rui-Qing; Chen, Ning; Zhu, Xing-Quan

    2008-01-01

    Differential transcription in Ascaris suum was investigated using a genomic-bioinformatic approach. A cDNA archive enriched for molecules in the infective third-stage larva (L3) of A. suum was constructed by suppressive-subtractive hybridization (SSH), and a subset of cDNAs from 3075 clones subjected to microarray analysis using cDNA probes derived from RNA from different developmental stages of A. suum. The cDNAs (n = 498) shown by microarray analysis to be enriched in the L3 were sequenced and subjected to bioinformatic analyses using a semi-automated pipeline (ESTExplorer). Using gene ontology (GO), 235 of these molecules were assigned to ‘biological process’ (n = 68), ‘cellular component’ (n = 50), or ‘molecular function’ (n = 117). Of the 91 clusters assembled, 56 molecules (61.5%) had homologues/orthologues in the free-living nematodes Caenorhabditis elegans and C. briggsae and/or other organisms, whereas 35 (38.5%) had no significant similarity to any sequences available in current gene databases. Transcripts encoding protein kinases, protein phosphatases (and their precursors), and enolases were abundantly represented in the L3 of A. suum, as were molecules involved in cellular processes, such as ubiquitination and proteasome function, gene transcription, protein–protein interactions, and function. In silico analyses inferred the C. elegans orthologues/homologues (n = 50) to be involved in apoptosis and insulin signaling (2%), ATP synthesis (2%), carbon metabolism (6%), fatty acid biosynthesis (2%), gap junction (2%), glucose metabolism (6%), or porphyrin metabolism (2%), although 34 (68%) of them could not be mapped to a specific metabolic pathway. Small numbers of these 50 molecules were predicted to be secreted (10%), anchored (2%), and/or transmembrane (12%) proteins. Functionally, 17 (34%) of them were predicted to be associated with (non-wild-type) RNAi phenotypes in C. elegans, the majority being embryonic lethality

  7. A linear time algorithm for detecting long genomic regions enriched with a specific combination of epigenetic states

    PubMed Central

    2015-01-01

    Background Epigenetic modifications are essential for controlling gene expression. Recent studies have shown that not only single epigenetic modifications but also combinations of multiple epigenetic modifications play vital roles in gene regulation. A striking example is the long hypomethylated regions enriched with modified H3K27me3 (called, "K27HMD" regions), which are exposed to suppress the expression of key developmental genes relevant to cellular development and differentiation during embryonic stages in vertebrates. It is thus a biologically important issue to develop an effective optimization algorithm for detecting long DNA regions (e.g., >4 kbp in size) that harbor a specific combination of epigenetic modifications (e.g., K27HMD regions). However, to date, optimization algorithms for these purposes have received little attention, and available methods are still heuristic and ad hoc. Results In this paper, we propose a linear time algorithm for calculating a set of non-overlapping regions that maximizes the sum of similarities between the vector of focal epigenetic states and the vectors of raw epigenetic states at DNA positions in the set of regions. The average elapsed time to process the epigenetic data of any of human chromosomes was less than 2 seconds on an Intel Xeon CPU. To demonstrate the effectiveness of the algorithm, we estimated large K27HMD regions in the medaka and human genomes using our method, ChromHMM, and a heuristic method. Conclusions We confirmed that the advantages of our method over those of the two other methods. Our method is flexible enough to handle other types of epigenetic combinations. The program that implements the method is called "CSMinfinder" and is made available at: http://mlab.cb.k.u-tokyo.ac.jp/~ichikawa/Segmentation/ PMID:25708947

  8. Genomics of compositae weeds: EST libraries, microarrays, and evidence of introgression

    USDA-ARS?s Scientific Manuscript database

    • Premise of Study: Weeds cause considerable environmental and economic damage. However, genomic characterization of weeds has lagged behind that of model plants and crop species. Here we report on the development of genomic tools and resources for 11 weeds from the Compositae family that can serve ...

  9. NEBNext Direct: A Novel, Rapid, Hybridization-Based Approach for the Capture and Library Conversion of Genomic Regions of Interest.

    PubMed

    Emerman, Amy B; Bowman, Sarah K; Barry, Andrew; Henig, Noa; Patel, Kruti M; Gardner, Andrew F; Hendrickson, Cynthia L

    2017-07-05

    Next-generation sequencing (NGS) is a powerful tool for genomic studies, translational research, and clinical diagnostics that enables the detection of single nucleotide polymorphisms, insertions and deletions, copy number variations, and other genetic variations. Target enrichment technologies improve the efficiency of NGS by only sequencing regions of interest, which reduces sequencing costs while increasing coverage of the selected targets. Here we present NEBNext Direct(®) , a hybridization-based, target-enrichment approach that addresses many of the shortcomings of traditional target-enrichment methods. This approach features a simple, 7-hr workflow that uses enzymatic removal of off-target sequences to achieve a high specificity for regions of interest. Additionally, unique molecular identifiers are incorporated for the identification and filtering of PCR duplicates. The same protocol can be used across a wide range of input amounts, input types, and panel sizes, enabling NEBNext Direct to be broadly applicable across a wide variety of research and diagnostic needs. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.

  10. Advances in encoding of colloids for combinatorial libraries: applications in genomics, proteomics and drug discovery.

    PubMed

    Lawrie, Gwendolyn A; Battersby, Bronwyn J; Grøndahl, Lisbeth; Trau, Matt

    2003-12-01

    The creation of enormous libraries of chemicals and their subsequent screening for bioactivity has been accelerated through recent developments in encoding solid supports. The ability to accurately identify the structure of a biomolecule that has exhibited activity is invaluable and is closer to realisation in the advent of smart nanoscience. In this review the evolution of encoding solid supports as platforms for combinatorial synthesis is traced. Current approaches to encoding solid supports are reviewed and their potential for use as supports for the high-throughput screening of split and mix libraries explored. Finally, a brief consideration of the status of the application of encoded libraries is provided including creative chemical and colloidal encoding.

  11. Determining antigen specificity of a monoclonal antibody using genome-scale CRISPR-Cas9 knockout library.

    PubMed

    Zotova, Anastasia; Zotov, Ivan; Filatov, Alexander; Mazurov, Dmitriy

    2016-12-01

    An essential step in monoclonal antibody (mAb) development is the characterization and final identification of the specific target antigen and its epitope. Antibody validation is rather straightforward when immunization is carried out with peptide or purified protein, but is more difficult when whole cells or other complex antigens are used for the immunization. Determining antigen specificity of a mAb is further complicated, when reactivity of an antibody is not detected in Western blotting and/or immunoprecipitation assay. In addition to protein-based methods used for antibody characterization, a number of gene-based techniques, such as cDNA expression or short-interfering RNA (siRNA) knockdown have been applied for validation of antibodies with restricted reactivities. Earlier we have generated, characterized, but not identified the BF4 mAb that specifically stains viral biofilms on the surface of the Human T-lymphotropic Virus Type I (HTLV-1) infected T cells. In this study, using the recently developed genome-scale CRISPR-Cas9 knockout (GeCKO) library vectors, we have established the CEM T- and the Raji B cell lines with pooled libraries. After immunofluorescent staining of these cells, negative cell sorting, and guide-RNA (gRNA) sequencing, we have identified BF4 as an anti-CD82 mAb. A deep sequence analysis of GeCKO library transferred to the cells shows that the chance to succeed in the selection of antibody-negative cells and, therefore, to identify a mAb depends on the quality of cell library preparation. We believe that the described method is applicable for identification of many other hybridomas and represents a good alternative to the current protein- and gene-based methods used for mAb validation. Copyright © 2016 Elsevier B.V. All rights reserved.

  12. Development of genomic resources for the narrow-leafed lupin (Lupinus angustifolius): construction of a bacterial artificial chromosome (BAC) library and BAC-end sequencing

    PubMed Central

    2011-01-01

    Background Lupinus angustifolius L, also known as narrow-leafed lupin (NLL), is becoming an important grain legume crop that is valuable for sustainable farming and is becoming recognised as a potential human health food. Recent interest is being directed at NLL to improve grain production, disease and pest management and health benefits of the grain. However, studies have been hindered by a lack of extensive genomic resources for the species. Results A NLL BAC library was constructed consisting of 111,360 clones with an average insert size of 99.7 Kbp from cv Tanjil. The library has approximately 12 × genome coverage. Both ends of 9600 randomly selected BAC clones were sequenced to generate 13985 BAC end-sequences (BESs), covering approximately 1% of the NLL genome. These BESs permitted a preliminary characterisation of the NLL genome such as organisation and composition, with the BESs having approximately 39% G:C content, 16.6% repetitive DNA and 5.4% putative gene-encoding regions. From the BESs 9966 simple sequence repeat (SSR) motifs were identified and some of these are shown to be potential markers. Conclusions The NLL BAC library and BAC-end sequences are powerful resources for genetic and genomic research on lupin. These resources will provide a robust platform for future high-resolution mapping, map-based cloning, comparative genomics and assembly of whole-genome sequencing data for the species. PMID:22014081

  13. Novel HIV-1 Knockdown Targets Identified by an Enriched Kinases/Phosphatases shRNA Library Using a Long-Term Iterative Screen in Jurkat T-Cells

    PubMed Central

    Rato, Sylvie; Maia, Sara; Brito, Paula M.; Resende, Leonor; Pereira, Carina F.; Moita, Catarina; Freitas, Rui P.; Moniz-Pereira, José; Hacohen, Nir; Moita, Luis Ferreira; Goncalves, Joao

    2010-01-01

    HIV-1 is a complex retrovirus that uses host machinery to promote its replication. Understanding cellular proteins involved in the multistep process of HIV-1 infection may result in the discovery of more adapted and effective therapeutic targets. Kinases and phosphatases are a druggable class of proteins critically involved in regulation of signal pathways of eukaryotic cells. Here, we focused on the discovery of kinases and phosphatases that are essential for HIV-1 replication but dispensable for cell viability. We performed an iterative screen in Jurkat T-cells with a short-hairpin-RNA (shRNA) library highly enriched for human kinases and phosphatases. We identified 14 new proteins essential for HIV-1 replication that do not affect cell viability. These proteins are described to be involved in MAPK, JNK and ERK pathways, vesicular traffic and DNA repair. Moreover, we show that the proteins under study are important in an early step of HIV-1 infection before viral integration, whereas some of them affect viral transcription/translation. This study brings new insights for the complex interplay of HIV-1/host cell and opens new possibilities for antiviral strategies. PMID:20174665

  14. In Situ Hi-C Library Preparation for Plants to Study Their Three-Dimensional Chromatin Interactions on a Genome-Wide Scale.

    PubMed

    Liu, Chang

    2017-01-01

    The spatial organization of the genome in the nucleus is critical for many cellular processes. It has been broadly accepted that the packing of chromatin inside the nucleus is not random, but structured at several hierarchical levels. The Hi-C method combines Chromatin Conformation Capture and high-throughput sequencing, which allows interrogating genome-wide chromatin interactions. Depending on the sequencing depth, chromatin packing patterns derived from Hi-C experiments can be viewed on a chromosomal scale or at a local genic level. Here, I describe a protocol of plant in situ Hi-C library preparation, which covers procedures starting from tissue fixation to library amplification.

  15. A bacterial artificial chromosome library for the Australian saltwater crocodile (Crocodylus porosus) and its utilization in gene isolation and genome characterization

    PubMed Central

    2009-01-01

    Background Crocodilians (Order Crocodylia) are an ancient vertebrate group of tremendous ecological, social, and evolutionary importance. They are the only extant reptilian members of Archosauria, a monophyletic group that also includes birds, dinosaurs, and pterosaurs. Consequently, crocodilian genomes represent a gateway through which the molecular evolution of avian lineages can be explored. To facilitate comparative genomics within Crocodylia and between crocodilians and other archosaurs, we have constructed a bacterial artificial chromosome (BAC) library for the Australian saltwater crocodile, Crocodylus porosus. This is the first BAC library for a crocodile and only the second BAC resource for a crocodilian. Results The C. porosus BAC library consists of 101,760 individually archived clones stored in 384-well microtiter plates. NotI digestion of random clones indicates an average insert size of 102 kb. Based on a genome size estimate of 2778 Mb, the library affords 3.7 fold (3.7×) coverage of the C. porosus genome. To investigate the utility of the library in studying sequence distribution, probes derived from CR1a and CR1b, two crocodilian CR1-like retrotransposon subfamilies, were hybridized to C. porosus macroarrays. The results indicate that there are a minimum of 20,000 CR1a/b elements in C. porosus and that their distribution throughout the genome is decidedly non-random. To demonstrate the utility of the library in gene isolation, we probed the C. porosus macroarrays with an overgo designed from a C-mos (oocyte maturation factor) partial cDNA. A BAC containing C-mos was identified and the C-mos locus was sequenced. Nucleotide and amino acid sequence alignment of the C. porosus C-mos coding sequence with avian and reptilian C-mos orthologs reveals greater sequence similarity between C. porosus and birds (specifically chicken and zebra finch) than between C. porosus and squamates (green anole). Conclusion We have demonstrated the utility of the

  16. High-Affinity DNA Aptamer Generation Targeting von Willebrand Factor A1-Domain by Genetic Alphabet Expansion for Systematic Evolution of Ligands by Exponential Enrichment Using Two Types of Libraries Composed of Five Different Bases.

    PubMed

    Matsunaga, Ken-Ichiro; Kimoto, Michiko; Hirao, Ichiro

    2017-01-11

    The novel evolutionary engineering method ExSELEX (genetic alphabet expansion for systematic evolution of ligands by exponential enrichment) provides high-affinity DNA aptamers that specifically bind to target molecules, by introducing an artificial hydrophobic base analogue as a fifth component into DNA aptamers. Here, we present a newer version of ExSELEX, using a library with completely randomized sequences consisting of five components: four natural bases and one unnatural hydrophobic base, 7-(2-thienyl)imidazo[4,5-b]pyridine (Ds). In contrast to the limited number of Ds-containing sequence combinations in our previous library, the increased complexity of the new randomized library could improve the success rates of high-affinity aptamer generation. To this end, we developed a sequencing method for each clone in the enriched library after several rounds of selection. Using the improved library, we generated a Ds-containing DNA aptamer targeting von Willebrand factor A1-domain (vWF) with significantly higher affinity (KD = 75 pM), relative to those generated by the initial version of ExSELEX, as well as that of the known DNA aptamer consisting of only the natural bases. In addition, the Ds-containing DNA aptamer was stabilized by introducing a mini-hairpin DNA resistant to nucleases, without any loss of affinity (KD = 61 pM). This new version is expected to consistently produce high-affinity DNA aptamers.

  17. Enrichment and genome sequence of the group I.1a ammonia-oxidizing Archaeon "Ca. Nitrosotenuis uzonensis" representing a clade globally distributed in thermal habitats.

    PubMed

    Lebedeva, Elena V; Hatzenpichler, Roland; Pelletier, Eric; Schuster, Nathalie; Hauzmayer, Sandra; Bulaev, Aleksandr; Grigor'eva, Nadezhda V; Galushko, Alexander; Schmid, Markus; Palatinszky, Marton; Le Paslier, Denis; Daims, Holger; Wagner, Michael

    2013-01-01

    The discovery of ammonia-oxidizing archaea (AOA) of the phylum Thaumarchaeota and the high abundance of archaeal ammonia monooxygenase subunit A encoding gene sequences in many environments have extended our perception of nitrifying microbial communities. Moreover, AOA are the only aerobic ammonia oxidizers known to be active in geothermal environments. Molecular data indicate that in many globally distributed terrestrial high-temperature habits a thaumarchaeotal lineage within the Nitrosopumilus cluster (also called "marine" group I.1a) thrives, but these microbes have neither been isolated from these systems nor functionally characterized in situ yet. In this study, we report on the enrichment and genomic characterization of a representative of this lineage from a thermal spring in Kamchatka. This thaumarchaeote, provisionally classified as "Candidatus Nitrosotenuis uzonensis", is a moderately thermophilic, non-halophilic, chemolithoautotrophic ammonia oxidizer. The nearly complete genome sequence (assembled into a single scaffold) of this AOA confirmed the presence of the typical thaumarchaeotal pathways for ammonia oxidation and carbon fixation, and indicated its ability to produce coenzyme F420 and to chemotactically react to its environment. Interestingly, like members of the genus Nitrosoarchaeum, "Candidatus N. uzonensis" also possesses a putative artubulin-encoding gene. Genome comparisons to related AOA with available genome sequences confirmed that the newly cultured AOA has an average nucleotide identity far below the species threshold and revealed a substantial degree of genomic plasticity with unique genomic regions in "Ca. N. uzonensis", which potentially include genetic determinants of ecological niche differentiation.

  18. Using genome-wide CRISPR library screening with library resistant DCK to find new sources of Ara-C drug resistance in AML

    PubMed Central

    Kurata, Morito; Rathe, Susan K.; Bailey, Natashay J.; Aumann, Natalie K.; Jones, Justine M.; Veldhuijzen, G. Willemijn; Moriarity, Branden S.; Largaespada, David A.

    2016-01-01

    Acute myeloid leukemia (AML) can display de novo or acquired resistance to cytosine arabinoside (Ara-C), a primary component of induction chemotherapy. To identify genes capable of independently imposing Ara-C resistance, we applied a genome-wide CRISPR library to human U937 cells and exposed to them to Ara-C. Interestingly, all drug resistant clones contained guide RNAs for DCK. To avoid DCK gene modification, gRNA resistant DCK cDNA was created by the introduction of silent mutations. The CRISPR screening was repeated using the gRNA resistant DCK, and loss of SLC29A was identified as also being capable of conveying Ara-C drug resistance. To determine if loss of Dck results in increased sensitivity to other drugs, we conducted a screen of 446 FDA approved drugs using two Dck-defective BXH-2 derived murine AML cell lines and their Ara-C sensitive parental lines. Both cell lines showed an increase in sensitivity to prednisolone. Guide RNA resistant cDNA rescue was a legitimate strategy and multiple DCK or SLC29A deficient human cell clones were established with one clone becoming prednisolone sensitive. Dck-defective leukemic cells may become prednisolone sensitive indicating prednisolone may be an effective adjuvant therapy in some cases of DCK-negative AML. PMID:27808171

  19. Using genome-wide CRISPR library screening with library resistant DCK to find new sources of Ara-C drug resistance in AML.

    PubMed

    Kurata, Morito; Rathe, Susan K; Bailey, Natashay J; Aumann, Natalie K; Jones, Justine M; Veldhuijzen, G Willemijn; Moriarity, Branden S; Largaespada, David A

    2016-11-03

    Acute myeloid leukemia (AML) can display de novo or acquired resistance to cytosine arabinoside (Ara-C), a primary component of induction chemotherapy. To identify genes capable of independently imposing Ara-C resistance, we applied a genome-wide CRISPR library to human U937 cells and exposed to them to Ara-C. Interestingly, all drug resistant clones contained guide RNAs for DCK. To avoid DCK gene modification, gRNA resistant DCK cDNA was created by the introduction of silent mutations. The CRISPR screening was repeated using the gRNA resistant DCK, and loss of SLC29A was identified as also being capable of conveying Ara-C drug resistance. To determine if loss of Dck results in increased sensitivity to other drugs, we conducted a screen of 446 FDA approved drugs using two Dck-defective BXH-2 derived murine AML cell lines and their Ara-C sensitive parental lines. Both cell lines showed an increase in sensitivity to prednisolone. Guide RNA resistant cDNA rescue was a legitimate strategy and multiple DCK or SLC29A deficient human cell clones were established with one clone becoming prednisolone sensitive. Dck-defective leukemic cells may become prednisolone sensitive indicating prednisolone may be an effective adjuvant therapy in some cases of DCK-negative AML.

  20. Functional Classification, Genomic Organization, Putatively cis-Acting Regulatory Elements, and Relationship to Quantitative Trait Loci, of Sorghum Genes with Rhizome-Enriched Expression1[W

    PubMed Central

    Jang, Cheol Seong; Kamps, Terry L.; Skinner, D. Neil; Schulze, Stefan R.; Vencill, William K.; Paterson, Andrew H.

    2006-01-01

    Rhizomes are organs of fundamental importance to plant competitiveness and invasiveness. We have identified genes expressed at substantially higher levels in rhizomes than other plant parts, and explored their functional categorization, genomic organization, regulatory motifs, and association with quantitative trait loci (QTLs) conferring rhizomatousness. The finding that genes with rhizome-enriched expression are distributed across a wide range of functional categories suggests some degree of specialization of individual members of many gene families in rhizomatous plants. A disproportionate share of genes with rhizome-enriched expression was implicated in secondary and hormone metabolism, and abiotic stimuli and development. A high frequency of unknown-function genes reflects our still limited knowledge of this plant organ. A putative oligosaccharyl transferase showed the highest degree of rhizome-specific expression, with several transcriptional or regulatory protein complex factors also showing high (but lesser) degrees of specificity. Inferred by the upstream sequences of their putative rice (Oryza sativa) homologs, sorghum (Sorghum bicolor) genes that were relatively highly expressed in rhizome tip tissues were enriched for cis-element motifs, including the pyrimidine box, TATCCA box, and CAREs box, implicating the gibberellins in regulation of many rhizome-specific genes. From cDNA clones showing rhizome-enriched expression, expressed sequence tags forming 455 contigs were plotted on the rice genome and aligned to QTL likelihood intervals for ratooning and rhizomatous traits in rice and sorghum. Highly expressed rhizome genes were somewhat enriched in QTL likelihood intervals for rhizomatousness or ratooning, with specific candidates including some of the most rhizome-specific genes. Some rhizomatousness and ratooning QTLs were shown to be potentially related to one another as a result of ancient duplication, suggesting long-term functional conservation of

  1. Gene-enriched draft genome of the cattle tick Rhipicephalus microplus: Assembly by the hybrid Pacific Biosciences/Illumina approach enabled analysis of the highly repetitive genome

    USDA-ARS?s Scientific Manuscript database

    The genome of the cattle tick R. microplus, an ectoparasite with global distribution, is estimated to be 7.1 Gbp and consists of ~70% repetitive DNA. We report the first assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genom...

  2. Genotyping of whole genome amplified reduced representation libraries reveals a cryptic population of Culicoides brevitarsis in the Northern Territory, Australia.

    PubMed

    Onyango, Maria G; Aitken, Nicola C; Jack, Cameron; Chuah, Aaron; Oguya, James; Djikeng, Appolinaire; Kemp, Steve; Bellis, Glenn A; Nicholas, Adrian; Walker, Peter J; Duchemin, Jean-Bernard

    2016-09-30

    The advent of genotyping by Next Generation Sequencing has enabled rapid discovery of thousands of single nucleotide polymorphism (SNP) markers and high throughput genotyping of large populations at an affordable cost. Genotyping by sequencing (GBS), a reduced representation library sequencing method, allows highly multiplexed sequencing of genomic subsets. This method has limitations for small organisms with low amounts of genomic DNA, such as the bluetongue virus (BTV) vectors, Culicoides midges. This study employed the GBS method to isolate SNP markers de novo from whole genome amplified Culicoides brevitarsis genomic DNA. The individuals were collected from regions representing two different Australian patterns of BTV strain distribution: the Northern Territory (NT) and the east coast. We isolated 8145 SNPs using GBS. Phylogenetic analysis conducted using the filtered 3263 SNPs revealed the presence of a distinct C. brevitarsis sub-population in the NT and this was confirmed by analysis of mitochondrial DNA. Two loci showed a very strong signal for selection and were unique to the NT population. Bayesian analysis with STRUCTURE indicated a possible two-population cluster. The results suggest that genotyping vectors with high density markers in combination with biological and environmental data is useful. However, more extensive sampling over a wider spatial and temporal range is needed. The presence of sub-structure in populations and loci under natural selection indicates the need for further investigation of the role of vectors in shaping the two Australian systems of BTV transmission. The described workflow is transferable to genotyping of small, non-model organisms, including arthropod vectors of pathogens of economic and medical importance.

  3. Mitogenome assembly from genomic multiplex libraries: comparison of strategies and novel mitogenomes for five species of frogs.

    PubMed

    Machado, D J; Lyra, M L; Grant, T

    2016-05-01

    Next-generation sequencing continues to revolutionize biodiversity studies by generating unprecedented amounts of DNA sequence data for comparative genomic analysis. However, these data are produced as millions or billions of short reads of variable quality that cannot be directly applied in comparative analyses, creating a demand for methods to facilitate assembly. We optimized an in silico strategy to efficiently reconstruct high-quality mitochondrial genomes directly from genomic reads. We tested this strategy using sequences from five species of frogs: Hylodes meridionalis (Hylodidae), Hyloxalus yasuni (Dendrobatidae), Pristimantis fenestratus (Craugastoridae), and Melanophryniscus simplex and Rhinella sp. (Bufonidae). These are the first mitogenomes published for these species, the genera Hylodes, Hyloxalus, Pristimantis, Melanophryniscus and Rhinella, and the families Craugastoridae and Hylodidae. Sequences were generated using only half of one lane of a standard Illumina HiqSeq 2000 flow cell, resulting in fewer than eight million reads. We analysed the reads of Hylodes meridionalis using three different assembly strategies: (1) reference-based (using bowtie2); (2) de novo (using abyss, soapdenovo2 and velvet); and (3) baiting and iterative mapping (using mira and mitobim). Mitogenomes were assembled exclusively with strategy 3, which we employed to assemble the remaining mitogenomes. Annotations were performed with mitos and confirmed by comparison with published amphibian mitochondria. In most cases, we recovered all 13 coding genes, 22 tRNAs, and two ribosomal subunit genes, with minor gene rearrangements. Our results show that few raw reads can be sufficient to generate high-quality scaffolds, making any Illumina machine run using genomic multiplex libraries a potential source of data for organelle assemblies as by-catch. © 2015 John Wiley & Sons Ltd.

  4. An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

    PubMed Central

    2012-01-01

    Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a

  5. KENeV: A web-application for the automated reconstruction and visualization of the enriched metabolic and signaling super-pathways deriving from genomic experiments

    PubMed Central

    Pilalis, Eleftherios; Koutsandreas, Theodoros; Valavanis, Ioannis; Athanasiadis, Emmanouil; Spyrou, George; Chatziioannou, Aristotelis

    2015-01-01

    Gene expression analysis, using high throughput genomic technologies,has become an indispensable step for the meaningful interpretation of the underlying molecular complexity, which shapes the phenotypic manifestation of the investigated biological mechanism. The modularity of the cellular response to different experimental conditions can be comprehended through the exploitation of molecular pathway databases, which offer a controlled, curated background for statistical enrichment analysis. Existing tools enable pathway analysis, visualization, or pathway merging but none integrates a fully automated workflow, combining all above-mentioned modules and destined to non-programmer users. We introduce an online web application, named KEGG Enriched Network Visualizer (KENeV), which enables a fully automated workflow starting from a list of differentially expressed genes and deriving the enriched KEGG metabolic and signaling pathways, merged into two respective, non-redundant super-networks. The final networks can be downloaded as SBML files, for further analysis, or instantly visualized through an interactive visualization module. In conclusion, KENeV (available online at http://www.grissom.gr/kenev) provides an integrative tool, suitable for users with no programming experience, for the functional interpretation, at both the metabolic and signaling level, of differentially expressed gene subsets deriving from genomic experiments. PMID:26925206

  6. Toward functional genomics in bacteria: Analysis of gene expression in Escherichia coli from a bacterial artificial chromosome library of Bacillus cereus

    PubMed Central

    Rondon, Michelle R.; Raffel, Sandra J.; Goodman, Robert M.; Handelsman, Jo

    1999-01-01

    As the study of microbes moves into the era of functional genomics, there is an increasing need for molecular tools for analysis of a wide diversity of microorganisms. Currently, biological study of many prokaryotes of agricultural, medical, and fundamental scientific interest is limited by the lack of adequate genetic tools. We report the application of the bacterial artificial chromosome (BAC) vector to prokaryotic biology as a powerful approach to address this need. We constructed a BAC library in Escherichia coli from genomic DNA of the Gram-positive bacterium Bacillus cereus. This library provides 5.75-fold coverage of the B. cereus genome, with an average insert size of 98 kb. To determine the extent of heterologous expression of B. cereus genes in the library, we screened it for expression of several B. cereus activities in the E. coli host. Clones expressing 6 of 10 activities tested were identified in the library, namely, ampicillin resistance, zwittermicin A resistance, esculin hydrolysis, hemolysis, orange pigment production, and lecithinase activity. We analyzed selected BAC clones genetically to identify rapidly specific B. cereus loci. These results suggest that BAC libraries will provide a powerful approach for studying gene expression from diverse prokaryotes. PMID:10339608

  7. Final report. Human artificial episomal chromosome (HAEC) for building large genomic libraries

    SciTech Connect

    Jean-Michael H. Vos

    1999-12-09

    Collections of human DNA fragments are maintained for research purposes as clones in bacterial host cells. However for unknown reasons, some regions of the human genome appear to be unclonable or unstable in bacteria. Their team has developed a system using episomes (extrachromosomal, autonomously replication DNA) that maintains large DNA fragments in human cells. This human artificial episomal chromosomal (HAEC) system may prove useful for coverage of these especially difficult regions. In the broader biomedical community, the HAEC system also shows promise for use in functional genomics and gene therapy. Recent improvements to the HAEC system and its application to mapping, sequencing, and functionally studying human and mouse DNA are summarized. Mapping and sequencing the human genome and model organisms are only the first steps in determining the function of various genetic units critical for gene regulation, DNA replication, chromatin packaging, chromosomal stability, and chromatid segregation. Such studies will require the ability to transfer and manipulate entire functional units into mammalian cells.

  8. All SNPs Are Not Created Equal: Genome-Wide Association Studies Reveal a Consistent Pattern of Enrichment among Functionally Annotated SNPs

    PubMed Central

    Schork, Andrew J.; Thompson, Wesley K.; Pham, Phillip; Torkamani, Ali; Roddey, J. Cooper; Sullivan, Patrick F.; Kelsoe, John R.; O'Donovan, Michael C.; Furberg, Helena; Schork, Nicholas J.; Andreassen, Ole A.; Dale, Anders M.

    2013-01-01

    Recent results indicate that genome-wide association studies (GWAS) have the potential to explain much of the heritability of common complex phenotypes, but methods are lacking to reliably identify the remaining associated single nucleotide polymorphisms (SNPs). We applied stratified False Discovery Rate (sFDR) methods to leverage genic enrichment in GWAS summary statistics data to uncover new loci likely to replicate in independent samples. Specifically, we use linkage disequilibrium-weighted annotations for each SNP in combination with nominal p-values to estimate the True Discovery Rate (TDR = 1−FDR) for strata determined by different genic categories. We show a consistent pattern of enrichment of polygenic effects in specific annotation categories across diverse phenotypes, with the greatest enrichment for SNPs tagging regulatory and coding genic elements, little enrichment in introns, and negative enrichment for intergenic SNPs. Stratified enrichment directly leads to increased TDR for a given p-value, mirrored by increased replication rates in independent samples. We show this in independent Crohn's disease GWAS, where we find a hundredfold variation in replication rate across genic categories. Applying a well-established sFDR methodology we demonstrate the utility of stratification for improving power of GWAS in complex phenotypes, with increased rejection rates from 20% in height to 300% in schizophrenia with traditional FDR and sFDR both fixed at 0.05. Our analyses demonstrate an inherent stratification among GWAS SNPs with important conceptual implications that can be leveraged by statistical methods to improve the discovery of loci. PMID:23637621

  9. Construction of genome-wide physical BAC contigs using mapped cDNA as probes: Toward an integrated BAC library resource for genome sequencing and analysis. Annual report, July 1995--January 1997

    SciTech Connect

    Mitchell, S.C.; Bocskai, D.; Cao, Y.

    1997-12-31

    The goal of human genome project is to characterize and sequence entire genomes of human and several model organisms, thus providing complete sets of information on the entire structure of transcribed, regulatory and other functional regions for these organisms. In the past years, a number of useful genetic and physical markers on human and mouse genomes have been made available along with the advent of BAC library resources for these organisms. The advances in technology and resource development made it feasible to efficiently construct genome-wide physical BAC contigs for human and other genomes. Currently, over 30,000 mapped STSs and 27,000 mapped Unigenes are available for human genome mapping. ESTs and cDNAs are excellent resources for building contig maps for two reasons. Firstly, they exist in two alternative forms--as both sequence information for PCR primer pairs, and cDoreen genomic libraries efficiently for large number of DNA probes by combining over 100 cDNA probes in each hybridization. Second, the linkage and order of genes are rather conserved among human, mouse and other model organisms. Therefore, gene markers have advantages over random anonymous STSs in building maps for comparative genomic studies.

  10. Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

    PubMed Central

    2012-01-01

    Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742

  11. Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

    PubMed

    Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

    2014-11-20

    Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria.

  12. A genomic analysis of Histomonas meleagridis through sequencing of a cDNA library.

    PubMed

    Klodnicki, M E; McDougald, L R; Beckstead, R B

    2013-04-01

    Histomonas meleagridis, a flagellated protozoan of the Order Trichomonadida, is the causative agent of blackhead disease in gallinaceous birds. Few genes have been identified in this organism; thus, little is known regarding the molecular basis for its metabolism, virulence, and antigenicity. To identify new genes, a cDNA library derived from a lab strain of H. meleagridis was sequenced and annotated. Data obtained from these experiments identified 3,425 H. meleagridis genes. Analysis of the data allowed the identification of 81 genes coding for putative hydrogenosomal proteins and was used to determine the codon usage frequency. Sequence information also identified bacteria that are cultured with H. meleagridis. Future analysis of these data should provide valuable molecular insights into H. meleagridis and provide the platform for molecular studies aimed at understanding the pathogenesis of blackhead disease.

  13. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae).

    PubMed

    Schmickl, Roswitha; Liston, Aaron; Zeisek, Vojtěch; Oberlander, Kenneth; Weitemier, Kevin; Straub, Shannon C K; Cronn, Richard C; Dreyer, Léanne L; Suda, Jan

    2016-09-01

    Phylogenetics benefits from using a large number of putatively independent nuclear loci and their combination with other sources of information, such as the plastid and mitochondrial genomes. To facilitate the selection of orthologous low-copy nuclear (LCN) loci for phylogenetics in nonmodel organisms, we created an automated and interactive script to select hundreds of LCN loci by a comparison between transcriptome and genome skim data. We used our script to obtain LCN genes for southern African Oxalis (Oxalidaceae), a speciose plant lineage in the Greater Cape Floristic Region. This resulted in 1164 LCN genes greater than 600 bp. Using target enrichment combined with genome skimming (Hyb-Seq), we obtained on average 1141 LCN loci, nearly the whole plastid genome and the nrDNA cistron from 23 southern African Oxalis species. Despite a wide range of gene trees, the phylogeny based on the LCN genes was very robust, as retrieved through various gene and species tree reconstruction methods as well as concatenation. Cytonuclear discordance was strong. This indicates that organellar phylogenies alone are unlikely to represent the species tree and stresses the utility of Hyb-Seq in phylogenetics.

  14. Enrichment and Genome Sequence of the Group I.1a Ammonia-Oxidizing Archaeon “Ca. Nitrosotenuis uzonensis” Representing a Clade Globally Distributed in Thermal Habitats

    PubMed Central

    Pelletier, Eric; Schuster, Nathalie; Hauzmayer, Sandra; Bulaev, Aleksandr; Grigor’eva, Nadezhda V.; Galushko, Alexander; Schmid, Markus; Palatinszky, Marton; Le Paslier, Denis; Daims, Holger; Wagner, Michael

    2013-01-01

    The discovery of ammonia-oxidizing archaea (AOA) of the phylum Thaumarchaeota and the high abundance of archaeal ammonia monooxygenase subunit A encoding gene sequences in many environments have extended our perception of nitrifying microbial communities. Moreover, AOA are the only aerobic ammonia oxidizers known to be active in geothermal environments. Molecular data indicate that in many globally distributed terrestrial high-temperature habits a thaumarchaeotal lineage within the Nitrosopumilus cluster (also called “marine” group I.1a) thrives, but these microbes have neither been isolated from these systems nor functionally characterized in situ yet. In this study, we report on the enrichment and genomic characterization of a representative of this lineage from a thermal spring in Kamchatka. This thaumarchaeote, provisionally classified as “Candidatus Nitrosotenuis uzonensis”, is a moderately thermophilic, non-halophilic, chemolithoautotrophic ammonia oxidizer. The nearly complete genome sequence (assembled into a single scaffold) of this AOA confirmed the presence of the typical thaumarchaeotal pathways for ammonia oxidation and carbon fixation, and indicated its ability to produce coenzyme F420 and to chemotactically react to its environment. Interestingly, like members of the genus Nitrosoarchaeum, “Candidatus N. uzonensis” also possesses a putative artubulin-encoding gene. Genome comparisons to related AOA with available genome sequences confirmed that the newly cultured AOA has an average nucleotide identity far below the species threshold and revealed a substantial degree of genomic plasticity with unique genomic regions in “Ca. N. uzonensis”, which potentially include genetic determinants of ecological niche differentiation. PMID:24278328

  15. Genome Clone Libraries and Data from the Integrated Molecular Analysis of Genomes and their Expression (I.M.A.G.E.) Consortium

    DOE Data Explorer

    The I.M.A.G.E. Consortium was initiated in 1993 by four academic groups on a collaborative basis after informal discussions led to a common vision of how to achieve an important goal in the study of the human genome: the Integrated Molecular Analysis of Genomes and their Expression Consortium's primary goal is to create arrayed cDNA libraries and associated bioinformatics tools, and make them publicly available to the research community. The primary organisms of interest include intensively studied mammalian species, including human, mouse, rat and non-human primate species. The Consortium has also focused on several commonly studied model organisms; as part of this effort it has arrayed cDNAs from zebrafish, and Fugu (pufferfish) as well as Xenopus laevis and X. tropicalis (frog). Utilizing high speed robotics, over nine million individual cDNA clones have been arrayed into 384-well microtiter plates, and sufficient replicas have been created to distribute copies both to sequencing centers and to a network of five distributors located worldwide. The I.M.A.G.E. Consortium represents the world's largest public cDNA collection, and works closely with the National Institutes of Health's Mammalian Gene Collection(MGC) to help it achieve its goal of creating a full-length cDNA clone for every human and mouse gene. I.M.A.G.E. is also a member of the ORFeome Collaboration, working to generate a complete set of expression-ready open reading frame clones representing each human gene. Custom informatics tools have been developed in support of these projects to better allow the research community to select clones of interest and track and collect all data deposited into public databases about those clones and their related sequences. I.M.A.G.E. clones are publicly available, free of any royalties, and may be used by anyone agreeing with the Consortium's guidelines.

  16. From Human Monocytes to Genome-Wide Binding Sites - A Protocol for Small Amounts of Blood: Monocyte Isolation/ChIP-Protocol/Library Amplification/Genome Wide Computational Data Analysis

    PubMed Central

    Weiterer, Sebastian; Uhle, Florian; Bhuju, Sabin; Jarek, Michael; Weigand, Markus A.; Bartkuhn, Marek

    2014-01-01

    Chromatin immunoprecipitation in combination with a genome-wide analysis via high-throughput sequencing is the state of the art method to gain genome-wide representation of histone modification or transcription factor binding profiles. However, chromatin immunoprecipitation analysis in the context of human experimental samples is limited, especially in the case of blood cells. The typically extremely low yields of precipitated DNA are usually not compatible with library amplification for next generation sequencing. We developed a highly reproducible protocol to present a guideline from the first step of isolating monocytes from a blood sample to analyse the distribution of histone modifications in a genome-wide manner. Conclusion: The protocol describes the whole work flow from isolating monocytes from human blood samples followed by a high-sensitivity and small-scale chromatin immunoprecipitation assay with guidance for generating libraries compatible with next generation sequencing from small amounts of immunoprecipitated DNA. PMID:24732314

  17. A HindIII BAC library construction of Mesobuthus martensii Karsch (Scorpiones:Buthidae): an important genetic resource for comparative genomics and phylogenetic analysis.

    PubMed

    Li, Songryong; Ma, Yibao; Jang, Shenghun; Wu, Yingliang; Liu, Hui; Cao, Zhijian; Li, Wenxin

    2009-12-01

    Scorpions are "living but sophisticated fossils" that have changed little in their morphology since their first appearance over the past 450 million years ago. To provide a genetic resource for understanding the evolution of scorpion genome and the relationships between scorpions and other organisms, we first determined the genome size of the scorpion Mesobuthus martensii Karsch (about 600 Mbp) in the order Scorpiones and constructed a HindIII BAC library of the male scorpion M. martensii Karsch from China. The BAC library consists of a total of 46,080 clones with an average insert size of 100 kb, providing a 7.7-fold coverage of the scorpion haploid genome size of 600 Mbp as revealed in this study. High-density colony hybridization-based library screening was performed using 18S-5.8S-28S rRNA gene that is one of the most commonly used phylogenetic markers. Both library screening and PCR identification results revealed six positive BAC clones which were overlapped, and formed a contig of approximately 120 kb covering the rDNA. BAC DNA sequencing analysis determined the complete sequence of M. martensii Karsch rDNA unit that has a total length of 8779 bp, including 1813 bp 18s rDNA, 157 bp 5.8s rDNA, 3823 bp 28s rDNA, 530 bp ETS, 2168 bp ITS1 and 288 bp ITS2. Interestingly, some tandem repeats are present in the rRNA intergenic sequence (IGS) and ITS1/2 regions. These results demonstrated that the BAC library of the scorpion M. martensii Karsch and the complete sequence of rDNA unit will provide important genetic resources and tools for comparative genomics and phylogenetic analysis.

  18. Construction of a California condor BAC library and first-generation chicken-condor comparative physical map as an endangered species conservation genomics resource.

    PubMed

    Romanov, Michael N; Koriabine, Maxim; Nefedov, Mikhail; de Jong, Pieter J; Ryder, Oliver A

    2006-12-01

    To support genomic analysis of the endangered California condor (Gymnogyps californianus), a BAC library (CHORI-262) was generated using DNA from the blood of a female. The library consists of 89,665 recombinant BAC clones providing approximately 14-fold coverage of the presumed approximately 1.48-Gb genome. Taking advantage of recent progress in chicken genomics, we developed a first-generation comparative chicken-condor physical map using an overgo hybridization approach. The overgos were derived from chicken (164 probes) and New World vulture (8 probes) sequences. Screening a 2.8x subset of the total library resulted in 236 BAC-gene assignments with 2.5 positive BAC clones per successful probe. A preliminary comparative chicken-condor BAC-based map included 93 genes. Comparison of selected condor BAC sequences with orthologous chicken sequences suggested a high degree of conserved synteny between the two avian genomes. This work will aid in identification and characterization of candidate loci for the chondrodystrophy mutation to advance genetic management of this disease.

  19. Draft Genome Sequence of Ruminoclostridium sp. Ne3, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus

    PubMed Central

    Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J.

    2015-01-01

    The draft genome sequence of Ruminoclostridium sp. Ne3 was reconstructed from the metagenome of a hydrogenogenic microbial consortium growing on xylan. The organism is likely the primary hemicellulose degrader within the consortium. PMID:25908130

  20. Draft Genome Sequence of Ruminoclostridium sp. Ne3, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus.

    PubMed

    Wang, Han; Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J

    2015-04-23

    The draft genome sequence of Ruminoclostridium sp. Ne3 was reconstructed from the metagenome of a hydrogenogenic microbial consortium growing on xylan. The organism is likely the primary hemicellulose degrader within the consortium. Copyright © 2015 Wang et al.

  1. The MICHR Genomic DNA BioLibrary: An Empirical Study of the Ethics of Biorepository Development

    PubMed Central

    Roessler, Blake J.; Steneck, Nicholas H.; Powell, Lisa

    2015-01-01

    In this article, we report on an effort to study the development and usefulness of a large, broad-use, opt-in biorepository for genomic research, focusing on three ethical issues: providing appropriate understanding, recruiting in ways that do not comprise autonomous decisions, and assessing costs vs. benefits. We conclude: 1) Understanding can be improved by separating the task of informing subjects from documenting informed consent (Common Rule) and permission to use personal health information and samples for research (HIPAA); however, regulations might have to be changed to accommodate this approach. 2) Changing recruiting methods increases efficiency but can interfere with subject autonomy. 3) Finally, we propose a framework for the objective evaluation of the utility of biorepositories and suggest that more attention needs to be paid to use and sustainability. PMID:25742665

  2. Mining New Crystal Protein Genes from Bacillus thuringiensis on the Basis of Mixed Plasmid-Enriched Genome Sequencing and a Computational Pipeline

    PubMed Central

    Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang

    2012-01-01

    We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes. PMID:22544259

  3. Genome-Wide Anaplasma phagocytophilum AnkA-DNA Interactions Are Enriched in Intergenic Regions and Gene Promoters and Correlate with Infection-Induced Differential Gene Expression

    PubMed Central

    Dumler, J. Stephen; Sinclair, Sara H.; Pappas-Brown, Valeria; Shetty, Amol C.

    2016-01-01

    Anaplasma phagocytophilum, an obligate intracellular prokaryote, infects neutrophils, and alters cardinal functions via reprogrammed transcription. Large contiguous regions of neutrophil chromosomes are differentially expressed during infection. Secreted A. phagocytophilum effector AnkA transits into the neutrophil or granulocyte nucleus to complex with DNA in heterochromatin across all chromosomes. AnkA binds to gene promoters to dampen cis-transcription and also has features of matrix attachment region (MAR)-binding proteins that regulate three-dimensional chromatin architecture and coordinate transcriptional programs encoded in topologically-associated chromatin domains. We hypothesize that identification of additional AnkA binding sites will better delineate how A. phagocytophilum infection results in reprogramming of the neutrophil genome. Using AnkA-binding ChIP-seq, we showed that AnkA binds broadly throughout all chromosomes in a reproducible pattern, especially at: (i) intergenic regions predicted to be MARs; (ii) within predicted lamina-associated domains; and (iii) at promoters ≤ 3000 bp upstream of transcriptional start sites. These findings provide genome-wide support for AnkA as a regulator of cis-gene transcription. Moreover, the dominant mark of AnkA in distal intergenic regions known to be AT-enriched, coupled with frequent enrichment in the nuclear lamina, provides strong support for its role as a MAR-binding protein and genome “re-organizer.” AnkA must be considered a prime candidate to promote neutrophil reprogramming and subsequent functional changes that belie improved microbial fitness and pathogenicity. PMID:27703927

  4. Epigenetic Patterns in Blood Associated With Lipid Traits Predict Incident Coronary Heart Disease Events and Are Enriched for Results From Genome-Wide Association Studies

    PubMed Central

    Hedman, Åsa K.; Mendelson, Michael M.; Marioni, Riccardo E.; Gustafsson, Stefan; Joehanes, Roby; Irvin, Marguerite R.; Zhi, Degui; Sandling, Johanna K.; Yao, Chen; Liu, Chunyu; Liang, Liming; Huan, Tianxiao; McRae, Allan F.; Demissie, Serkalem; Shah, Sonia; Starr, John M.; Cupples, L. Adrienne; Deloukas, Panos; Spector, Timothy D.; Sundström, Johan; Krauss, Ronald M.; Arnett, Donna K.; Deary, Ian J.; Lind, Lars; Levy, Daniel

    2017-01-01

    Background— Genome-wide association studies have identified loci influencing circulating lipid concentrations in humans; further information on novel contributing genes, pathways, and biology may be gained through studies of epigenetic modifications. Methods and Results— To identify epigenetic changes associated with lipid concentrations, we assayed genome-wide DNA methylation at cytosine–guanine dinucleotides (CpGs) in whole blood from 2306 individuals from 2 population-based cohorts, with replication of findings in 2025 additional individuals. We identified 193 CpGs associated with lipid levels in the discovery stage (P<1.08E-07) and replicated 33 (at Bonferroni-corrected P<0.05), including 25 novel CpGs not previously associated with lipids. Genes at lipid-associated CpGs were enriched in lipid and amino acid metabolism processes. A differentially methylated locus associated with triglycerides and high-density lipoprotein cholesterol (HDL-C; cg27243685; P=8.1E-26 and 9.3E-19) was associated with cis-expression of a reverse cholesterol transporter (ABCG1; P=7.2E-28) and incident cardiovascular disease events (hazard ratio per SD increment, 1.38; 95% confidence interval, 1.15–1.66; P=0.0007). We found significant cis-methylation quantitative trait loci at 64% of the 193 CpGs with an enrichment of signals from genome-wide association studies of lipid levels (PTC=0.004, PHDL-C=0.008 and Ptriglycerides=0.00003) and coronary heart disease (P=0.0007). For example, genome-wide significant variants associated with low-density lipoprotein cholesterol and coronary heart disease at APOB were cis-methylation quantitative trait loci for a low-density lipoprotein cholesterol–related differentially methylated locus. Conclusions— We report novel associations of DNA methylation with lipid levels, describe epigenetic mechanisms related to previous genome-wide association studies discoveries, and provide evidence implicating epigenetic regulation of reverse cholesterol

  5. Whole Genome Duplication and Enrichment of Metal Cation Transporters Revealed by De Novo Genome Sequencing of Extremely Halotolerant Black Yeast Hortaea werneckii

    PubMed Central

    Jackman, Shaun; Turk, Martina; Sadowski, Ivan; Nislow, Corey; Jones, Steven; Birol, Inanc; Cimerman, Nina Gunde; Plemenitaš, Ana

    2013-01-01

    Hortaea werneckii, ascomycetous yeast from the order Capnodiales, shows an exceptional adaptability to osmotically stressful conditions. To investigate this unusual phenotype we obtained a draft genomic sequence of a H. werneckii strain isolated from hypersaline water of solar saltern. Two of its most striking characteristics that may be associated with a halotolerant lifestyle are the large genetic redundancy and the expansion of genes encoding metal cation transporters. Although no sexual state of H. werneckii has yet been described, a mating locus with characteristics of heterothallic fungi was found. The total assembly size of the genome is 51.6 Mb, larger than most phylogenetically related fungi, coding for almost twice the usual number of predicted genes (23333). The genome appears to have experienced a relatively recent whole genome duplication, and contains two highly identical gene copies of almost every protein. This is consistent with some previous studies that reported increases in genomic DNA content triggered by exposure to salt stress. In hypersaline conditions transmembrane ion transport is of utmost importance. The analysis of predicted metal cation transporters showed that most types of transporters experienced several gene duplications at various points during their evolution. Consequently they are present in much higher numbers than expected. The resulting diversity of transporters presents interesting biotechnological opportunities for improvement of halotolerance of salt-sensitive species. The involvement of plasma P-type H+ ATPases in adaptation to different concentrations of salt was indicated by their salt dependent transcription. This was not the case with vacuolar H+ ATPases, which were transcribed constitutively. The availability of this genomic sequence is expected to promote the research of H. werneckii. Studying its extreme halotolerance will not only contribute to our understanding of life in hypersaline environments, but should also

  6. Construction and characterization of two BAC libraries representing a deep-coverage of the genome of chicory (Cichorium intybus L., Asteraceae)

    PubMed Central

    2010-01-01

    Background The Asteraceae represents an important plant family with respect to the numbers of species present in the wild and used by man. Nonetheless, genomic resources for Asteraceae species are relatively underdeveloped, hampering within species genetic studies as well as comparative genomics studies at the family level. So far, six BAC libraries have been described for the main crops of the family, i.e. lettuce and sunflower. Here we present the characterization of BAC libraries of chicory (Cichorium intybus L.) constructed from two genotypes differing in traits related to sexual and vegetative reproduction. Resolving the molecular mechanisms underlying traits controlling the reproductive system of chicory is a key determinant for hybrid development, and more generally will provide new insights into these traits, which are poorly investigated so far at the molecular level in Asteraceae. Findings Two bacterial artificial chromosome (BAC) libraries, CinS2S2 and CinS1S4, were constructed from HindIII-digested high molecular weight DNA of the contrasting genotypes C15 and C30.01, respectively. C15 was hermaphrodite, non-embryogenic, and S2S2 for the S-locus implicated in self-incompatibility, whereas C30.01 was male sterile, embryogenic, and S1S4. The CinS2S2 and CinS1S4 libraries contain 89,088 and 81,408 clones. Mean insert sizes of the CinS2S2 and CinS1S4 clones are 90 and 120 kb, respectively, and provide together a coverage of 12.3 haploid genome equivalents. Contamination with mitochondrial and chloroplast DNA sequences was evaluated with four mitochondrial and four chloroplast specific probes, and was estimated to be 0.024% and 1.00% for the CinS2S2 library, and 0.028% and 2.35% for the CinS1S4 library. Using two single copy genes putatively implicated in somatic embryogenesis, screening of both libraries resulted in detection of 12 and 13 positive clones for each gene, in accordance with expected numbers. Conclusions This indicated that both BAC libraries

  7. Physical analysis of the complex rye (Secale cereale L.) Alt4 aluminium (aluminum) tolerance locus using a whole-genome BAC library of rye cv. Blanco.

    PubMed

    Shi, B-J; Gustafson, J P; Button, J; Miyazaki, J; Pallotta, M; Gustafson, N; Zhou, H; Langridge, P; Collins, N C

    2009-08-01

    Rye is a diploid crop species with many outstanding qualities, and is important as a source of new traits for wheat and triticale improvement. Rye is highly tolerant of aluminum (Al) toxicity, and possesses a complex structure at the Alt4 Al tolerance locus not found at the corresponding locus in wheat. Here we describe a BAC library of rye cv. Blanco, representing a valuable resource for rye molecular genetic studies, and assess the library's suitability for investigating Al tolerance genes. The library provides 6 x genome coverage of the 8.1 Gb rye genome, has an average insert size of 131 kb, and contains only ~2% of empty or organelle-derived clones. Genetic analysis attributed the Al tolerance of Blanco to the Alt4 locus on the short arm of chromosome 7R, and revealed the presence of multiple allelic variants (haplotypes) of the Alt4 locus in the BAC library. BAC clones containing ALMT1 gene clusters from several Alt4 haplotypes were identified, and will provide useful starting points for exploring the basis for the structural variability and functional specialization of ALMT1 genes at this locus.

  8. The Genome of the Generalist Plant Pathogen Fusarium avenaceum Is Enriched with Genes Involved in Redox, Signaling and Secondary Metabolism

    PubMed Central

    Lysøe, Erik; Harris, Linda J.; Walkowiak, Sean; Subramaniam, Rajagopal; Divon, Hege H.; Riiser, Even S.; Llorens, Carlos; Gabaldón, Toni; Kistler, H. Corby; Jonkers, Wilfried; Kolseth, Anna-Karin; Nielsen, Kristian F.; Thrane, Ulf; Frandsen, Rasmus J. N.

    2014-01-01

    Fusarium avenaceum is a fungus commonly isolated from soil and associated with a wide range of host plants. We present here three genome sequences of F. avenaceum, one isolated from barley in Finland and two from spring and winter wheat in Canada. The sizes of the three genomes range from 41.6–43.1 MB, with 13217–13445 predicted protein-coding genes. Whole-genome analysis showed that the three genomes are highly syntenic, and share>95% gene orthologs. Comparative analysis to other sequenced Fusaria shows that F. avenaceum has a very large potential for producing secondary metabolites, with between 75 and 80 key enzymes belonging to the polyketide, non-ribosomal peptide, terpene, alkaloid and indole-diterpene synthase classes. In addition to known metabolites from F. avenaceum, fuscofusarin and JM-47 were detected for the first time in this species. Many protein families are expanded in F. avenaceum, such as transcription factors, and proteins involved in redox reactions and signal transduction, suggesting evolutionary adaptation to a diverse and cosmopolitan ecology. We found that 20% of all predicted proteins were considered to be secreted, supporting a life in the extracellular space during interaction with plant hosts. PMID:25409087

  9. Quick genome sequencing of “Candidatus Liberibacter” strains by use of Enrichment-Enlargement-Next generation sequencing (EEN)

    USDA-ARS?s Scientific Manuscript database

    Members of “Candidatus Liberibacter” are associated with several important plant diseases such as citrus Huanglongbing (HLB) and potato zebra chip (ZC) disease. Inability to culture and low titers in infected hosts have been major obstacles for research on these bacteria. The use of whole genome seq...

  10. Genomic study and MeSH enrichment analysis of early pregnancy rate and antral follicle numbers in Nelore heifers

    USDA-ARS?s Scientific Manuscript database

    Zebu animals (Bos indicus) are known to take longer to reach puberty when compared to taurine animals (Bos taurus), limiting the supply of animals for harvest or breeding and impacting profitability. Genomic information can be a helpful tool to better understand complex traits, and improve genetic g...

  11. Draft Genome Sequence of Clostridium sp. Ne2, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus.

    PubMed

    Wang, Han; Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J

    2015-04-23

    The draft genome sequence of Clostridium sp. Ne2 was reconstructed from a metagenome of a hydrogenogenic microbial consortium. The organism is most closely related to Clostridium magnum and is a strict anaerobe that is predicted to ferment a range of simple sugars. Copyright © 2015 Wang et al.

  12. Draft Genome Sequence of Clostridium beijerinckii Ne1, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus.

    PubMed

    Wang, Han; Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J

    2015-04-23

    The draft genome of Clostridium beijerinckii strain Ne1 was reconstructed from the metagenomic sequence of a mixed-microbial consortium that produced commercially significant quantities of hydrogen from xylan as a sole feedstock. The organism possesses relatively limited hemicellulolytic capacity and likely requires the action of other organisms to completely degrade xylan. Copyright © 2015 Wang et al.

  13. Draft Genome Sequence of Clostridium beijerinckii Ne1, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus

    PubMed Central

    Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J.

    2015-01-01

    The draft genome of Clostridium beijerinckii strain Ne1 was reconstructed from the metagenomic sequence of a mixed-microbial consortium that produced commercially significant quantities of hydrogen from xylan as a sole feedstock. The organism possesses relatively limited hemicellulolytic capacity and likely requires the action of other organisms to completely degrade xylan. PMID:25908128

  14. Draft Genome Sequence of Clostridium sp. Ne2, Clostridia from an Enrichment Culture Obtained from Australian Subterranean Termite, Nasutitermes exitiosus

    PubMed Central

    Lin, Hai; Tran-Dinh, Nai; Li, Dongmei; Greenfield, Paul; Midgley, David J.

    2015-01-01

    The draft genome sequence of Clostridium sp. Ne2 was reconstructed from a metagenome of a hydrogenogenic microbial consortium. The organism is most closely related to Clostridium magnum and is a strict anaerobe that is predicted to ferment a range of simple sugars. PMID:25908129

  15. Construction of a genomic library of the food spoilage yeast Zygosaccharomyces bailii and isolation of the beta-isopropylmalate dehydrogenase gene (ZbLEU2).

    PubMed

    Rodrigues, F; Zeeman, A M; Alves, C; Sousa, M J; Steensma, H Y; Côrte-Real, M; Leão, C

    2001-04-01

    A genomic library of the yeast Zygosaccharomyces bailii ISA 1307 was constructed in pRS316, a shuttle vector for Saccharomyces cerevisiae and Escherichia coli. The library has an average insert size of 6 kb and covers the genome more than 20 times assuming a genome size similar to that of S. cerevisiae. This new tool has been successfully used, by us and others, to isolate Z. bailii genes. One example is the beta-isopropylmalate dehydrogenase gene (ZbLEU2) of Z. bailii, which was cloned by complementation of a leu2 mutation in S. cerevisiae. An open reading frame encoding a protein with a molecular mass of 38.7 kDa was found. The nucleotide sequence of ZbLEU2 and the deduced amino acid sequence showed a significant degree of identity to those of beta-isopropylmalate dehydrogenases from several other yeast species. The sequence of ZbLEU2 has been deposited in the EMBL data library under accession number AJ292544.

  16. Characterization of rubber tree microRNA in phytohormone response using large genomic DNA libraries, promoter sequence and gene expression analysis.

    PubMed

    Kanjanawattanawong, Supanath; Tangphatsornruang, Sithichoke; Triwitayakorn, Kanokporn; Ruang-areerate, Panthita; Sangsrakru, Duangjai; Poopear, Supannee; Somyong, Suthasinee; Narangajavana, Jarunya

    2014-10-01

    The para rubber tree is the most widely cultivated tree species for producing natural rubber (NR) latex. Unfortunately, rubber tree characteristics such as a long life cycle, heterozygous genetic backgrounds, and poorly understood genetic profiles are the obstacles to breeding new rubber tree varieties, such as those with improved NR yields. Recent evidence has revealed the potential importance of controlling microRNA (miRNA) decay in some aspects of NR regulation. To gain a better understanding of miRNAs and their relationship with rubber tree gene regulation networks, large genomic DNA insert-containing libraries were generated to complement the incomplete draft genome sequence and applied as a new powerful tool to predict a function of interested genes. Bacterial artificial chromosome and fosmid libraries, containing a total of 120,576 clones with an average insert size of 43.35 kb, provided approximately 2.42 haploid genome equivalents of coverage based on the estimated 2.15 gb rubber tree genome. Based on these library sequences, the precursors of 1 member of rubber tree-specific miRNAs and 12 members of conserved miRNAs were successfully identified. A panel of miRNAs was characterized for phytohormone response by precisely identifying phytohormone-responsive motifs in their promoter sequences. Furthermore, the quantitative real-time PCR on ethylene stimulation of rubber trees was performed to demonstrate that the miR2118, miR159, miR164 and miR166 are responsive to ethylene, thus confirmed the prediction by genomic DNA analysis. The cis-regulatory elements identified in the promoter regions of these miRNA genes help augment our understanding of miRNA gene regulation and provide a foundation for further investigation of the regulation of rubber tree miRNAs.

  17. Genomic Survey and Biochemical Analysis of Recombinant Candidate Cyanobacteriochromes Reveals Enrichment for Near UV/Violet Sensors in the Halotolerant and Alkaliphilic Cyanobacterium Microcoleus IPPAS B353*

    PubMed Central

    Cho, Sung Mi; Jeoung, Sae Chae; Song, Ji-Young; Kupriyanova, Elena V.; Pronina, Natalia A.; Lee, Bong-Woo; Jo, Seong-Whan; Park, Beom-Seok; Choi, Sang-Bong; Song, Ji-Joon; Park, Youn-Il

    2015-01-01

    Cyanobacteriochromes (CBCRs), which are exclusive to and widespread among cyanobacteria, are photoproteins that sense the entire range of near-UV and visible light. CBCRs are related to the red/far-red phytochromes that utilize linear tetrapyrrole (bilin) chromophores. Best characterized from the unicellular cyanobacterium Synechocystis sp. PCC 6803 and the multicellular heterocyst forming filamentous cyanobacteria Nostoc punctiforme ATCC 29133 and Anabaena sp. PCC 7120, CBCRs have been poorly investigated in mat-forming, nonheterocystous cyanobacteria. In this study, we sequenced the genome of one of such species, Microcoleus IPPAS B353 (Microcoleus B353), and identified two phytochromes and seven CBCRs with one or more bilin-binding cGMP-specific phosphodiesterase, adenylyl cyclase and FhlA (GAF) domains. Biochemical and spectroscopic measurements of 23 purified GAF proteins from phycocyanobilin (PCB) producing recombinant Escherichia coli indicated that 13 of these proteins formed near-UV and visible light-absorbing covalent adducts: 10 GAFs contained PCB chromophores, whereas three contained the PCB isomer, phycoviolobilin (PVB). Furthermore, the complement of Microcoleus B353 CBCRs is enriched in near-UV and violet sensors, but lacks red/green and green/red CBCRs that are widely distributed in other cyanobacteria. We hypothesize that enrichment in short wavelength-absorbing CBCRs is critical for acclimation to high-light environments where this organism is found. PMID:26405033

  18. Enriching the Catalog

    ERIC Educational Resources Information Center

    Tennant, Roy

    2004-01-01

    After decades of costly and time-consuming effort, nearly all libraries have completed the retrospective conversion of their card catalogs to electronic form. However, bibliographic systems still are really not much more than card catalogs on wheels. Enriched content that Amazon.com takes for granted--such as digitized tables of contents, cover…

  19. Enriching the Catalog

    ERIC Educational Resources Information Center

    Tennant, Roy

    2004-01-01

    After decades of costly and time-consuming effort, nearly all libraries have completed the retrospective conversion of their card catalogs to electronic form. However, bibliographic systems still are really not much more than card catalogs on wheels. Enriched content that Amazon.com takes for granted--such as digitized tables of contents, cover…

  20. Generation of a Genome Scale Lentiviral Vector Library for EF1α Promoter-Driven Expression of Human ORFs and Identification of Human Genes Affecting Viral Titer

    PubMed Central

    Škalamera, Dubravka; Dahmer, Mareike; Purdon, Amy S.; Wilson, Benjamin M.; Ranall, Max V.; Blumenthal, Antje; Gabrielli, Brian; Gonda, Thomas J.

    2012-01-01

    The bottleneck in elucidating gene function through high-throughput gain-of-function genome screening is the limited availability of comprehensive libraries for gene overexpression. Lentiviral vectors are the most versatile and widely used vehicles for gene expression in mammalian cells. Lentiviral supernatant libraries for genome screening are commonly generated in the HEK293T cell line, yet very little is known about the effect of introduced sequences on the produced viral titer, which we have shown to be gene dependent. We have generated an arrayed lentiviral vector library for the expression of 17,030 human proteins by using the GATEWAY® cloning system to transfer ORFs from the Mammalian Gene Collection into an EF1alpha promoter-dependent lentiviral expression vector. This promoter was chosen instead of the more potent and widely used CMV promoter, because it is less prone to silencing and provides more stable long term expression. The arrayed lentiviral clones were used to generate viral supernatant by packaging in the HEK293T cell line. The efficiency of transfection and virus production was estimated by measuring the fluorescence of IRES driven GFP, co-expressed with the ORFs. More than 90% of cloned ORFs produced sufficient virus for downstream screening applications. We identified genes which consistently produced very high or very low viral titer. Supernatants from select clones that were either high or low virus producers were tested on a range of cell lines. Some of the low virus producers, including two previously uncharacterized proteins were cytotoxic to HEK293T cells. The library we have constructed presents a powerful resource for high-throughput gain-of-function screening of the human genome and drug-target discovery. Identification of human genes that affect lentivirus production may lead to improved technology for gene expression using lentiviral vectors. PMID:23251614

  1. Genomic library screening for viruses from the human dental plaque revealed pathogen-specific lytic phage sequences.

    PubMed

    Al-Jarbou, Ahmed Nasser

    2012-01-01

    Bacterial pathogenesis presents an astounding arsenal of virulence factors that allow them to conquer many different niches throughout the course of infection. Principally fascinating is the fact that some bacterial species are able to induce different diseases by expression of different combinations of virulence factors. Nevertheless, studies aiming at screening for the presence of bacteriophages in humans have been limited. Such screening procedures would eventually lead to identification of phage-encoded properties that impart increased bacterial fitness and/or virulence in a particular niche, and hence, would potentially be used to reverse the course of bacterial infections. As the human oral cavity represents a rich and dynamic ecosystem for several upper respiratory tract pathogens. However, little is known about virus diversity in human dental plaque which is an important reservoir. We applied the culture-independent approach to characterize virus diversity in human dental plaque making a library from a virus DNA fraction amplified using a multiple displacement method and sequenced 80 clones. The resulting sequence showed 44% significant identities to GenBank databases by TBLASTX analysis. TBLAST homology comparisons showed that 66% was viral; 18% eukarya; 10% bacterial; 6% mobile elements. These sequences were sorted into 6 contigs and 45 single sequences in which 4 contigs and a single sequence showed significant identity to a small region of a putative prophage in the Corynebacterium diphtheria genome. These findings interestingly highlight the uniqueness of over half of the sequences, whilst the dominance of a pathogen-specific prophage sequences imply their role in virulence.

  2. Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA

    PubMed Central

    Parkinson, Nicholas J.; Maslau, Siarhei; Ferneyhough, Ben; Zhang, Gang; Gregory, Lorna; Buck, David; Ragoussis, Jiannis; Ponting, Chris P.; Fischer, Michael D.

    2012-01-01

    New sequencing technologies can address diverse biomedical questions but are limited by a minimum required DNA input of typically 1 μg. We describe how sequencing libraries can be reproducibly created from 20 pg of input DNA using a modified transpososome-mediated fragmentation technique. Resulting libraries incorporate in-line bar-coding, which facilitates sample multiplexes that can be sequenced using Illumina platforms with the manufacturer's sequencing primer. We demonstrate this technique by providing deep coverage sequence of the Escherichia coli K-12 genome that shows equivalent target coverage to a 1-μg input library prepared using standard Illumina methods. Reducing template quantity does, however, increase the proportion of duplicate reads and enriches coverage in low-GC regions. This finding was confirmed with exhaustive resequencing of a mouse library constructed from 20 pg of gDNA input (about seven haploid genomes) resulting in ∼0.4-fold statistical coverage of uniquely mapped fragments. This implies that a near-complete coverage of the mouse genome is obtainable with this approach using 20 genomes as input. Application of this new method now allows genomic studies from low mass samples and routine preparation of sequencing libraries from enrichment procedures. PMID:22090378

  3. ve-SEQ: Robust, unbiased enrichment for streamlined detection and whole-genome sequencing of HCV and other highly diverse pathogens

    PubMed Central

    Trebes, Amy; Brown, Anthony; Klenerman, Paul; Buck, David; Piazza, Paolo; Barnes, Eleanor; Bowden, Rory

    2015-01-01

    The routine availability of high-depth virus sequence data would allow the sensitive detection of resistance-associated variants that can jeopardize HIV or hepatitis C virus (HCV) treatment. We introduce ve-SEQ, a high-throughput method for sequence-specific enrichment and characterization of whole-virus genomes at up to 20% divergence from a reference sequence and 1,000-fold greater sensitivity than direct sequencing. The extreme genetic diversity of HCV led us to implement an algorithm for the efficient design of panels of oligonucleotide probes to capture any sequence among a defined set of targets without detectable bias. ve-SEQ enables efficient detection and sequencing of any HCV genome, including mixtures and intra-host variants, in a single experiment, with greater tolerance of sequence diversity than standard amplification methods and greater sensitivity than metagenomic sequencing, features that are directly applicable to other pathogens or arbitrary groups of target organisms, allowing the combination of sensitive detection with sequencing in many settings. PMID:27092241

  4. Maize genome sequencing by methylation filtration.

    PubMed

    Palmer, Lance E; Rabinowicz, Pablo D; O'Shaughnessy, Andrew L; Balija, Vivekanand S; Nascimento, Lidia U; Dike, Sujit; de la Bastide, Melissa; Martienssen, Robert A; McCombie, W Richard

    2003-12-19

    Gene enrichment strategies offer an alternative to sequencing large and repetitive genomes such as that of maize. We report the generation and analysis of nearly 100,000 undermethylated (or methylation filtration) maize sequences. Comparison with the rice genome reveals that methylation filtration results in a more comprehensive representation of maize genes than those that result from expressed sequence tags or transposon insertion sites sequences. About 7% of the repetitive DNA is unmethylated and thus selected in our libraries, but potentially active transposons and unmethylated organelle genomes can be identified. Reverse transcription polymerase chain reaction can be used to finish the maize transcriptome.

  5. Selective Enrichment and Sequencing of Whole Mitochondrial Genomes in the Presence of Nuclear Encoded Mitochondrial Pseudogenes (Numts)

    PubMed Central

    Wolff, Jonci N.; Shearman, Deborah C. A.; Brooks, Rob C.; Ballard, John W. O.

    2012-01-01

    Numts are an integral component of many eukaryote genomes offering a snapshot of the evolutionary process that led from the incorporation of an α-proteobacterium into a larger eukaryotic cell some 1.8 billion years ago. Although numt sequence can be harnessed as molecular marker, these sequences often remain unidentified and are mistaken for genuine mtDNA leading to erroneous interpretation of mtDNA data sets. It is therefore indispensable that during the process of amplifying and sequencing mitochondrial genes, preventive measures are taken to ensure the exclusion of numts to guarantee the recovery of genuine mtDNA. This applies to mtDNA analyses in general but especially to studies where mtDNAs are sequenced de novo as the launch pad for subsequent mtDNA-based research. By using a combination of dilution series and nested rolling circle amplification (RCA), we present a novel strategy to selectively amplify mtDNA and exclude the amplification of numt sequence. We have successfully applied this strategy to de novo sequence the mtDNA of the Black Field Cricket Teleogryllus commodus, a species known to contain numts. Aligning our assembled sequence to the reference genome of Teleogryllus emma (GenBank EU557269.1) led to the identification of a numt sequence in the reference sequence. This unexpected result further highlights the need of a reliable and accessible strategy to eliminate this source of error. PMID:22606342

  6. Construction of an Ostrea edulis database from genomic and expressed sequence tags (ESTs) obtained from Bonamia ostreae infected haemocytes: Development of an immune-enriched oligo-microarray.

    PubMed

    Pardo, Belén G; Álvarez-Dios, José Antonio; Cao, Asunción; Ramilo, Andrea; Gómez-Tato, Antonio; Planas, Josep V; Villalba, Antonio; Martínez, Paulino

    2016-12-01

    The flat oyster, Ostrea edulis, is one of the main farmed oysters, not only in Europe but also in the United States and Canada. Bonamiosis due to the parasite Bonamia ostreae has been associated with high mortality episodes in this species. This parasite is an intracellular protozoan that infects haemocytes, the main cells involved in oyster defence. Due to the economical and ecological importance of flat oyster, genomic data are badly needed for genetic improvement of the species, but they are still very scarce. The objective of this study is to develop a sequence database, OedulisDB, with new genomic and transcriptomic resources, providing new data and convenient tools to improve our knowledge of the oyster's immune mechanisms. Transcriptomic and genomic sequences were obtained using 454 pyrosequencing and compiled into an O. edulis database, OedulisDB, consisting of two sets of 10,318 and 7159 unique sequences that represent the oyster's genome (WG) and de novo haemocyte transcriptome (HT), respectively. The flat oyster transcriptome was obtained from two strains (naïve and tolerant) challenged with B. ostreae, and from their corresponding non-challenged controls. Approximately 78.5% of 5619 HT unique sequences were successfully annotated by Blast search using public databases. A total of 984 sequences were identified as being related to immune response and several key immune genes were identified for the first time in flat oyster. Additionally, transcriptome information was used to design and validate the first oligo-microarray in flat oyster enriched with immune sequences from haemocytes. Our transcriptomic and genomic sequencing and subsequent annotation have largely increased the scarce resources available for this economically important species and have enabled us to develop an OedulisDB database and accompanying tools for gene expression analysis. This study represents the first attempt to characterize in depth the O. edulis haemocyte transcriptome in

  7. Local enrichment with homopolymeric (dA/dT) DNA in genomes of some lower dipterans and Drosophila melanogaster.

    PubMed

    Stocker, Ann Jacob; Gorab, Eduardo

    2003-04-01

    An investigation into the chromosomal localization of homopolymeric dA/dT was carried out with species of the genera Rhynchosciara, Chironomus, Drosophila and several other taxa. In situ hybridisation probing mitotic and polytene chromosomes with RNA homopolymers was performed, followed by immunological detection of the DNA/RNA hybrid. Use of this method allowed us to assess specific regions of some dipteran genomes, where the signal was generally, but not always, located in heterochromatic regions. Human and Drosophila chromosome regions known to contain dA/dT runs of up to 153 bp were devoid of consistent labelling. The stability of the rA/dT hybrid formed in situ was in agreement with the T(m) for long rA/dT hybrid complexes, suggesting that the method used in this work is able to identify unusually long homopolymeric dA/dT tracts.

  8. DNA Methylation and Genome Evolution in Honeybee: Gene Length, Expression, Functional Enrichment Covary with the Evolutionary Signature of DNA Methylation

    PubMed Central

    Zeng, Jia; Yi, Soojin V.

    2010-01-01

    A growing body of evidence suggests that DNA methylation is functionally divergent among different taxa. The recently discovered functional methylation system in the honeybee Apis mellifera presents an attractive invertebrate model system to study evolution and function of DNA methylation. In the honeybee, DNA methylation is mostly targeted toward transcription units (gene bodies) of a subset of genes. Here, we report an intriguing covariation of length and epigenetic status of honeybee genes. Hypermethylated and hypomethylated genes in honeybee are dramatically different in their lengths for both exons and introns. By analyzing orthologs in Drosophila melanogaster, Acyrthosiphon pisum, and Ciona intestinalis, we show genes that were short and long in the past are now preferentially situated in hyper- and hypomethylated classes respectively, in the honeybee. Moreover, we demonstrate that a subset of high-CpG genes are conspicuously longer than expected under the evolutionary relationship alone and that they are enriched in specific functional categories. We suggest that gene length evolution in the honeybee is partially driven by evolutionary forces related to regulation of gene expression, which in turn is associated with DNA methylation. However, lineage-specific patterns of gene length evolution suggest that there may exist additional forces underlying the observed interaction between DNA methylation and gene lengths in the honeybee. PMID:20924039

  9. Recurrent Rare Genomic Copy Number Variants and Bicuspid Aortic Valve Are Enriched in Early Onset Thoracic Aortic Aneurysms and Dissections.

    PubMed

    Prakash, Siddharth; Kuang, Shao-Qing; Regalado, Ellen; Guo, Dongchuan; Milewicz, Dianna

    2016-01-01

    Thoracic Aortic Aneurysms and Dissections (TAAD) are a major cause of death in the United States. The spectrum of TAAD ranges from genetic disorders, such as Marfan syndrome, to sporadic isolated disease of unknown cause. We hypothesized that genomic copy number variants (CNVs) contribute causally to early onset TAAD (ETAAD). We conducted a genome-wide SNP array analysis of ETAAD patients of European descent who were enrolled in the National Registry of Genetically Triggered Thoracic Aortic Aneurysms and Cardiovascular Conditions (GenTAC). Genotyping was performed on the Illumina Omni-Express platform, using PennCNV, Nexus and CNVPartition for CNV detection. ETAAD patients (n = 108, 100% European American, 28% female, average age 20 years, 55% with bicuspid aortic valves) were compared to 7013 dbGAP controls without a history of vascular disease using downsampled Omni 2.5 data. For comparison, 805 sporadic TAAD patients with late onset aortic disease (STAAD cohort) and 192 affected probands from families with at least two affected relatives (FTAAD cohort) from our institution were screened for additional CNVs at these loci with SNP arrays. We identified 47 recurrent CNV regions in the ETAAD, FTAAD and STAAD groups that were absent or extremely rare in controls. Nine rare CNVs that were either very large (>1 Mb) or shared by ETAAD and STAAD or FTAAD patients were also identified. Four rare CNVs involved genes that cause arterial aneurysms when mutated. The largest and most prevalent of the recurrent CNVs were at Xq28 (two duplications and two deletions) and 17q25.1 (three duplications). The percentage of individuals harboring rare CNVs was significantly greater in the ETAAD cohort (32%) than in the FTAAD (23%) or STAAD (17%) cohorts. We identified multiple loci affected by rare CNVs in one-third of ETAAD patients, confirming the genetic heterogeneity of TAAD. Alterations of candidate genes at these loci may contribute to the pathogenesis of TAAD.

  10. Recurrent Rare Genomic Copy Number Variants and Bicuspid Aortic Valve Are Enriched in Early Onset Thoracic Aortic Aneurysms and Dissections

    PubMed Central

    Prakash, Siddharth; Kuang, Shao-Qing; Regalado, Ellen; Guo, Dongchuan; Milewicz, Dianna

    2016-01-01

    Thoracic Aortic Aneurysms and Dissections (TAAD) are a major cause of death in the United States. The spectrum of TAAD ranges from genetic disorders, such as Marfan syndrome, to sporadic isolated disease of unknown cause. We hypothesized that genomic copy number variants (CNVs) contribute causally to early onset TAAD (ETAAD). We conducted a genome-wide SNP array analysis of ETAAD patients of European descent who were enrolled in the National Registry of Genetically Triggered Thoracic Aortic Aneurysms and Cardiovascular Conditions (GenTAC). Genotyping was performed on the Illumina Omni-Express platform, using PennCNV, Nexus and CNVPartition for CNV detection. ETAAD patients (n = 108, 100% European American, 28% female, average age 20 years, 55% with bicuspid aortic valves) were compared to 7013 dbGAP controls without a history of vascular disease using downsampled Omni 2.5 data. For comparison, 805 sporadic TAAD patients with late onset aortic disease (STAAD cohort) and 192 affected probands from families with at least two affected relatives (FTAAD cohort) from our institution were screened for additional CNVs at these loci with SNP arrays. We identified 47 recurrent CNV regions in the ETAAD, FTAAD and STAAD groups that were absent or extremely rare in controls. Nine rare CNVs that were either very large (>1 Mb) or shared by ETAAD and STAAD or FTAAD patients were also identified. Four rare CNVs involved genes that cause arterial aneurysms when mutated. The largest and most prevalent of the recurrent CNVs were at Xq28 (two duplications and two deletions) and 17q25.1 (three duplications). The percentage of individuals harboring rare CNVs was significantly greater in the ETAAD cohort (32%) than in the FTAAD (23%) or STAAD (17%) cohorts. We identified multiple loci affected by rare CNVs in one-third of ETAAD patients, confirming the genetic heterogeneity of TAAD. Alterations of candidate genes at these loci may contribute to the pathogenesis of TAAD. PMID:27092555

  11. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9.

    PubMed

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-07-01

    CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Free access at http://www.rgenome.net/cas-database/ sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. Cas-Database: web-based genome-wide guide RNA library design for gene knockout screens using CRISPR-Cas9

    PubMed Central

    Park, Jeongbin; Kim, Jin-Soo; Bae, Sangsu

    2016-01-01

    Motivation: CRISPR-derived RNA guided endonucleases (RGENs) have been widely used for both gene knockout and knock-in at the level of single or multiple genes. RGENs are now available for forward genetic screens at genome scale, but single guide RNA (sgRNA) selection at this scale is difficult. Results: We develop an online tool, Cas-Database, a genome-wide gRNA library design tool for Cas9 nucleases from Streptococcus pyogenes (SpCas9). With an easy-to-use web interface, Cas-Database allows users to select optimal target sequences simply by changing the filtering conditions. Furthermore, it provides a powerful way to select multiple optimal target sequences from thousands of genes at once for the creation of a genome-wide library. Cas-Database also provides a web application programming interface (web API) for advanced bioinformatics users. Availability and implementation: Free access at http://www.rgenome.net/cas-database/. Contact: sangsubae@hanyang.ac.kr or jskim01@snu.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153724

  13. Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

    PubMed Central

    Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

    2013-01-01

    Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation

  14. Genomic Signatures of North American Soybean Improvement Inform Diversity Enrichment Strategies and Clarify the Impact of Hybridization.

    PubMed

    Vaughn, Justin N; Li, Zenglu

    2016-09-08

    Crop improvement represents a long-running experiment in artificial selection on a complex trait, namely yield. How such selection relates to natural populations is unclear, but the analysis of domesticated populations could offer insights into the relative role of selection, drift, and recombination in all species facing major shifts in selective regimes. Because of the extreme autogamy exhibited by soybean (Glycine max), many "immortalized" genotypes of elite varieties spanning the last century have been preserved and characterized using ∼50,000 single nucleotide polymorphic (SNP) markers. Also due to autogamy, the history of North American soybean breeding can be roughly divided into pre- and posthybridization eras, allowing for direct interrogation of the role of recombination in improvement and selection. Here, we report on genome-wide characterization of the structure and history of North American soybean populations and the signature of selection in these populations. Supporting previous work, we find that maturity defines population structure. Though the diversity of North American ancestors is comparable to available landraces, prehybridization line selections resulted in a clonal structure that dominated early breeding and explains many of the reductions in diversity found in the initial generations of soybean hybridization. The rate of allele frequency change does not deviate sharply from neutral expectation, yet some regions bare hallmarks of strong selection, suggesting a highly variable range of selection strengths biased toward weak effects. We also discuss the importance of haplotypes as units of analysis when complex traits fall under novel selection regimes.

  15. Genome-wide comparison of the transcriptomes of highly enriched normal and chronic myeloid leukemia stem and progenitor cell populations.

    PubMed

    Gerber, Jonathan M; Gucwa, Jessica L; Esopi, David; Gurel, Meltem; Haffner, Michael C; Vala, Milada; Nelson, William G; Jones, Richard J; Yegnasubramanian, Srinivasan

    2013-05-01

    The persistence leukemia stem cells (LSCs) in chronic myeloid leukemia (CML) despite tyrosine kinase inhibition (TKI) may explain relapse after TKI withdrawal. Here we performed genome-wide transcriptome analysis of highly refined CML and normal stem and progenitor cell populations to identify novel targets for the eradication of CML LSCs using exon microarrays. We identified 97 genes that were differentially expressed in CML versus normal stem and progenitor cells. These included cell surface genes significantly upregulated in CML LSCs: DPP4 (CD26), IL2RA (CD25), PTPRD, CACNA1D, IL1RAP, SLC4A4, and KCNK5. Further analyses of the LSCs revealed dysregulation of normal cellular processes, evidenced by alternative splicing of genes in key cancer signaling pathways such as p53 signaling (e.g. PERP, CDKN1A), kinase binding (e.g. DUSP12, MARCKS), and cell proliferation (MYCN, TIMELESS); downregulation of pro-differentiation and TGF-β/BMP signaling pathways; upregulation of oxidative metabolism and DNA repair pathways; and activation of inflammatory cytokines, including CCL2, and multiple oncogenes (e.g., CCND1). These data represent an important resource for understanding the molecular changes in CML LSCs, which may be exploited to develop novel therapies for eradication these cells and achieve cure.

  16. Genome-wide comparison of the transcriptomes of highly enriched normal and chronic myeloid leukemia stem and progenitor cell populations

    PubMed Central

    Esopi, David; Gurel, Meltem; Haffner, Michael C.; Vala, Milada; Nelson, William G.; Jones, Richard J.; Yegnasubramanian, Srinivasan

    2013-01-01

    The persistence leukemia stem cells (LSCs) in chronic myeloid leukemia (CML) despite tyrosine kinase inhibition (TKI) may explain relapse after TKI withdrawal. Here we performed genome-wide transcriptome analysis of highly refined CML and normal stem and progenitor cell populations to identify novel targets for the eradication of CML LSCs using exon microarrays. We identified 97 genes that were differentially expressed in CML versus normal stem and progenitor cells. These included cell surface genes significantly upregulated in CML LSCs: DPP4 (CD26), IL2RA (CD25), PTPRD, CACNA1D, IL1RAP, SLC4A4, and KCNK5. Further analyses of the LSCs revealed dysregulation of normal cellular processes, evidenced by alternative splicing of genes in key cancer signaling pathways such as p53 signaling (e.g. PERP, CDKN1A), kinase binding (e.g. DUSP12, MARCKS), and cell proliferation (MYCN, TIMELESS); downregulation of pro-differentiation and TGF-β/BMP signaling pathways; upregulation of oxidative metabolism and DNA repair pathways; and activation of inflammatory cytokines, including CCL2, and multiple oncogenes (e.g., CCND1). These data represent an important resource for understanding the molecular changes in CML LSCs, which may be exploited to develop novel therapies for eradication these cells and achieve cure. PMID:23651669

  17. Genome-Centric Analysis of Microbial Populations Enriched by Hydraulic Fracture Fluid Additives in a Coal Bed Methane Production Well.

    PubMed

    Robbins, Steven J; Evans, Paul N; Parks, Donovan H; Golding, Suzanne D; Tyson, Gene W

    2016-01-01

    Coal bed methane (CBM) is generated primarily through the microbial degradation of coal. Despite a limited understanding of the microorganisms responsible for this process, there is significant interest in developing methods to stimulate additional methane production from CBM wells. Physical techniques including hydraulic fracture stimulation are commonly applied to CBM wells, however the effects of specific additives contained in hydraulic fracture fluids on native CBM microbial communities are poorly understood. Here, metagenomic sequencing was applied to the formation waters of a hydraulically fractured and several non-fractured CBM production wells to determine the effect of this stimulation technique on the in-situ microbial community. The hydraulically fractured well was dominated by two microbial populations belonging to the class Phycisphaerae (within phylum Planctomycetes) and candidate phylum Aminicenantes. Populations from these phyla were absent or present at extremely low abundance in non-fractured CBM wells. Detailed metabolic reconstruction of near-complete genomes from these populations showed that their high relative abundance in the hydraulically fractured CBM well could be explained by the introduction of additional carbon sources, electron acceptors, and biocides contained in the hydraulic fracture fluid.

  18. Genome-Centric Analysis of Microbial Populations Enriched by Hydraulic Fracture Fluid Additives in a Coal Bed Methane Production Well

    PubMed Central

    Robbins, Steven J.; Evans, Paul N.; Parks, Donovan H.; Golding, Suzanne D.; Tyson, Gene W.

    2016-01-01

    Coal bed methane (CBM) is generated primarily through the microbial degradation of coal. Despite a limited understanding of the microorganisms responsible for this process, there is significant interest in developing methods to stimulate additional methane production from CBM wells. Physical techniques including hydraulic fracture stimulation are commonly applied to CBM wells, however the effects of specific additives contained in hydraulic fracture fluids on native CBM microbial communities are poorly understood. Here, metagenomic sequencing was applied to the formation waters of a hydraulically fractured and several non-fractured CBM production wells to determine the effect of this stimulation technique on the in-situ microbial community. The hydraulically fractured well was dominated by two microbial populations belonging to the class Phycisphaerae (within phylum Planctomycetes) and candidate phylum Aminicenantes. Populations from these phyla were absent or present at extremely low abundance in non-fractured CBM wells. Detailed metabolic reconstruction of near-complete genomes from these populations showed that their high relative abundance in the hydraulically fractured CBM well could be explained by the introduction of additional carbon sources, electron acceptors, and biocides contained in the hydraulic fracture fluid. PMID:27375557

  19. Genomic Signatures of North American Soybean Improvement Inform Diversity Enrichment Strategies and Clarify the Impact of Hybridization

    PubMed Central

    Vaughn, Justin N.; Li, Zenglu

    2016-01-01

    Crop improvement represents a long-running experiment in artificial selection on a complex trait, namely yield. How such selection relates to natural populations is unclear, but the analysis of domesticated populations could offer insights into the relative role of selection, drift, and recombination in all species facing major shifts in selective regimes. Because of the extreme autogamy exhibited by soybean (Glycine max), many “immortalized” genotypes of elite varieties spanning the last century have been preserved and characterized using ∼50,000 single nucleotide polymorphic (SNP) markers. Also due to autogamy, the history of North American soybean breeding can be roughly divided into pre- and posthybridization eras, allowing for direct interrogation of the role of recombination in improvement and selection. Here, we report on genome-wide characterization of the structure and history of North American soybean populations and the signature of selection in these populations. Supporting previous work, we find that maturity defines population structure. Though the diversity of North American ancestors is comparable to available landraces, prehybridization line selections resulted in a clonal structure that dominated early breeding and explains many of the reductions in diversity found in the initial generations of soybean hybridization. The rate of allele frequency change does not deviate sharply from neutral expectation, yet some regions bare hallmarks of strong selection, suggesting a highly variable range of selection strengths biased toward weak effects. We also discuss the importance of haplotypes as units of analysis when complex traits fall under novel selection regimes. PMID:27402364

  20. A conserved BDNF, glutamate- and GABA-enriched gene module related to human depression identified by coexpression meta-analysis and DNA variant genome-wide association studies.

    PubMed

    Chang, Lun-Ching; Jamain, Stephane; Lin, Chien-Wei; Rujescu, Dan; Tseng, George C; Sibille, Etienne

    2014-01-01

    Large scale gene expression (transcriptome) analysis and genome-wide association studies (GWAS) for single nucleotide polymorphisms have generated a considerable amount of gene- and disease-related information, but heterogeneity and various sources of noise have limited the discovery of disease mechanisms. As systematic dataset integration is becoming essential, we developed methods and performed meta-clustering of gene coexpression links in 11 transcriptome studies from postmortem brains of human subjects with major depressive disorder (MDD) and non-psychiatric control subjects. We next sought enrichment in the top 50 meta-analyzed coexpression modules for genes otherwise identified by GWAS for various sets of disorders. One coexpression module of 88 genes was consistently and significantly associated with GWAS for MDD, other neuropsychiatric disorders and brain functions, and for medical illnesses with elevated clinical risk of depression, but not for other diseases. In support of the superior discriminative power of this novel approach, we observed no significant enrichment for GWAS-related genes in coexpression modules extracted from single studies or in meta-modules using gene expression data from non-psychiatric control subjects. Genes in the identified module encode proteins implicated in neuronal signaling and structure, including glutamate metabotropic receptors (GRM1, GRM7), GABA receptors (GABRA2, GABRA4), and neurotrophic and development-related proteins [BDNF, reelin (RELN), Ephrin receptors (EPHA3, EPHA5)]. These results are consistent with the current understanding of molecular mechanisms of MDD and provide a set of putative interacting molecular partners, potentially reflecting components of a functional module across cells and biological pathways that are synchronously recruited in MDD, other brain disorders and MDD-related illnesses. Collectively, this study demonstrates the importance of integrating transcriptome data, gene coexpression modules

  1. Genome-wide analysis of loss of heterozygosity in breast infiltrating ductal carcinoma distant normal tissue highlights arm specific enrichment and expansion across tumor stages.

    PubMed

    Ruan, Xiaoyang; Liu, Hongfang; Boardman, Lisa; Kocher, Jean-Pierre A

    2014-01-01

    Studies have shown concurrent loss of heterozygosity (LOH) in breast infiltrating ductal carcinoma (IDC) and adjacent or distant normal tissue. However, the overall extent of LOH in normal tissue and their significance to tumorigenesis remain unknown, as existing studies are largely based on selected microsatellite markers. Here we present the first autosome-wide study of LOH in IDC and distant normal tissue using informative loci deduced from SNP array-based and sequencing-based techniques. We show a consistently high LOH concurrence rate in IDC (mean = 24%) and distant normal tissue (m = 54%), suggesting for most patients (31/33) histologically normal tissue contains genomic instability that can be a potential marker of increased IDC risk. Concurrent LOH is more frequent in fragile site related genes like WWOX (9/31), NTRK2 (10/31), and FHIT (7/31) than traditional genetic markers like BRCA1 (0/23), BRCA2 (2/29) and TP53 (1/13). Analysis at arm level shows distant normal tissue has low level but non-random enrichment of LOH (topped by 8p and 16q) significantly correlated with matched IDC (Pearson r = 0.66, p = 3.5E-6) (topped by 8p, 11q, 13q, 16q, 17p, and 17q). The arm-specific LOH enrichment was independently observed in tumor samples from 548 IDC patients when stratified by tumor size based T stages. Fine LOH structure from sequencing data indicates LOH in low order tissues non-randomly overlap (∼67%) with LOH that usually has longer tract length (the length of genomic region affected by LOH) in high order tissues. The consistent observations from multiple datasets suggest progressive LOH in the development of IDC potentially through arm-specific pile up effect with discernible signature in normal tissue. Our finding also suggests that LOH detected in IDC by comparing to paired adjacent or distant normal tissue are more likely underestimated.

  2. Diversity of microbial eukaryotes in sediment at a deep-sea methane cold seep: surveys of ribosomal DNA libraries from raw sediment samples and two enrichment cultures.

    PubMed

    Takishita, Kiyotaka; Yubuki, Naoji; Kakizoe, Natsuki; Inagaki, Yuji; Maruyama, Tadashi

    2007-07-01

    Recent culture-independent surveys of eukaryotic small-subunit ribosomal DNA (SSU rDNA) from many environments have unveiled unexpectedly high diversity of microbial eukaryotes (microeukaryotes) at various taxonomic levels. However, such surveys were most probably biased by various technical difficulties, resulting in underestimation of microeukaryotic diversity. In the present study on oxygen-depleted sediment from a deep-sea methane cold seep of Sagami Bay, Japan, we surveyed the diversity of eukaryotic rDNA in raw sediment samples and in two enrichment cultures. More than half of all clones recovered from the raw sediment samples were of the basidiomycetous fungus Cryptococcus curvatus. Among other clones, phylotypes of eukaryotic parasites, such as Apicomplexa, Ichthyosporea, and Phytomyxea, were identified. On the other hand, we observed a marked difference in phylotype composition in the enrichment samples. Several phylotypes belonging to heterotrophic stramenopiles were frequently found in one enrichment culture, while a phylotype of Excavata previously detected at a deep-sea hydrothermal vent dominated the other. We successfully established a clonal culture of this excavate flagellate. Since these phylotypes were not identified in the raw sediment samples, the approach incorporating a cultivation step successfully found at least a fraction of the "hidden" microeukaryotic diversity in the environment examined.

  3. DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease

    PubMed Central

    Karakachoff, Matilde; Le Scouarnec, Solena; Le Clézio, Camille; Campion, Dominique; Schott, Jean-Jacques

    2017-01-01

    Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the “common disease-common variant” paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer’s disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package. PMID:28742119

  4. DoEstRare: A statistical test to identify local enrichments in rare genomic variants associated with disease.

    PubMed

    Persyn, Elodie; Karakachoff, Matilde; Le Scouarnec, Solena; Le Clézio, Camille; Campion, Dominique; Consortium, French Exome; Schott, Jean-Jacques; Redon, Richard; Bellanger, Lise; Dina, Christian

    2017-01-01

    Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.

  5. Chromosome region-specific libraries for human genome analysis. Progress report, September 1, 1991--August 31, 1992

    SciTech Connect

    Kao, Fa-Ten

    1992-08-01

    During the grant period progress has been made in the successful demonstration of regional mapping of microclones derived from microdissection libraries; successful demonstration of the feasibility of converting microclones with short inserts into yeast artificial chromosome clones with very large inserts for high resolution physical mapping of the dissected region; Successful demonstration of the usefulness of region-specific microclones to isolate region-specific cDNA clones as candidate genes to facilitate search for the crucial genes underlying genetic diseases assigned to the dissected region; and the successful construction of four region-specific microdissection libraries for human chromosome 2, including 2q35-q37, 2q33-q35, 2p23-p25 and 2p2l-p23. The 2q35-q37 library has been characterized in detail. The characterization of the other three libraries is in progress. These region-specific microdissection libraries and the unique sequence microclones derived from the libraries will be valuable resources for investigators engaged in high resolution physical mapping and isolation of disease-related genes residing in these chromosomal regions.

  6. A high-coverage artificial chromosome library for the genome-wide screening of drug-resistance genes in malaria parasites.

    PubMed

    Iwanaga, Shiroh; Kaneko, Izumi; Yuda, Masao

    2012-05-01

    The global spread of drug-resistant parasites is a serious problem for the treatment of malaria. Although identifying drug-resistance genes is crucial for the efforts against resistant parasites, an effective approach has not yet been developed. Here, we report a robust method for identifying resistance genes from parasites by using a Plasmodium artificial chromosome (PAC). Large genomic DNA fragments (10-50 kb) from the drug-resistant rodent malaria parasite Plasmodium berghei were ligated into the PAC and directly introduced into the drug-sensitive (i.e., wild-type) parasite by electroporation, resulting in a PAC library that encompassed the whole genomic sequence of the parasite. Subsequently, the transformed parasites that acquired resistance were selected by screening with the drug, and the resistance gene in the PAC was successfully identified. Furthermore, the drug-resistance gene was identified from a PAC library that was made from the pyrimethamine-resistant parasite Plasmodium chabaudi, further demonstrating the utility of our method. This method will promote the identification of resistance genes and contribute to the global fight against drug-resistant parasites.

  7. [Construction of genomic library of L. interrogans serovar lai using lambda gt11 as the vector and a study of recombiant plasmid pDL121].

    PubMed

    Liu, H; Dai, B; Jing, B; Wu, W; Li, S; Fang, Z; Zhao, H; Ye, D; Yan, R; Liu, J; Song, S; Yang, Y; Zhang, Y; Liu, F; Tu, Y; Yang, H; Huang, Z; Liang, L; Hu, L; Zhao, M

    1997-03-01

    A genomic library of L. interrogans serovar lai strain 017 has been constructed using lambda gt11 as the vector. DNA was partially digested by two blunt-end restriction enzymes, then methylated with EcoR I methylase; after EcoR I linker was added to the DNA, the linker-ended DNA was ligated to the dephosphorylated EcoR I digested lambda gt11 arms. The recombined DNA was packaged in vitro, and used to transduct E. coli Y1090 for amplification. There were 2.1 x 10(6) recombinant bacteriophages as recognized by their ability to form white plaques plated on Lac host in the presence of both IPTG and X-Ga1. A positive clone, designated lambda DL12, was screened with a rabbit anti-serum against L. interrogans serovar lai from the genomic library. The DNA from lambda DL12 was subcloned into plasmid pUC18. A recombinant (designated as pDL121) was obtained. SDS-PAGE analysis indicated that a 23 kd was expressed in E. coli JM 103 harboring pDL121. Western blotting analysis showed that a specific protein band molecular weight of 23 kd could be recognized by the rabbit antiserum against L. interrogans serovar lai strain 017.

  8. Sequencing the maize genome.

    PubMed

    Martienssen, Robert A; Rabinowicz, Pablo D; O'Shaughnessy, Andrew; McCombie, W Richard

    2004-04-01

    Sequencing of complex genomes can be accomplished by enriching shotgun libraries for genes. In maize, gene-enrichment by copy-number normalization (high C(0)t) and methylation filtration (MF) have been used to generate up to two-fold coverage of the gene-space with less than 1 million sequencing reads. Simulations using sequenced bacterial artificial chromosome (BAC) clones predict that 5x coverage of gene-rich regions, accompanied by less than 1x coverage of subclones from BAC contigs, will generate high-quality mapped sequence that meets the needs of geneticists while accommodating unusually high levels of structural polymorphism. By sequencing several inbred strains, we propose a strategy for capturing this polymorphism to investigate hybrid vigor or heterosis.

  9. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries

    PubMed Central

    Saski, Christopher A.; Bhattacharjee, Ranjana; Scheffler, Brian E.; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  10. Genomic Resources for Water Yam (Dioscorea alata L.): Analyses of EST-Sequences, De Novo Sequencing and GBS Libraries.

    PubMed

    Saski, Christopher A; Bhattacharjee, Ranjana; Scheffler, Brian E; Asiedu, Robert

    2015-01-01

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources in several model and non-model plant species. Yam (Dioscorea spp.) is a major food and cash crop in many countries but research efforts have been limited to understand the genetics and generate genomic information for the crop. The availability of a large number of genomic resources including genome-wide molecular markers will accelerate the breeding efforts and application of genomic selection in yams. In the present study, several methods including expressed sequence tags (EST)-sequencing, de novo sequencing, and genotyping-by-sequencing (GBS) profiles on two yam (Dioscorea alata L.) genotypes (TDa 95/00328 and TDa 95-310) was performed to generate genomic resources for use in its improvement programs. This includes a comprehensive set of EST-SSRs, genomic SSRs, whole genome SNPs, and reduced representation SNPs. A total of 1,152 EST-SSRs were developed from >40,000 EST-sequences generated from the two genotypes. A set of 388 EST-SSRs were validated as polymorphic showing a polymorphism rate of 34% when tested on two diverse parents targeted for anthracnose disease. In addition, approximately 40X de novo whole genome sequence coverage was generated for each of the two genotypes, and a total of 18,584 and 15,952 genomic SSRs were identified for TDa 95/00328 and TDa 95-310, respectively. A custom made pipeline resulted in the selection of 573 genomic SSRs common across the two genotypes, of which only eight failed, 478 being polymorphic and 62 monomorphic indicating a polymorphic rate of 83.5%. Additionally, 288,505 high quality SNPs were also identified between these two genotypes. Genotyping by sequencing reads on these two genotypes also revealed 36,790 overlapping SNP positions that are distributed throughout the genome. Our efforts in using different approaches

  11. Optimization of design and production strategies for novel adeno-associated viral display peptide libraries.

    PubMed

    Körbelin, J; Hunger, A; Alawi, M; Sieber, T; Binder, M; Trepel, M

    2017-08-01

    Libraries displaying random peptides on the surface of adeno-associated virus (AAV) are powerful tools for the generation of target-specific gene therapy vectors. However, for unknown reasons the success rate of AAV library screenings is variable and the influence of the production procedure has not been thoroughly evaluated. During library screenings, the capsid variants with the most favorable tropism are enriched over several selection rounds on a target of choice and identified by subsequent sequencing of the encapsidated viral genomes encoding the library capsids with targeting peptide insertions. Thus, a high capsid-genome correlation is crucial to obtain the correct information about the selected capsid variants. Producing AAV libraries by a two-step protocol with pseudotyped library transfer shuttles has been proposed as one way to ensure such a correlation. Here we show that AAV2 libraries produced by such a protocol via transfer shuttles display an unexpected additional bias in the amino-acid composition which confers increased heparin affinity and thus similarity to wildtype AAV2 tropism. This bias may fundamentally impair the intended use of AAV libraries, discouraging the use of transfer shuttles for the production of AAV libraries in the future.

  12. The highest-copy repeats are methylated in the small genome of the early divergent vascular plant Selaginella moellendorffii

    PubMed Central

    Chan, Agnes P; Melake-Berhan, Admasu; O'Brien, Kimberly; Buckley, Stephanie; Quan, Hui; Chen, Dan; Lewis, Matthew; Banks, Jo Ann; Rabinowicz, Pablo D

    2008-01-01

    Background The lycophyte Selaginella moellendorffii is a vascular plant that diverged from the fern/seed plant lineage at least 400 million years ago. Although genomic information for S. moellendorffii is starting to be produced, little is known about basic aspects of its molecular biology. In order to provide the first glimpse to the epigenetic landscape of this early divergent vascular plant, we used the methylation filtration technique. Methylation filtration genomic libraries select unmethylated DNA clones due to the presence of the methylation-dependent restriction endonuclease McrBC in the bacterial host. Results We conducted a characterization of the DNA methylation patterns of the S. moellendorffii genome by sequencing a set of S. moellendorffii shotgun genomic clones, along with a set of methylation filtered clones. Chloroplast DNA, which is typically unmethylated, was enriched in the filtered library relative to the shotgun library, showing that there is DNA methylation in the extremely small S. moellendorffii genome. The filtered library also showed enrichment in expressed and gene-like sequences, while the highest-copy repeats were largely under-represented in this library. These results show that genes and repeats are differentially methylated in the S. moellendorffii genome, as occurs in other plants studied. Conclusion Our results shed light on the genome methylation pattern in a member of a relatively unexplored plant lineage. The DNA methylation data reported here will help understanding the involvement of this epigenetic mark in fundamental biological processes, as well as the evolutionary aspects of epigenetics in land plants. PMID:18549478

  13. In vivo repackaging of recombinant cosmid molecules for analyses of Salmonella typhimurium, Streptococcus mutans, and mycobacterial genomic libraries.

    PubMed Central

    Jacobs, W R; Barrett, J F; Clark-Curtiss, J E; Curtiss, R

    1986-01-01

    Strains of Escherichia coli K-12 were constructed that permitted the amplification of in vitro-packaged recombinant cosmid-transducing particles by in vivo repackaging of recombinant cosmid molecules. Thermal induction of these thermoinducible, excision-defective lysogens containing recombinant cosmid molecules yielded high titers of packaged recombinant cosmids and low levels of PFU. These strains were used to amplify packaged recombinant cosmid libraries of Mycobacterium leprae, Mycobacterium vaccae, Salmonella typhimurium, and Streptococcus mutans DNA. Contiguous and noncontiguous libraries were compared for the successful identification of cloned genes. Construction of noncontiguous libraries allowed the dissociation of desired genes from genes that were deleterious to the survival of a cosmid recombinant and permitted selection for unlinked traits that resulted in a selected phenotype. In vivo repackaging of recombinant cosmids permitted amplification of the original in vitro-packaged collection of transducing particles, storage of cosmid libraries as phage lysates, facilitation of complementation screening, expression analysis of repackaged recombinant cosmids after UV-irradiated cells were infected, in situ enzyme or immunological screening, and facilitation of recovery of recombinant cosmid molecules containing transposon inserts. Images PMID:2937735

  14. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library.

    PubMed

    Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng

    2016-12-01

    CRISPR-Cas9 screens have been widely adopted to analyze coding-gene functions, but high-throughput screening of non-coding elements using this method is more challenging because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. We report a high-throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We validated 9 of 51 lncRNA hits using CRISPR-Cas9-mediated genomic deletion, functional rescue, CRISPR activation or inhibition and gene-expression profiling. Our high-throughput pgRNA genome deletion method will enable rapid identification of functional mammalian non-coding elements.

  15. A Prospective Virtual Screening Study: Enriching Hit Rates and Designing Focus Libraries To Find Inhibitors of PI3Kδ and PI3Kγ.

    PubMed

    Damm-Ganamet, Kelly L; Bembenek, Scott D; Venable, Jennifer W; Castro, Glenda G; Mangelschots, Lieve; Peeters, Daniëlle C G; Mcallister, Heather M; Edwards, James P; Disepio, Daniel; Mirzadegan, Taraneh

    2016-05-12

    Here, we report a high-throughput virtual screening (HTVS) study using phosphoinositide 3-kinase (both PI3Kγ and PI3Kδ). Our initial HTVS results of the Janssen corporate database identified small focused libraries with hit rates at 50% inhibition showing a 50-fold increase over those from a HTS (high-throughput screen). Further, applying constraints based on "chemically intuitive" hydrogen bonds and/or positional requirements resulted in a substantial improvement in the hit rates (versus no constraints) and reduced docking time. While we find that docking scoring functions are not capable of providing a reliable relative ranking of a set of compounds, a prioritization of groups of compounds (e.g., low, medium, and high) does emerge, which allows for the chemistry efforts to be quickly focused on the most viable candidates. Thus, this illustrates that it is not always necessary to have a high correlation between a computational score and the experimental data to impact the drug discovery process.

  16. High-throughput SNP discovery through deep resequencing of a reduced representation library to anchor and orient scaffolds in the soybean whole genome sequence

    PubMed Central

    2010-01-01

    Background The Soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy, but its marker density was only sufficient to properly orient 66% of the sequence scaffolds. The discovery and genetic mapping of more single nucleotide polymorphism (SNP) markers were needed to anchor and orient the remaining genome sequence. To that end, next generation sequencing and high-throughput genotyping were combined to obtain a much higher resolution genetic map that could be used to anchor and orient most of the remaining sequence and to help validate the integrity of the existing scaffold builds. Results A total of 7,108 to 25,047 predicted SNPs were discovered using a reduced representation library that was subsequently sequenced by the Illumina sequence-by-synthesis method on the clonal single molecule array platform. Using multiple SNP prediction methods, the validation rate of these SNPs ranged from 79% to 92.5%. A high resolution genetic map using 444 recombinant inbred lines was created with 1,790 SNP markers. Of the 1,790 mapped SNP markers, 1,240 markers had been selectively chosen to target existing unanchored or un-oriented sequence scaffolds, thereby increasing the amount of anchored sequence to 97%. Conclusion We have demonstrated how next generation sequencing was combined with high-throughput SNP detection assays to quickly discover large numbers of SNPs. Those SNPs were then used to create a high resolution genetic map that assisted in the assembly of scaffolds from the 8× whole genome shotgun sequences into pseudomolecules corresponding to chromosomes of the organism. PMID:20078886

  17. A nanobuffer reporter library for fine-scale imaging and perturbation of endocytic organelles | Office of Cancer Genomics

    Cancer.gov

    Endosomes, lysosomes and related catabolic organelles are a dynamic continuum of vacuolar structures that impact a number of cell physiological processes such as protein/lipid metabolism, nutrient sensing and cell survival. Here we develop a library of ultra-pH-sensitive fluorescent nanoparticles with chemical properties that allow fine-scale, multiplexed, spatio-temporal perturbation and quantification of catabolic organelle maturation at single organelle resolution to support quantitative investigation of these processes in living cells.

  18. Losing Libraries, Saving Libraries

    ERIC Educational Resources Information Center

    Miller, Rebecca

    2010-01-01

    This summer, as public libraries continued to get budget hit after budget hit across the country, several readers asked for a comprehensive picture of the ravages of the recession on library service. In partnership with 2010 Movers & Shakers Laura Solomon and Mandy Knapp, Ohio librarians who bought the Losing Libraries domain name,…

  19. Losing Libraries, Saving Libraries

    ERIC Educational Resources Information Center

    Miller, Rebecca

    2010-01-01

    This summer, as public libraries continued to get budget hit after budget hit across the country, several readers asked for a comprehensive picture of the ravages of the recession on library service. In partnership with 2010 Movers & Shakers Laura Solomon and Mandy Knapp, Ohio librarians who bought the Losing Libraries domain name,…

  20. Genomic resources for water yam (Dioscorea alata L.): analyses of EST-Sequences, De Novo sequencing and GBS libraries

    USDA-ARS?s Scientific Manuscript database

    The reducing cost and rapid progress in next-generation sequencing techniques coupled with high performance computational approaches have resulted in large-scale discovery of advanced genomic resources such as SSRs, SNPs and InDels in several model and non-model plant species. Yam (Dioscorea spp.) i...

  1. Reexamining Content-Enriched Access: Its Effect on Usage and Discovery

    ERIC Educational Resources Information Center

    Tosaka, Yuji; Weng, Cathy

    2011-01-01

    Content-enriched metadata in bibliographic records is considered helpful to library users in identifying and selecting library materials for their needs. The paper presents a study, using circulation data from a medium-sized academic library, of the effect of content-enriched records on library materials usage. The study also examines OPAC search…

  2. Reexamining Content-Enriched Access: Its Effect on Usage and Discovery

    ERIC Educational Resources Information Center

    Tosaka, Yuji; Weng, Cathy

    2011-01-01

    Content-enriched metadata in bibliographic records is considered helpful to library users in identifying and selecting library materials for their needs. The paper presents a study, using circulation data from a medium-sized academic library, of the effect of content-enriched records on library materials usage. The study also examines OPAC search…

  3. Construction and Characterization of a Repetitive DNA Library in Parodontidae (Actinopterygii: Characiformes): A Genomic and Evolutionary Approach to the Degeneration of the W Sex Chromosome

    PubMed Central

    Oliveira, Jordana Inácio Nascimento; Nogaroto, Viviane; Almeida, Mara Cristina; Artoni, Roberto Ferreira; Cestari, Marta Margarete; Moreira-Filho, Orlando; Vicari, Marcelo Ricardo

    2014-01-01

    Abstract Repetitive DNA sequences, including tandem and dispersed repeats, comprise a large portion of eukaryotic genomes and are important for gene regulation, sex chromosome differentiation, and karyotype evolution. In Parodontidae, only the repetitive DNAs WAp and pPh2004 and rDNAs were previously studied using fluorescence in situ hybridization. This study aimed to build a library of repetitive DNA in Parodontidae. We isolated 40 clones using Cot-1; 17 of these clones exhibited similarity to repetitive DNA sequences, including satellites, minisatellites, microsatellites, and class I and class II transposable elements (TEs), from Danio rerio and other organisms. The physical mapping of the clones to chromosomes revealed the presence of a satellite DNA, a Helitron element, and degenerate short interspersed element (SINE), long interspersed element (LINE), and tc1-mariner elements on the sex chromosomes. Some clones exhibited dispersed signals; other sequences were not detected. The 5S rDNA was detected on an autosomal pair. These elements likely function in the molecular degeneration of the W chromosome in Parodontidae. Thus, the location of these elements on the chromosomes is important for understanding the function of these repetitive DNAs and for integrative studies with genome sequencing. The presented data demonstrate that an intensive invasion of TEs occurred during W sex chromosome differentiation in the Parodontidae. PMID:25122415

  4. Construction of a genomic library of the human cytomegalovirus genome and analysis of late transcription of its inverted internal repeat region

    SciTech Connect

    Silva, K.F.S.T.

    1989-01-01

    The investigations described in this dissertation were designed to determine the transcriptionally active DNA sequences of IIR region and to identify the viral mRNA transcribed from the transcriptionally most active DNA sequences of that region during late phase of HCMV Towne infection. Preliminary transcriptional studies which included the hybridization of a southern blot of XbaI digested entire HCMV genome to {sup 32}P-labelled late phase infected cell A{sup +} RNA, indicated that late viral transcripts homologous to XbaI Q fragment of IIR region were very highly abundant while XbaI Q fragment showed a very low transcriptional activity. To facilitate further analysis of late transcription of IIR region, the entire DNA sequences of IIR region were molecularly cloned as U, S, and H BamHI fragments in pACYC-184 plasmid vector. In addition, to be used in future studies on other regions of the genome, except for y and c{prime} smaller fragments the entire 240 kb HCMV genome was cloned as BamHI fragments in the same vector. Furthermore, the U, S, and H BamHI fragments were mapped with six other restriction enzymes in order to use that mapping data in subsequent transcriptional analysis of the IIR region. Further localization of transcriptionally active DNA sequences within IIR region was achieved by hybridization of southern blots of restricted U, S, and H BamHI fragments with 3{prime} {sup 32}P-labelled infected cell late A{sup +} RNA. The 1.5 kb EcooRI subfragments of S BamHI fragment and the adjoining 0.72 kb XhoI subfragment of H BamHI fragment revealed the highest level of transcription, although the remainder of the S fragment was also transcribed at a substantial level. The U fragment and the remainder of the H fragment was transcribed at a very low level.

  5. Identification of three wheat globulin genes by screening a Triticum aestivum BAC genomic library with cDNA from a diabetes-associated globulin

    PubMed Central

    Loit, Evelin; Melnyk, Charles W; MacFarlane, Amanda J; Scott, Fraser W; Altosaar, Illimar

    2009-01-01

    Background Exposure to dietary wheat proteins in genetically susceptible individuals has been associated with increased risk for the development of Type 1 diabetes (T1D). Recently, a wheat protein encoded by cDNA WP5212 has been shown to be antigenic in mice, rats and humans with autoimmune T1D. To investigate the genomic origin of the identified wheat protein cDNA, a hexaploid wheat genomic library from Glenlea cultivar was screened. Results Three unique wheat globulin genes, Glo-3A, Glo3-B and Glo-3C, were identified. We describe the genomic structure of these genes and their expression pattern in wheat seeds. The Glo-3A gene shared 99% identity with the cDNA of WP5212 at the nucleotide and deduced amino acid level, indicating that we have identified the gene(s) encoding wheat protein WP5212. Southern analysis revealed the presence of multiple copies of Glo-3-like sequences in all wheat samples, including hexaploid, tetraploid and diploid species wheat seed. Aleurone and embryo tissue specificity of WP5212 gene expression, suggested by promoter region analysis, which demonstrated an absence of endosperm specific cis elements, was confirmed by immunofluorescence microscopy using anti-WP5212 antibodies. Conclusion Taken together, the results indicate that a diverse group of globulins exists in wheat, some of which could be associated with the pathogenesis of T1D in some susceptible individuals. These data expand our knowledge of specific wheat globulins and will enable further elucidation of their role in wheat biology and human health. PMID:19615078

  6. Elucidation of the Photorhabdus temperata Genome and Generation of a Transposon Mutant Library To Identify Motility Mutants Altered in Pathogenesis

    PubMed Central

    Hurst, Sheldon; Rowedder, Holli; Michaels, Brandye; Bullock, Hannah; Jackobeck, Ryan; Abebe-Akele, Feseha; Durakovic, Umjia; Gately, Jon; Janicki, Erik

    2015-01-01

    ABSTRACT The entomopathogenic nematode Heterorhabditis bacteriophora forms a specific mutualistic association with its bacterial partner Photorhabdus temperata. The microbial symbiont is required for nematode growth and development, and symbiont recognition is strain specific. The aim of this study was to sequence the genome of P. temperata and identify genes that plays a role in the pathogenesis of the Photorhabdus-Heterorhabditis symbiosis. A draft genome sequence of P. temperata strain NC19 was generated. The 5.2-Mb genome was organized into 17 scaffolds and contained 4,808 coding sequences (CDS). A genetic approach was also pursued to identify mutants with altered motility. A bank of 10,000 P. temperata transposon mutants was generated and screened for altered motility patterns. Five classes of motility mutants were identified: (i) nonmotile mutants, (ii) mutants with defective or aberrant swimming motility, (iii) mutant swimmers that do not require NaCl or KCl, (iv) hyperswimmer mutants that swim at an accelerated rate, and (v) hyperswarmer mutants that are able to swarm on the surface of 1.25% agar. The transposon insertion sites for these mutants were identified and used to investigate other physiological properties, including insect pathogenesis. The motility-defective mutant P13-7 had an insertion in the RNase II gene and showed reduced virulence and production of extracellular factors. Genetic complementation of this mutant restored wild-type activity. These results demonstrate a role for RNA turnover in insect pathogenesis and other physiological functions. IMPORTANCE The relationship between Photorhabdus and entomopathogenic nematode Heterorhabditis represents a well-known mutualistic system that has potential as a biological control agent. The elucidation of the genome of the bacterial partner and role that RNase II plays in its life cycle has provided a greater understanding of Photorhabdus as both an insect pathogen and a nematode symbiont. PMID

  7. Resistance gene enrichment sequencing (RenSeq) enables reannotation of the NB-LRR gene family from sequenced plant genomes and rapid mapping of resistance loci in segregating populations.

    PubMed

    Jupe, Florian; Witek, Kamil; Verweij, Walter; Sliwka, Jadwiga; Pritchard, Leighton; Etherington, Graham J; Maclean, Dan; Cock, Peter J; Leggett, Richard M; Bryan, Glenn J; Cardle, Linda; Hein, Ingo; Jones, Jonathan D G

    2013-11-01

    RenSeq is a NB-LRR (nucleotide binding-site leucine-rich repeat) gene-targeted, Resistance gene enrichment and sequencing method that enables discovery and annotation of pathogen resistance gene family members in plant genome sequences. We successfully applied RenSeq to the sequenced potato Solanum tuberosum clone DM, and increased the number of identified NB-LRRs from 438 to 755. The majority of these identified R gene loci reside in poorly or previously unannotated regions of the genome. Sequence and positional details on the 12 chromosomes have been established for 704 NB-LRRs and can be accessed through a genome browser that we provide. We compared these NB-LRR genes and the corresponding oligonucleotide baits with the highest sequence similarity and demonstrated that ~80% sequence identity is sufficient for enrichment. Analysis of the sequenced tomato S. lycopersicum 'Heinz 1706' extended the NB-LRR complement to 394 loci. We further describe a methodology that applies RenSeq to rapidly identify molecular markers that co-segregate with a pathogen resistance trait of interest. In two independent segregating populations involving the wild Solanum species S. berthaultii (Rpi-ber2) and S. ruiz-ceballosii (Rpi-rzc1), we were able to apply RenSeq successfully to identify markers that co-segregate with resistance towards the late blight pathogen Phytophthora infestans. These SNP identification workflows were designed as easy-to-adapt Galaxy pipelines.

  8. Strategies for complete plastid genome sequencing.

    PubMed

    Twyford, Alex D; Ness, Rob W

    2016-10-28

    Plastid sequencing is an essential tool in the study of plant evolution. This high-copy organelle is one of the most technically accessible regions of the genome, and its sequence conservation makes it a valuable region for comparative genome evolution, phylogenetic analysis and population studies. Here, we discuss recent innovations and approaches for de novo plastid assembly that harness genomic tools. We focus on technical developments including low-cost sequence library preparation approaches for genome skimming, enrichment via hybrid baits and methylation-sensitive capture, sequence platforms with higher read outputs and longer read lengths, and automated tools for assembly. These developments allow for a much more streamlined assembly than via conventional short-range PCR. Although newer methods make complete plastid sequencing possible for any land plant or green alga, there are still challenges for producing finished plastomes particularly from herbarium material or from structurally divergent plastids such as those of parasitic plants.

  9. Aquaculture Genomics

    USDA-ARS?s Scientific Manuscript database

    The genomics chapter covers the basics of genome mapping and sequencing and the current status of several relevant species. The chapter briefly describes the development and use of (cDNA, BAC, etc.) libraries for mapping and obtaining specific sequence information. Other topics include comparative ...

  10. Comparative genome analysis of Lactococcus garvieae using a suppression subtractive hybridization library: discovery of novel DNA signatures.

    PubMed

    Kim, Wonyong; Park, Hee Kuk; Thanh, Hien Dang; Lee, Bo-Young; Shin, Jong Wook; Shin, Hyoung-Shik

    2011-12-01

    Lactococcus garvieae, the pathogenic species in the genus Lactococcus, is recognized as an emerging pathogen in fish, animals, and humans. Despite the widespread distribution and emerging clinical significance of L. garvieae, little is known about the genomic content of this microorganism. Suppression subtractive hybridization was performed to identify the genomic differences between L. garvieae and Lactococcus lactis ssp. lactis, its closest phylogenetic neighbor, and the type species of the genus Lactococcus. Twenty-seven clones were specific to L. garvieae and were highly different from Lactococcus lactis in their nucleotide and protein sequences. Lactococcus garvieae primer sets were subsequently designed for two of these clones corresponding to a pyrH gene and a novel DNA signature for application in the specific detection of L. garvieae. The primer specificities were evaluated relative to three previously described 16S rRNA gene-targeted methods using 32 Lactococcus and closely related strains. Both newly designed primer sets were highly specific to L. garvieae and performed better than did the existing primers. Our findings may be useful for developing more stable and accurate tools for the discrimination of L. garvieae from other closely related species.

  11. Progress in the characterization of a human genomic YAC library selected on the basis of homology to T{sub 2}AG{sub 3}

    SciTech Connect

    Vocero-Akbani, A.; Sanjurjo, H.; Fair, K.

    1994-09-01

    Using a combination of physical and genetic mapping methods we have characterized more than 190 YAC clones originally isolated on the basis of hybridization to the human telomere regions by FISH (using Alu-PCR products or YAC subclones individually or pooled as probes). Thirty-seven of the YACs mapped to single telomeres while 16 mapped to more than one telomere, or to interstitial regions, including centromeres. Subclone libraries were constructed for a subset of YACs, genetic markers developed, and the loci incorporated into genetic maps for chromosomes 2, 6, 7, 8, 10, 12, 13, 14 and 20. Altogether 28 different telomeres are now defined by chromosomally mapped STSs which were derived from YACs that were FISH mapped to the termini of 1p, 2p{sup *}, 2q{sup +}, 3p, 3q, 4q, 5q, 6q{sup *}, 7p, 7q{sup *+}, 8p{sup +}, 9q, 10p{sup *}, 10q, 11q, 12p{sup *}, 13q{sup *+}, 14q{sup *+}, 16p, 16q, 17p, 17q, 18p, 18q, 20p, 21q, and 22q ({sup *} microsatellite marker, {sup +}RFLP). Development of microsatellite genetic markers for the five additional telomeres is currently in progress [7p (50 b), 10q (275 kb). 17p (100 kb), 17q (175 kb), and 18p (225 kb)]. For YACs that have been localized to telomeres by FISH and to chromosomes by STS mapping to a rodent/human somatic cell hybrid chromosome panel, five genome equivalent bacteriophage lamda subclone libraries have been constructed and screened for the presence of human DNA and CA{sub n} dinucleotide repeats by plaque filter hybridization. A number of CA positive clones have been sequenced revealing simple repeats of 12 or more CAs per clone. STS development and testing for polymorphism using the CEPH pedigree resource is in progress.

  12. The draft genome sequence of the ascomycete fungus Penicillium subrubescens reveals a highly enriched content of plant biomass related CAZymes compared to related fungi.

    PubMed

    Peng, Mao; Dilokpimol, Adiphol; Mäkelä, Miia R; Hildén, Kristiina; Bervoets, Sander; Riley, Robert; Grigoriev, Igor V; Hainaut, Matthieu; Henrissat, Bernard; de Vries, Ronald P; Granchi, Zoraide

    2017-03-20

    Here we report the genome sequence of the ascomycete saprobic fungus Penicillium subrubescens FBCC1632/CBS132785 isolated from a Jerusalem artichoke field in Finland. The 39.75Mb genome containing 14,188 gene models is highly similar for that reported for other Penicillium species, but contains a significantly higher number of putative carbohydrate active enzyme (CAZyme) encoding genes.

  13. The genome of the of the generalist plant pathogenic fungus Fusarium avenaceum is enriched with genes involved in redox, signaling and secondary metabolism

    USDA-ARS?s Scientific Manuscript database

    Fusarium avenaceum is a fungus commonly isolated from soil and with a wide range of host plants. We present here three genome sequences of F. avenaceum, one isolated from barley in Finland and two from spring and winter wheat in Canada. The physical sizes of the three genomes range from 41.6-43.2 MB...

  14. G-quadruplex (G4) motifs in the maize (Zea mays L.) genome are enriched at specific locations in thousands of genes coupled to energy status, hypoxia, low sugar, and nutrient deprivation.

    PubMed

    Andorf, Carson M; Kopylov, Mykhailo; Dobbs, Drena; Koch, Karen E; Stroupe, M Elizabeth; Lawrence, Carolyn J; Bass, Hank W

    2014-12-20

    The G-quadruplex (G4) elements comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single-stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The reversibility and structural diversity of G4s make them highly versatile genetic structures, as demonstrated by their roles in various functions including telomere metabolism, genome maintenance, immunoglobulin gene diversification, transcription, and translation. Sequence motifs capable of forming G4 DNA are typically located in telomere repeat DNA and other non-telomeric genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 non-telomeric G4 motifs in maize (Zea mays L., B73 AGPv2), 29% of which were in non-repetitive genomic regions. G4 motif hotspots exhibited non-random enrichment in genes at two locations on the antisense strand, one in the 5' UTR and the other at the 5' end of the first intron. Several genic G4 motifs were shown to adopt sequence-specific and potassium-dependent G4 DNA structures in vitro. The G4 motifs were prevalent in key regulatory genes associated with hypoxia (group VII ERFs), oxidative stress (DJ-1/GATase1), and energy status (AMPK/SnRK) pathways. They also showed statistical enrichment for genes in metabolic pathways that function in glycolysis, sugar degradation, inositol metabolism, and base excision repair. Collectively, the maize G4 motifs may represent conditional regulatory elements that can aid in energy status gene responses. Such a network of elements could provide a mechanistic basis for linking energy status signals to gene regulation in maize, a model genetic system and major world crop species for feed, food, and fuel.

  15. A bacterial genome in transition - an exceptional enrichment of IS elements but lack of evidence for recent transposition in the symbiont Amoebophilus asiaticus

    PubMed Central

    2011-01-01

    Background Insertion sequence (IS) elements are important mediators of genome plasticity and are widespread among bacterial and archaeal genomes. The 1.88 Mbp genome of the obligate intracellular amoeba symbiont Amoebophilus asiaticus contains an unusually large number of transposase genes (n = 354; 23% of all genes). Results The transposase genes in the A. asiaticus genome can be assigned to 16 different IS elements termed ISCaa1 to ISCaa16, which are represented by 2 to 24 full-length copies, respectively. Despite this high IS element load, the A. asiaticus genome displays a GC skew pattern typical for most bacterial genomes, indicating that no major rearrangements have occurred recently. Additionally, the high sequence divergence of some IS elements, the high number of truncated IS element copies (n = 143), as well as the absence of direct repeats in most IS elements suggest that the IS elements of A. asiaticus are transpositionally inactive. Although we could show transcription of 13 IS elements, we did not find experimental evidence for transpositional activity, corroborating our results from sequence analyses. However, we detected contiguous transcripts between IS elements and their downstream genes at nine loci in the A. asiaticus genome, indicating that some IS elements influence the transcription of downstream genes, some of which might be important for host cell interaction. Conclusions Taken together, the IS elements in the A. asiaticus genome are currently in the process of degradation and largely represent reflections of the evolutionary past of A. asiaticus in which its genome was shaped by their activity. PMID:21943072

  16. Creating Library Spaces: Libraries 2040.

    ERIC Educational Resources Information Center

    Bruijnzeels, Rob

    This paper suggests that by 2004, the traditional public libraries will have ceased to exist and new, attractive future libraries will have taken their place. The Libraries 2040 project of the Netherlands is initiating seven different libraries of the future. The Brabant library is the "ultimate library of the future" for the Dutch…

  17. Methods for Selecting Phage Display Antibody Libraries.

    PubMed

    Jara-Acevedo, Ricardo; Diez, Paula; Gonzalez-Gonzalez, Maria; Degano, Rosa Maria; Ibarrola, Nieves; Gongora, Rafael; Orfao, Alberto; Fuentes, Manuel

    2016-01-01

    The selection process aims sequential enrichment of phage antibody display library in clones that recognize the target of interest or antigen as the library undergoes successive rounds of selection. In this review, selection methods most commonly used for phage display antibody libraries have been comprehensively described.

  18. Project ENRICH.

    ERIC Educational Resources Information Center

    Gwaley, Elizabeth; And Others

    Project ENRICH was conceived in Beaver County, Pennsylvania, to: (1) identify preschool children with learning disabilities, and (2) to develop a program geared to the remediation of the learning disabilities within a school year, while allowing the child to be enrolled in a regular class situation for the following school year. Through…

  19. Job Enrichment

    ERIC Educational Resources Information Center

    Sanders, Rick

    1970-01-01

    Job enrichment means giving people more decision-making power, more responsibility, more grasp of the totality of the job, and a sense of their own importance in the company. This article presents evidence of the successful working of this approach (Donnelly Mirrors), and the lack of success with an opposing approach (General Motors). (NL)

  20. Motif enrichment tool.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/.

  1. HGVA: the Human Genome Variation Archive

    PubMed Central

    Coll, Jacobo; Haimel, Matthias; Kandasamy, Swaathi; Tarraga, Joaquin; Furio-Tari, Pedro; Bari, Wasim; Bleda, Marta; Rueda, Antonio; Gräf, Stefan; Rendon, Augusto

    2017-01-01

    Abstract High-profile genomic variation projects like the 1000 Genomes project or the Exome Aggregation Consortium, are generating a wealth of human genomic variation knowledge which can be used as an essential reference for identifying disease-causing genotypes. However, accessing these data, contrasting the various studies and integrating those data in downstream analyses remains cumbersome. The Human Genome Variation Archive (HGVA) tackles these challenges and facilitates access to genomic data for key reference projects in a clean, fast and integrated fashion. HGVA provides an efficient and intuitive web-interface for easy data mining, a comprehensive RESTful API and client libraries in Python, Java and JavaScript for fast programmatic access to its knowledge base. HGVA calculates population frequencies for these projects and enriches their data with variant annotation provided by CellBase, a rich and fast annotation solution. HGVA serves as a proof-of-concept of the genome analysis developments being carried out by the University of Cambridge together with UK's 100 000 genomes project and the National Institute for Health Research BioResource Rare-Diseases, in particular, deploying open-source for Computational Biology (OpenCB) software platform for storing and analyzing massive genomic datasets. PMID:28535294

  2. Library Computing.

    ERIC Educational Resources Information Center

    Library Journal, 1985

    1985-01-01

    This special supplement to "Library Journal" and "School Library Journal" includes articles on technological dependency, promise of computers for reluctant readers, copyright and database downloading, access to neighborhood of Mister Rogers, library acquisitions, circulating personal computers, "microcomputeritis,"…

  3. Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum

    PubMed Central

    Grativol, Clícia; Regulski, Michael; Bertalan, Marcelo; McCombie, W. Richard; da Silva, Felipe Rodrigues; Neto, Adhemar Zerlotini; Vicentini, Renato; Farinelli, Laurent; Hemerly, Adriana Silva; Martienssen, Robert A.; Ferreira, Paulo Cavalcanti Gomes

    2015-01-01

    SUMMARY Many economically important crops have large and complex genomes, which hampers sequencing of their genome by standard methods such as WGS. Large tracts of methylated repeats occur at plant genomes interspersed by hypomethylated gene-rich regions. Gene enrichment strategies based on methylation profile offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration (MF) with McrBC digestion to enrich for euchromatic regions of sugarcane genome. To verify the efficiency of MF and the assembly quality of sequences submitted to gene-enrichment strategy, we have compared assemblies using MF and unfiltered (UF) libraries. The MF allowed the achievement of a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5 times more scaffolds and 1.7 times more assembled Mb compared to unfiltered scaffolds. The coverage of sorghum CDS by MF scaffolds was at least 36% higher than by UF scaffolds. Using MF technology, we increased by 134X the coverage of genic regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds covering all genes at sugarcane BACs, 97.2% of sugarcane ESTs, 92.7% of sugarcane RNA-seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds encoding enzymes of the sucrose/starch pathway discovered 291 SNPs in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes were also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and improvement of sugarcane as a biofuel crop. PMID:24773339

  4. Sugarcane genome sequencing by methylation filtration provides tools for genomic research in the genus Saccharum.

    PubMed

    Grativol, Clícia; Regulski, Michael; Bertalan, Marcelo; McCombie, W Richard; da Silva, Felipe Rodrigues; Zerlotini Neto, Adhemar; Vicentini, Renato; Farinelli, Laurent; Hemerly, Adriana Silva; Martienssen, Robert A; Ferreira, Paulo Cavalcanti Gomes

    2014-07-01

    Many economically important crops have large and complex genomes that hamper their sequencing by standard methods such as whole genome shotgun (WGS). Large tracts of methylated repeats occur in plant genomes that are interspersed by hypomethylated gene-rich regions. Gene-enrichment strategies based on methylation profiles offer an alternative to sequencing repetitive genomes. Here, we have applied methyl filtration with McrBC endonuclease digestion to enrich for euchromatic regions in the sugarcane genome. To verify the efficiency of methylation filtration and the assembly quality of sequences submitted to gene-enrichment strategy, we have compared assemblies using methyl-filtered (MF) and unfiltered (UF) libraries. The use of methy filtration allowed a better assembly by filtering out 35% of the sugarcane genome and by producing 1.5× more scaffolds and 1.7× more assembled Mb in length compared with unfiltered dataset. The coverage of sorghum coding sequences (CDS) by MF scaffolds was at least 36% higher than by the use of UF scaffolds. Using MF technology, we increased by 134× the coverage of gene regions of the monoploid sugarcane genome. The MF reads assembled into scaffolds that covered all genes of the sugarcane bacterial artificial chromosomes (BACs), 97.2% of sugarcane expressed sequence tags (ESTs), 92.7% of sugarcane RNA-seq reads and 98.4% of sorghum protein sequences. Analysis of MF scaffolds from encoded enzymes of the sucrose/starch pathway discovered 291 single-nucleotide polymorphisms (SNPs) in the wild sugarcane species, S. spontaneum and S. officinarum. A large number of microRNA genes was also identified in the MF scaffolds. The information achieved by the MF dataset provides a valuable tool for genomic research in the genus Saccharum and for improvement of sugarcane as a biofuel crop. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  5. Pilot Sequencing of Onion Genomic DNA Reveals Fragments of Transposable Elements, Low Gene Densities, and Significant Gene Enrichment After Methyl Filtration

    USDA-ARS?s Scientific Manuscript database

    Onion (Allium cepa) is a diploid (2n=2x=16) monocot with one of the largest nuclear genomes among cultivated plants, over 6 and 16 times that of maize and rice, respectively. In this study, we sequenced onion BACs to estimate gene densities and investigate the nature and distribution of repetitive ...

  6. Phylogenetic marker development for target enrichment from transcriptome and genome skim data: the pipeline and its application in southern African Oxalis (Oxalidaceae)

    Treesearch

    Roswitha Schmickl; Aaron Liston; Vojtěch Zeisek; Kenneth Oberlander; Kevin Weitemier; Shannon C. K. Straub; Richard C. Cronn; Léanne L. Dreyer; Jan. Suda

    2016-01-01

    Phylogenetics benefits from using a large number of putatively independent nuclear loci and their combination with other sources of information, such as the plastid and mitochondrial genomes. To facilitate the selection of orthologous low-copy nuclear (LCN) loci for phylogenetics in nonmodel organisms, we created an automated and interactive script to select hundreds...

  7. Screening for the interacting partners of the proteins MamK & MamJ by two-hybrid genomic DNA library of Magnetospirillum magneticum AMB-1.

    PubMed

    Pan, Weidong; Xie, Chunlan; Lv, Jing

    2012-06-01

    Magnetotactic bacteria are a group of prokaryotes capable of sensing and navigating along the earth's magnetic field. The linear alignment of magnetosomes, which acts as a compass needle for orientation, is dependent on the proteins MamJ (amb0964) & MamK (amb0965). We constructed Magnetospirillum magneticum AMB-1 two-hybrid DNA libraries by fusing the random genomic fragments of AMB-1 to the N-terminal domain of the α-subunit of RNA polymerase in vector pTRG and used as preys. The genes mamJ & mamK were cloned in frame with the λ repressor protein (λ cI) in vector pBT and used as baits for screening the binding partners. After preliminary screening, we further confirmed the candidate interactions between selected protein pairs. The results showed that there were relatively strong interactions between MamK versus Amb3498 (flagella motor switch protein fliM), versus Amb0854 MCPs (signal domain of methyl-accepting chemotaxis protein) and versus Amb3568 (GGDEF domain-containing protein), respectively. MamJ versus Amb1722 (hypothetical protein), MamJ versus MamK, and MamK versus Amb1807 (cation transport ATPase) exhibited low level of interaction. Although the TPR repeat protein MamA (amb0971) showed no interaction with either MamJ or MamK, the TPR repeat protein Amb0024 with more motif sequences exhibited relatively strong interaction with MamK. Among the identified proteins, all categorized as signal transduction-related displayed interaction only with MamK and without MamJ, suggesting that magnetotaxis via MamK in Magnetospirillum magneticum AMB-1 might be somehow concerned with the widely accepted chemotaxis mechanism in bacteria.

  8. Cyclic AMP effectors in African trypanosomes revealed by genome-scale RNA interference library screening for resistance to the phosphodiesterase inhibitor CpdA.

    PubMed

    Gould, Matthew K; Bachmaier, Sabine; Ali, Juma A M; Alsford, Sam; Tagoe, Daniel N A; Munday, Jane C; Schnaufer, Achim C; Horn, David; Boshart, Michael; de Koning, Harry P

    2013-10-01

    One of the most promising new targets for trypanocidal drugs to emerge in recent years is the cyclic AMP (cAMP) phosphodiesterase (PDE) activity encoded by TbrPDEB1 and TbrPDEB2. These genes were genetically confirmed as essential, and a high-affinity inhibitor, CpdA, displays potent antitrypanosomal activity. To identify effectors of the elevated cAMP levels resulting from CpdA action and, consequently, potential sites for adaptations giving resistance to PDE inhibitors, resistance to the drug was induced. Selection of mutagenized trypanosomes resulted in resistance to CpdA as well as cross-resistance to membrane-permeable cAMP analogues but not to currently used trypanocidal drugs. Resistance was not due to changes in cAMP levels or in PDEB genes. A second approach, a genome-wide RNA interference (RNAi) library screen, returned four genes giving resistance to CpdA upon knockdown. Validation by independent RNAi strategies confirmed resistance to CpdA and suggested a role for the identified cAMP Response Proteins (CARPs) in cAMP action. CARP1 is unique to kinetoplastid parasites and has predicted cyclic nucleotide binding-like domains, and RNAi repression resulted in >100-fold resistance. CARP2 and CARP4 are hypothetical conserved proteins associated with the eukaryotic flagellar proteome or with flagellar function, with an orthologue of CARP4 implicated in human disease. CARP3 is a hypothetical protein, unique to Trypanosoma. CARP1 to CARP4 likely represent components of a novel cAMP signaling pathway in the parasite. As cAMP metabolism is validated as a drug target in Trypanosoma brucei, cAMP effectors highly divergent from the mammalian host, such as CARP1, lend themselves to further pharmacological development.

  9. Construction of BAC Libraries from Flow-Sorted Chromosomes.

    PubMed

    Šafář, Jan; Šimková, Hana; Doležel, Jaroslav

    2016-01-01

    Cloned DNA libraries in bacterial artificial chromosome (BAC) are the most widely used form of large-insert DNA libraries. BAC libraries are typically represented by ordered clones derived from genomic DNA of a particular organism. In the case of large eukaryotic genomes, whole-genome libraries consist of a hundred thousand to a million clones, which make their handling and screening a daunting task. The labor and cost of working with whole-genome libraries can be greatly reduced by constructing a library derived from a smaller part of the genome. Here we describe construction of BAC libraries from mitotic chromosomes purified by flow cytometric sorting. Chromosome-specific BAC libraries facilitate positional gene cloning, physical mapping, and sequencing in complex plant genomes.

  10. COLD-PCR amplification of bisulfite-converted DNA allows the enrichment and sequencing of rare un-methylated genomic regions.

    PubMed

    Castellanos-Rizaldos, Elena; Milbury, Coren A; Karatza, Elli; Chen, Clark C; Makrigiorgos, G Mike; Merewood, Anne

    2014-01-01

    Aberrant hypo-methylation of DNA is evident in a range of human diseases including cancer and diabetes. Development of sensitive assays capable of detecting traces of un-methylated DNA within methylated samples can be useful in several situations. Here we describe a new approach, fast-COLD-MS-PCR, which amplifies preferentially un-methylated DNA sequences. By employing an appropriate denaturation temperature during PCR of bi-sulfite converted DNA, fast-COLD-MS-PCR enriches un-methylated DNA and enables differential melting analysis or bisulfite sequencing. Using methylation on the MGMT gene promoter as a model, it is shown that serial dilutions of controlled methylation samples lead to the reliable sequencing of un-methylated sequences down to 0.05% un-methylated-to-methylated DNA. Screening of clinical glioma tumor and infant blood samples demonstrated that the degree of enrichment of un-methylated over methylated DNA can be modulated by the choice of denaturation temperature, providing a convenient method for analysis of partially methylated DNA or for revealing and sequencing traces of un-methylated DNA. Fast-COLD-MS-PCR can be useful for the detection of loss of methylation/imprinting in cancer, diabetes or diet-related methylation changes.

  11. Genome distribution of replication-independent histone H1 variants shows H1.0 associated with nucleolar domains and H1X associated with RNA polymerase II-enriched regions.

    PubMed

    Mayor, Regina; Izquierdo-Bouldstridge, Andrea; Millán-Ariño, Lluís; Bustillos, Alberto; Sampaio, Cristina; Luque, Neus; Jordan, Albert

    2015-03-20

    Unlike core histones, the linker histone H1 family is more evolutionarily diverse, and many organisms have multiple H1 variants or subtypes. In mammals, the H1 family includes seven somatic H1 variants; H1.1 to H1.5 are expressed in a replication-dependent manner, whereas H1.0 and H1X are replication-independent. Using ChIP-sequencing data and cell fractionation, we have compared the genomic distribution of H1.0 and H1X in human breast cancer cells, in which we previously observed differential distribution of H1.2 compared with the other subtypes. We have found H1.0 to be enriched at nucleolus-associated DNA repeats and chromatin domains, whereas H1X is associated with coding regions, RNA polymerase II-enriched regions, and hypomethylated CpG islands. Further, H1X accumulates within constitutive or included exons and retained introns and toward the 3' end of expressed genes. Inducible H1X knockdown does not affect cell proliferation but dysregulates a subset of genes related to cell movement and transport. In H1X-depleted cells, the promoters of up-regulated genes are not occupied specifically by this variant, have a lower than average H1 content, and, unexpectedly, do not form an H1 valley upon induction. We conclude that H1 variants are not distributed evenly across the genome and may participate with some specificity in chromatin domain organization or gene regulation. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  12. Genome Distribution of Replication-independent Histone H1 Variants Shows H1.0 Associated with Nucleolar Domains and H1X Associated with RNA Polymerase II-enriched Regions*

    PubMed Central

    Mayor, Regina; Izquierdo-Bouldstridge, Andrea; Millán-Ariño, Lluís; Bustillos, Alberto; Sampaio, Cristina; Luque, Neus; Jordan, Albert

    2015-01-01

    Unlike core histones, the linker histone H1 family is more evolutionarily diverse, and many organisms have multiple H1 variants or subtypes. In mammals, the H1 family includes seven somatic H1 variants; H1.1 to H1.5 are expressed in a replication-dependent manner, whereas H1.0 and H1X are replication-independent. Using ChIP-sequencing data and cell fractionation, we have compared the genomic distribution of H1.0 and H1X in human breast cancer cells, in which we previously observed differential distribution of H1.2 compared with the other subtypes. We have found H1.0 to be enriched at nucleolus-associated DNA repeats and chromatin domains, whereas H1X is associated with coding regions, RNA polymerase II-enriched regions, and hypomethylated CpG islands. Further, H1X accumulates within constitutive or included exons and retained introns and toward the 3′ end of expressed genes. Inducible H1X knockdown does not affect cell proliferation but dysregulates a subset of genes related to cell movement and transport. In H1X-depleted cells, the promoters of up-regulated genes are not occupied specifically by this variant, have a lower than average H1 content, and, unexpectedly, do not form an H1 valley upon induction. We conclude that H1 variants are not distributed evenly across the genome and may participate with some specificity in chromatin domain organization or gene regulation. PMID:25645921

  13. Library 2000.

    ERIC Educational Resources Information Center

    Drake, Miriam A.

    In fall 1984, the Georgia Institute of Technology administration and library staff began planning for Library 2000, a project aimed at creating a showcase library to demonstrate the application of the latest information technology in an academic and research environment. The purposes of Library 2000 include: increasing awareness of students,…

  14. Library Buildings.

    ERIC Educational Resources Information Center

    Manley, Will; And Others

    1989-01-01

    The innovative designs of three libraries are described: the Tempe (Arizona) Public Library, which emphasizes services for children and students; an underground library at Park College, Missouri; and a public library located in the Vancouver (Washington) Mall. The fourth article describes the work going on to restore the Los Angeles (California)…

  15. Library Computing.

    ERIC Educational Resources Information Center

    Dayall, Susan A.; And Others

    1987-01-01

    Six articles on computers in libraries discuss training librarians and staff to use new software; appropriate technology; system upgrades of the Research Libraries Group's information system; pre-IBM PC microcomputers; multiuser systems for small to medium-sized libraries; and a library user's view of the traditional card catalog. (EM)

  16. Special Libraries.

    ERIC Educational Resources Information Center

    Foskett, D. J.

    The Special Library is distinguished from other libraries as being a library serving a particular group of readers, who have an existence as a group outside of their readership of the library, and whose members direct at least some of their activities towards a common purpose. Thus, the special librarian's first and major responsibility is to know…

  17. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains.

  18. Helitrons shaping the genomic architecture of Drosophila: enrichment of DINE-TR1 in α- and β-heterochromatin, satellite DNA emergence, and piRNA expression.

    PubMed

    Dias, Guilherme B; Heringer, Pedro; Svartman, Marta; Kuhn, Gustavo C S

    2015-09-01

    Drosophila INterspersed Elements (DINEs) constitute an abundant but poorly understood group of Helitrons present in several Drosophila species. The general structure of DINEs includes two conserved blocks that may or not contain a region with tandem repeats in between. These central tandem repeats (CTRs) are similar within species but highly divergent between species. It has been assumed that CTRs have independent origins. Herein, we identify a subset of DINEs, termed DINE-TR1, which contain homologous CTRs of approximately 150 bp. We found DINE-TR1 in the sequenced genomes of several Drosophila species and in Bactrocera tryoni (Acalyptratae, Diptera). However, interspecific high sequence identity (∼ 88 %) is limited to the first ∼ 30 bp of each tandem repeat, implying that evolutionary constraints operate differently over the monomer length. DINE-TR1 is unevenly distributed across the Drosophila phylogeny. Nevertheless, sequence analysis suggests vertical transmission. We found that CTRs within DINE-TR1 have independently expanded into satellite DNA-like arrays at least twice within Drosophila. By analyzing the genome of Drosophila virilis and Drosophila americana, we show that DINE-TR1 is highly abundant in pericentromeric heterochromatin boundaries, some telomeric regions and in the Y chromosome. It is also present in the centromeric region of one autosome from D. virilis and dispersed throughout several euchromatic sites in both species. We further found that DINE-TR1 is abundant at piRNA clusters, and small DINE-TR1-derived RNA transcripts (∼25 nt) are predominantly expressed in the testes and the ovaries, suggesting active targeting by the piRNA machinery. These features suggest potential piRNA-mediated regulatory roles for DINEs at local and genome-wide scales in Drosophila.

  19. Adventures in the enormous: a 1.8 million clone BAC library for the 21.7 Gb genome of loblolly pine

    Treesearch

    Zenaida V. Magbanua; Seval Ozkan; Benjamin D. Bartlett; Philippe Chouvarine; Christopher A. Saski; Aaron Liston; Richard C. Cronn; C. Dana Nelson; Daniel G. Peterson

    2011-01-01

    Loblolly pine (LP; Pinus taeda L.) is the most economically important tree in the U.S. and a cornerstone species in southeastern forests. However, genomics research on LP and other conifers has lagged behind studies on flowering plants due, in part, to the large size of conifer genomes. As a means to accelerate conifer genome research, we...

  20. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update.

    PubMed

    Kuleshov, Maxim V; Jones, Matthew R; Rouillard, Andrew D; Fernandez, Nicolas F; Duan, Qiaonan; Wang, Zichen; Koplev, Simon; Jenkins, Sherry L; Jagodnik, Kathleen M; Lachmann, Alexander; McDermott, Michael G; Monteiro, Caroline D; Gundersen, Gregory W; Ma'ayan, Avi

    2016-07-08

    Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

  1. Comparative analyses of stress-responsive genes in Arabidopsis thaliana: insight from genomic data mining, functional enrichment, pathway analysis and phenomics.

    PubMed

    Naika, Mahantesha; Shameer, Khader; Sowdhamini, Ramanathan

    2013-07-01

    Biotic and abiotic stresses adversely affect agriculture by reducing crop growth and productivity worldwide. To investigate the abiotic stress-responsive genes in Arabidopsis thaliana, we compiled a dataset of stress signals and differentially upregulated genes (>= 2.5 fold change) from Stress-responsive transcription Factors DataBase (STIFDB) with additional set of stress signals and genes curated from PubMed and Gene Expression Omnibus. A dataset of 3091 genes differentially upregulated due to 14 different stress signals (abscisic acid, aluminum, cold, cold-drought-salt, dehydration, drought, heat, iron, light, NaCl, osmotic stress, oxidative stress, UV-B and wounding) were curated and used for the analysis. Details about stress-responsive enriched genes and their association with stress signals can be obtained from STIFDB2 database . The gene-stress-signal data were analyzed using an enrichment-based meta-analysis framework consisting of two different ontologies (Gene Ontology and Plant Ontology), biological pathway and functional domain annotations. We found several shared and distinct biological processes, cellular components and molecular functions associated with stress-responsive genes. Pathway analysis revealed that stress-responsive genes perturbed the pathways under the "Metabolic pathways" category. We also found several shared and stress-signal specific protein domains, suggesting functional mechanisms regulating stress-response. Phenomic characteristics of abiotic stress-responsive genes were ascertained for several stresses and found to be shared by multiple stresses in both anatomy and temporal categories of Plant Ontology. We found several constitutive stress-responsive genes that are differentially upregulated due to perturbation of different stress signals, for example a gene (AT1G68440) involved in phenylpropanoid metabolism and polyamine catabolism as responsive to seven different stress signals. We also performed structure-function prediction

  2. A Microbiome DNA Enrichment Method for Next-Generation Sequencing Sample Preparation.

    PubMed

    Yigit, Erbay; Feehery, George R; Langhorst, Bradley W; Stewart, Fiona J; Dimalanta, Eileen T; Pradhan, Sriharsa; Slatko, Barton; Gardner, Andrew F; McFarland, James; Sumner, Christine; Davis, Theodore B

    2016-07-01

    "Microbiome" is used to describe the communities of microorganisms and their genes in a particular environment, including communities in association with a eukaryotic host or part of a host. One challenge in microbiome analysis concerns the presence of host DNA in samples. Removal of host DNA before sequencing results in greater sequence depth of the intended microbiome target population. This unit describes a novel method of microbial DNA enrichment in which methylated host DNA such as human genomic DNA is selectively bound and separated from microbial DNA before next-generation sequencing (NGS) library construction. This microbiome enrichment technique yields a higher fraction of microbial sequencing reads and improved read quality resulting in a reduced cost of downstream data generation and analysis. © 2016 by John Wiley & Sons, Inc.

  3. Overrepresentation of glutamate signaling in Alzheimer's disease: network-based pathway enrichment using meta-analysis of genome-wide association studies.

    PubMed

    Pérez-Palma, Eduardo; Bustos, Bernabé I; Villamán, Camilo F; Alarcón, Marcelo A; Avila, Miguel E; Ugarte, Giorgia D; Reyes, Ariel E; Opazo, Carlos; De Ferrari, Giancarlo V

    2014-01-01

    Genome-wide association studies (GWAS) have successfully identified several risk loci for Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (>248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks (SNs) involved in glutamate signaling were significantly overrepresented in AD (p<2.7×10(-11), p<1.9×10(-11); GW and GATES, respectively). Notably, glutamate signaling SNs were also found to be significantly overrepresented (p<5.1×10(-8)) in the Alzheimer's disease Neuroimaging Initiative (ADNI) study, which was used as a targeted replication sample. Interestingly, components of the glutamate signaling SNs are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.

  4. Identification of a unique library of complex, but ordered, arrays of repetitive elements in the human genome and implication of their potential involvement in pathobiology.

    PubMed

    Lee, Kang-Hoon; Lee, Young-Kwan; Kwon, Deug-Nam; Chiu, Sophia; Chew, Victoria; Rah, Hyungchul; Kujawski, Gregory; Melhem, Ramzi; Hsu, Karen; Chung, Cecilia; Greenhalgh, David G; Cho, Kiho

    2011-06-01

    Approximately 2% of the human genome is reported to be occupied by genes. Various forms of repetitive elements (REs), both characterized and uncharacterized, are presumed to make up the vast majority of the rest of the genomes of human and other species. In conjunction with a comprehensive annotation of genes, information regarding components of genome biology, such as gene polymorphisms, non-coding RNAs, and certain REs, is found in human genome databases. However, the genome-wide profile of unique RE arrangements formed by different groups of REs has not been fully characterized yet. In this study, the entire human genome was subjected to an unbiased RE survey to establish a whole-genome profile of REs and their arrangements. Due to the limitation in query size within the bl2seq alignment program (National Center for Biotechnology Information [NCBI]) utilized for the RE survey, the entire NCBI reference human genome was fragmented into 6206 units of 0.5M nucleotides. A number of RE arrangements with varying complexities and patterns were identified throughout the genome. Each chromosome had unique profiles of RE arrangements and density, and high levels of RE density were measured near the centromere regions. Subsequently, 175 complex RE arrangements, which were selected throughout the genome, were subjected to a comparison analysis using five different human genome sequences. Interestingly, three of the five human genome databases shared the exactly same arrangement patterns and sequences for all 175 RE arrangement regions (a total of 12,765,625 nucleotides). The findings from this study demonstrate that a substantial fraction of REs in the human genome are clustered into various forms of ordered structures. Further investigations are needed to examine whether some of these ordered RE arrangements contribute to the human pathobiology as a functional genome unit. Copyright © 2011 Elsevier Inc. All rights reserved.

  5. Metaproteogenomic analysis of a sulfate-reducing enrichment culture reveals genomic organization of key enzymes in the m-xylene degradation pathway and metabolic activity of proteobacteria.

    PubMed

    Bozinovski, Dragana; Taubert, Martin; Kleinsteuber, Sabine; Richnow, Hans-Hermann; von Bergen, Martin; Vogt, Carsten; Seifert, Jana

    2014-10-01

    This study aimed to ascertain the functional and phylogenetic relationships within an m-xylene degrading sulfate-reducing enrichment culture, which had been maintained for several years in the laboratory with m-xylene as the sole source of carbon and energy. Previous studies indicated that a phylotype affiliated to the Desulfobacteraceae was the main m-xylene assimilating organism. In the present study, genes and gene products were identified by a metaproteogenomic approach using LC-MS/MS analysis of the microbial community, and 2426 peptides were identified from 576 proteins. In the metagenome of the community, gene clusters encoding enzymes involved in fumarate addition to a methyl moiety of m-xylene (nms, bss), as well as gene clusters coding for enzymes involved in modified beta-oxidation to (3-methyl)benzoyl-CoA (bns), were identified in two separate contigs. Additionally, gene clusters containing homologues to bam genes encoding benzoyl-CoA reductase (Bcr) class II, catalyzing the dearomatization of (3-methyl)benzoyl-CoA, were identified. Time-resolved protein stable isotope probing (protein-SIP) experiments using (13)C-labeled m-xylene showed that the respective gene products were highly (13)C-labeled. The present data suggested the identification of gene products that were similar to those involved in methylnaphthalene degradation even though the consortium was not capable of growing in the presence of naphthalene, methylnaphthalene or toluene as substrates. Thus, a novel branch of enzymes was found that was probably specific for anaerobic m-xylene degradation. Copyright © 2014 Elsevier GmbH. All rights reserved.

  6. Special Libraries

    ERIC Educational Resources Information Center

    Lavendel, Giuliana

    1977-01-01

    Discusses problems involved in maintaining special scientific or engineering libraries, including budget problems, remote storage locations, rental computer retrieval systems, protecting trade secrets, and establishing a magnetic tape library. (MLH)

  7. Library Buildings

    ERIC Educational Resources Information Center

    Allen, Walter C.

    1976-01-01

    Examines a century of library architecture in relation to the changing perceptions of library functions, the development of building techniques and materials, fluctuating esthetic fashions and sometimes wildly erratic economic climates. (Author)

  8. Genome and metagenome sequencing: Using the human methyl-binding domain to partition genomic DNA derived from plant tissues1

    PubMed Central

    Yigit, Erbay; Hernandez, David I.; Trujillo, Joshua T.; Dimalanta, Eileen; Bailey, C. Donovan

    2014-01-01

    • Premise of the study: Variation in the distribution of methylated CpG (methyl-CpG) in genomic DNA (gDNA) across the tree of life is biologically interesting and useful in genomic studies. We illustrate the use of human methyl-CpG-binding domain (MBD2) to fractionate angiosperm DNA into eukaryotic nuclear (methyl-CpG-rich) vs. organellar and prokaryotic (methyl-CpG-poor) elements for genomic and metagenomic sequencing projects. • Methods: MBD2 has been used to enrich prokaryotic DNA in animal systems. Using gDNA from five model angiosperm species, we apply a similar approach to identify whether MBD2 can fractionate plant gDNA into methyl-CpG-depleted vs. enriched methyl-CpG elements. For each sample, three gDNA libraries were sequenced: (1) untreated gDNA, (2) a methyl-CpG-depleted fraction, and (3) a methyl-CpG-enriched fraction. • Results: Relative to untreated gDNA, the methyl-depleted libraries showed a 3.2–11.2-fold and 3.4–11.3-fold increase in chloroplast DNA (cpDNA) and mitochondrial DNA (mtDNA), respectively. Methyl-enriched fractions showed a 1.8–31.3-fold and 1.3–29.0-fold decrease in cpDNA and mtDNA, respectively. • Discussion: The application of MBD2 enabled fractionation of plant gDNA. The effectiveness was particularly striking for monocot gDNA (Poaceae). When sufficiently effective on a sample, this approach can increase the cost efficiency of sequencing plant genomes as well as prokaryotes living in or on plant tissues. PMID:25383266

  9. Genome and metagenome sequencing: Using the human methyl-binding domain to partition genomic DNA derived from plant tissues.

    PubMed

    Yigit, Erbay; Hernandez, David I; Trujillo, Joshua T; Dimalanta, Eileen; Bailey, C Donovan

    2014-11-01

    Variation in the distribution of methylated CpG (methyl-CpG) in genomic DNA (gDNA) across the tree of life is biologically interesting and useful in genomic studies. We illustrate the use of human methyl-CpG-binding domain (MBD2) to fractionate angiosperm DNA into eukaryotic nuclear (methyl-CpG-rich) vs. organellar and prokaryotic (methyl-CpG-poor) elements for genomic and metagenomic sequencing projects. • MBD2 has been used to enrich prokaryotic DNA in animal systems. Using gDNA from five model angiosperm species, we apply a similar approach to identify whether MBD2 can fractionate plant gDNA into methyl-CpG-depleted vs. enriched methyl-CpG elements. For each sample, three gDNA libraries were sequenced: (1) untreated gDNA, (2) a methyl-CpG-depleted fraction, and (3) a methyl-CpG-enriched fraction. • Relative to untreated gDNA, the methyl-depleted libraries showed a 3.2-11.2-fold and 3.4-11.3-fold increase in chloroplast DNA (cpDNA) and mitochondrial DNA (mtDNA), respectively. Methyl-enriched fractions showed a 1.8-31.3-fold and 1.3-29.0-fold decrease in cpDNA and mtDNA, respectively. • The application of MBD2 enabled fractionation of plant gDNA. The effectiveness was particularly striking for monocot gDNA (Poaceae). When sufficiently effective on a sample, this approach can increase the cost efficiency of sequencing plant genomes as well as prokaryotes living in or on plant tissues.

  10. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences

    PubMed Central

    Moran, Sebastian; Arribas, Carles; Esteller, Manel

    2016-01-01

    Aim: DNA methylation is the best known epigenetic mark. Cancer and other pathologies show an altered DNA methylome. However, delivering complete DNA methylation maps is compromised by the price and labor-intensive interpretation of single nucleotide methods. Material & methods: Following the success of the HumanMethylation450 BeadChip (Infinium) methylation microarray (450K), we report the technical and biological validation of the newly developed MethylationEPIC BeadChip (Infinium) microarray that covers over 850,000 CpG methylation sites (850K). The 850K microarray contains >90% of the 450K sites, but adds 333,265 CpGs located in enhancer regions identified by the ENCODE and FANTOM5 projects. Results & conclusion: The 850K array demonstrates high reproducibility at the 450K CpG sites, is consistent among technical replicates, is reliable in the matched study of fresh frozen versus formalin-fixed paraffin-embeded samples and is also useful for 5-hydroxymethylcytosine. These results highlight the value of the MethylationEPIC BeadChip as a useful tool for the analysis of the DNA methylation profile of the human genome. PMID:26673039

  11. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences.

    PubMed

    Moran, Sebastian; Arribas, Carles; Esteller, Manel

    2016-03-01

    DNA methylation is the best known epigenetic mark. Cancer and other pathologies show an altered DNA methylome. However, delivering complete DNA methylation maps is compromised by the price and labor-intensive interpretation of single nucleotide methods. Following the success of the HumanMethylation450 BeadChip (Infinium) methylation microarray (450K), we report the technical and biological validation of the newly developed MethylationEPIC BeadChip (Infinium) microarray that covers over 850,000 CpG methylation sites (850K). The 850K microarray contains >90% of the 450K sites, but adds 333,265 CpGs located in enhancer regions identified by the ENCODE and FANTOM5 projects. The 850K array demonstrates high reproducibility at the 450K CpG sites, is consistent among technical replicates, is reliable in the matched study of fresh frozen versus formalin-fixed paraffin-embeded samples and is also useful for 5-hydroxymethylcytosine. These results highlight the value of the MethylationEPIC BeadChip as a useful tool for the analysis of the DNA methylation profile of the human genome.

  12. Library Skills.

    ERIC Educational Resources Information Center

    Paul, Karin; Kuhlthau, Carol C.; Branch, Jennifer L.; Solowan, Diane Galloway; Case, Roland; Abilock, Debbie; Eisenberg, Michael B.; Koechlin, Carol; Zwaan, Sandi; Hughes, Sandra; Low, Ann; Litch, Margaret; Lowry, Cindy; Irvine, Linda; Stimson, Margaret; Schlarb, Irene; Wilson, Janet; Warriner, Emily; Parsons, Les; Luongo-Orlando, Katherine; Hamilton, Donald

    2003-01-01

    Includes 19 articles that address issues related to library skills and Canadian school libraries. Topics include information literacy; inquiry learning; critical thinking and electronic research; collaborative inquiry; information skills and the Big 6 approach to problem solving; student use of online databases; library skills; Internet accuracy;…

  13. Library Skills.

    ERIC Educational Resources Information Center

    Paul, Karin; Kuhlthau, Carol C.; Branch, Jennifer L.; Solowan, Diane Galloway; Case, Roland; Abilock, Debbie; Eisenberg, Michael B.; Koechlin, Carol; Zwaan, Sandi; Hughes, Sandra; Low, Ann; Litch, Margaret; Lowry, Cindy; Irvine, Linda; Stimson, Margaret; Schlarb, Irene; Wilson, Janet; Warriner, Emily; Parsons, Les; Luongo-Orlando, Katherine; Hamilton, Donald

    2003-01-01

    Includes 19 articles that address issues related to library skills and Canadian school libraries. Topics include information literacy; inquiry learning; critical thinking and electronic research; collaborative inquiry; information skills and the Big 6 approach to problem solving; student use of online databases; library skills; Internet accuracy;…

  14. Library Computing.

    ERIC Educational Resources Information Center

    Goodgion, Laurel; And Others

    1986-01-01

    Eight articles in special supplement to "Library Journal" and "School Library Journal" cover a computer program called "Byte into Books"; microcomputers and the small library; creating databases with students; online searching with a microcomputer; quality automation software; Meckler Publishing Company's…

  15. Library Computing

    ERIC Educational Resources Information Center

    Library Computing, 1985

    1985-01-01

    Special supplement to "Library Journal" and "School Library Journal" covers topics of interest to school, public, academic, and special libraries planning for automation: microcomputer use, readings in automation, online searching, databases of microcomputer software, public access to microcomputers, circulation, creating a…

  16. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  17. Informative genomic microsatellite markers for efficient genotyping applications in sugarcane.

    PubMed

    Parida, Swarup K; Kalia, Sanjay K; Kaul, Sunita; Dalal, Vivek; Hemaprabha, G; Selvi, Athiappan; Pandit, Awadhesh; Singh, Archana; Gaikwad, Kishor; Sharma, Tilak R; Srivastava, Prem Shankar; Singh, Nagendra K; Mohapatra, Trilochan

    2009-01-01

    Genomic microsatellite markers are capable of revealing high degree of polymorphism. Sugarcane (Saccharum sp.), having a complex polyploid genome requires more number of such informative markers for various applications in genetics and breeding. With the objective of generating a large set of microsatellite markers designated as Sugarcane Enriched Genomic MicroSatellite (SEGMS), 6,318 clones from genomic libraries of two hybrid sugarcane cultivars enriched with 18 different microsatellite repeat-motifs were sequenced to generate 4.16 Mb high-quality sequences. Microsatellites were identified in 1,261 of the 5,742 non-redundant clones that accounted for 22% enrichment of the libraries. Retro-transposon association was observed for 23.1% of the identified microsatellites. The utility of the microsatellite containing genomic sequences were demonstrated by higher primer designing potential (90%) and PCR amplification efficiency (87.4%). A total of 1,315 markers including 567 class I microsatellite markers were designed and placed in the public domain for unrestricted use. The level of polymorphism detected by these markers among sugarcane species, genera, and varieties was 88.6%, while cross-transferability rate was 93.2% within Saccharum complex and 25% to cereals. Cloning and sequencing of size variant amplicons revealed that the variation in the number of repeat-units was the main source of SEGMS fragment length polymorphism. High level of polymorphism and wide range of genetic diversity (0.16-0.82 with an average of 0.44) assayed with the SEGMS markers suggested their usefulness in various genotyping applications in sugarcane.

  18. Direct Cloning from Enrichment Cultures, a Reliable Strategy for Isolation of Complete Operons and Genes from Microbial Consortia

    PubMed Central

    Entcheva, Plamena; Liebl, Wolfgang; Johann, Andre; Hartsch, Thomas; Streit, Wolfgang R.

    2001-01-01

    Enrichment cultures of microbial consortia enable the diverse metabolic and catabolic activities of these populations to be studied on a molecular level and to be explored as potential sources for biotechnology processes. We have used a combined approach of enrichment culture and direct cloning to construct cosmid libraries with large (>30-kb) inserts from microbial consortia. Enrichment cultures were inoculated with samples from five environments, and high amounts of avidin were added to the cultures to favor growth of biotin-producing microbes. DNA was extracted from three of these enrichment cultures and used to construct cosmid libraries; each library consisted of between 6,000 and 35,000 clones, with an average insert size of 30 to 40 kb. The inserts contained a diverse population of genomic DNA fragments isolated from the consortia organisms. These three libraries were used to complement the Escherichia coli biotin auxotrophic strain ATCC 33767 Δ(bio-uvrB). Initial screens resulted in the isolation of seven different complementing cosmid clones, carrying biotin biosynthesis operons. Biotin biosynthesis capabilities and growth under defined conditions of four of these clones were studied. Biotin measured in the different culture supernatants ranged from 42 to 3,800 pg/ml/optical density unit. Sequencing the identified biotin synthesis genes revealed high similarities to bio operons from gram-negative bacteria. In addition, random sequencing identified other interesting open reading frames, as well as two operons, the histidine utilization operon (hut), and the cluster of genes involved in biosynthesis of molybdopterin cofactors in bacteria (moaABCDE). PMID:11133432

  19. Direct selection: a method for the isolation of cDNAs encoded by large genomic regions.

    PubMed Central

    Lovett, M; Kere, J; Hinton, L M

    1991-01-01

    We have developed a strategy for the rapid enrichment and identification of cDNAs encoded by large genomic regions. The basis of this "direct selection" scheme is the hybridization of an entire library of cDNAs to an immobilized genomic clone. Nonspecific hybrids are eliminated and selected cDNAs are eluted. These molecules are then amplified and are either cloned or subjected to further selection/amplification cycles. This scheme was tested using a 550-kilobase yeast artificial chromosome clone that contains the EPO gene. Using this clone and a fetal kidney cDNA library, we have achieved a 1000-fold enrichment of EPO cDNAs in one cycle of enrichment. More significantly, we have further investigated one of the "anonymous" cDNAs that was selectively enriched. We confirmed that this cDNA was encoded by the yeast artificial chromosome. Its frequency in the starting library was 1 in 1 x 10(5) cDNAs and after selection comprised 2% of the selected library. DNA sequence analysis of this cDNA and of the yeast artificial chromosome clone revealed that this gene encodes the beta 2 subunit of the human guanine nucleotide-binding regulatory proteins. Restriction mapping and hybridization data position this gene (GNB2) to within 30-70 kilobases of the EPO gene. The selective isolation and mapping of GNB2 confirms the feasibility of this direct selection strategy and suggests that it will be useful for the rapid isolation of cDNAs, including disease-related genes, across extensive portions of the human genome. Images PMID:1946378

  20. High-Throughput SNP Discovery through Deep Resequencing of a Reduced Representation Library to Anchor and Orient Scaffolds in the Soybean Whole Genome Sequence

    USDA-ARS?s Scientific Manuscript database

    The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...

  1. Genome-wide annotation of mutations in a phenotyped mutant library provides an efficient platform for discovery of casual gene mutations

    USDA-ARS?s Scientific Manuscript database

    Ethyl methanesulfonate (EMS) efficiently generates high-density mutations in genomes. Conventionally, these mutations are identified by techniques that can detect single-nucleotide mismatches in heteroduplexes of individual PCR amplicons. We applied whole-genome sequencing to 256-phenotyped mutant l...

  2. The National Cryptologic Museum Library

    DTIC Science & Technology

    2010-09-01

    Intelligence Service, books were collected wherever they could be found regardless of age or language . Thus the library has many rare and hard-to-find items...enriched last spring by the aquisition of the personal collection of the late Louis Kruh, a nationally known collector and colleague of Kahn. Among the

  3. Spectral gene set enrichment (SGSE).

    PubMed

    Frost, H Robert; Li, Zhigang; Moore, Jason H

    2015-03-03

    Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes the statistical association between gene sets and principal components (PCs) using our principal component gene set enrichment (PCGSE) method. The overall statistical association between each gene set and the spectral structure of the data is then computed by combining the PC-level p-values using the weighted Z-method with weights set to the PC variance scaled by Tracy-Widom test p-values. Using simulated data, we show that the SGSE algorithm can accurately recover spectral features from noisy data. To illustrate the utility of our method on real data, we demonstrate the superior performance of the SGSE method relative to standard cluster-based techniques for testing the association between MSigDB gene sets and the variance structure of microarray gene expression data. Unsupervised gene set testing can provide important information about the biological signal held in high-dimensional genomic data sets. Because it uses the association between gene sets and samples PCs to generate a measure of unsupervised enrichment, the SGSE method is independent of cluster or network creation algorithms and, most importantly, is able to utilize the statistical significance of PC eigenvalues to ignore elements of the data most likely to represent noise.

  4. Exome Capture with Heterologous Enrichment in Pig (Sus scrofa).

    PubMed

    Guiatti, Denis; Pomari, Elena; Radovic, Slobodanka; Spadotto, Alessandro; Stefanon, Bruno

    2015-01-01

    The discovery of new protein-coding DNA variants related to carcass traits is very important for the Italian pig industry, which requires heavy pigs with higher thickness of subcutaneous fat for Protected Designation of Origin (PDO) productions. Exome capture techniques offer the opportunity to focus on the regions of DNA potentially related to the gene and protein expression. In this research a human commercial target enrichment kit was used to evaluate its performances for pig exome capture and for the identification of DNA variants suitable for comparative analysis. Two pools of 30 pigs each, crosses of Italian Duroc X Large White (DU) and Commercial hybrid X Large White (HY), were used and NGS libraries were prepared with the SureSelectXT Target Enrichment System for Illumina Paired-End Sequencing Library (Agilent). A total of 140.2 M and 162.5 M of raw reads were generated for DU and HY, respectively. Average coverage of all the exonic regions for Sus scrofa (ENSEMBL Sus_scrofa.Sscrofa10.2.73.gtf) was 89.33X for DU and 97.56X for HY; and 35% of aligned bases uniquely mapped to off-target regions. Comparison of sequencing data with the Sscrofa10.2 reference genome, after applying hard filtering criteria, revealed a total of 232,530 single nucleotide variants (SNVs) of which 20.6% mapped in exonic regions and 49.5% within intronic regions. The comparison of allele frequencies of 213 randomly selected SNVs from exome sequencing and the same SNVs analyzed with a Sequenom MassARRAY® system confirms that this "human-on-pig" approach offers new potentiality for the identification of DNA variants in protein-coding genes.

  5. Libraries program

    USGS Publications Warehouse

    2011-01-01

    The U.S. Congress authorized a library for the U.S. Geological Survey (USGS) in 1879. The library was formally established in 1882 with the naming of the first librarian and began with a staff of three and a collection of 1,400 books. Today, the USGS Libraries Program is one of the world's largest Earth and natural science repositories and a resource of national significance used by researchers and the public worldwide.

  6. A Rapid Spin Column-Based Method to Enrich Pathogen Transcripts from Eukaryotic Host Cells Prior to Sequencing

    PubMed Central

    Bent, Zachary W.; Poorey, Kunal; LaBauve, Annette E.; Hamblin, Rachelle; Williams, Kelly P.

    2016-01-01

    When analyzing pathogen transcriptomes during the infection of host cells, the signal-to-background (pathogen-to-host) ratio of nucleic acids (NA) in infected samples is very small. Despite the advancements in next-generation sequencing, the minute amount of pathogen NA makes standard RNA-seq library preps inadequate for effective gene-level analysis of the pathogen in cases with low bacterial loads. In order to provide a more complete picture of the pathogen transcriptome during an infection, we developed a novel pathogen enrichment technique, which can enrich for transcripts from any cultivable bacteria or virus, using common, readily available laboratory equipment and reagents. To evenly enrich for pathogen transcripts, we generate biotinylated pathogen-targeted capture probes in an enzymatic process using the entire genome of the pathogen as a template. The capture probes are hybridized to a strand-specific cDNA library generated from an RNA sample. The biotinylated probes are captured on a monomeric avidin resin in a miniature spin column, and enriched pathogen-specific cDNA is eluted following a series of washes. To test this method, we performed an in vitro time-course infection using Klebsiella pneumoniae to infect murine macrophage cells. K. pneumoniae transcript enrichment efficiency was evaluated using RNA-seq. Bacterial transcripts were enriched up to ~400-fold, and allowed the recovery of transcripts from ~2000–3600 genes not observed in untreated control samples. These additional transcripts revealed interesting aspects of K. pneumoniae biology including the expression of putative virulence factors and the expression of several genes responsible for antibiotic resistance even in the absence of drugs. PMID:28002481

  7. A Rapid Spin Column-Based Method to Enrich Pathogen Transcripts from Eukaryotic Host Cells Prior to Sequencing

    DOE PAGES

    Bent, Zachary W.; Poorey, Kunal; LaBauve, Annette E.; ...

    2016-12-21

    When analyzing pathogen transcriptomes during the infection of host cells, the signal-to-background (pathogen-to-host) ratio of nucleic acids (NA) in infected samples is very small. Despite the advancements in next-generation sequencing, the minute amount of pathogen NA makes standard RNA-seq library preps inadequate for effective gene-level analysis of the pathogen in cases with low bacterial loads. In order to provide a more complete picture of the pathogen transcriptome during an infection, we developed a novel pathogen enrichment technique, which can enrich for transcripts f