Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng
2012-01-01
To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
Immune-Related Transcriptome of Coptotermes formosanus Shiraki Workers: The Defense Mechanism
Hussain, Abid; Li, Yi-Feng; Cheng, Yu; Liu, Yang; Chen, Chuan-Cheng; Wen, Shuo-Yang
2013-01-01
Formosan subterranean termites, Coptotermes formosanus Shiraki, live socially in microbial-rich habitats. To understand the molecular mechanism by which termites combat pathogenic microbes, a full-length normalized cDNA library and four Suppression Subtractive Hybridization (SSH) libraries were constructed from termite workers infected with entomopathogenic fungi (Metarhizium anisopliae and Beauveria bassiana), Gram-positive Bacillus thuringiensis and Gram-negative Escherichia coli, and the libraries were analyzed. From the high quality normalized cDNA library, 439 immune-related sequences were identified. These sequences were categorized as pattern recognition receptors (47 sequences), signal modulators (52 sequences), signal transducers (137 sequences), effectors (39 sequences) and others (164 sequences). From the SSH libraries, 27, 17, 22 and 15 immune-related genes were identified from each SSH library treated with M. anisopliae, B. bassiana, B. thuringiensis and E. coli, respectively. When the normalized cDNA library was compared with the SSH libraries, 37 immune-related clusters were found in common; 56 clusters were identified in the SSH libraries, and 259 were identified in the normalized cDNA library. The immune-related gene expression pattern was further investigated using quantitative real time PCR (qPCR). Important immune-related genes were characterized, and their potential functions were discussed based on the integrated analysis of the results. We suggest that normalized cDNA and SSH libraries enable us to discover functional genes transcriptome. The results remarkably expand our knowledge about immune-inducible genes in C. formosanus Shiraki and enable the future development of novel control strategies for the management of Formosan subterranean termites. PMID:23874972
2011-01-01
Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean. PMID:22118559
Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu
2015-01-01
The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.
Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J
2007-06-01
As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Bricheux, G; Brugerolle, G
1997-08-01
The parasitic protozoan Trichomonas vaginalis is known to contain the ubiquitous and highly conserved protein actin. A genomic library and a cDNA library have been screened to identify and clone the actin gene(s) of T. vaginalis. The nucleotide sequence of one gene and its flanking regions have been determined. The open reading frame encodes a protein of 376 amino acids. The sequence is not interrupted by any introns and the promoter could be represented by a 10 bp motif close to a consensus motif also found upstream of most sequenced T. vaginalis genes. The five different clones isolated from the cDNA library have similar sequences and encode three actin proteins differing only by one or two amino acids. A phylogenetic analysis of 31 actin sequences by distance matrix and parsimony methods, using centractin as outgroup, gives congruent trees with Parabasala branching above Diplomonadida.
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Jangid, Kamlesh; Kao, Ming-Hung; Lahamge, Aishwarya; Williams, Mark A; Rathbun, Stephen L; Whitman, William B
2016-01-01
K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley's K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing.
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E
2005-12-02
cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.
Jangid, Kamlesh; Kao, Ming-Hung; Lahamge, Aishwarya; Williams, Mark A.; Rathbun, Stephen L.; Whitman, William B.
2016-01-01
K-shuff is a new algorithm for comparing the similarity of gene sequence libraries, providing measures of the structural and compositional diversity as well as the significance of the differences between these measures. Inspired by Ripley’s K-function for spatial point pattern analysis, the Intra K-function or IKF measures the structural diversity, including both the richness and overall similarity of the sequences, within a library. The Cross K-function or CKF measures the compositional diversity between gene libraries, reflecting both the number of OTUs shared as well as the overall similarity in OTUs. A Monte Carlo testing procedure then enables statistical evaluation of both the structural and compositional diversity between gene libraries. For 16S rRNA gene libraries from complex bacterial communities such as those found in seawater, salt marsh sediments, and soils, K-shuff yields reproducible estimates of structural and compositional diversity with libraries greater than 50 sequences. Similarly, for pyrosequencing libraries generated from a glacial retreat chronosequence and Illumina® libraries generated from US homes, K-shuff required >300 and 100 sequences per sample, respectively. Power analyses demonstrated that K-shuff is sensitive to small differences in Sanger or Illumina® libraries. This extra sensitivity of K-shuff enabled examination of compositional differences at much deeper taxonomic levels, such as within abundant OTUs. This is especially useful when comparing communities that are compositionally very similar but functionally different. K-shuff will therefore prove beneficial for conventional microbiome analysis as well as specific hypothesis testing. PMID:27911946
Sequence evaluation of four specific cDNA libraries for developmental genomics of sunflower.
Tamborindeguy, C; Ben, C; Liboz, T; Gentzbittel, L
2004-04-01
Four different cDNA libraries were constructed from sunflower protoplasts growing under embryogenic and non-embryogenic conditions: one standard library from each condition and two subtractive libraries in opposite sense. A total of 22,876 cDNA clones were obtained and 4800 ESTs were sequenced, giving rise to 2479 high quality ESTs representing an unigene set of 1502 sequences. This set was compared with ESTs represented in public databases using the programs BLASTN and BLASTX, and its members were classified according to putative function using the catalog in the Kyoto Encyclopedia of Genes and Genomes (KEGG). Some 33% of sequences failed to align with existing plant ESTs and therefore represent putative novel genes. The libraries show a low level of redundancy and, on average, 50% of the present ESTs have not been previously reported for sunflower. Several potentially interesting genes were identified, based on their homology with genes involved in animal zygotic division or plant embryogenesis. We also identified two ESTs that show significantly different levels of expression under embryogenic and non-embryogenic conditions. The libraries described here represent an original and valuable resource for the discovery of yet unknown genes putatively involved in dicot embryogenesis and improving our knowledge of the mechanisms involved in polarity acquisition by plant embryos.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gihring, Thomas; Green, Stefan; Schadt, Christopher Warren
2011-01-01
Technologies for massively parallel sequencing are revolutionizing microbial ecology and are vastly increasing the scale of ribosomal RNA (rRNA) gene studies. Although pyrosequencing has increased the breadth and depth of possible rRNA gene sampling, one drawback is that the number of reads obtained per sample is difficult to control. Pyrosequencing libraries typically vary widely in the number of sequences per sample, even within individual studies, and there is a need to revisit the behaviour of richness estimators and diversity indices with variable gene sequence library sizes. Multiple reports and review papers have demonstrated the bias in non-parametric richness estimators (e.g.more » Chao1 and ACE) and diversity indices when using clone libraries. However, we found that biased community comparisons are accumulating in the literature. Here we demonstrate the effects of sample size on Chao1, ACE, CatchAll, Shannon, Chao-Shen and Simpson's estimations specifically using pyrosequencing libraries. The need to equalize the number of reads being compared across libraries is reiterated, and investigators are directed towards available tools for making unbiased diversity comparisons.« less
Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine
2002-06-15
To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.
Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D
2010-04-01
We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.
Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.
2010-01-01
Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028
The Essential Genome of Escherichia coli K-12
2018-01-01
ABSTRACT Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. PMID:29463657
Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry
2007-01-01
Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146
Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K
2008-01-01
Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species. Conclusion The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546. PMID:18248674
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries
Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P
2008-01-01
Background Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. Results We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. Conclusion EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects. PMID:18402700
EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries.
Smith, Robin P; Buchser, William J; Lemmon, Marcus B; Pardinas, Jose R; Bixby, John L; Lemmon, Vance P
2008-04-10
Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds. We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization. EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.
Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y
2004-05-01
Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.
Tomazetto, Geizecler; Wibberg, Daniel; Schlüter, Andreas; Oliveira, Valéria M
2015-01-01
A fosmid metagenomic library was constructed with total community DNA obtained from a municipal wastewater treatment plant (MWWTP), with the aim of identifying new FeFe-hydrogenase genes encoding the enzymes most important for hydrogen metabolism. The dataset generated by pyrosequencing of a fosmid library was mined to identify environmental gene tags (EGTs) assigned to FeFe-hydrogenase. The majority of EGTs representing FeFe-hydrogenase genes were affiliated with the class Clostridia, suggesting that this group is the main hydrogen producer in the MWWTP analyzed. Based on assembled sequences, three FeFe-hydrogenase genes were predicted based on detection of the L2 motif (MPCxxKxxE) in the encoded gene product, confirming true FeFe-hydrogenase sequences. These sequences were used to design specific primers to detect fosmids encoding FeFe-hydrogenase genes predicted from the dataset. Three identified fosmids were completely sequenced. The cloned genomic fragments within these fosmids are closely related to members of the Spirochaetaceae, Bacteroidales and Firmicutes, and their FeFe-hydrogenase sequences are characterized by the structure type M3, which is common to clostridial enzymes. FeFe-hydrogenase sequences found in this study represent hitherto undetected sequences, indicating the high genetic diversity regarding these enzymes in MWWTP. Results suggest that MWWTP have to be considered as reservoirs for new FeFe-hydrogenase genes. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Kamatuka, Kenta; Hattori, Masahiro; Sugiyama, Tomoyasu
2016-12-01
RNA interference (RNAi) screening is extensively used in the field of reverse genetics. RNAi libraries constructed using random oligonucleotides have made this technology affordable. However, the new methodology requires exploration of the RNAi target gene information after screening because the RNAi library includes non-natural sequences that are not found in genes. Here, we developed a web-based tool to support RNAi screening. The system performs short hairpin RNA (shRNA) target prediction that is informed by comprehensive enquiry (SPICE). SPICE automates several tasks that are laborious but indispensable to evaluate the shRNAs obtained by RNAi screening. SPICE has four main functions: (i) sequence identification of shRNA in the input sequence (the sequence might be obtained by sequencing clones in the RNAi library), (ii) searching the target genes in the database, (iii) demonstrating biological information obtained from the database, and (iv) preparation of search result files that can be utilized in a local personal computer (PC). Using this system, we demonstrated that genes targeted by random oligonucleotide-derived shRNAs were not different from those targeted by organism-specific shRNA. The system facilitates RNAi screening, which requires sequence analysis after screening. The SPICE web application is available at http://www.spice.sugysun.org/.
Designing oligo libraries taking alternative splicing into account
NASA Astrophysics Data System (ADS)
Shoshan, Avi; Grebinskiy, Vladimir; Magen, Avner; Scolnicov, Ariel; Fink, Eyal; Lehavi, David; Wasserman, Alon
2001-06-01
We have designed sequences for DNA microarrays and oligo libraries, taking alternative splicing into account. Alternative splicing is a common phenomenon, occurring in more than 25% of the human genes. In many cases, different splice variants have different functions, are expressed in different tissues or may indicate different stages of disease. When designing sequences for DNA microarrays or oligo libraries, it is very important to take into account the sequence information of all the mRNA transcripts. Therefore, when a gene has more than one transcript (as a result of alternative splicing, alternative promoter sites or alternative poly-adenylation sites), it is very important to take all of them into account in the design. We have used the LEADS transcriptome prediction system to cluster and assemble the human sequences in GenBank and design optimal oligonucleotides for all the human genes with a known mRNA sequence based on the LEADS predictions.
Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu
2007-01-01
Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061
Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee
2015-09-21
Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
2010-01-01
Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self-BLAST function within SSHdb grouped redundant clones together and illustrated that the SSHscreen plots are a useful tool for choosing anonymous clones for sequencing, since redundant clones cluster together on the enrichment ratio plots. Conclusions We developed the SSHscreen-SSHdb software pipeline, which greatly facilitates gene discovery using suppression subtractive hybridization by improving the selection of clones for sequencing after screening the library on a small number of microarrays. Annotation of the sequence information and collaboration was further enhanced through a web-based SSHdb database, and we illustrated this through identification of drought responsive genes from cowpea, which can now be investigated in gene function studies. SSH is a popular and powerful gene discovery tool, and therefore this pipeline will have application for gene discovery in any biological system, particularly non-model organisms. SSHscreen 2.0.1 and a link to SSHdb are available from http://microarray.up.ac.za/SSHscreen. PMID:20359330
Jiang, Likun; You, Weiwei; Zhang, Xiaojun; Xu, Jian; Jiang, Yanliang; Wang, Kai; Zhao, Zixia; Chen, Baohua; Zhao, Yunfeng; Mahboob, Shahid; Al-Ghanim, Khalid A; Ke, Caihuan; Xu, Peng
2016-02-01
The small abalone (Haliotis diversicolor) is one of the most important aquaculture species in East Asia. To facilitate gene cloning and characterization, genome analysis, and genetic breeding of it, we constructed a large-insert bacterial artificial chromosome (BAC) library, which is an important genetic tool for advanced genetics and genomics research. The small abalone BAC library includes 92,610 clones with an average insert size of 120 Kb, equivalent to approximately 7.6× of the small abalone genome. We set up three-dimensional pools and super pools of 18,432 BAC clones for target gene screening using PCR method. To assess the approach, we screened 12 target genes in these 18,432 BAC clones and identified 16 positive BAC clones. Eight positive BAC clones were then sequenced and assembled with the next generation sequencing platform. The assembled contigs representing these 8 BAC clones spanned 928 Kb of the small abalone genome, providing the first batch of genome sequences for genome evaluation and characterization. The average GC content of small abalone genome was estimated as 40.33%. A total of 21 protein-coding genes, including 7 target genes, were annotated into the 8 BACs, which proved the feasibility of PCR screening approach with three-dimensional pools in small abalone BAC library. One hundred fifty microsatellite loci were also identified from the sequences for marker development in the future. The BAC library and clone pools provided valuable resources and tools for genetic breeding and conservation of H. diversicolor.
2010-01-01
Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Kim, Heon Seok; Lee, Kyungjin; Bae, Sangsu; Park, Jeongbin; Lee, Chong-Kyo; Kim, Meehyein; Kim, Eunji; Kim, Minju; Kim, Seokjoong; Kim, Chonsaeng; Kim, Jin-Soo
2017-06-23
Several groups have used genome-wide libraries of lentiviruses encoding small guide RNAs (sgRNAs) for genetic screens. In most cases, sgRNA expression cassettes are integrated into cells by using lentiviruses, and target genes are statistically estimated by the readout of sgRNA sequences after targeted sequencing. We present a new virus-free method for human gene knockout screens using a genome-wide library of CRISPR/Cas9 sgRNAs based on plasmids and target gene identification via whole-genome sequencing (WGS) confirmation of authentic mutations rather than statistical estimation through targeted amplicon sequencing. We used 30,840 pairs of individually synthesized oligonucleotides to construct the genome-scale sgRNA library, collectively targeting 10,280 human genes ( i.e. three sgRNAs per gene). These plasmid libraries were co-transfected with a Cas9-expression plasmid into human cells, which were then treated with cytotoxic drugs or viruses. Only cells lacking key factors essential for cytotoxic drug metabolism or viral infection were able to survive. Genomic DNA isolated from cells that survived these challenges was subjected to WGS to directly identify CRISPR/Cas9-mediated causal mutations essential for cell survival. With this approach, we were able to identify known and novel genes essential for viral infection in human cells. We propose that genome-wide sgRNA screens based on plasmids coupled with WGS are powerful tools for forward genetics studies and drug target discovery. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stapleton, Mark; Liao, Guochun; Brokstein, Peter
2002-08-12
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5prime expressed sequence tags (EST) from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to {approx}40 percent of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remainingmore » genes, we have generated an additional 157,835 5prime ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22hr embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70 percent of the predicted genes in Drosophila.« less
Senthilkumar, Palanisamy; Thirugnanasambantham, Krishnaraj; Mandal, Abul Kalam Azad
2012-12-01
Tea (Camellia sinensis (L.) O. Kuntze) is an economically important plant cultivated for its leaves. Infection of Pestalotiopsis theae in leaves causes gray blight disease and enormous loss to the tea industry. We used suppressive subtractive hybridization (SSH) technique to unravel the differential gene expression pattern during gray blight disease development in tea. Complementary DNA from P. theae-infected and uninfected leaves of disease tolerant cultivar UPASI-10 was used as tester and driver populations respectively. Subtraction efficiency was confirmed by comparing abundance of β-actin gene. A total of 377 and 720 clones with insert size >250 bp from forward and reverse library respectively were sequenced and analyzed. Basic Local Alignment Search Tool analysis revealed 17 sequences in forward SSH library have high degree of similarity with disease and hypersensitive response related genes and 20 sequences with hypothetical proteins while in reverse SSH library, 23 sequences have high degree of similarity with disease and stress response-related genes and 15 sequences with hypothetical proteins. Functional analysis indicated unknown (61 and 59 %) or hypothetical functions (23 and 18 %) for most of the differentially regulated genes in forward and reverse SSH library, respectively, while others have important role in different cellular activities. Majority of the upregulated genes are related to hypersensitive response and reactive oxygen species production. Based on these expressed sequence tag data, putative role of differentially expressed genes were discussed in relation to disease. We also demonstrated the efficiency of SSH as a tool in enriching gray blight disease related up- and downregulated genes in tea. The present study revealed that many genes related to disease resistance were suppressed during P. theae infection and enhancing these genes by the application of inducers may impart better disease tolerance to the plants.
Rojas-Cartagena, Carmencita; Ortíz-Pineda, Pablo; Ramírez-Gómez, Francisco; Suárez-Castillo, Edna C.; Matos-Cruz, Vanessa; Rodríguez, Carlos; Ortíz-Zuazaga, Humberto; García-Arrarás, José E.
2010-01-01
Repair and regeneration are key processes for tissue maintenance, and their disruption may lead to disease states. Little is known about the molecular mechanisms that underline the repair and regeneration of the digestive tract. The sea cucumber Holothuria glaberrima represents an excellent model to dissect and characterize the molecular events during intestinal regeneration. To study the gene expression profile, cDNA libraries were constructed from normal, 3-day, and 7-day regenerating intestines of H. glaberrima. Clones were randomly sequenced and queried against the nonredundant protein database at the National Center for Biotechnology Information. RT-PCR analyses were made of several genes to determine their expression profile during intestinal regeneration. A total of 5,173 sequences from three cDNA libraries were obtained. About 46.2, 35.6, and 26.2% of the sequences for the normal, 3-days, and 7-days cDNA libraries, respectively, shared significant similarity with known sequences in the protein database of GenBank but only present 10% of similarity among them. Analysis of the libraries in terms of functional processes, protein domains, and most common sequences suggests that a differential expression profile is taking place during the regeneration process. Further examination of the expressed sequence tag dataset revealed that 12 putative genes are differentially expressed at significant level (R > 6). Experimental validation by RT-PCR analysis reveals that at least three genes (unknown C-4677-1, melanotransferrin, and centaurin) present a differential expression during regeneration. These findings strongly suggest that the gene expression profile varies among regeneration stages and provide evidence for the existence of differential gene expression. PMID:17579180
Sequence analysis of 16S rRNA gene clone libraries is a popular tool used to describe the composition of natural microbial communities. Commonly, clone libraries are developed by direct cloning of 16S rRNA gene PCR products. Different primers are often employed in the initial amp...
Sequence analysis of 16S rRNA gene clone libraries is a popular tool used to describe the composition of natural microbial communities. Commonly, clone libraries are developed by direct cloning of 16S rRNA gene PCR products. Different primers are often employed in the initial amp...
Expressed sequence tag analysis of guinea pig (Cavia porcellus) eye tissues for NEIBank
Simpanya, Mukoma F.; Wistow, Graeme; Gao, James; David, Larry L.; Giblin, Frank J.
2008-01-01
Purpose To characterize gene expression patterns in guinea pig ocular tissues and identify orthologs of human genes from NEIBank expressed sequence tags. Methods RNA was extracted from dissected eye tissues of 2.5-month-old guinea pigs to make three unamplified and unnormalized cDNA libraries in the pCMVSport-6 vector for the lens, retina, and eye minus lens and retina. Over 4,000 clones were sequenced from each library and were analyzed using GRIST for clustering and gene identification. Lens crystallin EST data were validated using two-dimensional electrophoresis (2-DE), matrix assisted laser desorption (MALDI), and electrospray ionization mass spectrometry (ESIMS). Results Combined data from the three libraries generated a total of 6,694 distinctive gene clusters, with each library having between 1,000 and 3,000 clusters. Approximately 60% of the total gene clusters were novel cDNA sequences and had significant homologies to other mammalian sequences in GenBank. Complete cDNA sequences were obtained for many guinea pig lens proteins, including αA/αAinsert-, γN-, and γS-crystallins, lengsin and GRIFIN. The ratio of αA- to αB-crystallin on 2-DE gels was 8: 1 in the lens nucleus and 6.5: 1 in the cortex. Analysis of ESTs, genome sequence, and proteins (by MALDI), did not reveal any evidence for the presence of γD-, γE-, and γF-crystallin in the guinea pig. Predicted masses of many guinea pig lens crystallins were confirmed by ESIMS analysis. For the retina, orthologs of human phototransduction genes were found, such as Rhodopsin, S-antigen (Sag, Arrestin), and Transducin. The guinea-pig ortholog of NRL, a key rod photoreceptor-specific transcription factor, was also represented in EST data. In the ‘rest-of-eye’ library, the most abundant transcripts included decorin and keratin 12, representative of the cornea. Conclusions Genomic analysis of guinea pig eye tissues provides sequence-verified clones for future studies. Guinea pig orthologs of many human eye specific genes were identified. Guinea pig gene structures were similar to their human and rodent gene counterparts. Surprisingly, no orthologs of γD-, γE-, and γF-crystallin were found in EST, proteomic, or the current guinea pig genome data. PMID:19104676
A part toolbox to tune genetic expression in Bacillus subtilis
Guiziou, Sarah; Sauveplane, Vincent; Chang, Hung-Ju; Clerté, Caroline; Declerck, Nathalie; Jules, Matthieu; Bonnet, Jerome
2016-01-01
Libraries of well-characterised components regulating gene expression levels are essential to many synthetic biology applications. While widely available for the Gram-negative model bacterium Escherichia coli, such libraries are lacking for the Gram-positive model Bacillus subtilis, a key organism for basic research and biotechnological applications. Here, we engineered a genetic toolbox comprising libraries of promoters, Ribosome Binding Sites (RBS), and protein degradation tags to precisely tune gene expression in B. subtilis. We first designed a modular Expression Operating Unit (EOU) facilitating parts assembly and modifications and providing a standard genetic context for gene circuits implementation. We then selected native, constitutive promoters of B. subtilis and efficient RBS sequences from which we engineered three promoters and three RBS sequence libraries exhibiting ∼14 000-fold dynamic range in gene expression levels. We also designed a collection of SsrA proteolysis tags of variable strength. Finally, by using fluorescence fluctuation methods coupled with two-photon microscopy, we quantified the absolute concentration of GFP in a subset of strains from the library. Our complete promoters and RBS sequences library comprising over 135 constructs enables tuning of GFP concentration over five orders of magnitude, from 0.05 to 700 μM. This toolbox of regulatory components will support many research and engineering applications in B. subtilis. PMID:27402159
Steinberg, Lisa M; Regan, John M
2008-11-01
Methanogens play a critical role in the decomposition of organics under anaerobic conditions. The methanogenic consortia in saturated wetland soils are often subjected to large temperature fluctuations and acidic conditions, imposing a selective pressure for psychro- and acidotolerant community members; however, methanogenic communities in engineered digesters are frequently maintained within a narrow range of mesophilic and circumneutral conditions to retain system stability. To investigate the hypothesis that these two disparate environments have distinct methanogenic communities, the methanogens in an oligotrophic acidic fen and a mesophilic anaerobic digester treating municipal wastewater sludge were characterized by creating clone libraries for the 16S rRNA and methyl coenzyme M reductase alpha subunit (mcrA) genes. A quantitative framework was developed to assess the differences between these two communities by calculating the average sequence similarity for 16S rRNA genes and mcrA within a genus and family using sequences of isolated and characterized methanogens within the approved methanogen taxonomy. The average sequence similarities for 16S rRNA genes within a genus and family were 96.0 and 93.5%, respectively, and the average sequence similarities for mcrA within a genus and family were 88.9 and 79%, respectively. The clone libraries of the bog and digester environments showed no overlap at the species level and almost no overlap at the family level. Both libraries were dominated by clones related to uncultured methanogen groups within the Methanomicrobiales, although members of the Methanosarcinales and Methanobacteriales were also found in both libraries. Diversity indices for the 16S rRNA gene library of the bog and both mcrA libraries were similar, but these indices indicated much lower diversity in the 16S digester library than in the other three libraries.
Defining the ABC of gene essentiality in streptococci.
Charbonneau, Amelia R L; Forman, Oliver P; Cain, Amy K; Newland, Graham; Robinson, Carl; Boursnell, Mike; Parkhill, Julian; Leigh, James A; Maskell, Duncan J; Waller, Andrew S
2017-05-31
Utilising next generation sequencing to interrogate saturated bacterial mutant libraries provides unprecedented information for the assignment of genome-wide gene essentiality. Exposure of saturated mutant libraries to specific conditions and subsequent sequencing can be exploited to uncover gene essentiality relevant to the condition. Here we present a barcoded transposon directed insertion-site sequencing (TraDIS) system to define an essential gene list for Streptococcus equi subsp. equi, the causative agent of strangles in horses, for the first time. The gene essentiality data for this group C Streptococcus was compared to that of group A and B streptococci. Six barcoded variants of pGh9:ISS1 were designed and used to generate mutant libraries containing between 33,000-66,000 unique mutants. TraDIS was performed on DNA extracted from each library and data were analysed separately and as a combined master pool. Gene essentiality determined that 19.5% of the S. equi genome was essential. Gene essentialities were compared to those of group A and group B streptococci, identifying concordances of 90.2% and 89.4%, respectively and an overall concordance of 83.7% between the three species. The use of barcoded pGh9:ISS1 to generate mutant libraries provides a highly useful tool for the assignment of gene function in S. equi and other streptococci. The shared essential gene set of group A, B and C streptococci provides further evidence of the close genetic relationships between these important pathogenic bacteria. Therefore, the ABC of gene essentiality reported here provides a solid foundation towards reporting the functional genome of streptococci.
Dokarry, Melissa; Laurendon, Caroline; O'Maille, Paul E
2012-01-01
Structure-based combinatorial protein engineering (SCOPE) is a homology-independent recombination method to create multiple crossover gene libraries by assembling defined combinations of structural elements ranging from single mutations to domains of protein structure. SCOPE was originally inspired by DNA shuffling, which mimics recombination during meiosis, where mutations from parental genes are "shuffled" to create novel combinations in the resulting progeny. DNA shuffling utilizes sequence identity between parental genes to mediate template-switching events (the annealing and extension of one parental gene fragment on another) in PCR reassembly reactions to generate crossovers and hence recombination between parental genes. In light of the conservation of protein structure and degeneracy of sequence, SCOPE was developed to enable the "shuffling" of distantly related genes with no requirement for sequence identity. The central principle involves the use of oligonucleotides to encode for crossover regions to choreograph template-switching events during PCR assembly of gene fragments to create chimeric genes. This approach was initially developed to create libraries of hybrid DNA polymerases from distantly related parents, and later developed to create a combinatorial mutant library of sesquiterpene synthases to explore the catalytic landscapes underlying the functional divergence of related enzymes. This chapter presents a simplified protocol of SCOPE that can be integrated with different mutagenesis techniques and is suitable for automation by liquid-handling robots. Two examples are presented to illustrate the application of SCOPE to create gene libraries using plant sesquiterpene synthases as the model system. In the first example, we outline how to create an active-site library as a series of complex mixtures of diverse mutants. In the second example, we outline how to create a focused library as an array of individual clones to distil minimal combinations of functionally important mutations. Through these examples, the principles of the technique are illustrated and the suitability of automating various aspects of the procedure for given applications are discussed. Copyright © 2012 Elsevier Inc. All rights reserved.
Lam, Kathy N; Charles, Trevor C
2015-01-01
Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite widespread use of E. coli to propagate foreign DNA in metagenomic libraries, the effects of in vivo transcriptional activity on clone stability are not well understood. Further work is required to tease apart the effects of transcription from those of gene product toxicity.
Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K
2017-07-01
Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
Gorodkin, Jan; Cirera, Susanna; Hedegaard, Jakob; Gilchrist, Michael J; Panitz, Frank; Jørgensen, Claus; Scheibye-Knudsen, Karsten; Arvin, Troels; Lumholdt, Steen; Sawera, Milena; Green, Trine; Nielsen, Bente J; Havgaard, Jakob H; Rosenkilde, Carina; Wang, Jun; Li, Heng; Li, Ruiqiang; Liu, Bin; Hu, Songnian; Dong, Wei; Li, Wei; Yu, Jun; Wang, Jian; Stærfeldt, Hans-Henrik; Wernersson, Rasmus; Madsen, Lone B; Thomsen, Bo; Hornshøj, Henrik; Bujie, Zhan; Wang, Xuegang; Wang, Xuefei; Bolund, Lars; Brunak, Søren; Yang, Huanming; Bendixen, Christian; Fredholm, Merete
2007-01-01
Background Knowledge of the structure of gene expression is essential for mammalian transcriptomics research. We analyzed a collection of more than one million porcine expressed sequence tags (ESTs), of which two-thirds were generated in the Sino-Danish Pig Genome Project and one-third are from public databases. The Sino-Danish ESTs were generated from one normalized and 97 non-normalized cDNA libraries representing 35 different tissues and three developmental stages. Results Using the Distiller package, the ESTs were assembled to roughly 48,000 contigs and 73,000 singletons, of which approximately 25% have a high confidence match to UniProt. Approximately 6,000 new porcine gene clusters were identified. Expression analysis based on the non-normalized libraries resulted in the following findings. The distribution of cluster sizes is scaling invariant. Brain and testes are among the tissues with the greatest number of different expressed genes, whereas tissues with more specialized function, such as developing liver, have fewer expressed genes. There are at least 65 high confidence housekeeping gene candidates and 876 cDNA library-specific gene candidates. We identified differential expression of genes between different tissues, in particular brain/spinal cord, and found patterns of correlation between genes that share expression in pairs of libraries. Finally, there was remarkable agreement in expression between specialized tissues according to Gene Ontology categories. Conclusion This EST collection, the largest to date in pig, represents an essential resource for annotation, comparative genomics, assembly of the pig genome sequence, and further porcine transcription studies. PMID:17407547
2011-01-01
Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak. PMID:21645357
Diversity and function in microbial mats from the Lucky Strike hydrothermal vent field.
Crépeau, Valentin; Cambon Bonavita, Marie-Anne; Lesongeur, Françoise; Randrianalivelo, Henintsoa; Sarradin, Pierre-Marie; Sarrazin, Jozée; Godfroy, Anne
2011-06-01
Diversity and function in microbial mats from the Lucky Strike hydrothermal vent field (Mid-Atlantic Ridge) were investigated using molecular approaches. DNA and RNA were extracted from mat samples overlaying hydrothermal deposits and Bathymodiolus azoricus mussel assemblages. We constructed and analyzed libraries of 16S rRNA gene sequences and sequences of functional genes involved in autotrophic carbon fixation [forms I and II RuBisCO (cbbL/M), ATP-citrate lyase B (aclB)]; methane oxidation [particulate methane monooxygenase (pmoA)] and sulfur oxidation [adenosine-5'-phosphosulfate reductase (aprA) and soxB]. To gain new insights into the relationships between mats and mussels, we also used new domain-specific 16S rRNA gene primers targeting Bathymodiolus sp. symbionts. All identified archaeal sequences were affiliated with a single group: the marine group 1 Thaumarchaeota. In contrast, analyses of bacterial sequences revealed much higher diversity, although two phyla Proteobacteria and Bacteroidetes were largely dominant. The 16S rRNA gene sequence library revealed that species affiliated to Beggiatoa Gammaproteobacteria were the dominant active population. Analyses of DNA and RNA functional gene libraries revealed a diverse and active chemolithoautotrophic population. Most of these sequences were affiliated with Gammaproteobacteria, including hydrothermal fauna symbionts, Thiotrichales and Methylococcales. PCR and reverse transcription-PCR using 16S rRNA gene primers targeted to Bathymodiolus sp. symbionts revealed sequences affiliated with both methanotrophic and thiotrophic endosymbionts. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Dominant genetics using a yeast genomic library under the control of a strong inducible promoter.
Ramer, S W; Elledge, S J; Davis, R W
1992-12-01
In Saccharomyces cerevisiae, numerous genes have been identified by selection from high-copy-number libraries based on "multicopy suppression" or other phenotypic consequences of overexpression. Although fruitful, this approach suffers from two major drawbacks. First, high copy number alone may not permit high-level expression of tightly regulated genes. Conversely, other genes expressed in proportion to dosage cannot be identified if their products are toxic at elevated levels. This work reports construction of a genomic DNA expression library for S. cerevisiae that circumvents both limitations by fusing randomly sheared genomic DNA to the strong, inducible yeast GAL1 promoter, which can be regulated by carbon source. The library obtained contains 5 x 10(7) independent recombinants, representing a breakpoint at every base in the yeast genome. This library was used to examine aberrant gene expression in S. cerevisiae. A screen for dominant activators of yeast mating response identified eight genes that activate the pathway in the absence of exogenous mating pheromone, including one previously unidentified gene. One activator was a truncated STE11 gene lacking approximately 1000 base pairs of amino-terminal coding sequence. In two different clones, the same GAL1 promoter-proximal ATG is in-frame with the coding sequence of STE11, suggesting that internal initiation of translation there results in production of a biologically active, truncated STE11 protein. Thus this library allows isolation based on dominant phenotypes of genes that might have been difficult or impossible to isolate from high-copy-number libraries.
Keshri, Jitendra; Mishra, Avinash; Jha, Bhavanath
2013-03-30
Population indices of bacteria and archaea were investigated from saline-alkaline soil and a possible microbe-environment pattern was established using gene targeted metagenomics. Clone libraries were constructed using 16S rRNA and functional gene(s) involved in carbon fixation (cbbL), nitrogen fixation (nifH), ammonia oxidation (amoA) and sulfur metabolism (apsA). Molecular phylogeny revealed the dominance of Actinobacteria, Firmicutes and Proteobacteria along with archaeal members of Halobacteraceae. The library consisted of novel bacterial (20%) and archaeal (38%) genera showing ≤95% similarity to previously retrieved sequences. Phylogenetic analysis indicated ability of inhabitant to survive in stress condition. The 16S rRNA gene libraries contained novel gene sequences and were distantly homologous with cultured bacteria. Functional gene libraries were found unique and most of the clones were distantly related to Proteobacteria, while clones of nifH gene library also showed homology with Cyanobacteria and Firmicutes. Quantitative real-time PCR exhibited that bacterial abundance was two orders of magnitude higher than archaeal. The gene(s) quantification indicated the size of the functional guilds harboring relevant key genes. The study provides insights on microbial ecology and different metabolic interactions occurring in saline-alkaline soil, possessing phylogenetically diverse groups of bacteria and archaea, which may be explored further for gene cataloging and metabolic profiling. Copyright © 2012 Elsevier GmbH. All rights reserved.
White, K Makay; Matthews, Melinda K; Hughes, Rachel C; Sommer, Andrew J; Griffitts, Joel S; Newell, Peter D; Chaston, John M
2018-03-28
A metagenome wide association (MGWA) study of bacterial host association determinants in Drosophila predicted that LPS biosynthesis genes are significantly associated with host colonization. We were unable to create site-directed mutants for each of the predicted genes in Acetobacter , so we created an arrayed transposon insertion library using Acetobacter fabarum DsW_054 isolated from Drosophila Creation of the A. fabarum DsW_054 gene knock-out library was performed by combinatorial mapping and Illumina sequencing of random transposon insertion mutants. Transposon insertion locations for 6,418 mutants were successfully mapped, including hits within 63% of annotated genes in the A. fabarum DsW_054 genome. For 45/45 members of the library, insertion sites were verified by arbitrary PCR and Sanger sequencing. Mutants with insertions in four different LPS biosynthesis genes were selected from the library to validate the MGWA predictions. Insertion mutations in two genes biosynthetically upstream of Lipid-A formation, lpxC and lpxB , show significant differences in host association, whereas mutations in two genes encoding LPS biosynthesis functions downstream of Lipid-A biosynthesis had no effect. These results suggest an impact of bacterial cell surface molecules on the bacterial capacity for host association. Also, the transposon insertion mutant library will be a useful resource for ongoing research on the genetic basis for Acetobacter traits. Copyright © 2018 White et al.
Oesterle, Sabine; Gerngross, Daniel; Schmitt, Steven; Roberts, Tania Michelle; Panke, Sven
2017-09-26
Multiplexed gene expression optimization via modulation of gene translation efficiency through ribosome binding site (RBS) engineering is a valuable approach for optimizing artificial properties in bacteria, ranging from genetic circuits to production pathways. Established algorithms design smart RBS-libraries based on a single partially-degenerate sequence that efficiently samples the entire space of translation initiation rates. However, the sequence space that is accessible when integrating the library by CRISPR/Cas9-based genome editing is severely restricted by DNA mismatch repair (MMR) systems. MMR efficiency depends on the type and length of the mismatch and thus effectively removes potential library members from the pool. Rather than working in MMR-deficient strains, which accumulate off-target mutations, or depending on temporary MMR inactivation, which requires additional steps, we eliminate this limitation by developing a pre-selection rule of genome-library-optimized-sequences (GLOS) that enables introducing large functional diversity into MMR-proficient strains with sequences that are no longer subject to MMR-processing. We implement several GLOS-libraries in Escherichia coli and show that GLOS-libraries indeed retain diversity during genome editing and that such libraries can be used in complex genome editing operations such as concomitant deletions. We argue that this approach allows for stable and efficient fine tuning of chromosomal functions with minimal effort.
Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.
Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S
2007-05-01
Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.
The Essential Genome of Escherichia coli K-12.
Goodall, Emily C A; Robinson, Ashley; Johnston, Iain G; Jabbari, Sara; Turner, Keith A; Cunningham, Adam F; Lund, Peter A; Cole, Jeffrey A; Henderson, Ian R
2018-02-20
Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli , we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli , reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data. Copyright © 2018 Goodall et al.
Massa, Sónia I; Pearson, Gareth A; Aires, Tânia; Kube, Michael; Olsen, Jeanine L; Reinhardt, Richard; Serrão, Ester A; Arnaud-Haond, Sophie
2011-09-01
Predicted global climate change threatens the distributional ranges of species worldwide. We identified genes expressed in the intertidal seagrass Zostera noltii during recovery from a simulated low tide heat-shock exposure. Five Expressed Sequence Tag (EST) libraries were compared, corresponding to four recovery times following sub-lethal temperature stress, and a non-stressed control. We sequenced and analyzed 7009 sequence reads from 30min, 2h, 4h and 24h after the beginning of the heat-shock (AHS), and 1585 from the control library, for a total of 8594 sequence reads. Among 51 Tentative UniGenes (TUGs) exhibiting significantly different expression between libraries, 19 (37.3%) were identified as 'molecular chaperones' and were over-expressed following heat-shock, while 12 (23.5%) were 'photosynthesis TUGs' generally under-expressed in heat-shocked plants. A time course analysis of expression showed a rapid increase in expression of the molecular chaperone class, most of which were heat-shock proteins; which increased from 2 sequence reads in the control library to almost 230 in the 30min AHS library, followed by a slow decrease during further recovery. In contrast, 'photosynthesis TUGs' were under-expressed 30min AHS compared with the control library, and declined progressively with recovery time in the stress libraries, with a total of 29 sequence reads 24h AHS, compared with 125 in the control. A total of 4734 TUGs were screened for EST-Single Sequence Repeats (EST-SSRs) and 86 microsatellites were identified. Copyright © 2011 Elsevier B.V. All rights reserved.
Gong, Qian; Li, Chang-ying; Chang, Ji-wu; Zhu, Tie-hong
2012-06-01
To screen monoclonal antibodies to amylin from a constructed human phage antibody library and identify their antigenic specificity and combining activities. The heavy chain Fd fragment and light chain of human immunoglobulin genes were amplified from peripheral blood lymphocytes of healthy donors using RT-PCR, and then inserted into phagemid pComb3XSS to generate a human phage antibody library. The insertion of light chain or heavy chain Fd genes were identified by PCR after the digestion of Sac I, Xba I, Xho Iand Spe I. One of positive clones was analyzed by DNA sequencing. The specific anti-amylin clones were screened from antibody library against human amylin antigens and then the positive clones were determined by Phage-ELISA analysis. A Fab phage antibody library with 0.8×10(8); members was constructed with the efficacy of about 70%. DNA sequence analysis indicated V(H); gene belonged to V(H);3 gene family and V(λ); gene belonged to the V(λ); gene family. Using human amylin as panning antigen, specific anti-amylin Fab antibodies were enriched by screening the library for three times. Phage-ELISA assay showed the positive clones had very good specificity to amylin antigen. The successful construction of a phage antibody library and the identification of anti-amylin Fab antibodies provide a basis for further study and preparation of human anti-amylin antibodies.
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
The bacterial composition of chlorinated drinking water was analyzed using 16S rRNA gene clone libraries derived from DNA extracts of 12 samples and compared to clone libraries previously generated using RNA extracts from the same samples. Phylogenetic analysis of 761 DNA-based ...
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Comparative Analysis of Expressed Genes from Cacao Meristems Infected by Moniliophthora perniciosa
Gesteira, Abelmon S.; Micheli, Fabienne; Carels, Nicolas; Da Silva, Aline C.; Gramacho, Karina P.; Schuster, Ivan; Macêdo, Joci N.; Pereira, Gonçalo A. G.; Cascardo, Júlio C. M.
2007-01-01
Background and Aims Witches' broom disease is caused by the hemibiotrophic basidiomycete Moniliophthora perniciosa, and is one of the most important diseases of cacao in the western hemisphere. Because very little is known about the global process of such disease development, expressed sequence tags (ESTs) were used to identify genes expressed during the Theobroma cacao–Moniliophthora perniciosa interaction. Methods Two cDNA libraries corresponding to the resistant (RT) and susceptible (SP) cacao–M. perniciosa interactions were constructed from total RNA, using the DB SMART Creator cDNA library kit (Clontech). Clones were randomly selected, sequenced from the 5′ end and analysed using bioinformatics tools including in silico analysis of the differential gene expression. Key Results A total of 6884 ESTs were generated from the RT and SP cDNA libraries. These ESTs were composed of 2585 singlets and 341 contigs for a total of 2926 non-redundant sequences. The redundancy of the libraries was low and their specificity high when compared with the few other cacao libraries already published. Sequence analysis allowed the assignment of a putative functional category for 54 % of sequences, whereas approx. 22 % of sequences corresponded to unknown function and approx. 24 % of sequences did not show any significant similarity with other proteins present in the database. Despite the similar overall distribution of the sequences in functional categories between the two libraries, qualitative differences were observed. Genes involved during the defence response to pathogen infection or in programmed cell death were identified, such as pathogenesis related-proteins, trypsin inhibitor or oxalate oxidase, and some of them showed an in silico differential expression between the resistant and the susceptible interactions. Conclusions As far as is known this is the first EST resource from the cacao–M. perniciosa interaction and it is believed that it will provide a significant contribution to the understanding of the molecular mechanisms of the resistance and susceptibility of cacao to M. perniciosa, to develop strategies to control witches broom, and as a source of polymorphism for molecular marker development and marker-assisted selection. PMID:17557832
Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.
2012-01-01
High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309
Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.
Martínez, Asunción; Osburne, Marcia S
2013-01-01
One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries. © 2013 Elsevier Inc. All rights reserved.
Merelli, Ivan; Caprera, Andrea; Stella, Alessandra; Del Corvo, Marcello; Milanesi, Luciano; Lazzari, Barbara
2009-10-15
The NCBI dbEST currently contains more than eight million human Expressed Sequenced Tags (ESTs). This wide collection represents an important source of information for gene expression studies, provided it can be inspected according to biologically relevant criteria. EST data can be browsed using different dedicated web resources, which allow to investigate library specific gene expression levels and to make comparisons among libraries, highlighting significant differences in gene expression. Nonetheless, no tool is available to examine distributions of quantitative EST collections in Gene Ontology (GO) categories, nor to retrieve information concerning library-dependent EST involvement in metabolic pathways. In this work we present the Human EST Ontology Explorer (HEOE) http://www.itb.cnr.it/ptp/human_est_explorer, a web facility for comparison of expression levels among libraries from several healthy and diseased tissues. The HEOE provides library-dependent statistics on the distribution of sequences in the GO Direct Acyclic Graph (DAG) that can be browsed at each GO hierarchical level. The tool is based on large-scale BLAST annotation of EST sequences. Due to the huge number of input sequences, this BLAST analysis was performed with the aid of grid computing technology, which is particularly suitable to address data parallel task. Relying on the achieved annotation, library-specific distributions of ESTs in the GO Graph were inferred. A pathway-based search interface was also implemented, for a quick evaluation of the representation of libraries in metabolic pathways. EST processing steps were integrated in a semi-automatic procedure that relies on Perl scripts and stores results in a MySQL database. A PHP-based web interface offers the possibility to simultaneously visualize, retrieve and compare data from the different libraries. Statistically significant differences in GO categories among user selected libraries can also be computed. The HEOE provides an alternative and complementary way to inspect EST expression levels with respect to approaches currently offered by other resources. Furthermore, BLAST computation on the whole human EST dataset was a suitable test of grid scalability in the context of large-scale bioinformatics analysis. The HEOE currently comprises sequence analysis from 70 non-normalized libraries, representing a comprehensive overview on healthy and unhealthy tissues. As the analysis procedure can be easily applied to other libraries, the number of represented tissues is intended to increase.
Yin, Jingjing; Li, Liangjun; Chen, Xuehao
2013-01-01
Lotus root is a popular wetland vegetable which produces edible rhizome. At the molecular level, the regulation of rhizome formation is very complex, which has not been sufficiently addressed in research. In this study, to identify differentially expressed genes (DEGs) in lotus root, four libraries (L1 library: stolon stage, L2 library: initial swelling stage, L3 library: middle swelling stage, L4: later swelling stage) were constructed from the rhizome development stages. High-throughput tag-sequencing technique was used which is based on Solexa Genome Analyzer Platform. Approximately 5.0 million tags were sequenced, and 4542104, 4474755, 4777919, and 4750348 clean tags including 151282, 137476, 215872, and 166005 distinct tags were obtained after removal of low quality tags from each library respectively. More than 43% distinct tags were unambiguous tags mapping to the reference genes, and 40% were unambiguous tag-mapped genes. From L1, L2, L3, and L4, total 20471, 18785, 23448, and 21778 genes were annotated, after mapping their functions in existing databases. Profiling of gene expression in L1/L2, L2/L3, and L3/L4 libraries were different among most of the selected 20 DEGs. Most of the DEGs in L1/L2 libraries were relevant to fiber development and stress response, while in L2/L3 and L3/L4 libraries, major of the DEGs were involved in metabolism of energy and storage. All up-regulated transcriptional factors in four libraries and 14 important rhizome formation-related genes in four libraries were also identified. In addition, the expression of 9 genes from identified DEGs was performed by qRT-PCR method. In a summary, this study provides a comprehensive understanding of gene expression during the rhizome formation in lotus root. PMID:23840598
Gomulski, Ludvik M; Dimopoulos, George; Xi, Zhiyong; Soares, Marcelo B; Bonaldo, Maria F; Malacrida, Anna R; Gasperi, Giuliano
2008-01-01
Background The medfly, Ceratitis capitata, is a highly invasive agricultural pest that has become a model insect for the development of biological control programs. Despite research into the behavior and classical and population genetics of this organism, the quantity of sequence data available is limited. We have utilized an expressed sequence tag (EST) approach to obtain detailed information on transcriptome signatures that relate to a variety of physiological systems in the medfly; this information emphasizes on reproduction, sex determination, and chemosensory perception, since the study was based on normalized cDNA libraries from embryos and adult heads. Results A total of 21,253 high-quality ESTs were obtained from the embryo and head libraries. Clustering analyses performed separately for each library resulted in 5201 embryo and 6684 head transcripts. Considering an estimated 19% overlap in the transcriptomes of the two libraries, they represent about 9614 unique transcripts involved in a wide range of biological processes and molecular functions. Of particular interest are the sequences that share homology with Drosophila genes involved in sex determination, olfaction, and reproductive behavior. The medfly transformer2 (tra2) homolog was identified among the embryonic sequences, and its genomic organization and expression were characterized. Conclusion The sequences obtained in this study represent the first major dataset of expressed genes in a tephritid species of agricultural importance. This resource provides essential information to support the investigation of numerous questions regarding the biology of the medfly and other related species and also constitutes an invaluable tool for the annotation of complete genome sequences. Our study has revealed intriguing findings regarding the transcript regulation of tra2 and other sex determination genes, as well as insights into the comparative genomics of genes implicated in chemosensory reception and reproduction. PMID:18500975
Nordeste, Ricardo F; Trainer, Maria A; Charles, Trevor C
2010-01-01
Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bioplastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti allows for the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates finding functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.
Cheng, Jiujun; Nordeste, Ricardo; Trainer, Maria A; Charles, Trevor C
2017-01-01
Development of different PHAs as alternatives to petrochemically derived plastics can be facilitated by mining metagenomic libraries for diverse PHA cycle genes that might be useful for synthesis of bio-plastics. The specific phenotypes associated with mutations of the PHA synthesis pathway genes in Sinorhizobium meliloti and Pseudomonas putida, allows the use of powerful selection and screening tools to identify complementing novel PHA synthesis genes. Identification of novel genes through their function rather than sequence facilitates the functional proteins that may otherwise have been excluded through sequence-only screening methodology. We present here methods that we have developed for the isolation of clones expressing novel PHA metabolism genes from metagenomic libraries.
A Blumeria graminisf.sp. hordei BAC library--contig building and microsynteny studies.
Pedersen, Carsten; Wu, Boqian; Giese, Henriette
2002-11-01
A bacterial artificial chromosome (BAC) library of Blumeria graminis f.sp. hordei, containing 12,000 clones with an average insert size of 41 kb, was constructed. The library represents about three genome equivalents and BAC-end sequencing showed a high content of repetitive sequences, making contig-building difficult. To identify overlapping clones, several strategies were used: colony hybridisation, PCR screening, fingerprinting techniques and the use of single-copy expressed sequence tags. The latter proved to be the most efficient method for identification of overlapping clones. Two contigs, at or close to avirulence loci, were constructed. Single nucleotide polymorphism (SNP) markers were developed from BAC-end sequences to link the contigs to the genetic maps. Two other BAC contigs were used to study microsynteny between B. graminis and two other ascomycetes, Neurospora crassa and Aspergillus fumigatus. The library provides an invaluable tool for the isolation of avirulence genes from B. graminis and for the study of gene synteny between this fungus and other fungi.
Sequence analysis of 497 mouse brain ESTs expressed in the substantia nigra
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, G.J.; Savioz, A.; Davies, R.W.
1997-01-15
The use of subtracted, region-specific cDNA libraries combined with single-pass cDNA sequencing allows the discovery of novel genes and facilitates molecular description of the tissue or region involved. We report the sequence of 497 mouse expressed sequence tags (ESTs) from two subtracted libraries enriched for cDNAs expressed in the substantia nigra, a brain region with important roles in movement control and Parkinson disease. Of these, 238 ESTs give no database matches and therefore derive from novel genes. A further 115 ESTs show sequence similarity to ESTs from other organisms, which themselves do not yield any significant database matches to genesmore » of known function. Fifty-six ESTs show sequence similarity to previously identified genes whose mouse homologues have not been reported. The total number of ESTs reported that are new for the mouse is 407, which, together with the 90 ESTs corresponding to known mouse genes or cDNAs, contributes to the molecular description of the substantia nigra. 21 refs., 4 tabs.« less
Interpreting a sequenced genome: toward a cosmid transgenic library of Caenorhabditis elegans.
Janke, D L; Schein, J E; Ha, T; Franz, N W; O'Neil, N J; Vatcher, G P; Stewart, H I; Kuervers, L M; Baillie, D L; Rose, A M
1997-10-01
We have generated a library of transgenic Caenorhabditis elegans strains that carry sequenced cosmids from the genome of the nematode. Each strain carries an extrachromosomal array containing a single cosmid, sequenced by the C. elegans Genome Sequencing Consortium, and a dominate Rol-6 marker. More than 500 transgenic strains representing 250 cosmids have been constructed. Collectively, these strains contain approximately 8 Mb of sequence data, or approximately 8% of the C. elegans genome. The transgenic strains are being used to rescue mutant phenotypes, resulting in a high-resolution map alignment of the genetic, physical, and DNA sequence maps of the nematode. We have chosen the region of chromosome III deleted by sDf127 and not covered by the duplication sDp8(III;I) as a starting point for a systematic correlation of mutant phenotypes with nucleotide sequence. In this defined region, we have identified 10 new essential genes whose mutant phenotypes range from developmental arrest at early larva, to maternal effect lethal. To date, 8 of these 10 essential genes have been rescued. In this region, these rescues represent approximately 10% of the genes predicted by GENEFINDER and considerably enhance the map alignment. Furthermore, this alignment facilitates future efforts to physically position and clone other genes in the region. [Updated information about the Transgenic Library is available via the Internet at http://darwin.mbb.sfu.ca/imbb/dbaillie/cos mid.html.
Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yaung, Stephanie J.; Deng, Luxue; Li, Ning
Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less
Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics
Yaung, Stephanie J.; Deng, Luxue; Li, Ning; ...
2015-03-11
Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less
Prospecting for viral natural enemies of the fire ant Solenopsis invicta in Argentina.
Valles, Steven M; Porter, Sanford D; Calcaterra, Luis A
2018-01-01
Metagenomics and next generation sequencing were employed to discover new virus natural enemies of the fire ant, Solenopsis invicta Buren in its native range (i.e., Formosa, Argentina) with the ultimate goal of testing and releasing new viral pathogens into U.S. S. invicta populations to provide natural, sustainable control of this ant. RNA was purified from worker ants from 182 S. invicta colonies, which was pooled into 4 groups according to location. A library was created from each group and sequenced using Illumina Miseq technology. After a series of winnowing methods to remove S. invicta genes, known S. invicta virus genes, and all other non-virus gene sequences, 61,944 unique singletons were identified with virus identity. These were assembled de novo yielding 171 contiguous sequences with significant identity to non-plant virus genes. Fifteen contiguous sequences exhibited very high expression rates and were detected in all four gene libraries. One contig (Contig_29) exhibited the highest expression level overall and across all four gene libraries. Random amplification of cDNA ends analyses expanded this contiguous sequence yielding a complete virus genome, which we have provisionally named Solenopsis invicta virus 5 (SINV-5). SINV-5 is a positive-sense, single-stranded RNA virus with genome characteristics consistent with insect-infecting viruses from the family Dicistroviridae. Moreover, the replicative genome strand of SINV-5 was detected in worker ants indicating that S. invicta serves as host for the virus. Many additional sequences were identified that are likely of viral origin. These sequences await further investigation to determine their origins and relationship with S. invicta. This study expands knowledge of the RNA virome diversity found within S. invicta populations.
Prospecting for viral natural enemies of the fire ant Solenopsis invicta in Argentina
Porter, Sanford D.; Calcaterra, Luis A.
2018-01-01
Metagenomics and next generation sequencing were employed to discover new virus natural enemies of the fire ant, Solenopsis invicta Buren in its native range (i.e., Formosa, Argentina) with the ultimate goal of testing and releasing new viral pathogens into U.S. S. invicta populations to provide natural, sustainable control of this ant. RNA was purified from worker ants from 182 S. invicta colonies, which was pooled into 4 groups according to location. A library was created from each group and sequenced using Illumina Miseq technology. After a series of winnowing methods to remove S. invicta genes, known S. invicta virus genes, and all other non-virus gene sequences, 61,944 unique singletons were identified with virus identity. These were assembled de novo yielding 171 contiguous sequences with significant identity to non-plant virus genes. Fifteen contiguous sequences exhibited very high expression rates and were detected in all four gene libraries. One contig (Contig_29) exhibited the highest expression level overall and across all four gene libraries. Random amplification of cDNA ends analyses expanded this contiguous sequence yielding a complete virus genome, which we have provisionally named Solenopsis invicta virus 5 (SINV-5). SINV-5 is a positive-sense, single-stranded RNA virus with genome characteristics consistent with insect-infecting viruses from the family Dicistroviridae. Moreover, the replicative genome strand of SINV-5 was detected in worker ants indicating that S. invicta serves as host for the virus. Many additional sequences were identified that are likely of viral origin. These sequences await further investigation to determine their origins and relationship with S. invicta. This study expands knowledge of the RNA virome diversity found within S. invicta populations. PMID:29466388
Serial analysis of gene expression (SAGE) in normal human trabecular meshwork.
Liu, Yutao; Munro, Drew; Layfield, David; Dellinger, Andrew; Walter, Jeffrey; Peterson, Katherine; Rickman, Catherine Bowes; Allingham, R Rand; Hauser, Michael A
2011-04-08
To identify the genes expressed in normal human trabecular meshwork tissue, a tissue critical to the pathogenesis of glaucoma. Total RNA was extracted from human trabecular meshwork (HTM) harvested from 3 different donors. Extracted RNA was used to synthesize individual SAGE (serial analysis of gene expression) libraries using the I-SAGE Long kit from Invitrogen. Libraries were analyzed using SAGE 2000 software to extract the 17 base pair sequence tags. The extracted sequence tags were mapped to the genome using SAGE Genie map. A total of 298,834 SAGE tags were identified from all HTM libraries (96,842, 88,126, and 113,866 tags, respectively). Collectively, there were 107,325 unique tags. There were 10,329 unique tags with a minimum of 2 counts from a single library. These tags were mapped to known unique Unigene clusters. Approximately 29% of the tags (orphan tags) did not map to a known Unigene cluster. Thirteen percent of the tags mapped to at least 2 Unigene clusters. Sequence tags from many glaucoma-related genes, including myocilin, optineurin, and WD repeat domain 36, were identified. This is the first time SAGE analysis has been used to characterize the gene expression profile in normal HTM. SAGE analysis provides an unbiased sampling of gene expression of the target tissue. These data will provide new and valuable information to improve understanding of the biology of human aqueous outflow.
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.
Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B
2004-12-15
cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal
2017-01-01
Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610
Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun
2013-01-01
Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Genomics approach to the environmental community of microorganisms
NASA Astrophysics Data System (ADS)
Kawarabayasi, Y.; Maruyama, A.
2004-12-01
It was indicated by microscopic observation or comparison of 16S rDNA sequence that many extremophiles were surviving in many hydrothermal environments. But it is generally said that over 99% of total microbes are now uncultivable. Thus, we planned to identify uncultivable microbes through direct sequencing of environmental DNA. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected from low-temperature hydrothermal water at RM24 in the Southern East Pacific Rise (S-EPR). It was shown that the sequences of some number of clones indicated the similar feature to the intron in eukaryote or tandem repetitive sequence identified in some human familiar diseases. The results indicated that many microorganisms with eukaryotic feature were dominant in low temperature water of S-EPR. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. The ORFs were easily identified all clones determined entire sequence. Thus it can be said that hot springs is good resources for searching novel genes. At last, the mixed microbes isolated from Suiyo seamount were used for construction of shotgun library. The clones in this library contained the ORFs. From some clones in hot spring and Suiyo sample, aminoacyl-tRNA synthatase, which is generally present in all organisms, was isolated by similarity. The phylogenetic analysis of aminoacyl-tRNA synthetase identified indicated that novel and unidentified microorganisms should be present in hot spring or Suiyo seamount. The novel genes identified from Suiyo seamount were also utilized for expression in E. coli. Some gene products were successfully obtained from the E. coli cells as soluble proteins. Some protein indicated the thermostability up to 70_E#8249;C, meaning that the original host cell of this gene should be stable up to the same temperature. Our work indicates that environmental genomics, including the direct cloning, sequencing of environmental DNA and expression of gene identified, is powerful approach to collect novel uncultivable microbes or novel active genes.
Tartar, Aurélien; Wheeler, Marsha M; Zhou, Xuguo; Coy, Monique R; Boucias, Drion G; Scharf, Michael E
2009-01-01
Background Termite lignocellulose digestion is achieved through a collaboration of host plus prokaryotic and eukaryotic symbionts. In the present work, we took a combined host and symbiont metatranscriptomic approach for investigating the digestive contributions of host and symbiont in the lower termite Reticulitermes flavipes. Our approach consisted of parallel high-throughput sequencing from (i) a host gut cDNA library and (ii) a hindgut symbiont cDNA library. Subsequently, we undertook functional analyses of newly identified phenoloxidases with potential importance as pretreatment enzymes in industrial lignocellulose processing. Results Over 10,000 expressed sequence tags (ESTs) were sequenced from the 2 libraries that aligned into 6,555 putative transcripts, including 171 putative lignocellulase genes. Sequence analyses provided insights in two areas. First, a non-overlapping complement of host and symbiont (prokaryotic plus protist) glycohydrolase gene families known to participate in cellulose, hemicellulose, alpha carbohydrate, and chitin degradation were identified. Of these, cellulases are contributed by host plus symbiont genomes, whereas hemicellulases are contributed exclusively by symbiont genomes. Second, a diverse complement of previously unknown genes that encode proteins with homology to lignase, antioxidant, and detoxification enzymes were identified exclusively from the host library (laccase, catalase, peroxidase, superoxide dismutase, carboxylesterase, cytochrome P450). Subsequently, functional analyses of phenoloxidase activity provided results that were strongly consistent with patterns of laccase gene expression. In particular, phenoloxidase activity and laccase gene expression are mostly restricted to symbiont-free foregut plus salivary gland tissues, and phenoloxidase activity is inducible by lignin feeding. Conclusion To our knowledge, this is the first time that a dual host-symbiont transcriptome sequencing effort has been conducted in a single termite species. This sequence database represents an important new genomic resource for use in further studies of collaborative host-symbiont termite digestion, as well as development of coevolved host and symbiont-derived biocatalysts for use in industrial biomass-to-bioethanol applications. Additionally, this study demonstrates that: (i) phenoloxidase activities are prominent in the R. flavipes gut and are not symbiont derived, (ii) expands the known number of host and symbiont glycosyl hydrolase families in Reticulitermes, and (iii) supports previous models of lignin degradation and host-symbiont collaboration in cellulose/hemicellulose digestion in the termite gut. All sequences in this paper are available publicly with the accession numbers FL634956-FL640828 (Termite Gut library) and FL641015-FL645753 (Symbiont library). PMID:19832970
Townsley, Brad T; Covington, Michael F; Ichihashi, Yasunori; Zumstein, Kristina; Sinha, Neelima R
2015-01-01
Next Generation Sequencing (NGS) is driving rapid advancement in biological understanding and RNA-sequencing (RNA-seq) has become an indispensable tool for biology and medicine. There is a growing need for access to these technologies although preparation of NGS libraries remains a bottleneck to wider adoption. Here we report a novel method for the production of strand specific RNA-seq libraries utilizing the terminal breathing of double-stranded cDNA to capture and incorporate a sequencing adapter. Breath Adapter Directional sequencing (BrAD-seq) reduces sample handling and requires far fewer enzymatic steps than most available methods to produce high quality strand-specific RNA-seq libraries. The method we present is optimized for 3-prime Digital Gene Expression (DGE) libraries and can easily extend to full transcript coverage shotgun (SHO) type strand-specific libraries and is modularized to accommodate a diversity of RNA and DNA input materials. BrAD-seq offers a highly streamlined and inexpensive option for RNA-seq libraries.
Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander
2015-01-01
High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305
Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro
2015-11-18
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
Genome-Wide Mutagenesis in Borrelia burgdorferi.
Lin, Tao; Gao, Lihui
2018-01-01
Signature-tagged mutagenesis (STM) is a functional genomics approach to identify bacterial virulence determinants and virulence factors by simultaneously screening multiple mutants in a single host animal, and has been utilized extensively for the study of bacterial pathogenesis, host-pathogen interactions, and spirochete and tick biology. The signature-tagged transposon mutagenesis has been developed to investigate virulence determinants and pathogenesis of Borrelia burgdorferi. Mutants in genes important in virulence are identified by negative selection in which the mutants fail to colonize or disseminate in the animal host and tick vector. STM procedure combined with Luminex Flex ® Map™ technology and next-generation sequencing (e.g., Tn-seq) are the powerful high-throughput tools for the determination of Borrelia burgdorferi virulence determinants. The assessment of multiple tissue sites and two DNA resources at two different time points using Luminex Flex ® Map™ technology provides a robust data set. B. burgdorferi transposon mutant screening indicates that a high proportion of genes are the novel virulence determinants that are required for mouse and tick infection. In this protocol, an effective signature-tagged Himar1-based transposon suicide vector was developed and used to generate a sequence-defined library of nearly 4800 mutants in the infectious B. burgdorferi B31 clone. In STM, signature-tagged suicide vectors are constructed by inserting unique DNA sequences (tags) into the transposable elements. The signature-tagged transposon mutants are generated when transposon suicide vectors are transformed into an infectious B. burgdorferi clone, and the transposable element is transposed into the 5'-TA-3' sequence in the B. burgdorferi genome with the signature tag. The transposon library is created and consists of many sub-libraries, each sub-library has several hundreds of mutants with same tags. A group of mice or ticks are infected with a mixed population of mutants with different tags, after recovered from different tissues of infected mice and ticks, mutants from output pool and input pool are detected using high-throughput, semi-quantitative Luminex ® FLEXMAP™ or next-generation sequencing (Tn-seq) technologies. Thus far, we have created a high-density, sequence-defined transposon library of over 6600 STM mutants for the efficient genome-wide investigation of genes and gene products required for wild-type pathogenesis, host-pathogen interactions, in vitro growth, in vivo survival, physiology, morphology, chemotaxis, motility, structure, metabolism, gene regulation, plasmid maintenance and replication, etc. The insertion sites of 4480 transposon mutants have been determined. About 800 predicted protein-encoding genes in the genome were disrupted in the STM transposon library. The infectivity and some functions of 800 mutants in 500 genes have been determined. Analysis of these transposon mutants has yielded valuable information regarding the genes and gene products important in the pathogenesis and biology of B. burgdorferi and its tick vectors.
Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin
2011-10-01
Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.
RNA-Seq for Bacterial Gene Expression.
Poulsen, Line Dahl; Vinther, Jeppe
2018-06-01
RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.
Hoshino, Tatsuhiko; Inagaki, Fumio
2017-01-01
Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
Automated design of degenerate codon libraries.
Mena, Marco A; Daugherty, Patrick S
2005-12-01
Degenerate codon libraries are frequently used in protein engineering and evolution studies but are often limited to targeting a small number of positions to adequately limit the search space. To mitigate this, codon degeneracy can be limited using heuristics or previous knowledge of the targeted positions. To automate design of libraries given a set of amino acid sequences, an algorithm (LibDesign) was developed that generates a set of possible degenerate codon libraries, their resulting size, and their score relative to a user-defined scoring function. A gene library of a specified size can then be constructed that is representative of the given amino acid distribution or that includes specific sequences or combinations thereof. LibDesign provides a new tool for automated design of high-quality protein libraries that more effectively harness existing sequence-structure information derived from multiple sequence alignment or computational protein design data.
Huang, Xianzhong; Yang, Lifei; Jin, Yuhuan; Lin, Jun; Liu, Fang
2017-01-01
Arabidopsis pumila is an ephemeral plant, and a close relative of the model plant Arabidopsis thaliana , but it possesses higher photosynthetic efficiency, higher propagation rate, and higher salinity tolerance compared to those A. thaliana , thus providing a candidate plant system for gene mining for environmental adaption and salt tolerance. However, A. pumila is an under-explored resource for understanding the genetic mechanisms underlying abiotic stress adaptation. To improve our understanding of the molecular and genetic mechanisms of salt stress adaptation, more than 19,900 clones randomly selected from a cDNA library constructed previously from leaf tissue exposed to high-salinity shock were sequenced. A total of 16,014 high-quality expressed sequence tags (ESTs) were generated, which have been deposited in the dbEST GenBank under accession numbers JZ932319 to JZ948332. Clustering and assembly of these ESTs resulted in the identification of 8,835 unique sequences, consisting of 2,469 contigs and 6,366 singletons. The blastx results revealed 8,011 unigenes with significant similarity to known genes, while only 425 unigenes remained uncharacterized. Functional classification demonstrated an abundance of unigenes involved in binding, catalytic, structural or transporter activities, and in pathways of energy, carbohydrate, amino acid, or lipid metabolism. At least seven main classes of genes were related to salt-tolerance among the 8,835 unigenes. Many previously reported salt tolerance genes were also manifested in this library, for example VP1, H + -ATPase, NHX1, SOS2, SOS3, NAC, MYB, ERF, LEA, P5CS1 . In addition, 251 transcription factors were identified from the library, classified into 42 families. Lastly, changes in expression of the 12 most abundant unigenes, 12 transcription factor genes, and 19 stress-related genes in the first 24 h of exposure to high-salinity stress conditions were monitored by qRT-PCR. The large-scale EST library obtained in this study provides first-hand information on gene sequences expressed in young leaves of A. pumila exposed to salt shock. The rapid discovery of known or unknown genes related to salinity stress response in A. pumila will facilitate the understanding of complex adaptive mechanisms for ephemerals.
2009-01-01
Background Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat (Triticum aestivum L.) worldwide. In spite of its agricultural importance, the genomics and genetics of the pathogen are poorly characterized. Pst transcripts from urediniospores and germinated urediniospores have been examined previously, but little is known about genes expressed during host infection. Some genes involved in virulence in other rust fungi have been found to be specifically expressed in haustoria. Therefore, the objective of this study was to generate a cDNA library to characterize genes expressed in haustoria of Pst. Results A total of 5,126 EST sequences of high quality were generated from haustoria of Pst, from which 287 contigs and 847 singletons were derived. Approximately 10% and 26% of the 1,134 unique sequences were homologous to proteins with known functions and hypothetical proteins, respectively. The remaining 64% of the unique sequences had no significant similarities in GenBank. Fifteen genes were predicted to be proteins secreted from Pst haustoria. Analysis of ten genes, including six secreted protein genes, using quantitative RT-PCR revealed changes in transcript levels in different developmental and infection stages of the pathogen. Conclusions The haustorial cDNA library was useful in identifying genes of the stripe rust fungus expressed during the infection process. From the library, we identified 15 genes encoding putative secreted proteins and six genes induced during the infection process. These genes are candidates for further studies to determine their functions in wheat-Pst interactions. PMID:20028560
Nguyen, Kieu T H; Adamkiewicz, Marta A; Hebert, Lauren E; Zygiel, Emily M; Boyle, Holly R; Martone, Christina M; Meléndez-Ríos, Carola B; Noren, Karen A; Noren, Christopher J; Hall, Marilena Fitzsimons
2014-10-01
A target-unrelated peptide (TUP) can arise in phage display selection experiments as a result of a propagation advantage exhibited by the phage clone displaying the peptide. We previously characterized HAIYPRH, from the M13-based Ph.D.-7 phage display library, as a propagation-related TUP resulting from a G→A mutation in the Shine-Dalgarno sequence of gene II. This mutant was shown to propagate in Escherichia coli at a dramatically faster rate than phage bearing the wild-type Shine-Dalgarno sequence. We now report 27 additional fast-propagating clones displaying 24 different peptides and carrying 14 unique mutations. Most of these mutations are found either in or upstream of the gene II Shine-Dalgarno sequence, but still within the mRNA transcript of gene II. All 27 clones propagate at significantly higher rates than normal library phage, most within experimental error of wild-type M13 propagation, suggesting that mutations arise to compensate for the reduced virulence caused by the insertion of a lacZα cassette proximal to the replication origin of the phage used to construct the library. We also describe an efficient and convenient assay to diagnose propagation-related TUPS among peptide sequences selected by phage display. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Ulloa, Pilar E; Rincón, Gonzalo; Islas-Trejo, Alma; Araneda, Cristian; Iturra, Patricia; Neira, Roberto; Medrano, Juan F
2015-06-01
The objectives of this study were to measure gene expression in zebrafish and then identify SNP to be used as potential markers in a growth association study. We developed an approach where muscle samples collected from low- and high-growth fish were analyzed using RNA-Sequencing (RNA-seq), and SNP were chosen from the genes that were differentially expressed between the low and high groups. A population of 24 families was fed a plant protein-based diet from the larval to adult stages. From a total of 440 males, 5 % of the fish from both tails of the weight gain distribution were selected. Total RNA was extracted from individual muscle of 8 low-growth and 8 high-growth fish. Two pooled RNA-Seq libraries were prepared for each phenotype using 4 fish per library. Libraries were sequenced using the Illumina GAII Sequencer and analyzed using the CLCBio genomic workbench software. One hundred and twenty-four genes were differentially expressed between phenotypes (p value < 0.05 and FDR < 0.2). From these genes, 164 SNP were selected and genotyped in 240 fish samples. Marker-trait analysis revealed 5 SNP associated with growth in key genes (Nars, Lmod2b, Cuzd1, Acta1b, and Plac8l1). These genes are good candidates for further growth studies in fish and to consider for identification of potential SNPs associated with different growth rates in response to a plant protein-based diet.
Complementary DNA libraries: an overview.
Ying, Shao-Yao
2004-07-01
The generation of complete and full-length cDNA libraries for potential functional assays of specific gene sequences is essential for most molecules in biotechnology and biomedical research. The field of cDNA library generation has changed rapidly in the past 10 yr. This review presents an overview of the method available for the basic information of generating cDNA libraries, including the definition of the cDNA library, different kinds of cDNA libraries, difference between methods for cDNA library generation using conventional approaches and a novel strategy, and the quality of cDNA libraries. It is anticipated that the high-quality cDNA libraries so generated would facilitate studies involving genechips and the microarray, differential display, subtractive hybridization, gene cloning, and peptide library generation.
Novel aromatic ring-hydroxylating dioxygenase genes from coastal marine sediments of Patagonia
Lozada, Mariana; Riva Mercadal, Juan P; Guerrero, Leandro D; Di Marzio, Walter D; Ferrero, Marcela A; Dionisi, Hebe M
2008-01-01
Background Polycyclic aromatic hydrocarbons (PAHs), widespread pollutants in the marine environment, can produce adverse effects in marine organisms and can be transferred to humans through seafood. Our knowledge of PAH-degrading bacterial populations in the marine environment is still very limited, and mainly originates from studies of cultured bacteria. In this work, genes coding catabolic enzymes from PAH-biodegradation pathways were characterized in coastal sediments of Patagonia with different levels of PAH contamination. Results Genes encoding for the catalytic alpha subunit of aromatic ring-hydroxylating dioxygenases (ARHDs) were amplified from intertidal sediment samples using two different primer sets. Products were cloned and screened by restriction fragment length polymorphism analysis. Clones representing each restriction pattern were selected in each library for sequencing. A total of 500 clones were screened in 9 gene libraries, and 193 clones were sequenced. Libraries contained one to five different ARHD gene types, and this number was correlated with the number of PAHs found in the samples above the quantification limit (r = 0.834, p < 0.05). Overall, eight different ARHD gene types were detected in the sediments. In five of them, their deduced amino acid sequences formed deeply rooted branches with previously described ARHD peptide sequences, exhibiting less than 70% identity to them. They contain consensus sequences of the Rieske type [2Fe-2S] cluster binding site, suggesting that these gene fragments encode for ARHDs. On the other hand, three gene types were closely related to previously described ARHDs: archetypical nahAc-like genes, phnAc-like genes as identified in Alcaligenes faecalis AFK2, and phnA1-like genes from marine PAH-degraders from the genus Cycloclasticus. Conclusion These results show the presence of hitherto unidentified ARHD genes in this sub-Antarctic marine environment exposed to anthropogenic contamination. This information can be used to study the geographical distribution and ecological significance of bacterial populations carrying these genes, and to design molecular assays to monitor the progress and effectiveness of remediation technologies. PMID:18366740
Geoseq: a tool for dissecting deep-sequencing datasets.
Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi
2010-10-12
Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Informatic selection of a neural crest-melanocyte cDNA set for microarray analysis
Loftus, S. K.; Chen, Y.; Gooden, G.; Ryan, J. F.; Birznieks, G.; Hilliard, M.; Baxevanis, A. D.; Bittner, M.; Meltzer, P.; Trent, J.; Pavan, W.
1999-01-01
With cDNA microarrays, it is now possible to compare the expression of many genes simultaneously. To maximize the likelihood of finding genes whose expression is altered under the experimental conditions, it would be advantageous to be able to select clones for tissue-appropriate cDNA sets. We have taken advantage of the extensive sequence information in the dbEST expressed sequence tag (EST) database to identify a neural crest-derived melanocyte cDNA set for microarray analysis. Analysis of characterized genes with dbEST identified one library that contained ESTs representing 21 neural crest-expressed genes (library 198). The distribution of the ESTs corresponding to these genes was biased toward being derived from library 198. This is in contrast to the EST distribution profile for a set of control genes, characterized to be more ubiquitously expressed in multiple tissues (P < 1 × 10−9). From library 198, a subset of 852 clustered ESTs were selected that have a library distribution profile similar to that of the 21 neural crest-expressed genes. Microarray analysis demonstrated the majority of the neural crest-selected 852 ESTs (Mel1 array) were differentially expressed in melanoma cell lines compared with a non-neural crest kidney epithelial cell line (P < 1 × 10−8). This was not observed with an array of 1,238 ESTs that was selected without library origin bias (P = 0.204). This study presents an approach for selecting tissue-appropriate cDNAs that can be used to examine the expression profiles of developmental processes and diseases. PMID:10430933
Yu, Zhongtang; Yu, Marie; Morrison, Mark
2006-04-01
Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.
Aschard, Hugues; Cattoir, Vincent; Yoder-Himes, Deborah; Lory, Stephen; Pier, Gerald B.
2013-01-01
High-throughput sequencing of transposon (Tn) libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq) to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200–1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian host. PMID:24039572
Yung, Pui Yi; Burke, Catherine; Lewis, Matt; Egan, Suhelen; Kjelleberg, Staffan; Thomas, Torsten
2009-01-01
Metagenomics provides access to the uncultured majority of the microbial world. The approaches employed in this field have, however, had limited success in linking functional genes to the taxonomic or phylogenetic origin of the organism they belong to. Here we present an efficient strategy to recover environmental DNA fragments that contain phylogenetic marker genes from metagenomic libraries. Our method involves the cleavage of 23S ribsosmal RNA (rRNA) genes within pooled library clones by the homing endonuclease I-CeuI followed by the insertion and selection of an antibiotic resistance cassette. This approach was applied to screen a library of 6500 fosmid clones derived from the microbial community associated with the sponge Cymbastela concentrica. Several fosmid clones were recovered after the screen and detailed phylogenetic and taxonomic assignment based on the rRNA gene showed that they belong to previously unknown organisms. In addition, compositional features of these fosmid clones were used to classify and taxonomically assign a dataset of environmental shotgun sequences. Our approach represents a valuable tool for the analysis of rapidly increasing, environmental DNA sequencing information. PMID:19767618
Balancing gene expression without library construction via a reusable sRNA pool.
Ghodasara, Amar; Voigt, Christopher A
2017-07-27
Balancing protein expression is critical when optimizing genetic systems. Typically, this requires library construction to vary the genetic parts controlling each gene, which can be expensive and time-consuming. Here, we develop sRNAs corresponding to 15nt 'target' sequences that can be inserted upstream of a gene. The targeted gene can be repressed from 1.6- to 87-fold by controlling sRNA expression using promoters of different strength. A pool is built where six sRNAs are placed under the control of 16 promoters that span a ∼103-fold range of strengths, yielding ∼107 combinations. This pool can simultaneously optimize up to six genes in a system. This requires building only a single system-specific construct by placing a target sequence upstream of each gene and transforming it with the pre-built sRNA pool. The resulting library is screened and the top clone is sequenced to determine the promoter controlling each sRNA, from which the fold-repression of the genes can be inferred. The system is then rebuilt by rationally selecting parts that implement the optimal expression of each gene. We demonstrate the versatility of this approach by using the same pool to optimize a metabolic pathway (β-carotene) and genetic circuit (XNOR logic gate). © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Capturing diversity of marine heterotrophic protists: one cell at a time
Heywood, Jane L; Sieracki, Michael E; Bellows, Wendy; Poulton, Nicole J; Stepanauskas, Ramunas
2011-01-01
Recent applications of culture-independent, molecular methods have revealed unexpectedly high diversity in a variety of functional and phylogenetic groups of microorganisms in the ocean. However, none of the existing research tools are free from significant limitations, such as PCR and cloning biases, low phylogenetic resolution and others. Here, we employed novel, single-cell sequencing techniques to assess the composition of small (<10 μm diameter), heterotrophic protists from the Gulf of Maine. Single cells were isolated by flow cytometry, their genomes amplified, and 18S rRNA marker genes were amplified and sequenced. We compared the results to traditional environmental PCR cloning of sorted cells. The diversity of heterotrophic protists was significantly higher in the library of single amplified genomes (SAGs) than in environmental PCR clone libraries of the 18S rRNA gene, obtained from the same coastal sample. Libraries of SAGs, but not clones contained several recently discovered, uncultured groups, including picobiliphytes and novel marine stramenopiles. Clone, but not SAG, libraries contained several large clusters of identical and nearly identical sequences of Dinophyceae, Cercozoa and Stramenopiles. Similar results were obtained using two alternative primer sets, suggesting that PCR biases may not be the only explanation for the observed patterns. Instead, differences in the number of 18S rRNA gene copies among the various protist taxa probably had a significant role in determining the PCR clone composition. These results show that single-cell sequencing has the potential to more accurately assess protistan community composition than previously established methods. In addition, the creation of SAG libraries opens opportunities for the analysis of multiple genes or entire genomes of the uncultured protist groups. PMID:20962875
Brulle, Franck; Jeffroy, Fanny; Madec, Stéphanie; Nicolas, Jean-Louis; Paillard, Christine
2012-10-01
The Manila clam, Ruditapes philippinarum, is an economically-important, commercial shellfish; harvests are diminished in some European waters by a pathogenic bacterium, Vibrio tapetis, that causes Brown Ring disease. To identify molecular characteristics associated with susceptibility or resistance to Brown Ring disease, Suppression Subtractive Hybridization (SSH) analyzes were performed to construct cDNA libraries enriched in up- or down-regulated transcripts from clam immune cells, hemocytes, after a 3-h in vitro challenge with cultured V. tapetis. Nine hundred and ninety eight sequences from the two libraries were sequenced, and an in silico analysis identified 235 unique genes. BLAST and "Gene ontology" classification analyzes revealed that 60.4% of the Expressed Sequence Tags (ESTs) have high similarities with genes involved in various physiological functions, such as immunity, apoptosis and cytoskeleton organization; whereas, 39.6% remain unidentified. From the 235 unique genes, we selected 22 candidates based upon physiological function and redundancy in the libraries. Then, Real-Time PCR analysis identified 3 genes related to cytoskeleton organization showing significant variation in expression attributable to V. tapetis exposure. Disruption in regulation of these genes is consistent with the etiologic agent of Brown Ring disease in Manila clams. Copyright © 2012 Elsevier Ltd. All rights reserved.
Hellberg, Rosalee S; Martin, Keely G; Keys, Ashley L; Haney, Christopher J; Shen, Yuelian; Smiley, R Derike
2013-12-01
Use of 16S rRNA partial gene sequencing within the regulatory workflow could greatly reduce the time and labor needed for confirmation and subtyping of Listeria monocytogenes. The goal of this study was to build a 16S rRNA partial gene reference library for Listeria spp. and investigate the potential for 16S rRNA molecular subtyping. A total of 86 isolates of Listeria representing L. innocua, L. seeligeri, L. welshimeri, and L. monocytogenes were obtained for use in building the custom library. Seven non-Listeria species and three additional strains of Listeria were obtained for use in exclusivity and food spiking tests. Isolates were sequenced for the partial 16S rRNA gene using the MicroSeq ID 500 Bacterial Identification Kit (Applied Biosystems). High-quality sequences were obtained for 84 of the custom library isolates and 23 unique 16S sequence types were discovered for use in molecular subtyping. All of the exclusivity strains were negative for Listeria and the three Listeria strains used in food spiking were consistently recovered and correctly identified at the species level. The spiking results also allowed for differentiation beyond the species level, as 87% of replicates for one strain and 100% of replicates for the other two strains consistently matched the same 16S type. Copyright © 2013 Elsevier Ltd. All rights reserved.
Milnthorpe, Andrew T; Soloviev, Mikhail
2011-04-15
The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.
2011-01-01
Background The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. Results We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Conclusion Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used. PMID:21496233
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
Amin, Shivani; Rastogi, Rajesh P; Sonani, Ravi R; Ray, Arabinda; Sharma, Rakesh; Madamwar, Datta
2018-04-15
To explore the potential genes from the industrially polluted Amlakhadi canal, located in Ankleshwar, Gujarat, India, its community genome was extracted and cloned into E. coli EPI300™-T1 R using a fosmid vector (pCC2 FOS™) generating a library of 3,92,000 clones with average size of 40kb of DNA-insert. From this library, the clone DM1 producing brown colored melanin-like pigment was isolated and characterized. For over expression of the pigment, further sub-cloning of the clone DM1 was done. Sub-clone containing 10kb of the insert was sequenced for gene identification. The amino acids sequence of a protein 4-Hydroxyphenylpyruvate dioxygenase (HPPD), which is know to be involved in melanin biosynthesis was obtained from the gene sequence. The sequence-homology based 3D structure model of HPPD was constructed and analyzed. The physico-chemical nature of pigment was further analysed using 1 H and 13 C NMR, LC-MS, FTIR and UV-visible spectroscopy. The pigment was readily soluble in DMSO with an absorption maximum around 290nm. Based on the genetic and chemical characterization, the compound was confirmed as melanin-like pigment. The present results indicate that the metagenomic library from industrially polluted environment generated a microbial tool for the production of melanin-like pigment. Copyright © 2018 Elsevier B.V. All rights reserved.
High-Resolution Sequence-Function Mapping of Full-Length Proteins
Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.
2015-01-01
Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
USDA-ARS?s Scientific Manuscript database
Oocyte-specific genes play critical roles in oogenesis, folliculogenesis and early embryonic development. Through analysis of expressed sequence tags (ESTs) from a rainbow trout oocyte cDNA library, we identified a novel transcript which is represented by ESTs only from the oocyte library. The novel...
We examined the bacterial composition of chlorinated drinking water using 16S rRNA gene clone libraries derived from RNA and DNA extracted from twelve water samples collected in three different months (June, August, and September of 2007). Phylogenetic analysis of 1234 and 1117 ...
We examined the bacterial composition of chlorinated drinking water using 16S rRNA gene clone libraries derived from RNA and DNA extracted from twelve water samples collected in three different months (June, August, and September of 2007). Phylogenetic analysis of 1234 and 1117 ...
Liu, Jinlin; Jia, Zhijuan; Li, Sha; Li, Yan; You, Qiang; Zhang, Chunyan; Zheng, Xiaotong; Xiong, Guomei; Zhao, Jin; Qi, Chao; Yang, Jihong
2016-09-15
The chemical and biological compositions of deep-sea sediments are interesting because of the underexplored diversity when it comes to bioprospecting. The special geographical location and climates make Arctic Ocean a unique ocean area containing an abundance of microbial resources. A metagenomic library was constructed based on the deep-sea sediments of Arctic Ocean. Part of insertion fragments of this library were sequenced. A chitin deacetylase gene, cdaYJ, was identified and characterized. A metagenomic library with 2750 clones was obtained and ten clones were sequenced. Results revealed several interesting genes, including a chitin deacetylase coding sequence, cdaYJ. The CdaYJ is homologous to some known chitin deacetylases and contains conserved chitin deacetylase active sites. CdaYJ protein exhibits a long N-terminal and a relative short C-terminal. Phylogenetic analysis revealed that CdaYJ showed highest homology to CDAs from Alphaproteobacteria. The cdaYJ gene was subcloned into the pET-28a vector and the recombinant CdaYJ (rCdaYJ) was expressed in Escherichia coli BL21 (DE3). rCdaYJ showed a molecular weight of 43kDa, and exhibited deacetylation activity by using p-nitroacetanilide as substrate. The optimal pH and temperature of rCdaYJ were tested as pH7.4 and 28°C, respectively. The construction of metagenomic library of the Arctic deep-sea sediments provides us an opportunity to look into the microbial communities and exploiting valuable gene resources. A chitin deacetylase CdaYJ was identified from the library. It showed highest deacetylation activity under slight alkaline and low temperature conditions. CdaYJ might be a candidate chitin deacetylase that possesses industrial and pharmaceutical potentials. Copyright © 2016 Elsevier B.V. All rights reserved.
Zha, Wenjun; Zhou, Lei; Li, Sanhe; Liu, Kai; Yang, Guocai; Chen, Zhijun; Liu, Kai; Xu, Huashan; Li, Peide; Hussain, Saddam; You, Aiqing
2016-12-20
MicroRNAs (miRNAs) are a group of small RNAs involved in various biological processes through negative regulation of mRNAs at the post-transcriptional level. The brown planthopper (BPH), Nilaparvata lugens (Stål), is one of the most serious and destructive insect pests of rice. In the present study, two small RNA libraries of virulent N. lugens populations (Biotype I survives on susceptive rice variety TN1 and Biotype Y survives on moderately resistant rice variety YHY15) were constructed and sequenced using the high-throughput sequencing technology in order to identify the relationship between miRNAs of N.lugens and adaptation of BPH pests to rice resistance. In total 15,758,632 and 11,442,592 reads, corresponding to 3,144,026 and 2,550,049 unique sequences, were obtained in the two libraries (BPH-TN1 and BPH-YHY15 libraries), respectively. A total of 41 potential novel miRNAs were predicted in the two libraries, and 26 miRNAs showed significantly differential expression between two libraries. All miRNAs were significantly up-regulated in the BPH-TN1 library. Target genes likely regulated by these differentially expressed miRNAs were predicted using computational prediction. The functional annotation of target genes performed by Gene Ontology enrichment (GO) and Kyoto Encyclopedia of Genes and Genomes pathway analysis (KEGG) indicated that a majority of differential miRNAs were involved in "Metabolism" pathway. These results provided an understanding of the role of miRNAs in BPH to adaptability of BPH on rice resistance, and will be useful in developing new control strategies for host defense against BPH. Copyright © 2016 Elsevier B.V. All rights reserved.
Hou, Q; Chen, K; Shan, Z
2015-01-01
To construct the cDNA library of the ascites tumor cells of ovarian cancer, which can be used to screen the related antigen for the early diagnosis of ovarian cancer and therapeutic targets of immune treatment. Four cases of ovarian serous cystadenocarcinoma, two cases of ovarian mucinous cystadenocarcinoma, and two cases of ovarian endometrial carcinoma in patients with ascitic tumor cells which were used to construct the cDNA library. To screen the ovarian cancer antigen gene, evaluate the enzyme, and analyze nucleotide sequence, serological analysis of recombinant tumor cDNA expression libraries (SEREX) and suppression subtractive hybridization technique (SSH) techniques were utilized. The detection method of recombinant expression-based serological mini-arrays (SMARTA) was used to detect the ovarian cancer antigen and the positive reaction of 105 cases of ovarian cancer patients and 105 normal women's autoantibodies correspondingly in serum. After two rounds of serologic screening and glycosides sequencing analysis, 59 candidates of ovarian cancer antigen gene fragments were finally identified, which corresponded to 50 genes. They were then divided into six categories: (1) the homologous genes which related to the known ovarian cancer genes, such as BARD 1 gene, etc; (2) the homologous genes which were associated with other tumors, such as TM4SFI gene, etc; (3) the genes which were expressed in a special organization, such as ILF3, FXR1 gene, etc; (4) the genes which were the same with some protein genes of special function, such as TIZ, ClD gene; (5) the homologous genes which possessed the same source with embryonic genes, such as PKHD1 gene, etc; (6) the remaining genes were the unknown genes without the homologous sequence in the gene pool, such as OV-189 genes. SEREX technology combined with SSH method is an effective research strategy which can filter tumor antigen with high specific character; the corresponding autoantibodies of TM4SFl, ClD, TIZ, BARDI, FXRI, and OV-189 gene's recombinant antigen in serum can be regarded as the biomarkers which are used to diagnose ovarian cancer. The combination of multiple antigen detection can improve diagnostic efficiency.
Microbial communities established on Mont Blanc summit with Saharan dust deposition
NASA Astrophysics Data System (ADS)
Chuvochina, M.; Alekhina, I.; Normand, P.; Petit, J. R.; Bulat, S.
2009-04-01
Dust originating from the Sahara desert can be uplifted during storms, transported across the Mediterranean towards the Alpine region and deposited during snowfalls. The microbes associated with dust particles can be involved in establishing microbiota in icy environments as well as affect ecosystem and human health. Our objective was to use a culture-free DNA-based approach to assess bacterial content and diversity and furthermore, to identify ‘icy' microbes which could be brought on the Mont Blanc (MtBl) summit with Saharan dust and became living in the snow. Saharan dust fallout on MtBl summit from one event (MB5, event June 2006) vs. control libraries and that from another event (May 2008) were collected in Grenoble (SD, 200 m a.s.l.) and at Col du Dome (MB-SD, 4250 m a.s.l.). Soil from Ksar Ghilane (SS, Saharan desert, Tunisia, March 2008) was taken for overall comparison as a possible source population. Fresh snow falling in Grenoble (85) was collected as example of diversity in this area. To assess the microbial diversity 16S rRNA gene libraries (v3-v5 region) were constructed for corresponding dust-snow samples (MB5, SS, SD, 85 and MB-SD) along with clear snow samples and several controls. For both MB5 and MB-SD samples full-gene technique was evoked in attempt to differentiate reproduced bacteria from damaged DNA. Before sequencing the clones were rybotyped. All clone libraries were distinct in community composition except for some single phylotypes (or closely related groups) overlap. Thus, clone libraries from two different events that were collected at Col du Dome area within 2 year interval (MB5 and MB-SD) were different in community composition except one of the abundant phylotype from MB-SD library (Geodermatophilus sp.) which was shared (98% sequence similarity) with single representative from MB-5 library. These bacteria are pigmented and radiation-resistant, so it could be an indicator of desert origin for our sequences. For MB5 library two Deinococcus-Thermus phylotypes (46.2%) along with A-Proteobacteria and CFB were dominated while for MB-SD library two Actinobacteria phylotypes (29%) with another A-Proteobacteria phylotype were copious. This testify that two dust events are principally different in species composition meaning that any other events can also be different in microbes transferred from Sahara to MtBl summit. Of three other gene libraries (SD, SS and 85) selected as ‘supporting' gene pools for MB-SD two libraries (SD and 85) were strictly different from MB-SD while SS library showed two phylotype groups shared with MB-SD. Among them, one A-Proteobacteria phylotype have been detected in unrelated dust event (Polymenakou et al., 2008). It's worth noticing that one numerous A-Proteobacteria phylotype from MB5 library was closely related (97%) to another numerous phylotype from 85 (Grenoble snow fallout) library despite the events were separated geographically and split in time. Such phylotypes could be present in atmosphere elsewhere. Amongst microbes detected in MtBl dust layers libraries four separate minor phylotypes (three cyanobacteria and Deinococcus sp1) were found in MB5 and two minor phylotypes (uncultured actinobacteria, uncultured alphaproteobacteria) - in MB-SD which could be living (keep safe) in ice and snow. All of them were early discovered in cold ecosystems. Seems to be despite the dominant phylotypes recovered in both dust event gene libraries might have a high chance to be established in a snow as populations the minor phylotypes in the dust microbial load are more important in habiting the snow with dust-providing nutrients. In order to recover the cultures of ‘icy' microbes identified by sequence the attempt was done with Deinococci but we didn't succeed. The full gene and partial gene sub-libraries of the MB5 sample showed different results with respect to observed phylotypes. For example, ‘cold-loving' Cyanobacteria phylotypes were detected only in full gene library while one phylotype related to abundant Deinococcus sp was detected mostly by partial gene PCR approach. This provides extra evidence for microbes (Cyanobacteria spp.) which indeed can be living in snow that follows from their full-size gene sequences. When microbes are dead (DNA is degraded in snow with time like ancient DNA) someone has more possibility to be recovered by partial gene sequences. Several phylotypes related to human (Streptococcus sanguinis, Acinetobacter johnsonii and Helcococcus sp.) and plants pathogens (Sphingomonas melonis, Pseudomonas sp) were detected in association with studied dust events. Based on these primary results another view concerning the microbial transport from Sahara desert to the terrestrial glacier can be drawn. The dust fallout could provide just nutrients than dominant microbial seeds while minor phylotypes can be brought from elsewhere.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-06-24
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-01-01
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928
Quantitative analysis of a deeply sequenced marine microbial metatranscriptome.
Gifford, Scott M; Sharma, Shalabh; Rinta-Kanto, Johanna M; Moran, Mary Ann
2011-03-01
The potential of metatranscriptomic sequencing to provide insights into the environmental factors that regulate microbial activities depends on how fully the sequence libraries capture community expression (that is, sample-sequencing depth and coverage depth), and the sensitivity with which expression differences between communities can be detected (that is, statistical power for hypothesis testing). In this study, we use an internal standard approach to make absolute (per liter) estimates of transcript numbers, a significant advantage over proportional estimates that can be biased by expression changes in unrelated genes. Coastal waters of the southeastern United States contain 1 × 10(12) bacterioplankton mRNA molecules per liter of seawater (~200 mRNA molecules per bacterial cell). Even for the large bacterioplankton libraries obtained in this study (~500,000 possible protein-encoding sequences in each of two libraries after discarding rRNAs and small RNAs from >1 million 454 FLX pyrosequencing reads), sample-sequencing depth was only 0.00001%. Expression levels of 82 genes diagnostic for transformations in the marine nitrogen, phosphorus and sulfur cycles ranged from below detection (<1 × 10(6) transcripts per liter) for 36 genes (for example, phosphonate metabolism gene phnH, dissimilatory nitrate reductase subunit napA) to >2.7 × 10(9) transcripts per liter (ammonia transporter amt and ammonia monooxygenase subunit amoC). Half of the categories for which expression was detected, however, had too few copy numbers for robust statistical resolution, as would be required for comparative (experimental or time-series) expression studies. By representing whole community gene abundance and expression in absolute units (per volume or mass of environment), 'omics' data can be better leveraged to improve understanding of microbially mediated processes in the ocean.
Genes expressed during the development and ripening of watermelon fruit.
Levi, A; Davis, A; Hernandez, A; Wechter, P; Thimmapuram, J; Trebitsh, T; Tadmor, Y; Katzir, N; Portnoy, V; King, S
2006-11-01
A normalized cDNA library was constructed using watermelon flesh mRNA from three distinct developmental time-points and was subtracted by hybridization with leaf cDNA. Random cDNA clones of the watermelon flesh subtraction library were sequenced from the 5' end in order to identify potentially informative genes associated with fruit setting, development, and ripening. One-thousand and forty-six 5'-end sequences (expressed sequence tags; ESTs) were assembled into 832 non-redundant sequences, designated as "EST-unigenes". Of these 832 "EST-unigenes", 254 ( approximately 30%) have no significant homology to sequences published so far for other plant species. Additionally, 168 "EST-unigenes" ( approximately 20%) correspond to genes with unknown function, whereas 410 "EST-unigenes" ( approximately 50%) correspond to genes with known function in other plant species. These "EST-unigenes" are mainly associated with metabolism, membrane transport, cytoskeleton synthesis and structure, cell wall formation and cell division, signal transduction, nucleic acid binding and transcription factors, defense and stress response, and secondary metabolism. This study provides the scientific community with novel genetic information for watermelon as well as an expanded pool of genes associated with fruit development in watermelon. These genes will be useful targets in future genetic and functional genomic studies of watermelon and its development.
Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts.
Hirozane-Kishikawa, Tomoko; Shiraki, Toshiyuki; Waki, Kazunori; Nakamura, Mari; Arakawa, Takahiro; Kawai, Jun; Fagiolini, Michela; Hensch, Takao K; Hayashizaki, Yoshihide; Carninci, Piero
2003-09-01
The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
Zhang, Hongkai; Torkamani, Ali; Jones, Teresa M; Ruiz, Diana I; Pons, Jaume; Lerner, Richard A
2011-08-16
Use of large combinatorial antibody libraries and next-generation sequencing of nucleic acids are two of the most powerful methods in modern molecular biology. The libraries are screened using the principles of evolutionary selection, albeit in real time, to enrich for members with a particular phenotype. This selective process necessarily results in the loss of information about less-fit molecules. On the other hand, sequencing of the library, by itself, gives information that is mostly unrelated to phenotype. If the two methods could be combined, the full potential of very large molecular libraries could be realized. Here we report the implementation of a phenotype-information-phenotype cycle that integrates information and gene recovery. After selection for phage-encoded antibodies that bind to targets expressed on the surface of Escherichia coli, the information content of the selected pool is obtained by pyrosequencing. Sequences that encode specific antibodies are identified by a bioinformatic analysis and recovered by a stringent affinity method that is uniquely suited for gene isolation from a highly degenerate collection of nucleic acids. This approach can be generalized for selection of antibodies against targets that are present as minor components of complex systems.
Hussey, Richard S; Huang, Guozhong; Allen, Rex
2011-01-01
Identifying parasitism genes encoding proteins secreted from a plant-parasitic nematode's esophageal gland cells and injected through its stylet into plant tissue is the key to understanding the molecular basis of nematode parasitism of plants. Parasitism genes have been cloned by directly microaspirating the cytoplasm from the esophageal gland cells of different parasitic stages of cyst or root-knot nematodes to provide mRNA to create a gland cell-specific cDNA library by long-distance reverse-transcriptase polymerase chain reaction. cDNA clones are sequenced and deduced protein sequences with a signal peptide for secretion are identified for high-throughput in situ hybridization to confirm gland-specific expression.
Robust Sub-nanomolar Library Preparation for High Throughput Next Generation Sequencing.
Wu, Wells W; Phue, Je-Nie; Lee, Chun-Ting; Lin, Changyi; Xu, Lai; Wang, Rong; Zhang, Yaqin; Shen, Rong-Fong
2018-05-04
Current library preparation protocols for Illumina HiSeq and MiSeq DNA sequencers require ≥2 nM initial library for subsequent loading of denatured cDNA onto flow cells. Such amounts are not always attainable from samples having a relatively low DNA or RNA input; or those for which a limited number of PCR amplification cycles is preferred (less PCR bias and/or more even coverage). A well-tested sub-nanomolar library preparation protocol for Illumina sequencers has however not been reported. The aim of this study is to provide a much needed working protocol for sub-nanomolar libraries to achieve outcomes as informative as those obtained with the higher library input (≥ 2 nM) recommended by Illumina's protocols. Extensive studies were conducted to validate a robust sub-nanomolar (initial library of 100 pM) protocol using PhiX DNA (as a control), genomic DNA (Bordetella bronchiseptica and microbial mock community B for 16S rRNA gene sequencing), messenger RNA, microRNA, and other small noncoding RNA samples. The utility of our protocol was further explored for PhiX library concentrations as low as 25 pM, which generated only slightly fewer than 50% of the reads achieved under the standard Illumina protocol starting with > 2 nM. A sub-nanomolar library preparation protocol (100 pM) could generate next generation sequencing (NGS) results as robust as the standard Illumina protocol. Following the sub-nanomolar protocol, libraries with initial concentrations as low as 25 pM could also be sequenced to yield satisfactory and reproducible sequencing results.
Comparison of large-insert, small-insert and pyrosequencing libraries for metagenomic analysis.
Danhorn, Thomas; Young, Curtis R; DeLong, Edward F
2012-11-01
The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary 'next-generation' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world.
Bellerophon: A program to detect chimeric sequences in multiple sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip
2003-12-23
Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.
Takahara, Hiroyuki; Dolf, Andreas; Endl, Elmar; O'Connell, Richard
2009-08-01
Generation of stage-specific cDNA libraries is a powerful approach to identify pathogen genes that are differentially expressed during plant infection. Biotrophic pathogens develop specialized infection structures inside living plant cells, but sampling the transcriptome of these structures is problematic due to the low ratio of fungal to plant RNA, and the lack of efficient methods to isolate them from infected plants. Here we established a method, based on fluorescence-activated cell sorting (FACS), to purify the intracellular biotrophic hyphae of Colletotrichum higginsianum from homogenates of infected Arabidopsis leaves. Specific selection of viable hyphae using a fluorescent vital marker provided intact RNA for cDNA library construction. Pilot-scale sequencing showed that the library was enriched with plant-induced and pathogenicity-related fungal genes, including some encoding small, soluble secreted proteins that represent candidate fungal effectors. The high purity of the hyphae (94%) prevented contamination of the library by sequences derived from host cells or other fungal cell types. RT-PCR confirmed that genes identified in the FACS-purified hyphae were also expressed in planta. The method has wide applicability for isolating the infection structures of other plant pathogens, and will facilitate cell-specific transcriptome analysis via deep sequencing and microarray hybridization, as well as proteomic analyses.
Metagenomic Analysis of Viral Communities in (Hado)Pelagic Sediments
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 106 to 1011 viruses/cm3 of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24−30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10−3 in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95−99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses. PMID:23468952
Metagenomic analysis of viral communities in (hado)pelagic sediments.
Yoshida, Mitsuhiro; Takaki, Yoshihiro; Eitoku, Masamitsu; Nunoura, Takuro; Takai, Ken
2013-01-01
In this study, we analyzed viral metagenomes (viromes) in the sedimentary habitats of three geographically and geologically distinct (hado)pelagic environments in the northwest Pacific; the Izu-Ogasawara Trench (water depth = 9,760 m) (OG), the Challenger Deep in the Mariana Trench (10,325 m) (MA), and the forearc basin off the Shimokita Peninsula (1,181 m) (SH). Virus abundance ranged from 10(6) to 10(11) viruses/cm(3) of sediments (down to 30 cm below the seafloor [cmbsf]). We recovered viral DNA assemblages (viromes) from the (hado)pelagic sediment samples and obtained a total of 37,458, 39,882, and 70,882 sequence reads by 454 GS FLX Titanium pyrosequencing from the virome libraries of the OG, MA, and SH (hado)pelagic sediments, respectively. Only 24-30% of the sequence reads from each virome library exhibited significant similarities to the sequences deposited in the public nr protein database (E-value <10(-3) in BLAST). Among the sequences identified as potential viral genes based on the BLAST search, 95-99% of the sequence reads in each library were related to genes from single-stranded DNA (ssDNA) viral families, including Microviridae, Circoviridae, and Geminiviridae. A relatively high abundance of sequences related to the genetic markers (major capsid protein [VP1] and replication protein [Rep]) of two ssDNA viral groups were also detected in these libraries, thereby revealing a high genotypic diversity of their viruses (833 genotypes for VP1 and 2,551 genotypes for Rep). A majority of the viral genes predicted from each library were classified into three ssDNA viral protein categories: Rep, VP1, and minor capsid protein. The deep-sea sedimentary viromes were distinct from the viromes obtained from the oceanic and fresh waters and marine eukaryotes, and thus, deep-sea sediments harbor novel viromes, including previously unidentified ssDNA viruses.
Update of the Diatom EST Database: a new tool for digital transcriptomics
Maheswari, Uma; Mock, Thomas; Armbrust, E. Virginia; Bowler, Chris
2009-01-01
The Diatom Expressed Sequence Tag (EST) Database was constructed to provide integral access to ESTs from these ecologically and evolutionarily interesting microalgae. It has now been updated with 130 000 Phaeodactylum tricornutum ESTs from 16 cDNA libraries and 77 000 Thalassiosira pseudonana ESTs from seven libraries, derived from cells grown in different nutrient and stress regimes. The updated relational database incorporates results from statistical analyses such as log-likelihood ratios and hierarchical clustering, which help to identify differentially expressed genes under different conditions, and allow similarities in gene expression in different libraries to be investigated in a functional context. The database also incorporates links to the recently sequenced genomes of P. tricornutum and T. pseudonana, enabling an easy cross-talk between the expression pattern of diatom orthologs and the genome browsers. These improvements will facilitate exploration of diatom responses to conditions of ecological relevance and will aid gene function identification of diatom-specific genes and in silico gene prediction in this largely unexplored class of eukaryotes. The updated Diatom EST Database is available at http://www.biologie.ens.fr/diatomics/EST3. PMID:19029140
Constructing and detecting a cDNA library for mites.
Hu, Li; Zhao, YaE; Cheng, Juan; Yang, YuanJun; Li, Chen; Lu, ZhaoHui
2015-10-01
RNA extraction and construction of complementary DNA (cDNA) library for mites have been quite challenging due to difficulties in acquiring tiny living mites and breaking their hard chitin. The present study is to explore a better method to construct cDNA library for mites that will lay the foundation on transcriptome and molecular pathogenesis research. We selected Psoroptes cuniculi as an experimental subject and took the following steps to construct and verify cDNA library. First, we combined liquid nitrogen grinding with TRIzol for total RNA extraction. Then, switching mechanism at 5' end of the RNA transcript (SMART) technique was used to construct full-length cDNA library. To evaluate the quality of cDNA library, the library titer and recombination rate were calculated. The reliability of cDNA library was detected by sequencing and analyzing positive clones and genes amplified by specific primers. The results showed that the RNA concentration was 836 ng/μl and the absorbance ratio at 260/280 nm was 1.82. The library titer was 5.31 × 10(5) plaque-forming unit (PFU)/ml and the recombination rate was 98.21%, indicating that the library was of good quality. In the 33 expressed sequence tags (ESTs) of P. cuniculi, two clones of 1656 and 1658 bp were almost identical with only three variable sites detected, which had an identity of 99.63% with that of Psoroptes ovis, indicating that the cDNA library was reliable. Further detection by specific primers demonstrated that the 553-bp Pso c II gene sequences of P. cuniculi had an identity of 98.56% with those of P. ovis, confirming that the cDNA library was not only reliable but also feasible.
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.
Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal
2010-01-01
Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
NASA Astrophysics Data System (ADS)
Yu, Jianzhong; Ma, Xiaolei; Pan, Kehou; Yang, Guanpin; Yu, Wengong
2010-07-01
We constructed and characterized a normalized cDNA library of Nannochloropsis oculata CS-179, and obtained 905 nonredundant sequences (NRSs) ranging from 431-1 756 bp in length. Among them, 496 were very similar to nonredundant ones in the GenBank ( E ≤1.0e-05), and 349 ESTs had significant hits with the clusters of eukaryotic orthologous groups (KOG). Bases G and/or C at the third position of codons of 14 amino acid residues suggested a strong bias in the conserved domain of 362 NRSs (>60%). We also identified the unigenes encoding phosphorus and nitrogen transporters, suggesting that N. oculata could efficiently transport and metabolize phosphorus and nitrogen, and recognized the unigenes that involved in biosynthesis and storage of both fatty acids and polyunsaturated fatty acids (PUFAs), which will facilitate the demonstration of eicosapentaenoic acid (EPA) biosynthesis pathway of N. oculata. In comparison with the original cDNA library, the normalized library significantly increased the efficiencies of random sequencing and rarely expressed genes discovering, and decreased the frequency of abundant gene sequences.
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library
Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul
2005-01-01
The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
2011-01-01
Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389
Oshiki, Mamoru; Segawa, Takahiro; Ishii, Satoshi
2018-02-02
Various microorganisms play key roles in the Nitrogen (N) cycle. Quantitative PCR (qPCR) and PCR-amplicon sequencing of the N cycle functional genes allow us to analyze the abundance and diversity of microbes responsible in the N transforming reactions in various environmental samples. However, analysis of multiple target genes can be cumbersome and expensive. PCR-independent analysis, such as metagenomics and metatranscriptomics, is useful but expensive especially when we analyze multiple samples and try to detect N cycle functional genes present at relatively low abundance. Here, we present the application of microfluidic qPCR chip technology to simultaneously quantify and prepare amplicon sequence libraries for multiple N cycle functional genes as well as taxon-specific 16S rRNA gene markers for many samples. This approach, named as N cycle evaluation (NiCE) chip, was evaluated by using DNA from pure and artificially mixed bacterial cultures and by comparing the results with those obtained by conventional qPCR and amplicon sequencing methods. Quantitative results obtained by the NiCE chip were comparable to those obtained by conventional qPCR. In addition, the NiCE chip was successfully applied to examine abundance and diversity of N cycle functional genes in wastewater samples. Although non-specific amplification was detected on the NiCE chip, this could be overcome by optimizing the primer sequences in the future. As the NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes, this tool should advance our ability to explore N cycling in various samples. Importance. We report a novel approach, namely Nitrogen Cycle Evaluation (NiCE) chip by using microfluidic qPCR chip technology. By sequencing the amplicons recovered from the NiCE chip, we can assess diversities of the N cycle functional genes. The NiCE chip technology is applicable to analyze the temporal dynamics of the N cycle gene transcriptions in wastewater treatment bioreactors. The NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes. While there is a room for future improvement, this tool should significantly advance our ability to explore the N cycle in various environmental samples. Copyright © 2018 American Society for Microbiology.
Model-based reconstruction of synthetic promoter library in Corynebacterium glutamicum.
Zhang, Shuanghong; Liu, Dingyu; Mao, Zhitao; Mao, Yufeng; Ma, Hongwu; Chen, Tao; Zhao, Xueming; Wang, Zhiwen
2018-05-01
To develop an efficient synthetic promoter library for fine-tuned expression of target genes in Corynebacterium glutamicum. A synthetic promoter library for C. glutamicum was developed based on conserved sequences of the - 10 and - 35 regions. The synthetic promoter library covered a wide range of strengths, ranging from 1 to 193% of the tac promoter. 68 promoters were selected and sequenced for correlation analysis between promoter sequence and strength with a statistical model. A new promoter library was further reconstructed with improved promoter strength and coverage based on the results of correlation analysis. Tandem promoter P70 was finally constructed with increased strength by 121% over the tac promoter. The promoter library developed in this study showed a great potential for applications in metabolic engineering and synthetic biology for the optimization of metabolic networks. To the best of our knowledge, this is the first reconstruction of synthetic promoter library based on statistical analysis of C. glutamicum.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon
2011-01-01
Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Sekar, Raju; Mills, DeEtta K.; Remily, Elizabeth R.; Voss, Joshua D.; Richardson, Laurie L.
2006-01-01
Microbial community profiles and species composition associated with two black band-diseased colonies of the coral Siderastrea siderea were studied by 16S rRNA-targeted gene cloning, sequencing, and amplicon-length heterogeneity PCR (LH-PCR). Bacterial communities associated with the surface mucopolysaccharide layer (SML) of apparently healthy tissues of the infected colonies, together with samples of the black band disease (BBD) infections, were analyzed using the same techniques for comparison. Gene sequences, ranging from 424 to 1,537 bp, were retrieved from all positive clones (n = 43 to 48) in each of the four clone libraries generated and used for comparative sequence analysis. In addition to LH-PCR community profiling, all of the clone sequences were aligned with LH-PCR primer sequences, and the theoretical lengths of the amplicons were determined. Results revealed that the community profiles were significantly different between BBD and SML samples. The SML samples were dominated by γ-proteobacteria (53 to 64%), followed by β-proteobacteria (18 to 21%) and α-proteobacteria (5 to 11%). In contrast, both BBD clone libraries were dominated by α-proteobacteria (58 to 87%), followed by verrucomicrobia (2 to 10%) and 0 to 6% each of δ-proteobacteria, bacteroidetes, firmicutes, and cyanobacteria. Alphaproteobacterial sequence types related to the bacteria associated with toxin-producing dinoflagellates were observed in BBD clone libraries but were not found in the SML libraries. Similarly, sequences affiliated with the family Desulfobacteraceae and toxin-producing cyanobacteria, both believed to be involved in BBD pathogenesis, were found only in BBD libraries. These data provide evidence for an association of numerous toxin-producing heterotrophic microorganisms with BBD of corals. PMID:16957217
Creating a RAW264.7 CRISPR-Cas9 Genome Wide Library
Napier, Brooke A; Monack, Denise M
2017-01-01
The bacterial clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 genome editing tools are used in mammalian cells to knock-out specific genes of interest to elucidate gene function. The CRISPR-Cas9 system requires that the mammalian cell expresses Cas9 endonuclease, guide RNA (gRNA) to lead the endonuclease to the gene of interest, and the PAM sequence that links the Cas9 to the gRNA. CRISPR-Cas9 genome wide libraries are used to screen the effect of each gene in the genome on the cellular phenotype of interest, in an unbiased high-throughput manner. In this protocol, we describe our method of creating a CRISPR-Cas9 genome wide library in a transformed murine macrophage cell-line (RAW264.7). We have employed this library to identify novel mediators in the caspase-11 cell death pathway (Napier et al., 2016); however, this library can then be used to screen the importance of specific genes in multiple murine macrophage cellular pathways. PMID:28868328
Genomic resources for Myzus persicae: EST sequencing, SNP identification, and microarray design
Ramsey, John S; Wilson, Alex CC; de Vos, Martin; Sun, Qi; Tamborindeguy, Cecilia; Winfield, Agnese; Malloch, Gaynor; Smith, Dawn M; Fenton, Brian; Gray, Stewart M; Jander, Georg
2007-01-01
Background The green peach aphid, Myzus persicae (Sulzer), is a world-wide insect pest capable of infesting more than 40 plant families, including many crop species. However, despite the significant damage inflicted by M. persicae in agricultural systems through direct feeding damage and by its ability to transmit plant viruses, limited genomic information is available for this species. Results Sequencing of 16 M. persicae cDNA libraries generated 26,669 expressed sequence tags (ESTs). Aphids for library construction were raised on Arabidopsis thaliana, Nicotiana benthamiana, Brassica oleracea, B. napus, and Physalis floridana (with and without Potato leafroll virus infection). The M. persicae cDNA libraries include ones made from sexual and asexual whole aphids, guts, heads, and salivary glands. In silico comparison of cDNA libraries identified aphid genes with tissue-specific expression patterns, and gene expression that is induced by feeding on Nicotiana benthamiana. Furthermore, 2423 genes that are novel to science and potentially aphid-specific were identified. Comparison of cDNA data from three aphid lineages identified single nucleotide polymorphisms that can be used as genetic markers and, in some cases, may represent functional differences in the protein products. In particular, non-conservative amino acid substitutions in a highly expressed gut protease may be of adaptive significance for M. persicae feeding on different host plants. The Agilent eArray platform was used to design an M. persicae oligonucleotide microarray representing over 10,000 unique genes. Conclusion New genomic resources have been developed for M. persicae, an agriculturally important insect pest. These include previously unknown sequence data, a collection of expressed genes, molecular markers, and a DNA microarray that can be used to study aphid gene expression. These resources will help elucidate the adaptations that allow M. persicae to develop compatible interactions with its host plants, complementing ongoing work illuminating plant molecular responses to phloem-feeding insects. PMID:18021414
Computational annotation of genes differentially expressed along olive fruit development
Galla, Giulio; Barcaccia, Gianni; Ramina, Angelo; Collani, Silvio; Alagna, Fiammetta; Baldoni, Luciana; Cultrera, Nicolò GM; Martinelli, Federico; Sebastiani, Luca; Tonutti, Pietro
2009-01-01
Background Olea europaea L. is a traditional tree crop of the Mediterranean basin with a worldwide economical high impact. Differently from other fruit tree species, little is known about the physiological and molecular basis of the olive fruit development and a few sequences of genes and gene products are available for olive in public databases. This study deals with the identification of large sets of differentially expressed genes in developing olive fruits and the subsequent computational annotation by means of different software. Results mRNA from fruits of the cv. Leccino sampled at three different stages [i.e., initial fruit set (stage 1), completed pit hardening (stage 2) and veraison (stage 3)] was used for the identification of differentially expressed genes putatively involved in main processes along fruit development. Four subtractive hybridization libraries were constructed: forward and reverse between stage 1 and 2 (libraries A and B), and 2 and 3 (libraries C and D). All sequenced clones (1,132 in total) were analyzed through BlastX against non-redundant NCBI databases and about 60% of them showed similarity to known proteins. A total of 89 out of 642 differentially expressed unique sequences was further investigated by Real-Time PCR, showing a validation of the SSH results as high as 69%. Library-specific cDNA repertories were annotated according to the three main vocabularies of the gene ontology (GO): cellular component, biological process and molecular function. BlastX analysis, GO terms mapping and annotation analysis were performed using the Blast2GO software, a research tool designed with the main purpose of enabling GO based data mining on sequence sets for which no GO annotation is yet available. Bioinformatic analysis pointed out a significantly different distribution of the annotated sequences for each GO category, when comparing the three fruit developmental stages. The olive fruit-specific transcriptome dataset was used to query all known KEGG (Kyoto Encyclopaedia of Genes and Genomes) metabolic pathways for characterizing and positioning retrieved EST records. The integration of the olive sequence datasets within the MapMan platform for microarray analysis allowed the identification of specific biosynthetic pathways useful for the definition of key functional categories in time course analyses for gene groups. Conclusion The bioinformatic annotation of all gene sequences was useful to shed light on metabolic pathways and transcriptional aspects related to carbohydrates, fatty acids, secondary metabolites, transcription factors and hormones as well as response to biotic and abiotic stresses throughout olive drupe development. These results represent a first step toward both functional genomics and systems biology research for understanding the gene functions and regulatory networks in olive fruit growth and ripening. PMID:19852839
Schallmey, Marcus; Ly, Anh; Wang, Chunxia; Meglei, Gabriela; Voget, Sonja; Streit, Wolfgang R; Driscoll, Brian T; Charles, Trevor C
2011-08-01
We previously reported the construction of metagenomic libraries in the IncP cosmid vector pRK7813, enabling heterologous expression of these broad-host-range libraries in multiple bacterial hosts. Expressing these libraries in Sinorhizobium meliloti, we have successfully complemented associated phenotypes of polyhydroxyalkanoate synthesis mutants. DNA sequence analysis of three clones indicates that the complementing genes are homologous to, but substantially different from, known polyhydroxyalkanaote synthase-encoding genes. Thus we have demonstrated the ability to isolate diverse genes for polyhydroxyalkanaote synthesis by functional complementation of defined mutants. Such genes might be of use in the engineering of more efficient systems for the industrial production of bioplastics. The use of functional complementation will also provide a vehicle to probe the genetics of polyhydroxyalkanaote metabolism and its relation to carbon availability in complex microbial assemblages. 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Bacterial diversity of Taxus rhizosphere: culture-independent and culture-dependent approaches.
Hao, Da Cheng; Ge, Guang Bo; Yang, Ling
2008-07-01
The regional variability of Taxus rhizosphere bacterial community composition and diversity was studied by comparative analysis of three large 16S rRNA gene clone libraries from the Taxus rhizosphere in different regions of China (subtropical and temperate regions). One hundred and forty-six clones were screened for three libraries. Phylogenetic analysis of 16S rRNA gene sequences demonstrated that the abundance of sequences affiliated with Gammaproteobacteria, Betaproteobacteria, and Actinobacteria was higher in the library from the T. xmedia rhizosphere of the temperate region compared with the subtropical Taxus mairei rhizosphere. On the other hand, Acidobacteria was more abundant in libraries from the subtropical Taxus mairei rhizosphere. Richness estimates and diversity indices of three libraries revealed major differences, indicating a higher richness in the Taxus rhizosphere bacterial communities of the subtropical region and considerable variability in the bacterial community composition within this region. By enrichment culture, a novel Actinobacteria strain DICP16 was isolated from the T. xmedia rhizosphere of the temperate region and was identified as Leifsonia shinshuensis sp. via 16S rRNA gene and gyrase B sequence analyses. DICP16 was able to remove the xylosyl group from 7-xylosyl-10-deacetylbaccatin III and 7-xylosyl-10-deacetylpaclitaxel, thereby making the xylosyltaxanes available as sources of 10-deacetylbaccatin III and the anticancer drug paclitaxel. Taken together, the present studies provide, for the first time, the knowledge of the biodiversity of microorganisms populating Taxus rhizospheres.
Begin at the beginning: A BAC-end view of the passion fruit (Passiflora) genome.
Santos, Anselmo Azevedo; Penha, Helen Alves; Bellec, Arnaud; Munhoz, Carla de Freitas; Pedrosa-Harand, Andrea; Bergès, Hélène; Vieira, Maria Lucia Carneiro
2014-09-26
The passion fruit (Passiflora edulis) is a tropical crop of economic importance both for juice production and consumption as fresh fruit. The juice is also used in concentrate blends that are consumed worldwide. However, very little is known about the genome of the species. Therefore, improving our understanding of passion fruit genomics is essential and to some degree a pre-requisite if its genetic resources are to be used more efficiently. In this study, we have constructed a large-insert BAC library and provided the first view on the structure and content of the passion fruit genome, using BAC-end sequence (BES) data as a major resource. The library consisted of 82,944 clones and its levels of organellar DNA were very low. The library represents six haploid genome equivalents, and the average insert size was 108 kb. To check its utility for gene isolation, successful macroarray screening experiments were carried out with probes complementary to eight Passiflora gene sequences available in public databases. BACs harbouring those genes were used in fluorescent in situ hybridizations and unique signals were detected for four BACs in three chromosomes (n=9). Then, we explored 10,000 BES and we identified reads likely to contain repetitive mobile elements (19.6% of all BES), simple sequence repeats and putative proteins, and to estimate the GC content (~42%) of the reads. Around 9.6% of all BES were found to have high levels of similarity to plant genes and ontological terms were assigned to more than half of the sequences analysed (940). The vast majority of the top-hits made by our sequences were to Populus trichocarpa (24.8% of the total occurrences), Theobroma cacao (21.6%), Ricinus communis (14.3%), Vitis vinifera (6.5%) and Prunus persica (3.8%). We generated the first large-insert library for a member of Passifloraceae. This BAC library provides a new resource for genetic and genomic studies, as well as it represents a valuable tool for future whole genome study. Remarkably, a number of BAC-end pair sequences could be mapped to intervals of the sequenced Arabidopsis thaliana, V. vinifera and P. trichocarpa chromosomes, and putative collinear microsyntenic regions were identified.
Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu
2014-01-01
Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361
Asamizu, E; Nakamura, Y; Sato, S; Tabata, S
2000-06-30
For comprehensive analysis of genes expressed in the model dicotyledonous plant, Arabidopsis thaliana, expressed sequence tags (ESTs) were accumulated. Normalized and size-selected cDNA libraries were constructed from aboveground organs, flower buds, roots, green siliques and liquid-cultured seedlings, respectively, and a total of 14,026 5'-end ESTs and 39,207 3'-end ESTs were obtained. The 3'-end ESTs could be clustered into 12,028 non-redundant groups. Similarity search of the non-redundant ESTs against the public non-redundant protein database indicated that 4816 groups show similarity to genes of known function, 1864 to hypothetical genes, and the remaining 5348 are novel sequences. Gene coverage by the non-redundant ESTs was analyzed using the annotated genomic sequences of approximately 10 Mb on chromosomes 3 and 5. A total of 923 regions were hit by at least one EST, among which only 499 regions were hit by the ESTs deposited in the public database. The result indicates that the EST source generated in this project complements the EST data in the public database and facilitates new gene discovery.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hadano, S.; Ishida, Y.; Tomiyasu, H.
1994-09-01
To complete a transcription map of the 1 Mb region in human chromosome 4p16.3 containing the Huntington disease (HD) gene, the isolation of cDNA clones are being performed throughout. Our method relies on a direct screening of the cDNA libraries probed with single copy microclones from 3 YAC clones spanning 1 Mbp of the HD gene region. AC-DNAs were isolated by a preparative pulsed-field gel electrophoresis, amplified by both a single unique primer (SUP)-PCR and a linker ligation PCR, and 6 microclone-DNA libraries were generated. Then, 8,640 microclones from these libraries were independently amplified by PCR, and arrayed onto themore » membranes. 800-900 microclones that were not cross-hybridized with total human and yeast genomic DNA, TAC vector DNA, and ribosomal cDNA on a dot hybridization (putatively carrying single copy sequences) were pooled to make 9 probe pools. A total of {approximately}1.8x10{sup 7} plaques from the human brain cDNA libraries was screened with 9 pool-probes, and then 672 positive cDNA clones were obtained. So far, 597 cDNA clones were defined and arrayed onto a map of the 1 Mbp of the HD gene region by hybridization with HD region-specific cosmid contigs and YAC clones. Further characterization including a DNA sequencing and Northern blot analysis is currently underway.« less
Le Chevanton, L; Leblon, G
1989-04-15
We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.
MytiBase: a knowledgebase of mussel (M. galloprovincialis) transcribed sequences
Venier, Paola; De Pittà, Cristiano; Bernante, Filippo; Varotto, Laura; De Nardi, Barbara; Bovo, Giuseppe; Roch, Philippe; Novoa, Beatriz; Figueras, Antonio; Pallavicini, Alberto; Lanfranchi, Gerolamo
2009-01-01
Background Although Bivalves are among the most studied marine organisms due to their ecological role, economic importance and use in pollution biomonitoring, very little information is available on the genome sequences of mussels. This study reports the functional analysis of a large-scale Expressed Sequence Tag (EST) sequencing from different tissues of Mytilus galloprovincialis (the Mediterranean mussel) challenged with toxic pollutants, temperature and potentially pathogenic bacteria. Results We have constructed and sequenced seventeen cDNA libraries from different Mediterranean mussel tissues: gills, digestive gland, foot, anterior and posterior adductor muscle, mantle and haemocytes. A total of 24,939 clones were sequenced from these libraries generating 18,788 high-quality ESTs which were assembled into 2,446 overlapping clusters and 4,666 singletons resulting in a total of 7,112 non-redundant sequences. In particular, a high-quality normalized cDNA library (Nor01) was constructed as determined by the high rate of gene discovery (65.6%). Bioinformatic screening of the non-redundant M. galloprovincialis sequences identified 159 microsatellite-containing ESTs. Clusters, consensuses, related similarities and gene ontology searches have been organized in a dedicated, searchable database . Conclusion We defined the first species-specific catalogue of M. galloprovincialis ESTs including 7,112 unique transcribed sequences. Putative microsatellite markers were identified. This annotated catalogue represents a valuable platform for expression studies, marker validation and genetic linkage analysis for investigations in the biology of Mediterranean mussels. PMID:19203376
Using the TIGR gene index databases for biological discovery.
Lee, Yuandan; Quackenbush, John
2003-11-01
The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
Bentley, L; Fehrsen, J; Jordaan, F; Huismans, H; du Plessis, D H
2000-04-01
VP2 is an outer capsid protein of African horsesickness virus (AHSV) and is recognized by serotype-discriminatory neutralizing antibodies. With the objective of locating its antigenic regions, a filamentous phage library was constructed that displayed peptides derived from the fragmentation of a cDNA copy of the gene encoding VP2. Peptides ranging in size from approximately 30 to 100 amino acids were fused with pIII, the attachment protein of the display vector, fUSE2. To ensure maximum diversity, the final library consisted of three sub-libraries. The first utilized enzymatically fragmented DNA encoding only the VP2 gene, the second included plasmid sequences, while the third included a PCR step designed to allow different peptide-encoding sequences to recombine before ligation into the vector. The resulting composite library was subjected to immunoaffinity selection with AHSV-specific polyclonal chicken IgY, polyclonal horse immunoglobulins and a monoclonal antibody (MAb) known to neutralize AHSV. Antigenic peptides were located by sequencing the DNA of phages bound by the antibodies. Most antigenic determinants capable of being mapped by this method were located in the N-terminal half of VP2. Important binding areas were mapped with high resolution by identifying the minimum overlapping areas of the selected peptides. The MAb was also used to screen a random 17-mer epitope library. Sequences that may be part of a discontinuous neutralization epitope were identified. The amino acid sequences of the antigenic regions on VP2 of serotype 3 were compared with corresponding regions on three other serotypes, revealing regions with the potential to discriminate AHSV serotypes serologically.
Booman, Marije; Borza, Tudor; Feng, Charles Y; Hori, Tiago S; Higgins, Brent; Culf, Adrian; Léger, Daniel; Chute, Ian C; Belkaid, Anissa; Rise, Marlies; Gamperl, A Kurt; Hubert, Sophie; Kimball, Jennifer; Ouellette, Rodney J; Johnson, Stewart C; Bowman, Sharen; Rise, Matthew L
2011-08-01
The collapse of Atlantic cod (Gadus morhua) wild populations strongly impacted the Atlantic cod fishery and led to the development of cod aquaculture. In order to improve aquaculture and broodstock quality, we need to gain knowledge of genes and pathways involved in Atlantic cod responses to pathogens and other stressors. The Atlantic Cod Genomics and Broodstock Development Project has generated over 150,000 expressed sequence tags from 42 cDNA libraries representing various tissues, developmental stages, and stimuli. We used this resource to develop an Atlantic cod oligonucleotide microarray containing 20,000 unique probes. Selection of sequences from the full range of cDNA libraries enables application of the microarray for a broad spectrum of Atlantic cod functional genomics studies. We included sequences that were highly abundant in suppression subtractive hybridization (SSH) libraries, which were enriched for transcripts responsive to pathogens or other stressors. These sequences represent genes that potentially play an important role in stress and/or immune responses, making the microarray particularly useful for studies of Atlantic cod gene expression responses to immune stimuli and other stressors. To demonstrate its value, we used the microarray to analyze the Atlantic cod spleen response to stimulation with formalin-killed, atypical Aeromonas salmonicida, resulting in a gene expression profile that indicates a strong innate immune response. These results were further validated by quantitative PCR analysis and comparison to results from previous analysis of an SSH library. This study shows that the Atlantic cod 20K oligonucleotide microarray is a valuable new tool for Atlantic cod functional genomics research.
A database of annotated tentative orthologs from crop abiotic stress transcripts.
Balaji, Jayashree; Crouch, Jonathan H; Petite, Prasad V N S; Hoisington, David A
2006-10-07
A minimal requirement to initiate a comparative genomics study on plant responses to abiotic stresses is a dataset of orthologous sequences. The availability of a large amount of sequence information, including those derived from stress cDNA libraries allow for the identification of stress related genes and orthologs associated with the stress response. Orthologous sequences serve as tools to explore genes and their relationships across species. For this purpose, ESTs from stress cDNA libraries across 16 crop species including 6 important cereal crops and 10 dicots were systematically collated and subjected to bioinformatics analysis such as clustering, grouping of tentative orthologous sets, identification of protein motifs/patterns in the predicted protein sequence, and annotation with stress conditions, tissue/library source and putative function. All data are available to the scientific community at http://intranet.icrisat.org/gt1/tog/homepage.htm. We believe that the availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics.
Abécassis, V; Pompon, D; Truan, G
2000-10-15
The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
Screening and analyzing genes associated with Amur tiger placental development.
Li, Q; Lu, T F; Liu, D; Hu, P F; Sun, B; Ma, J Z; Wang, W J; Wang, K F; Zhang, W X; Chen, J; Guan, W J; Ma, Y H; Zhang, M H
2014-09-26
The Amur tiger is a unique endangered species in the world, and thus, protection of its genetic resources is extremely important. In this study, an Amur tiger placenta cDNA library was constructed using the SMART cDNA Library Construction kit. A total of 508 colonies were sequenced, in which 205 (76%) genes were annotated and mapped to 74 KEGG pathways, including 29 metabolism, 29 genetic information processing, 4 environmental information processing, 7 cell motility, and 5 organismal system pathways. Additionally, PLAC8, PEG10 and IGF-II were identified after screening genes from the expressed sequence tags, and they were associated with placental development. These findings could lay the foundation for future functional genomic studies of the Amur tiger.
Chen, Bo-Ruei; Hale, Devin C; Ciolek, Peter J; Runge, Kurt W
2012-05-03
Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches.
Yao, Lin; Yang, Qian; Song, Jinzhu; Tan, Chong; Guo, Changhong; Wang, Li; Qu, Lianhai; Wang, Yun
2013-04-01
Trichoderma harzianum 88, a filamentous soil fungus, is an effective biocontrol agent against several plant pathogens. High-throughput sequencing was used here to study the mycoparasitism mechanisms of T. harzianum 88. Plate confrontation tests of T. harzianum 88 against plant pathogens were conducted, and a cDNA library was constructed from T. harzianum 88 mycelia in the presence of plant pathogen cell walls. Randomly selected transcripts from the cDNA library were compared with eukaryotic plant and fungal genomes. Of the 1,386 transcripts sequenced, the most abundant Gene Ontology (GO) classification group was "physiological process". Differential expression of 19 genes was confirmed by real-time RT-PCR at different mycoparasitism stages against plant pathogens. Gene expression analysis revealed the transcription of various genes involved in mycoparasitism of T. harzianum 88. Our study provides helpful insights into the mechanisms of T. harzianum 88-plant pathogen interactions.
Xie, Wen; Yang, Xin; Wang, Shao-Ii; Wu, Qing-jun; Yang, Ni-na; Li, Ru-mei; Jiao, Xiaoguo; Pan, Hui-peng; Liu, Bai-ming; Feng, Yun-tao; Xu, Bao-yun; Zhou, Xu-guo; Zhang, You-jun
2012-01-01
Thiamethoxam has been used as a major insecticide to control the B-biotype sweetpotato whitefly, Bemisia tabaci (Gennadius) (Hemiptera: Aleyrodidae). Due to its excessive use, a high level of resistance to thiamethoxam has developed worldwide over the past several years. To better understand the molecular mechanisms underlying this resistance in B. tabaci, gene profiles between the thiamethoxam-resistant and thiamethoxam-susceptible strains were investigated using the suppression subtractive hybridization (SSH) library approach. A total of 72 and 52 upand down-regulated genes were obtained from the forward and reverse SSH libraries, respectively. These expressed sequence tags (ESTs) belong to several functional categories based on their gene ontology annotation. Some categories such as cell communication, response to abiotic stimulus, lipid particle, and nuclear envelope were identified only in the forward library of thiamethoxam-resistant strains. In contrast, categories such as behavior, cell proliferation, nutrient reservoir activity, sequence-specific DNA binding transcription factor activity, and signal transducer activity were identified solely in the reverse library. To study the validity of the SSH method, 16 differentially expressed genes from both forward and reverse SSH libraries were selected randomly for further analyses using quantitative realtime PCR (qRT-PCR). The qRT-PCR results were fairly consistent with the SSH results; however, only 50% of the genes showed significantly different expression profiles between the thiamethoxam-resistant and thiamethoxam-susceptible whiteflies. Among these genes, a putative NAD-dependent methanol dehydrogenase was substantially over-expressed in the thiamethoxamresistant adults compared to their susceptible counterparts. The distributed profiles show that it was highly expressed during the egg stage, and was most abundant in the abdomen of adult females. PMID:22957505
An optimized protocol for generation and analysis of Ion Proton sequencing reads for RNA-Seq.
Yuan, Yongxian; Xu, Huaiqian; Leung, Ross Ka-Kit
2016-05-26
Previous studies compared running cost, time and other performance measures of popular sequencing platforms. However, comprehensive assessment of library construction and analysis protocols for Proton sequencing platform remains unexplored. Unlike Illumina sequencing platforms, Proton reads are heterogeneous in length and quality. When sequencing data from different platforms are combined, this can result in reads with various read length. Whether the performance of the commonly used software for handling such kind of data is satisfactory is unknown. By using universal human reference RNA as the initial material, RNaseIII and chemical fragmentation methods in library construction showed similar result in gene and junction discovery number and expression level estimated accuracy. In contrast, sequencing quality, read length and the choice of software affected mapping rate to a much larger extent. Unspliced aligner TMAP attained the highest mapping rate (97.27 % to genome, 86.46 % to transcriptome), though 47.83 % of mapped reads were clipped. Long reads could paradoxically reduce mapping in junctions. With reference annotation guide, the mapping rate of TopHat2 significantly increased from 75.79 to 92.09 %, especially for long (>150 bp) reads. Sailfish, a k-mer based gene expression quantifier attained highly consistent results with that of TaqMan array and highest sensitivity. We provided for the first time, the reference statistics of library preparation methods, gene detection and quantification and junction discovery for RNA-Seq by the Ion Proton platform. Chemical fragmentation performed equally well with the enzyme-based one. The optimal Ion Proton sequencing options and analysis software have been evaluated.
Rise, Matthew L.; von Schalburg, Kristian R.; Brown, Gordon D.; Mawer, Melanie A.; Devlin, Robert H.; Kuipers, Nathanael; Busby, Maura; Beetz-Sargent, Marianne; Alberto, Roberto; Gibbs, A. Ross; Hunt, Peter; Shukin, Robert; Zeznik, Jeffrey A.; Nelson, Colleen; Jones, Simon R.M.; Smailus, Duane E.; Jones, Steven J.M.; Schein, Jacqueline E.; Marra, Marco A.; Butterfield, Yaron S.N.; Stott, Jeff M.; Ng, Siemon H.S.; Davidson, William S.; Koop, Ben F.
2004-01-01
We report 80,388 ESTs from 23 Atlantic salmon (Salmo salar) cDNA libraries (61,819 ESTs), 6 rainbow trout (Oncorhynchus mykiss) cDNA libraries (14,544 ESTs), 2 chinook salmon (Oncorhynchus tshawytscha) cDNA libraries (1317 ESTs), 2 sockeye salmon (Oncorhynchus nerka) cDNA libraries (1243 ESTs), and 2 lake whitefish (Coregonus clupeaformis) cDNA libraries (1465 ESTs). The majority of these are 3′ sequences, allowing discrimination between paralogs arising from a recent genome duplication in the salmonid lineage. Sequence assembly reveals 28,710 different S. salar, 8981 O. mykiss, 1085 O. tshawytscha, 520 O. nerka, and 1176 C. clupeaformis putative transcripts. We annotate the submitted portion of our EST database by molecular function. Higher- and lower-molecular-weight fractions of libraries are shown to contain distinct gene sets, and higher rates of gene discovery are associated with higher-molecular weight libraries. Pyloric caecum library group annotations indicate this organ may function in redox control and as a barrier against systemic uptake of xenobiotics. A microarray is described, containing 7356 salmonid elements representing 3557 different cDNAs. Analyses of cross-species hybridizations to this cDNA microarray indicate that this resource may be used for studies involving all salmonids. PMID:14962987
Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei
2008-03-03
The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest.
Mesarich, Carl H.; Rees-George, Jonathan; Gardner, Paul P.; Ghomi, Fatemeh Ashari; Gerth, Monica L.; Andersen, Mark T.; Rikkerink, Erik H. A.; Fineran, Peter C.
2017-01-01
Pseudomonas syringae pv. actinidiae (Psa), the causal agent of kiwifruit canker, is one of the most devastating plant diseases of recent times. We have generated two mini-Tn5-based random insertion libraries of Psa ICMP 18884. The first, a ‘phenotype of interest’ (POI) library, consists of 10,368 independent mutants gridded into 96-well plates. By replica plating onto selective media, the POI library was successfully screened for auxotrophic and motility mutants. Lipopolysaccharide (LPS) biosynthesis mutants with ‘Fuzzy-Spreader’-like morphologies were also identified through a visual screen. The second, a ‘mutant of interest’ (MOI) library, comprises around 96,000 independent mutants, also stored in 96-well plates, with approximately 200 individuals per well. The MOI library was sequenced on the Illumina MiSeq platform using Transposon-Directed Insertion site Sequencing (TraDIS) to map insertion sites onto the Psa genome. A grid-based PCR method was developed to recover individual mutants, and using this strategy, the MOI library was successfully screened for a putative LPS mutant not identified in the visual screen. The Psa chromosome and plasmid had 24,031 and 1,236 independent insertion events respectively, giving insertion frequencies of 3.65 and 16.6 per kb respectively. These data suggest that the MOI library is near saturation, with the theoretical probability of finding an insert in any one chromosomal gene estimated to be 97.5%. However, only 47% of chromosomal genes had insertions. This surprisingly low rate cannot be solely explained by the lack of insertions in essential genes, which would be expected to be around 5%. Strikingly, many accessory genes, including most of those encoding type III effectors, lacked insertions. In contrast, 94% of genes on the Psa plasmid had insertions, including for example, the type III effector HopAU1. These results suggest that some chromosomal sites are rendered inaccessible to transposon insertion, either by DNA-binding proteins or by the architecture of the nucleoid. PMID:28249011
Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Toga, Kouhei; Saiki, Ryota; Shimada, Keisuke; Bourguignon, Thomas; Lo, Nathan; Hojo, Masaru; Maekawa, Kiyoto; Miura, Toru
2013-01-01
In termites, division of labor among castes, categories of individuals that perform specialized tasks, increases colony-level productivity and is the key to their ecological success. Although molecular studies on caste polymorphism have been performed in termites, we are far from a comprehensive understanding of the molecular basis of this phenomenon. To facilitate future molecular studies, we aimed to construct expressed sequence tag (EST) libraries covering wide ranges of gene repertoires in three representative termite species, Hodotermopsis sjostedti, Reticulitermes speratus and Nasutitermes takasagoensis. We generated normalized cDNA libraries from whole bodies, except for guts containing microbes, of almost all castes, sexes and developmental stages and sequenced them with the 454 GS FLX titanium system. We obtained >1.2 million quality-filtered reads yielding >400 million bases for each of the three species. Isotigs, which are analogous to individual transcripts, and singletons were produced by assembling the reads and annotated using public databases. Genes related to juvenile hormone, which plays crucial roles in caste differentiation of termites, were identified from the EST libraries by BLAST search. To explore the potential for DNA methylation, which plays an important role in caste differentiation of honeybees, tBLASTn searches for DNA methyltransferases (dnmt1, dnmt2 and dnmt3) and methyl-CpG binding domain (mbd) were performed against the EST libraries. All four of these genes were found in the H. sjostedti library, while all except dnmt3 were found in R. speratus and N. takasagoensis. The ratio of the observed to the expected CpG content (CpG O/E), which is a proxy for DNA methylation level, was calculated for the coding sequences predicted from the isotigs and singletons. In all of the three species, the majority of coding sequences showed depletion of CpG O/E (less than 1), and the distributions of CpG O/E were bimodal, suggesting the presence of DNA methylation.
Hayashi, Yoshinobu; Shigenobu, Shuji; Watanabe, Dai; Toga, Kouhei; Saiki, Ryota; Shimada, Keisuke; Bourguignon, Thomas; Lo, Nathan; Hojo, Masaru; Maekawa, Kiyoto; Miura, Toru
2013-01-01
In termites, division of labor among castes, categories of individuals that perform specialized tasks, increases colony-level productivity and is the key to their ecological success. Although molecular studies on caste polymorphism have been performed in termites, we are far from a comprehensive understanding of the molecular basis of this phenomenon. To facilitate future molecular studies, we aimed to construct expressed sequence tag (EST) libraries covering wide ranges of gene repertoires in three representative termite species, Hodotermopsis sjostedti , Reticulitermessperatus and Nasutitermestakasagoensis . We generated normalized cDNA libraries from whole bodies, except for guts containing microbes, of almost all castes, sexes and developmental stages and sequenced them with the 454 GS FLX titanium system. We obtained >1.2 million quality-filtered reads yielding >400 million bases for each of the three species. Isotigs, which are analogous to individual transcripts, and singletons were produced by assembling the reads and annotated using public databases. Genes related to juvenile hormone, which plays crucial roles in caste differentiation of termites, were identified from the EST libraries by BLAST search. To explore the potential for DNA methylation, which plays an important role in caste differentiation of honeybees, tBLASTn searches for DNA methyltransferases (dnmt1, dnmt2 and dnmt3) and methyl-CpG binding domain (mbd) were performed against the EST libraries. All four of these genes were found in the H . sjostedti library, while all except dnmt3 were found in R . speratus and N . takasagoensis . The ratio of the observed to the expected CpG content (CpG O/E), which is a proxy for DNA methylation level, was calculated for the coding sequences predicted from the isotigs and singletons. In all of the three species, the majority of coding sequences showed depletion of CpG O/E (less than 1), and the distributions of CpG O/E were bimodal, suggesting the presence of DNA methylation. PMID:24098800
Pratt, Lee H.; Liang, Chun; Shah, Manish; Sun, Feng; Wang, Haiming; Reid, St. Patrick; Gingle, Alan R.; Paterson, Andrew H.; Wing, Rod; Dean, Ralph; Klein, Robert; Nguyen, Henry T.; Ma, Hong-mei; Zhao, Xin; Morishige, Daryl T.; Mullet, John E.; Cordonnier-Pratt, Marie-Michèle
2005-01-01
Improved knowledge of the sorghum transcriptome will enhance basic understanding of how plants respond to stresses and serve as a source of genes of value to agriculture. Toward this goal, Sorghum bicolor L. Moench cDNA libraries were prepared from light- and dark-grown seedlings, drought-stressed plants, Colletotrichum-infected seedlings and plants, ovaries, embryos, and immature panicles. Other libraries were prepared with meristems from Sorghum propinquum (Kunth) Hitchc. that had been photoperiodically induced to flower, and with rhizomes from S. propinquum and johnsongrass (Sorghum halepense L. Pers.). A total of 117,682 expressed sequence tags (ESTs) were obtained representing both 3′ and 5′ sequences from about half that number of cDNA clones. A total of 16,801 unique transcripts, representing tentative UniScripts (TUs), were identified from 55,783 3′ ESTs. Of these TUs, 9,032 are represented by two or more ESTs. Collectively, these libraries were predicted to contain a total of approximately 31,000 TUs. Individual libraries, however, were predicted to contain no more than about 6,000 to 9,000, with the exception of light-grown seedlings, which yielded an estimate of close to 13,000. In addition, each library exhibits about the same level of complexity with respect to both the number of TUs preferentially expressed in that library and the frequency with which two or more ESTs is found in only that library. These results indicate that the sorghum genome is expressed in highly selective fashion in the individual organs and in response to the environmental conditions surveyed here. Close to 2,000 differentially expressed TUs were identified among the cDNA libraries examined, of which 775 were differentially expressed at a confidence level of 98%. From these 775 TUs, signature genes were identified defining drought, Colletotrichum infection, skotomorphogenesis (etiolation), ovary, immature panicle, and embryo. PMID:16169961
Horse cDNA clones encoding two MHC class I genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barbis, D.P.; Maher, J.K.; Stanek, J.
1994-12-31
Two full-length clones encoding MHC class I genes were isolated by screening a horse cDNA library, using a probe encoding in human HLA-A2.2Y allele. The library was made in the pcDNA1 vector (Invitrogen, San Diego, CA), using mRNA from peripheral blood lymphocytes obtained from a Thoroughbred stallion (No. 0834) homozygous for a common horse MHC haplotype (ELA-A2, -B2, -D2; Antczak et al. 1984; Donaldson et al. 1988). The clones were sequenced, using SP6 and T7 universal primers and horse-specific oligonucleotides designed to extend previously determined sequences.
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...
2015-08-15
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Evaluation of microbial community in hydrothermal field by direct DNA sequencing
NASA Astrophysics Data System (ADS)
Kawarabayasi, Y.; Maruyama, A.
2002-12-01
Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the differentiation of species. Thus, some novel archaeal species are expected to be in this field. The present direct cloning and sequencing technique is now opening a window to the new world in hydrothermal microbial community analysis.
Chang, Yaqing; Zhao, Wenming; Du, Zhenlin; Hao, Zhenlin
2015-01-01
Shell color is an important trait that is used in breeding the Japanese scallop Patinopecten yessoensis, the most economically important scallop species in China. We constructed four transcriptome libraries from different shell color lines of P. yessoensis: the left and right shell mantles of ordinary strains of P. yessoensis and the left shell mantles of the ‘Ivory’ and ‘Maple’ strains. These four libraries were paired-end sequenced using the Illumina HiSeq 2000 platform and contained 54,802,692 sequences, 40,798,962 sequences, 74,019,262 sequences, and 44,466,166 sequences, respectively. A total of 214,087,082 expressed sequence tags were assembled into 73,522 unigenes with an average size of 1,163 bp. When the data were compared against the public Nr and Swiss-Prot databases using BlastX, nearly 30.55% (22,458) of the unigenes were significantly matched to known unique proteins. Gene Ontology annotation and pathway mapping analysis using the Kyoto Encyclopedia of Genes and Genomes categorized unigenes according to their diverse biological functions and processes and identified candidate genes that were potentially involved in growth, pigmentation, metal transcription, and immunity. Expression profile analysis was performed on all four libraries and many differentially expressed genes were identified. In addition, 5,772 simple sequence repeats were obtained from the P. yessoensis transcriptomes, and 464,197, 395,646, and 310,649 single nucleotide polymorphisms were revealed in the ordinary strains, the ‘Ivory’ strain, and the ‘Maple’ strain, respectively. These results provide valuable information for future genomic studies on P. yessoensis and improve our understanding of the molecular mechanisms involved in the growth, immunity, shell coloring, and shell biomineralization of this species. These resources also may be used in a variety of applications, such as trait mapping, marker-assisted breeding, studies of population genetics and genomics, and work on functional genomics. PMID:25680107
Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.
Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T
1996-02-01
A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.
Huang, Jianke; Wang, Weiliang; Yin, Weibo; Hu, Zanmin; Li, Yuanguang
2012-01-01
Background Microalgae have been extensively investigated and exploited because of their competitive nutritive bioproducts and biofuel production ability. Chlorella are green algae that can grow well heterotrophically and photoautotrophically. Previous studies proved that shifting from heterotrophy to photoautotrophy in light-induced environments causes photooxidative damage as well as distinct physiologic features that lead to dynamic changes in Chlorella intracellular components, which have great potential in algal health food and biofuel production. However, the molecular mechanisms underlying the trophic transition remain unclear. Methodology/Principal Findings In this study, suppression subtractive hybridization strategy was employed to screen and characterize genes that are differentially expressed in response to the light-induced shift from heterotrophy to photoautotrophy. Expressed sequence tags (ESTs) were obtained from 770 and 803 randomly selected clones among the forward and reverse libraries, respectively. Sequence analysis identified 544 unique genes in the two libraries. The functional annotation of the assembled unigenes demonstrated that 164 (63.1%) from the forward library and 62 (21.8%) from the reverse showed significant similarities with the sequences in the NCBI non-redundant database. The time-course expression patterns of 38 selected differentially expressed genes further confirmed their responsiveness to a diverse trophic status. The majority of the genes enriched in the subtracted libraries were associated with energy metabolism, amino acid metabolism, protein synthesis, carbohydrate metabolism, and stress defense. Conclusions/Significance The data presented here offer the first insights into the molecular foundation underlying the diverse microalgal trophic niche. In addition, the results can be used as a reference for unraveling candidate genes associated with the transition of Chlorella from heterotrophy to photoautotrophy, which holds great potential for further improving its lipid and nutrient production. PMID:23209737
Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane
2006-01-01
Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057
[cDNA library construction from panicle meristem of finger millet].
Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B
2014-01-01
The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
Thomas, Matthew C; Selinger, L Brent; Inglis, G Douglas
2012-08-01
The temporal dynamics of planktonic protists in river water have received limited attention despite their ecological significance and recent studies linking phagotrophic protists to the persistence of human-pathogenic bacteria. Using molecular-based techniques targeting the 18S rRNA gene, we studied the seasonal diversity of planktonic protists in Southwestern Alberta rivers (Oldman River Basin) over a 1-year period. Nonmetric multidimensional scaling analysis of terminal restriction fragment length polymorphism (T-RFLP) data revealed distinct shifts in protistan community profiles that corresponded to season rather than geographical location. Community structures were examined by using clone library analysis; HaeIII restriction profiles of 18S rRNA gene amplicons were used to remove prevalent solanaceous plant clones prior to sequencing. Sanger sequencing of the V1-to-V3 region of the 18S rRNA gene libraries from spring, summer, fall, and winter supported the T-RFLP results and showed marked seasonal differences in the protistan community structure. The spring library was dominated by Chloroplastidae (29.8%), Centrohelida (28.1%), and Alveolata (25.5%), while the summer and fall libraries contained primarily fungal clones (83.0% and 88.0%, respectively). Alveolata (35.6%), Euglenozoa (24.4%), Chloroplastida (15.6%), and Fungi (15.6%) dominated the winter library. These data demonstrate that planktonic protists, including protozoa, are abundant in river water in Southwestern Alberta and that conspicuous seasonal shifts occur in the community structure.
Thomas, Matthew C.; Selinger, L. Brent
2012-01-01
The temporal dynamics of planktonic protists in river water have received limited attention despite their ecological significance and recent studies linking phagotrophic protists to the persistence of human-pathogenic bacteria. Using molecular-based techniques targeting the 18S rRNA gene, we studied the seasonal diversity of planktonic protists in Southwestern Alberta rivers (Oldman River Basin) over a 1-year period. Nonmetric multidimensional scaling analysis of terminal restriction fragment length polymorphism (T-RFLP) data revealed distinct shifts in protistan community profiles that corresponded to season rather than geographical location. Community structures were examined by using clone library analysis; HaeIII restriction profiles of 18S rRNA gene amplicons were used to remove prevalent solanaceous plant clones prior to sequencing. Sanger sequencing of the V1-to-V3 region of the 18S rRNA gene libraries from spring, summer, fall, and winter supported the T-RFLP results and showed marked seasonal differences in the protistan community structure. The spring library was dominated by Chloroplastidae (29.8%), Centrohelida (28.1%), and Alveolata (25.5%), while the summer and fall libraries contained primarily fungal clones (83.0% and 88.0%, respectively). Alveolata (35.6%), Euglenozoa (24.4%), Chloroplastida (15.6%), and Fungi (15.6%) dominated the winter library. These data demonstrate that planktonic protists, including protozoa, are abundant in river water in Southwestern Alberta and that conspicuous seasonal shifts occur in the community structure. PMID:22685143
Diverse Antibiotic Resistance Genes in Dairy Cow Manure
Wichmann, Fabienne; Udikovic-Kolic, Nikolina; Andrew, Sheila; Handelsman, Jo
2014-01-01
ABSTRACT Application of manure from antibiotic-treated animals to crops facilitates the dissemination of antibiotic resistance determinants into the environment. However, our knowledge of the identity, diversity, and patterns of distribution of these antibiotic resistance determinants remains limited. We used a new combination of methods to examine the resistome of dairy cow manure, a common soil amendment. Metagenomic libraries constructed with DNA extracted from manure were screened for resistance to beta-lactams, phenicols, aminoglycosides, and tetracyclines. Functional screening of fosmid and small-insert libraries identified 80 different antibiotic resistance genes whose deduced protein sequences were on average 50 to 60% identical to sequences deposited in GenBank. The resistance genes were frequently found in clusters and originated from a taxonomically diverse set of species, suggesting that some microorganisms in manure harbor multiple resistance genes. Furthermore, amid the great genetic diversity in manure, we discovered a novel clade of chloramphenicol acetyltransferases. Our study combined functional metagenomics with third-generation PacBio sequencing to significantly extend the roster of functional antibiotic resistance genes found in animal gut bacteria, providing a particularly broad resource for understanding the origins and dispersal of antibiotic resistance genes in agriculture and clinical settings. PMID:24757214
Fine-tuning gene networks using simple sequence repeats
Egbert, Robert G.; Klavins, Eric
2012-01-01
The parameters in a complex synthetic gene network must be extensively tuned before the network functions as designed. Here, we introduce a simple and general approach to rapidly tune gene networks in Escherichia coli using hypermutable simple sequence repeats embedded in the spacer region of the ribosome binding site. By varying repeat length, we generated expression libraries that incrementally and predictably sample gene expression levels over a 1,000-fold range. We demonstrate the utility of the approach by creating a bistable switch library that programmatically samples the expression space to balance the two states of the switch, and we illustrate the need for tuning by showing that the switch’s behavior is sensitive to host context. Further, we show that mutation rates of the repeats are controllable in vivo for stability or for targeted mutagenesis—suggesting a new approach to optimizing gene networks via directed evolution. This tuning methodology should accelerate the process of engineering functionally complex gene networks. PMID:22927382
htsint: a Python library for sequencing pipelines that combines data through gene set generation.
Richards, Adam J; Herrel, Anthony; Bonneaud, Camille
2015-09-24
Sequencing technologies provide a wealth of details in terms of genes, expression, splice variants, polymorphisms, and other features. A standard for sequencing analysis pipelines is to put genomic or transcriptomic features into a context of known functional information, but the relationships between ontology terms are often ignored. For RNA-Seq, considering genes and their genetic variants at the group level enables a convenient way to both integrate annotation data and detect small coordinated changes between experimental conditions, a known caveat of gene level analyses. We introduce the high throughput data integration tool, htsint, as an extension to the commonly used gene set enrichment frameworks. The central aim of htsint is to compile annotation information from one or more taxa in order to calculate functional distances among all genes in a specified gene space. Spectral clustering is then used to partition the genes, thereby generating functional modules. The gene space can range from a targeted list of genes, like a specific pathway, all the way to an ensemble of genomes. Given a collection of gene sets and a count matrix of transcriptomic features (e.g. expression, polymorphisms), the gene sets produced by htsint can be tested for 'enrichment' or conditional differences using one of a number of commonly available packages. The database and bundled tools to generate functional modules were designed with sequencing pipelines in mind, but the toolkit nature of htsint allows it to also be used in other areas of genomics. The software is freely available as a Python library through GitHub at https://github.com/ajrichards/htsint.
MISSION LentiPlex pooled shRNA library screening in mammalian cells.
Coussens, Matthew J; Corman, Courtney; Fischer, Ashley L; Sago, Jack; Swarthout, John
2011-12-21
RNA interference (RNAi) is an intrinsic cellular mechanism for the regulation of gene expression. Harnessing the innate power of this system enables us to knockdown gene expression levels in loss of gene function studies. There are two main methods for performing RNAi. The first is the use of small interfering RNAs (siRNAs) that are chemically synthesized, and the second utilizes short-hairpin RNAs (shRNAs) encoded within plasmids. The latter can be transfected into cells directly or packaged into replication incompetent lentiviral particles. The main advantages of using lentiviral shRNAs is the ease of introduction into a wide variety of cell types, their ability to stably integrate into the genome for long term gene knockdown and selection, and their efficacy in conducting high-throughput loss of function screens. To facilitate this we have created the LentiPlex pooled shRNA library. The MISSION LentiPlex Human shRNA Pooled Library is a genome-wide lentiviral pool produced using a proprietary process. The library consists of over 75,000 shRNA constructs from the TRC collection targeting 15,000+ human genes. Each library is tested for shRNA representation before product release to ensure robust library coverage. The library is provided in a ready-to-use lentiviral format at titers of at least 5 x 10(8) TU/ml via p24 assay and is pre-divided into ten subpools of approximately 8,000 shRNA constructs each. Amplification and sequencing primers are also provided for downstream target identification. Previous studies established a synergistic antitumor activity of TRAIL when combined with Paclitaxel in A549 cells, a human lung carcinoma cell line. In this study we demonstrate the application of a pooled LentiPlex shRNA library to rapidly conduct a positive selection screen for genes involved in the cytotoxicity of A549 cells when exposed to TRAIL and Paclitaxel. One barrier often encountered with high-throughput screens is the cost and difficulty in deconvolution; we also detail a cost-effective polyclonal approach utilizing traditional sequencing.
Sequenced sorghum mutant library- an efficient platform for discovery of causal gene mutations
USDA-ARS?s Scientific Manuscript database
Ethyl methanesulfonate (EMS) efficiently generates high-density mutations in genomes. We applied whole-genome sequencing to 256 phenotyped mutant lines of sorghum (Sorghum bicolor L. Moench) to 16x coverage. Comparisons with the reference sequence revealed >1.8 million canonical EMS-induced G/C to A...
Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M
2004-01-01
Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.
2000-01-01
Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957
Pan, Lang; Gao, Haitao; Xia, Wenwen; Zhang, Teng; Dong, Liyao
2016-03-01
Non-target site resistance (NTSR) to herbicides is an increasing concern for weed control. Metabolic herbicide resistance is an important mechanism for NTSR. However, little is known about metabolic resistance at the genetic level. In this study, we have identified three fenoxaprop-P-ethyl-resistant American sloughgrass (Beckmannia syzigachne Steud.) populations, in which the molecular basis for NTSR remains unclear. To reveal the mechanisms of metabolic resistance, the genes likely to be involved in herbicide metabolism (e.g. for cytochrome P450s, esterases, hydrolases, oxidases, peroxidases, glutathione S-transferases, glycosyltransferases, and transporter proteins) were isolated using transcriptome sequencing, in combination with RT-PCR (reverse transcription-PCR) and RACE (rapid amplification of cDNA ends). Consequently, we established a herbicide-metabolizing enzyme library containing at least 332 genes, and each of these genes was cloned and the sequence and the expression level compared between the fenoxaprop-P-ethyl-resistant and susceptible populations. Fifteen metabolic enzyme genes were found to be possibly involved in fenoxaprop-P-ethyl resistance. In addition, we found five metabolizing enzyme genes that have a different gene sequence in plants of susceptible versus resistant B. syzigachne populations. These genes may be major candidates for herbicide metabolic resistance. This established metabolic enzyme library represents an important step forward towards a better understanding of herbicide metabolism and metabolic resistance in this and possibly other closely related weed species. This new information may help to understand weed metabolic resistance and to develop novel strategies of weed management. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A
2009-01-01
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386
Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou
2016-11-01
It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
Effect of condensed tannins on bovine rumen protist diversity based on 18S rRNA gene sequences.
Tan, Hui Yin; Sieo, Chin Chin; Abdullah, Norhani; Liang, Juan Boo; Huang, Xiao Dan; Ho, Yin Wan
2013-01-01
Molecular diversity of protists from bovine rumen fluid incubated with condensed tannins of Leucaena leucocephala hybrid-Rendang at 20 mg/500 mg dry matter (treatment) or without condensed tannins (control) was investigated using 18S rRNA gene library. Clones from the control library were distributed within nine genera, but clones from the condensed tannin treatment clone library were related to only six genera. Diversity estimators such as abundance-based coverage estimation and Chao1 showed significant differences between the two libraries, although no differences were found based on Shannon-Weaver index and Libshuff. © 2012 The Author(s) Journal of Eukaryotic Microbiology © 2012 International Society of Protistologists.
The Microbial Ferrous Wheel in a Neutral pH Groundwater Seep
Roden, Eric E.; McBeth, Joyce M.; Blöthe, Marco; Percak-Dennett, Elizabeth M.; Fleming, Emily J.; Holyoke, Rebecca R.; Luther, George W.; Emerson, David; Schieber, Juergen
2012-01-01
Evidence for microbial Fe redox cycling was documented in a circumneutral pH groundwater seep near Bloomington, Indiana. Geochemical and microbiological analyses were conducted at two sites, a semi-consolidated microbial mat and a floating puffball structure. In situ voltammetric microelectrode measurements revealed steep opposing gradients of O2 and Fe(II) at both sites, similar to other groundwater seep and sedimentary environments known to support microbial Fe redox cycling. The puffball structure showed an abrupt increase in dissolved Fe(II) just at its surface (∼5 cm depth), suggesting an internal Fe(II) source coupled to active Fe(III) reduction. Most probable number enumerations detected microaerophilic Fe(II)-oxidizing bacteria (FeOB) and dissimilatory Fe(III)-reducing bacteria (FeRB) at densities of 102 to 105 cells mL−1 in samples from both sites. In vitro Fe(III) reduction experiments revealed the potential for immediate reduction (no lag period) of native Fe(III) oxides. Conventional full-length 16S rRNA gene clone libraries were compared with high throughput barcode sequencing of the V1, V4, or V6 variable regions of 16S rRNA genes in order to evaluate the extent to which new sequencing approaches could provide enhanced insight into the composition of Fe redox cycling microbial community structure. The composition of the clone libraries suggested a lithotroph-dominated microbial community centered around taxa related to known FeOB (e.g., Gallionella, Sideroxydans, Aquabacterium). Sequences related to recognized FeRB (e.g., Rhodoferax, Aeromonas, Geobacter, Desulfovibrio) were also well-represented. Overall, sequences related to known FeOB and FeRB accounted for 88 and 59% of total clone sequences in the mat and puffball libraries, respectively. Taxa identified in the barcode libraries showed partial overlap with the clone libraries, but were not always consistent across different variable regions and sequencing platforms. However, the barcode libraries provided confirmation of key clone library results (e.g., the predominance of Betaproteobacteria) and an expanded view of lithotrophic microbial community composition. PMID:22783228
Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen
2018-01-01
Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Illumina sequencing of green stink bug nymph and adult cdna to identify potential rnai gene targets
USDA-ARS?s Scientific Manuscript database
Whole-body transcriptomes for nymphs and adults of the green stink bug, Acrosternum hilare (Say), were sequenced on an Illumina® Genome Analyzer IIx sequencer. The insects were collected from sites in North Carolina and Virginia, USA. The cDNA library for each sample was sequenced on one lane of an...
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413
Vizcaíno, Juan Antonio; González, Francisco Javier; Suárez, M Belén; Redondo, José; Heinrich, Julian; Delgado-Jarana, Jesús; Hermosa, Rosa; Gutiérrez, Santiago; Monte, Enrique; Llobell, Antonio; Rey, Manuel
2006-01-01
Background The filamentous fungus Trichoderma harzianum is used as biological control agent of several plant-pathogenic fungi. In order to study the genome of this fungus, a functional genomics project called "TrichoEST" was developed to give insights into genes involved in biological control activities using an approach based on the generation of expressed sequence tags (ESTs). Results Eight different cDNA libraries from T. harzianum strain CECT 2413 were constructed. Different growth conditions involving mainly different nutrient conditions and/or stresses were used. We here present the analysis of the 8,710 ESTs generated. A total of 3,478 unique sequences were identified of which 81.4% had sequence similarity with GenBank entries, using the BLASTX algorithm. Using the Gene Ontology hierarchy, we performed the annotation of 51.1% of the unique sequences and compared its distribution among the gene libraries. Additionally, the InterProScan algorithm was used in order to further characterize the sequences. The identification of the putatively secreted proteins was also carried out. Later, based on the EST abundance, we examined the highly expressed genes and a hydrophobin was identified as the gene expressed at the highest level. We compared our collection of ESTs with the previous collections obtained from Trichoderma species and we also compared our sequence set with different complete eukaryotic genomes from several animals, plants and fungi. Accordingly, the presence of similar sequences in different kingdoms was also studied. Conclusion This EST collection and its annotation provide a significant resource for basic and applied research on T. harzianum, a fungus with a high biotechnological interest. PMID:16872539
Guibert, Lilian M; Loviso, Claudia L; Marcos, Magalí S; Commendatore, Marta G; Dionisi, Hebe M; Lozada, Mariana
2012-10-01
Although sediments are the natural hydrocarbon sink in the marine environment, the ecology of hydrocarbon-degrading bacteria in sediments is poorly understood, especially in cold regions. We studied the diversity of alkane-degrading bacterial populations and their response to oil exposure in sediments of a chronically polluted Subantarctic coastal environment, by analyzing alkane monooxygenase (alkB) gene libraries. Sequences from the sediment clone libraries were affiliated with genes described in Proteobacteria and Actinobacteria, with 67 % amino acid identity in average to sequences from isolated microorganisms. The majority of the sequences were most closely related to uncultured microorganisms from cold marine sediments or soils from high latitude regions, highlighting the role of temperature in the structuring of this bacterial guild. The distribution of alkB sequences among samples of different sites and years, and selection after experimental oil exposure allowed us to identify ecologically relevant alkB genes in Subantarctic sediments, which could be used as biomarkers for alkane biodegradation in this environment. 16 S rRNA amplicon pyrosequencing indicated the abundance of several genera for which no alkB genes have yet been described (Oleispira, Thalassospira) or that have not been previously associated with oil biodegradation (Spongiibacter-formerly Melitea-, Maribius, Robiginitomaculum, Bizionia and Gillisia). These genera constitute candidates for future work involving identification of hydrocarbon biodegradation pathway genes.
Ito, Yuji
2017-01-01
As an alternative to hybridoma technology, the antibody phage library system can also be used for antibody selection. This method enables the isolation of antigen-specific binders through an in vitro selection process known as biopanning. While it has several advantages, such as an avoidance of animal immunization, the phage cloning and screening steps of biopanning are time-consuming and problematic. Here, we introduce a novel biopanning method combined with high-throughput sequencing (HTS) using a next-generation sequencer (NGS) to save time and effort in antibody selection, and to increase the diversity of acquired antibody sequences. Biopannings against a target antigen were performed using a human single chain Fv (scFv) antibody phage library. VH genes in pooled phages at each round of biopanning were analyzed by HTS on a NGS. The obtained data were trimmed, merged, and translated into amino acid sequences. The frequencies (%) of the respective VH sequences at each biopanning step were calculated, and the amplification factor (change of frequency through biopanning) was obtained to estimate the potential for antigen binding. A phylogenetic tree was drawn using the top 50 VH sequences with high amplification factors. Representative VH sequences forming the cluster were then picked up and used to reconstruct scFv genes harboring these VHs. Their derived scFv-Fc fusion proteins showed clear antigen binding activity. These results indicate that a combination of biopanning and HTS enables the rapid and comprehensive identification of specific binders from antibody phage libraries.
Noda, Hiroaki; Kawai, Sawako; Koizumi, Yoko; Matsui, Kageaki; Zhang, Qiang; Furukawa, Shigetoyo; Shimomura, Michihiko; Mita, Kazuei
2008-01-01
Background The brown planthopper (BPH), Nilaparvata lugens (Hemiptera, Delphacidae), is a serious insect pests of rice plants. Major means of BPH control are application of agricultural chemicals and cultivation of BPH resistant rice varieties. Nevertheless, BPH strains that are resistant to agricultural chemicals have developed, and BPH strains have appeared that are virulent against the resistant rice varieties. Expressed sequence tag (EST) analysis and related applications are useful to elucidate the mechanisms of resistance and virulence and to reveal physiological aspects of this non-model insect, with its poorly understood genetic background. Results More than 37,000 high-quality ESTs, excluding sequences of mitochondrial genome, microbial genomes, and rDNA, have been produced from 18 libraries of various BPH tissues and stages. About 10,200 clusters have been made from whole EST sequences, with average EST size of 627 bp. Among the top ten most abundantly expressed genes, three are unique and show no homology in BLAST searches. The actin gene was highly expressed in BPH, especially in the thorax. Tissue-specifically expressed genes were extracted based on the expression frequency among the libraries. An EST database is available at our web site. Conclusion The EST library will provide useful information for transcriptional analyses, proteomic analyses, and gene functional analyses of BPH. Moreover, specific genes for hemimetabolous insects will be identified. The microarray fabricated based on the EST information will be useful for finding genes related to agricultural and biological problems related to this pest. PMID:18315884
Hirao, Tomonori; Fukatsu, Eitaro; Watanabe, Atsushi
2012-01-24
Pine wilt disease is caused by the pine wood nematode, Bursaphelenchus xylophilus, which threatens pine forests and forest ecosystems worldwide and causes serious economic losses. In the 40 years since the pathogen was identified, the physiological changes occurring as the disease progresses have been characterized using anatomical and biochemical methods, and resistant trees have been selected via breeding programs. However, no studies have assessed the molecular genetics, e.g. transcriptional changes, associated with infection-induced physiological changes in resistant or susceptible trees. We constructed seven subtractive suppression hybridization (SSH) cDNA libraries using time-course sampling of trees inoculated with pine wood nematode at 1, 3, or 7 days post-inoculation (dpi) in susceptible trees and at 1, 3, 7, or 14 dpi in resistant trees. A total of 3,299 sequences was obtained from these cDNA libraries, including from 138 to 315 non-redundant sequences in susceptible SSH libraries and from 351 to 435 in resistant SSH libraries. Using Gene Ontology hierarchy, those non-redundant sequences were classified into 15 subcategories of the biological process Gene Ontology category and 17 subcategories of the molecular function category. The transcriptional components revealed by the Gene Ontology classification clearly differed between resistant and susceptible libraries. Some transcripts were discriminative: expression of antimicrobial peptide and putative pathogenesis-related genes (e.g., PR-1b, 2, 3, 4, 5, 6) was much higher in susceptible trees than in resistant trees at every time point, whereas expression of PR-9, PR-10, and cell wall-related genes (e.g., for hydroxyproline-rich glycoprotein precursor and extensin) was higher in resistant trees than in susceptible trees at 7 and 14 dpi. Following inoculation with pine wood nematode, there were marked differences between resistant and susceptible trees in transcript diversity and the timing and level of transcripts expressed in common; in particular, expression of stress response and defense genes differed. This study provided new insight into the differences in the physiological changes between resistant and susceptible trees that have been observed in anatomical and biochemical studies.
NASA Astrophysics Data System (ADS)
Sun, S. M.; Slightom, J. L.; Hall, T. C.
1981-01-01
A plant gene coding for the major storage protein (phaseolin, G1-globulin) of the French bean was isolated from a genomic library constructed in the phage vector Charon 24A. Comparison of the nucleotide sequence of part of the gene with that of the cloned messenger RNA (cDNA) revealed the presence of three intervening sequences, all beginning with GTand ending with AG. The 5' and 3' boundaries of intervening sequences TVS-A (88 base pairs) and IVS-B (124 base pairs) are similar to those described for animal and viral genes, but the 3' boundary of IVS-C (129 base pairs) shows some differences. A sequence of 185 amino acids deduced from the cloned DMAs represents about 40% of a phaseolin polypeptide.
Analyses of Hypomethylated Oil Palm Gene Space
Jayanthi, Nagappan; Mohd-Amin, Ab Halim; Azizi, Norazah; Chan, Kuang-Lim; Maqbool, Nauman J.; Maclean, Paul; Brauning, Rudi; McCulloch, Alan; Moraga, Roger; Ong-Abdullah, Meilina; Singh, Rajinder
2014-01-01
Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm. PMID:24497974
Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea
2015-01-01
In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3’end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding. PMID:25768012
Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea
2015-01-01
In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3'end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding.
Jiang, Jinjin; Wang, Yue; Zhu, Bao; Fang, Tingting; Fang, Yujie; Wang, Youping
2015-01-27
Brassica includes many successfully cultivated crop species of polyploid origin, either by ancestral genome triplication or by hybridization between two diploid progenitors, displaying complex repetitive sequences and transposons. The U's triangle, which consists of three diploids and three amphidiploids, is optimal for the analysis of complicated genomes after polyploidization. Next-generation sequencing enables the transcriptome profiling of polyploids on a global scale. We examined the gene expression patterns of three diploids (Brassica rapa, B. nigra, and B. oleracea) and three amphidiploids (B. napus, B. juncea, and B. carinata) via digital gene expression analysis. In total, the libraries generated between 5.7 and 6.1 million raw reads, and the clean tags of each library were mapped to 18547-21995 genes of B. rapa genome. The unambiguous tag-mapped genes in the libraries were compared. Moreover, the majority of differentially expressed genes (DEGs) were explored among diploids as well as between diploids and amphidiploids. Gene ontological analysis was performed to functionally categorize these DEGs into different classes. The Kyoto Encyclopedia of Genes and Genomes analysis was performed to assign these DEGs into approximately 120 pathways, among which the metabolic pathway, biosynthesis of secondary metabolites, and peroxisomal pathway were enriched. The non-additive genes in Brassica amphidiploids were analyzed, and the results indicated that orthologous genes in polyploids are frequently expressed in a non-additive pattern. Methyltransferase genes showed differential expression pattern in Brassica species. Our results provided an understanding of the transcriptome complexity of natural Brassica species. The gene expression changes in diploids and allopolyploids may help elucidate the morphological and physiological differences among Brassica species.
Kanai, Akio; Oida, Hanako; Matsuura, Nana; Doi, Hirofumi
2003-01-01
We systematically screened a genomic DNA library to identify proteins of the hyperthermophilic archaeon Pyrococcus furiosus using an expression cloning method. One gene product, which we named FAU-1 (P. furiosus AU-binding), demonstrated the strongest binding activity of all the genomic library-derived proteins tested against an AU-rich RNA sequence. The protein was purified to near homogeneity as a 54 kDa single polypeptide, and the gene locus corresponding to this FAU-1 activity was also sequenced. The FAU-1 gene encoded a 472-amino-acid protein that was characterized by highly charged domains consisting of both acidic and basic amino acids. The N-terminal half of the gene had a degree of similarity (25%) with RNase E from Escherichia coli. Five rounds of RNA-binding-site selection and footprinting analysis showed that the FAU-1 protein binds specifically to the AU-rich sequence in a loop region of a possible RNA ligand. Moreover, we demonstrated that the FAU-1 protein acts as an oligomer, and mainly as a trimer. These results showed that the FAU-1 protein is a novel heat-stable protein with an RNA loop-binding characteristic. PMID:12614195
Llera-Herrera, Raúl; García-Gasca, Alejandra; Abreu-Goodger, Cei; Huvet, Arnaud; Ibarra, Ana M.
2013-01-01
Despite the great advances in sequencing technologies, genomic and transcriptomic information for marine non-model species with ecological, evolutionary, and economical interest is still scarce. In this work we aimed to identify genes expressed during spermatogenesis in the functional hermaphrodite scallop Nodipecten subnodosus (Mollusca: Bivalvia: Pectinidae), with the purpose of obtaining a panel of genes that would allow for the study of differentially transcribed genes between diploid and triploid scallops in the context of meiotic arrest and reproductive sterility. Because our aim was to isolate genes involved in meiosis and other testis maturation-related processes, we generated suppressive subtractive hybridization libraries of testis vs. inactive gonad. We obtained 352 and 177 ESTs by clone sequencing, and using pyrosequencing (454-Roche) we maximized the identified ESTs to 34,276 reads. A total of 1,153 genes from the testis library had a blastx hit and GO annotation, including genes specific for meiosis, spermatogenesis, sex-differentiation, and transposable elements. Some of the identified meiosis genes function in chromosome pairing (scp2, scp3), recombination and DNA repair (dmc1, rad51, ccnb1ip1/hei10), and meiotic checkpoints (rad1, hormad1, dtl/cdt2). Gene expression analyses in different gametogenic stages in both sexual regions of the gonad of meiosis genes confirmed that the expression was specific or increased towards the maturing testis. Spermatogenesis genes included known testis-specific ones (kelch-10, shippo1, adad1), with some of these known to be associated to sterility. Sex differentiation genes included one of the most conserved genes at the bottom of the sex-determination cascade (dmrt1). Transcript from transposable elements, reverse transcriptase, and transposases in this library evidenced that transposition is an active process during spermatogenesis in N. subnodosus. In relation to the inactive library, we identified 833 transcripts with functional annotation related to activation of the transcription and translation machinery, as well as to germline control and maintenance. PMID:24066034
A wing expressed sequence tag resource for Bicyclus anynana butterflies, an evo-devo model
Beldade, Patrícia; Rudd, Stephen; Gruber, Jonathan D; Long, Anthony D
2006-01-01
Background Butterfly wing color patterns are a key model for integrating evolutionary developmental biology and the study of adaptive morphological evolution. Yet, despite the biological, economical and educational value of butterflies they are still relatively under-represented in terms of available genomic resources. Here, we describe an Expression Sequence Tag (EST) project for Bicyclus anynana that has identified the largest available collection to date of expressed genes for any butterfly. Results By targeting cDNAs from developing wings at the stages when pattern is specified, we biased gene discovery towards genes potentially involved in pattern formation. Assembly of 9,903 ESTs from a subtracted library allowed us to identify 4,251 genes of which 2,461 were annotated based on BLAST analyses against relevant gene collections. Gene prediction software identified 2,202 peptides, of which 215 longer than 100 amino acids had no homology to any known proteins and, thus, potentially represent novel or highly diverged butterfly genes. We combined gene and Single Nucleotide Polymorphism (SNP) identification by constructing cDNA libraries from pools of outbred individuals, and by sequencing clones from the 3' end to maximize alignment depth. Alignments of multi-member contigs allowed us to identify over 14,000 putative SNPs, with 316 genes having at least one high confidence double-hit SNP. We furthermore identified 320 microsatellites in transcribed genes that can potentially be used as genetic markers. Conclusion Our project was designed to combine gene and sequence polymorphism discovery and has generated the largest gene collection available for any butterfly and many potential markers in expressed genes. These resources will be invaluable for exploring the potential of B. anynana in particular, and butterflies in general, as models in ecological, evolutionary, and developmental genetics. PMID:16737530
Telke, Amar A; Rolain, Jean-Marc
2015-12-01
Shewanella algae MARS 14 is a colistin-resistant clinical isolate retrieved from bronchoalveolar lavage of a hospitalised patient. A functional genomics strategy was employed to discover the molecular support for colistin resistance in S. algae MARS 14. A pZE21 MCS-1 plasmid-based genomic expression library was constructed in Escherichia coli TOP10. The estimated library size was 1.30×10(8) bp. Functional screening of colistin-resistant clones was carried out on Luria-Bertani agar containing 8 mg/L colistin. Five colistin-resistant clones were obtained after complete screening of the genomic expression library. Analysis of DNA sequencing results found a unique gene in all selected clones. Amino acid sequence analysis of this unique gene using the Integrated Microbial Genomes (IMG) and KEGG databases revealed that this gene encodes ethanolamine phosphotransferase (EptA, or so-called PmrC). Reverse transcription PCR analysis indicated that resistance to colistin in S. algae MARS 14 was associated with overexpression of EptA (27-fold increase), which plays a crucial role in the arrangement of outer membrane lipopolysaccharide. Copyright © 2015 Elsevier B.V. and the International Society of Chemotherapy. All rights reserved.
McGarvey, J A; Franco, R B; Palumbo, J D; Hnasko, R; Stanker, L; Mitloehner, F M
2013-06-01
To describe, at high resolution, the bacterial population dynamics and chemical transformations during the ensiling of alfalfa and subsequent exposure to air. Samples of alfalfa, ensiled alfalfa and silage exposed to air were collected and their bacterial population structures compared using 16S rRNA gene libraries containing approximately 1900 sequences each. Cultural and chemical analyses were also performed to complement the 16S gene sequence data. Sequence analysis revealed significant differences (P < 0·05) in the bacterial populations at each time point. The alfalfa-derived library contained mostly sequences associated with the Gammaproteobacteria (including the genera: Enterobacter, Erwinia and Pantoea); the ensiled material contained mostly sequences associated with the lactic acid bacteria (LAB) (including the genera: Lactobacillus, Pediococcus and Lactococcus). Exposure to air resulted in even greater percentages of LAB, especially among the genus Lactobacillus, and a significant drop in bacterial diversity. In-depth 16S rRNA gene sequence analysis revealed significant bacterial population structure changes during ensiling and again during exposure to air. This in-depth description of the bacterial population dynamics that occurred during ensiling and simulated feed out expands our knowledge of these processes. © 2013 The Society for Applied Microbiology No claim to US Government works.
Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A
2015-10-26
Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs. Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.
Coelho, Marcia Reed Rodrigues; de Vos, Marjon; Carneiro, Newton Portilho; Marriel, Ivanildo Evódio; Paiva, Edilson; Seldin, Lucy
2008-02-01
The diversity of nitrogen-fixing bacteria was assessed in the rhizospheres of two cultivars of sorghum (IS 5322-C and IPA 1011) sown in Cerrado soil amended with two levels of nitrogen fertilizer (12 and 120 kg ha(-1)). The nifH gene was amplified directly from DNA extracted from the rhizospheres, and the PCR products cloned and sequenced. Four clone libraries were generated from the nifH fragments and 245 sequences were obtained. Most of the clones (57%) were closely related to nifH genes of uncultured bacteria. NifH clones affiliated with Azohydromonas spp., Ideonella sp., Rhizobium etli and Bradyrhizobium sp. were found in all libraries. Sequences affiliated with Delftia tsuruhatensis were found in the rhizosphere of both cultivars sown with high levels of nitrogen, while clones affiliated with Methylocystis sp. were detected only in plants sown under low levels of nitrogen. Moreover, clones affiliated with Paenibacillus durus could be found in libraries from the cultivar IS 5322-C sown either in high or low amounts of fertilizer. This study showed that the amount of nitrogen used for fertilization is the overriding determinative factor that influenced the nitrogen-fixing community structures in sorghum rhizospheres cultivated in Cerrado soil.
Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas
2009-06-01
The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.
Zhao, Yinhe; Wang, Guoying; Zhang, Jinpeng; Yang, Junbo; Peng, Shang; Gao, Lianming; Li, Chengyun; Hu, Jinyong; Li, Dezhu; Gao, Lizhi
2006-07-01
Asarum caudigerum (Aristolochiaceae) is an important species of paleoherb in relation to understanding the origin and evolution of angiosperm flowers, due to its basal position in the angiosperms. The aim of this study was to isolate floral-related genes from A. caudigerum, and to infer evolutionary relationships among florally expression-related genes, to further illustrate the origin and diversification of flowers in angiosperms. A subtracted floral cDNA library was constructed from floral buds using suppression subtractive hybridization (SSH). The cDNA of floral buds and leaves at the seedling stage were used as a tester and a driver, respectively. To further identify the function of putative MADS-box transcription factors, phylogenetic trees were reconstructed in order to infer evolutionary relationships within the MADS-box gene family. In the forward-subtracted floral cDNA library, 1920 clones were randomly sequenced, from which 567 unique expressed sequence tags (ESTs) were obtained. Among them, 127 genes failed to show significant similarity to any published sequences in GenBank and thus are putatively novel genes. Phylogenetic analysis indicated that a total of 29 MADS-box transcription factors were members of the APETALA3(AP3) subfamily, while nine others were putative MADS-box transcription factors that formed a cluster with MADS-box genes isolated from Amborella, the basal-most angiosperm, and those from the gymnosperms. This suggests that the origin of A. caudigerum is intermediate between the angiosperms and gymnosperms.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn
2009-01-01
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
n-CoDeR concept: unique types of antibodies for diagnostic use and therapy.
Carlsson, R; Söderlind, E
2001-05-01
The n-CoDeR recombinant antibody gene libraries are built on a single master framework, into which diverse in vivo-formed complementarity determining regions (CDRs) are allowed to recombine. These CDRs are sampled from in vivo-processed and proof-read gene sequences, thus ensuring an optimal level of correctly folded and functional molecules. By the modularized assembly process, up to six CDRs can be varied at the same time, providing a possibility for the creation of a hitherto undescribed genetic and functional variation. The n-CoDeR antibody gene libraries can be used to select highly specific, human antibody fragments with specificities to virtually any antigen, including carbohydrates and human self-proteins and with affinities down into the subnanomolar range. Furthermore, combining CDRs sampled from in vivo-processed sequences into a single framework result in molecules exhibiting a lower immunogenicity compared to normal human immunoglobulins, as determined by computer analyses. The distinguished features of the n-CoDeR libraries in the therapeutic and diagnostic areas are discussed.
Isolation of the alkane inducible cytochrome P450 (P450alk) gene from the yeast Candida tropicalis
The gene for the alkane-inducible cytochrome P450, P450alk, has been isolated from the yeast Candida tropicalis by immunoscreening a λgt11 library. Isolation of the gene has been identified on the basis of its inducibility and partial DNA sequence. Transcripts of this gene were i...
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
2012-01-01
Background Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. Results An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. Conclusions This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches. PMID:22554201
Silva, Cynthia C.; Hayden, Helen; Sawbridge, Tim; Mele, Pauline; De Paula, Sérgio O.; Silva, Lívia C. F.; Vidigal, Pedro M. P.; Vicentini, Renato; Sousa, Maíra P.; Torres, Ana Paula R.; Santiago, Vânia M. J.; Oliveira, Valéria M.
2013-01-01
Two fosmid libraries, totaling 13,200 clones, were obtained from bioreactor sludge of petroleum refinery wastewater treatment system. The library screening based on PCR and biological activity assays revealed more than 400 positive clones for phenol degradation. From these, 100 clones were randomly selected for pyrosequencing in order to evaluate the genetic potential of the microorganisms present in wastewater treatment plant for biodegradation, focusing mainly on novel genes and pathways of phenol and aromatic compound degradation. The sequence analysis of selected clones yielded 129,635 reads at an estimated 17-fold coverage. The phylogenetic analysis showed Burkholderiales and Rhodocyclales as the most abundant orders among the selected fosmid clones. The MG-RAST analysis revealed a broad metabolic profile with important functions for wastewater treatment, including metabolism of aromatic compounds, nitrogen, sulphur and phosphorus. The predicted 2,276 proteins included phenol hydroxylases and cathecol 2,3- dioxygenases, involved in the catabolism of aromatic compounds, such as phenol, byphenol, benzoate and phenylpropanoid. The sequencing of one fosmid insert of 33 kb unraveled the gene that permitted the host, Escherichia coli EPI300, to grow in the presence of aromatic compounds. Additionally, the comparison of the whole fosmid sequence against bacterial genomes deposited in GenBank showed that about 90% of sequence showed no identity to known sequences of Proteobacteria deposited in the NCBI database. This study surveyed the functional potential of fosmid clones for aromatic compound degradation and contributed to our knowledge of the biodegradative capacity and pathways of microbial assemblages present in refinery wastewater treatment system. PMID:23637911
Large-Scale Concatenation cDNA Sequencing
Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.
1997-01-01
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Flexible CRISPR library construction using parallel oligonucleotide retrieval
Read, Abigail; Gao, Shaojian; Batchelor, Eric
2017-01-01
Abstract CRISPR/Cas9-based gene knockout libraries have emerged as a powerful tool for functional screens. We present here a set of pre-designed human and mouse sgRNA sequences that are optimized for both high on-target potency and low off-target effect. To maximize the chance of target gene inactivation, sgRNAs were curated to target both 5΄ constitutive exons and exons that encode conserved protein domains. We describe here a robust and cost-effective method to construct multiple small sized CRISPR library from a single oligo pool generated by array synthesis using parallel oligonucleotide retrieval. Together, these resources provide a convenient means for individual labs to generate customized CRISPR libraries of variable size and coverage depth for functional genomics application. PMID:28334828
Hiesel, Rudolf; Schobel, Werner; Schuster, Wolfgang; Brennicke, Axel
1987-01-01
Two loci encoding subunit III of the cytochrome oxidase (COX) in Oenothera mitochondria have been identified from a cDNA library of mitochondrial transcripts. A 657-bp sequence block upstream from the open reading frame is also present in the two copies of the COX subunit I gene and is presumably involved in homologous sequence rearrangement. The proximal points of sequence rearrangements are located 3 bp upstream from the COX I and 1139 bp upstream from the COX III initiation codons. The 5'-termini of both COX I and COX III mRNAs have been mapped in this common sequence confining the promoter region for the Oenothera mitochondrial COX I and COX III genes to the homologous sequence block. ImagesFig. 5. PMID:15981332
Panosyan, Hovik; Birkeland, Nils-Kåre
2014-11-01
The phylogenetic diversity of the prokaryotic community thriving in the Arzakan hot spring in Armenia was studied using molecular and culture-based methods. A sequence analysis of 16S rRNA gene clone libraries demonstrated the presence of a diversity of microorganisms belonging to the Alphaproteobacteria, Betaproteobacteria, Gammaproteobacteria, Epsilonproteobacteria, Firmicutes, Bacteroidetes phyla, and Cyanobacteria. Proteobacteria was the dominant group, representing 52% of the bacterial clones. Denaturing gradient gel electrophoresis profiles of the bacterial 16S rRNA gene fragments also indicated the abundance of Proteobacteria, Bacteroidetes, and Cyanobacteria populations. Most of the sequences were most closely related to uncultivated microorganisms and shared less than 96% similarity with their closest matches in GenBank, indicating that this spring harbors a unique community of novel microbial species or genera. The majority of the sequences of an archaeal 16S rRNA gene library, generated from a methanogenic enrichment, were close relatives of members of the genus Methanoculleus. Aerobic endospore-forming bacteria mainly belonging to Bacillus and Geobacillus were detected only by culture-dependent methods. Three isolates were successfully obtained having 99, 96, and 96% 16S rRNA gene sequence similarities to Arcobacter sp., Methylocaldum sp., and Methanoculleus sp., respectively. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gene expression profiles responses to aphid feeding in chrysanthemum (Chrysanthemum morifolium).
Xia, Xiaolong; Shao, Yafeng; Jiang, Jiafu; Ren, Liping; Chen, Fadi; Fang, Weimin; Guan, Zhiyong; Chen, Sumei
2014-12-02
Chrysanthemum is an important ornamental plant all over the world. It is easily attacked by aphid, Macrosiphoniella sanbourni. The molecular mechanisms of plant defense responses to aphid are only partially understood. Here, we investigate the gene expression changes in response to aphid feeding in chrysanthemum leaf by RNA-Seq technology. Three libraries were generated from pooled leaf tissues of Chrysanthemum morifolium 'nannongxunzhang' that were collected at different time points with (Y) or without (CK) aphid infestations and mock puncture treatment (Z), and sequenced using an Illumina HiSeqTM 2000 platform. A total of 7,363,292, 7,215,860 and 7,319,841 clean reads were obtained in library CK, Y and Z, respectively. The proportion of clean reads was >97.29% in each library. Approximately 76.35% of the clean reads were mapped to a reference gene database including all known chrysanthemum unigene sequences. 1,157, 527 and 340 differentially expressed genes (DEGs) were identified in the comparison of CK-VS-Y, CK-VS-Z and Z-VS-Y, respectively. These DEGs were involved in phytohormone signaling, cell wall biosynthesis, photosynthesis, reactive oxygen species (ROS) pathway and transcription factor regulatory networks, and so on. Changes in gene expression induced by aphid feeding are shown to be multifaceted. There are various forms of crosstalk between different pathways those genes belonging to, which would allow plants to fine-tune its defense responses.
Rapid Creation and Quantitative Monitoring of High Coverage shRNA Libraries
Bassik, Michael C.; Lebbink, Robert Jan; Churchman, L. Stirling; Ingolia, Nicholas T.; Patena, Weronika; LeProust, Emily M.; Schuldiner, Maya; Weissman, Jonathan S.; McManus, Michael T.
2009-01-01
Short hairpin RNA (shRNA) libraries are limited by the low efficacy of many shRNAs, giving false negatives, and off-target effects, giving false positives. Here we present a strategy for rapidly creating expanded shRNA pools (∼30 shRNAs/gene) that are analyzed by deep-sequencing (EXPAND). This approach enables identification of multiple effective target-specific shRNAs from a complex pool, allowing a rigorous statistical evaluation of whether a gene is a true hit. PMID:19448642
Phenotypic mutant library: potential for gene discovery
USDA-ARS?s Scientific Manuscript database
The rapid development of high throughput and affordable Next- Generation Sequencing (NGS) techniques has renewed interest in gene discovery using forward genetics. The conventional forward genetic approach starts with isolation of mutants with a phenotype of interest, mapping the mutation within a s...
Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin
2016-01-01
ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including streptothricins, borrelidin, two novel lipopeptides, and one unknown antibiotic from Streptomyces rochei Sal35. The transfer, expression, and screening of the library were all performed in a high-throughput way, so that this approach is scalable and adaptable to industrial automation for next-generation antibiotic discovery. PMID:27451447
USDA-ARS?s Scientific Manuscript database
Puccinia striiformis f. sp. tritici (Pst) causes stripe rust, one of the most important diseases of wheat worldwide. To identify Pst genes involved in infection and sporulation, a custom oligonucleotide Genechip was made using sequences of 442 genes selected from Pst cDNA libraries. Microarray analy...
Bellerophon: a program to detect chimeric sequences in multiple sequence alignments.
Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip
2004-09-22
Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments. Bellerophon is available as an interactive web server at http://foo.maths.uq.edu.au/~huber/bellerophon.pl
RNA-Seq analysis to capture the transcriptome landscape of a single cell
Tang, Fuchou; Barbacioru, Catalin; Nordman, Ellen; Xu, Nanlan; Bashkirov, Vladimir I; Lao, Kaiqin; Surani, M. Azim
2013-01-01
We describe here a protocol for digital transcriptome analysis in a single mouse blastomere using a deep sequencing approach. An individual blastomere was first isolated and put into lysate buffer by mouth pipette. Reverse transcription was then performed directly on the whole cell lysate. After this, the free primers were removed by Exonuclease I and a poly(A) tail was added to the 3′ end of the first-strand cDNA by Terminal Deoxynucleotidyl Transferase. Then the single cell cDNAs were amplified by 20 plus 9 cycles of PCR. Then 100-200 ng of these amplified cDNAs were used to construct a sequencing library. The sequencing library can be used for deep sequencing using the SOLiD system. Compared with the cDNA microarray technique, our assay can capture up to 75% more genes expressed in early embryos. The protocol can generate deep sequencing libraries within 6 days for 16 single cell samples. PMID:20203668
Qin, Huibin; Lang, Huihua; Yang, Hongjiang
2013-09-01
Household anaerobic digesters have been installed across rural China for biogas production, but information on methanogen community structure in these small biogas units is sparsely available. By creating clone libraries for 16S rRNA and methyl coenzyme M reductase alpha subunit (mcrA) genes, we investigated the methanogenic consortia in a household biogas digester treating swine manure. Operational taxonomic units (OTUs) were defined by comparative sequence analysis, seven OTUs were identified in the 16S rRNA gene library, and ten OTUs were identified in the mcrA gene library. Both libraries were dominated by clones highly related to the type strain Methanocorpusculum labreanum Z, 64.0 % for 16S rRNA gene clones and 64.3 % for mcrA gene clones. Additionally, gas chromatography assays showed that formic acid was 84.54 % of the total volatile fatty acids and methane was 57.20 % of the biogas composition. Our results may help further isolation and characterization of methanogenic starter strains for industrial biogas production.
Using Phage Display to Create Recombinant Antibodies.
Dasch, James R; Dasch, Amy L
2017-09-01
A variety of phage display technologies have been developed since the approach was first described for antibodies. The most widely used approaches incorporate antibody sequences into the minor coat protein pIII of the nonlytic filamentous phage fd or M13. Libraries of variable gene sequences, encoding either scFv or Fab fragments, are made by incorporating sequences into phagemid vectors. The phagemid is packaged into phage particles with the assistance of a helper phage to produce the antibody display phage. This protocol describes a method for creating a phagemid library. The multiple cloning site (MCS) of the pBluescript KS(-) phagemid vector is replaced by digestion with the restriction enzyme BssHII, followed by the insertion of four overlapping oligonucleotides to create a new MCS within the vector. Next, the 3' portion of gene III (from M13mp18) is amplified and combined with an antibody sequence using overlap extension PCR. This product is inserted into the phagemid vector to create pPDS. Two helper plasmids are also created from the modified pBluescript vector: pLINK provides the linker between the heavy and light chains, and pFABC provides the CH1 domain of the heavy chain. An antibody cDNA library is constructed from the RNA of interest and ligated into pPDS. The phagemid library is electroporated into Escherichia coli cells along with the VCS-M13 helper phage. © 2017 Cold Spring Harbor Laboratory Press.
Bernstein, Steven L; Guo, Yan; Peterson, Katherine; Wistow, Graeme
2009-01-01
Background The optic nerve is a pure white matter central nervous system (CNS) tract with an isolated blood supply, and is widely used in physiological studies of white matter response to various insults. We examined the gene expression profile of human optic nerve (ON) and, through the NEIBANK online resource, to provide a resource of sequenced verified cDNA clones. An un-normalized cDNA library was constructed from pooled human ON tissues and was used in expressed sequence tag (EST) analysis. Location of an abundant oligodendrocyte marker was examined by immunofluorescence. Quantitative real time polymerase chain reaction (qRT-PCR) and Western analysis were used to compare levels of expression for key calcium channel protein genes and protein product in primate and rodent ON. Results Our analyses revealed a profile similar in many respects to other white matter related tissues, but significantly different from previously available ON cDNA libraries. The previous libraries were found to include specific markers for other eye tissues, suggesting contamination. Immune/inflammatory markers were abundant in the new ON library. The oligodendrocyte marker QKI was abundant at the EST level. Immunofluorescence revealed that this protein is a useful oligodendrocyte cell-type marker in rodent and primate ONs. L-type calcium channel EST abundance was found to be particularly low. A qRT-PCR-based comparative mammalian species analysis reveals that L-type calcium channel expression levels are significantly lower in primate than in rodent ON, which may help account for the class-specific difference in responsiveness to calcium channel blocking agents. Several known eye disease genes are abundantly expressed in ON. Many genes associated with normal axonal function, mRNAs associated with axonal transport, inflammation and neuroprotection are observed. Conclusion We conclude that the new cDNA library is a faithful representation of human ON and EST data provide an initial overview of gene expression patterns in this tissue. The data provide clues for tissue-specific and species-specific properties of human ON that will help in design of therapeutic models. PMID:19778450
Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi
2012-07-02
Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.
2012-01-01
Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974
TIAN, PENG; LI, JIE; LIU, XIANG; LI, YUXI; CHEN, MEIHENG; MA, YUN; ZHENG, YI QING; FU, YONGGUI; ZOU, HUA
2014-01-01
Nasal polyps (NP) is highly associated with the disorder of immune cells. Alternative polyadenylation (APA) produces mRNA isoforms with different length of 3′-untranslated region (UTR) and regulates gene expression. It has been proven that this APA-mediated regulation of 3′UTR length is an immune-associated phenomenon. The aim of this study was to investigate the genome-wide alternative tandem 3′UTR length switching events in non-eosinophilic nasal polyp tissue. Thirteen patients diagnosed as having non-eosinophilic nasal polyps were included in this study. Nasal polyp tissue and control mucosa were collected during surgery. The 3′ end library of cDNA was constructed. The recovered libraries were sequenced with second sequencing technology, and the sequencing data were analyzed by an in-house bioinformatics pipeline. Tandem 3′UTR length switching between samples was detected by a test of linear trend alternative to independence. We found a significant alteration in the tandem 3′UTR length in 1,920 genes in nasal polyp samples. Functional annotation results showed that several gene ontology (GO) terms were enriched in the list of genes with switched APA sites, including regulation of transcription, macromolecule catabolic localization and mRNA processing. The results suggested that APA-mediated alternative 3′UTR regulation plays an important role in the post-transcriptional regulation of gene expression in non-eosinophilic nasal polyps. PMID:24715051
ESTree db: a Tool for Peach Functional Genomics
Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo
2005-01-01
Background The ESTree db represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. Results The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. Conclusion The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig. PMID:16351742
ESTree db: a tool for peach functional genomics.
Lazzari, Barbara; Caprera, Andrea; Vecchietti, Alberto; Stella, Alessandra; Milanesi, Luciano; Pozzi, Carlo
2005-12-01
The ESTree db http://www.itb.cnr.it/estree/ represents a collection of Prunus persica expressed sequenced tags (ESTs) and is intended as a resource for peach functional genomics. A total of 6,155 successful EST sequences were obtained from four in-house prepared cDNA libraries from Prunus persica mesocarps at different developmental stages. Another 12,475 peach EST sequences were downloaded from public databases and added to the ESTree db. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts and data were collected in a MySQL database. A php-based web interface was developed to query the database. The ESTree db version as of April 2005 encompasses 18,630 sequences representing eight libraries. Contig assembly was performed with CAP3. Putative single nucleotide polymorphism (SNP) detection was performed with the AutoSNP program and a search engine was implemented to retrieve results. All the sequences and all the contig consensus sequences were annotated both with blastx against the GenBank nr db and with GOblet against the viridiplantae section of the Gene Ontology db. Links to NiceZyme (Expasy) and to the KEGG metabolic pathways were provided. A local BLAST utility is available. A text search utility allows querying and browsing the database. Statistics were provided on Gene Ontology occurrences to assign sequences to Gene Ontology categories. The resulting database is a comprehensive resource of data and links related to peach EST sequences. The Sequence Report and Contig Report pages work as the web interface core structures, giving quick access to data related to each sequence/contig.
Clone DB: an integrated NCBI resource for clone-associated data
Schneider, Valerie A.; Chen, Hsiu-Chuan; Clausen, Cliff; Meric, Peter A.; Zhou, Zhigang; Bouk, Nathan; Husain, Nora; Maglott, Donna R.; Church, Deanna M.
2013-01-01
The National Center for Biotechnology Information (NCBI) Clone DB (http://www.ncbi.nlm.nih.gov/clone/) is an integrated resource providing information about and facilitating access to clones, which serve as valuable research reagents in many fields, including genome sequencing and variation analysis. Clone DB represents an expansion and replacement of the former NCBI Clone Registry and has records for genomic and cell-based libraries and clones representing more than 100 different eukaryotic taxa. Records provide details of library construction, associated sequences, map positions and information about resource distribution. Clone DB is indexed in the NCBI Entrez system and can be queried by fields that include organism, clone name, gene name and sequence identifier. Whenever possible, genomic clones are mapped to reference assemblies and their map positions provided in clone records. Clones mapping to specific genomic regions can also be searched for using the NCBI Clone Finder tool, which accepts queries based on sequence coordinates or features such as gene or transcript names. Clone DB makes reports of library, clone and placement data on its FTP site available for download. With Clone DB, users now have available to them a centralized resource that provides them with the tools they will need to make use of these important research reagents. PMID:23193260
High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells.
Zhou, Yuexin; Zhu, Shiyou; Cai, Changzu; Yuan, Pengfei; Li, Chunmei; Huang, Yanyi; Wei, Wensheng
2014-05-22
Targeted genome editing technologies are powerful tools for studying biology and disease, and have a broad range of research applications. In contrast to the rapid development of toolkits to manipulate individual genes, large-scale screening methods based on the complete loss of gene expression are only now beginning to be developed. Here we report the development of a focused CRISPR/Cas-based (clustered regularly interspaced short palindromic repeats/CRISPR-associated) lentiviral library in human cells and a method of gene identification based on functional screening and high-throughput sequencing analysis. Using knockout library screens, we successfully identified the host genes essential for the intoxication of cells by anthrax and diphtheria toxins, which were confirmed by functional validation. The broad application of this powerful genetic screening strategy will not only facilitate the rapid identification of genes important for bacterial toxicity but will also enable the discovery of genes that participate in other biological processes.
EuroPineDB: a high-coverage web database for maritime pine transcriptome
2011-01-01
Background Pinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases. Description EuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided. Conclusions The EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome. PMID:21762488
Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi
2014-03-01
Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.
Diversity of Metabolically Active Bacteria in Water-Flooded High-Temperature Heavy Oil Reservoir
Nazina, Tamara N.; Shestakova, Natalya M.; Semenova, Ekaterina M.; Korshunova, Alena V.; Kostrukova, Nadezda K.; Tourova, Tatiana P.; Min, Liu; Feng, Qingxian; Poltaraus, Andrey B.
2017-01-01
The goal of this work was to study the overall genomic diversity of microorganisms of the Dagang high-temperature oilfield (PRC) and to characterize the metabolically active fraction of these populations. At this water-flooded oilfield, the microbial community of formation water from the near-bottom zone of an injection well where the most active microbial processes of oil degradation occur was investigated using molecular, cultural, radiotracer, and physicochemical techniques. The samples of microbial DNA and RNA from back-flushed water were used to obtain the clone libraries for the 16S rRNA gene and cDNA of 16S rRNA, respectively. The DNA-derived clone libraries were found to contain bacterial and archaeal 16S rRNA genes and the alkB genes encoding alkane monooxygenases similar to those encoded by alkB-geo1 and alkB-geo6 of geobacilli. The 16S rRNA genes of methanogens (Methanomethylovorans, Methanoculleus, Methanolinea, Methanothrix, and Methanocalculus) were predominant in the DNA-derived library of Archaea cloned sequences; among the bacterial sequences, the 16S rRNA genes of members of the genus Geobacillus were the most numerous. The RNA-derived library contained only bacterial cDNA of the 16S rRNA sequences belonging to metabolically active aerobic organotrophic bacteria (Tepidimonas, Pseudomonas, Acinetobacter), as well as of denitrifying (Azoarcus, Tepidiphilus, Calditerrivibrio), fermenting (Bellilinea), iron-reducing (Geobacter), and sulfate- and sulfur-reducing bacteria (Desulfomicrobium, Desulfuromonas). The presence of the microorganisms of the main functional groups revealed by molecular techniques was confirmed by the results of cultural, radioisotope, and geochemical research. Functioning of the mesophilic and thermophilic branches was shown for the microbial food chain of the near-bottom zone of the injection well, which included the microorganisms of the carbon, sulfur, iron, and nitrogen cycles. PMID:28487680
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zezza, D.J.; Stewart, S.E.; Steiner, L.A.
1992-12-15
Xenopus laevis Ig contain two distinct types of L chains, designated [rho] or L1 and [sigma] or L2. The authors have analyzed Xenopus genomic DNA by Southern blotting with cDNA probes specific for L1 V and C regions. Many fragments hybridized to the V probe, but only one or two fragments hybridized to the C probe. Corresponding C, J, and V gene segments were identified on clones isolated from a genomic library prepared from the same DNA. One clone contains a C gene segment separated from a J gene segment by an intron of 3.4 kb. The J and Cmore » gene segments are nearly identical in sequence to cDNA clones analyzed previously. The C segment is somewhat more similar and the J segment considerably more similar in sequence to the corresponding segments of mammalian [kappa] chains than to those of mammalian [lambda] chains. Upstream of the J segment is a typical recombination signal sequence with a spacer of 23 bp, as in J[kappa]. A second clone from the library contains four V gene segments, separated by 2.1 to 3.6 kb. Two of these, V1 and V3, have the expected structural and regulatory features of V genes, and are very similar in sequence to each other and to mammalian V[kappa]. A third gene segment, V2, resembles V1 and V3 in its coding region and nearby 5[prime]-flanking region, but diverges in sequence 5[prime] to position [minus]95 with loss of the octamer promoter element. The fourth V-like segment is similar to the others at the 3[prime]-end, but upstream of codon 64 bears no resemblance in sequence to any Ig V region. All four V segments have typical recombination signal sequences with 12-bp spacers at their 3[prime]-ends, as in V[kappa]. Taken together, the data suggest that Xenopus L1 L chain genes are members of the [kappa] gene family. 80 refs., 9 figs.« less
Single molecule targeted sequencing for cancer gene mutation detection.
Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui
2016-05-19
With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis.
Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.
Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong
2018-05-01
This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.
Jensen, Sigmund; Fortunato, Sofia A V; Hoffmann, Friederike; Hoem, Solveig; Rapp, Hans Tore; Øvreås, Lise; Torsvik, Vigdis L
2017-04-01
During the last decades, our knowledge about the activity of sponge-associated microorganisms and their contribution to biogeochemical cycling has gradually increased. Functional groups involved in carbon and nitrogen metabolism are well documented, whereas knowledge about microorganisms involved in the sulfur cycle is still limited. Both sulfate reduction and sulfide oxidation has been detected in the cold water sponge Geodia barretti from Korsfjord in Norway, and with specimens from this site, the present study aims to identify extant versus active sponge-associated microbiota with focus on sulfur metabolism. Comparative analysis of small subunit ribosomal RNA (16S rRNA) gene (DNA) and transcript (complementary DNA (cDNA)) libraries revealed profound differences. The transcript library was predominated by Chloroflexi despite their low abundance in the gene library. An opposite result was found for Acidobacteria. Proteobacteria were detected in both libraries with representatives of the Alpha- and Gammaproteobacteria related to clades with presumably thiotrophic bacteria from sponges and other marine invertebrates. Sequences that clustered with sponge-associated Deltaproteobacteria were remotely related to cultivated sulfate-reducing bacteria. The microbes involved in sulfur cycling were identified by the functional gene aprA (adenosine-5'-phosphosulfate reductase) and its transcript. Of the aprA sequences (DNA and cDNA), 87 % affiliated with sulfur-oxidizing bacteria. They clustered with Alphaproteobacteria and with clades of deep-branching Gammaproteobacteria. The remaining sequences clustered with sulfate-reducing Archaea of the phylum Euryarchaeota. These results indicate an active role of yet uncharacterized Bacteria and Archaea in the sponge's sulfur cycle.
Filteau, Marie; Lagacé, Luc; LaPointe, Gisèle; Roy, Denis
2010-04-01
An arbitrary primed community PCR fingerprinting technique based on capillary electrophoresis was developed to study maple sap microbial community characteristics among 19 production sites in Québec over the tapping season. Presumptive fragment identification was made with corresponding fingerprint profiles of bacterial isolate cultures. Maple sap microbial communities were subsequently compared using a representative subset of 13 16S rRNA gene clone libraries followed by gene sequence analysis. Results from both methods indicated that all maple sap production sites and flow periods shared common microbiota members, but distinctive features also existed. Changes over the season in relative abundance of predominant populations showed evidence of a common pattern. Pseudomonas (64%) and Rahnella (8%) were the most abundantly and frequently represented genera of the 2239 sequences analyzed. Janthinobacterium, Leuconostoc, Lactococcus, Weissella, Epilithonimonas and Sphingomonas were revealed as occasional contaminants in maple sap. Maple sap microbiota showed a low level of deep diversity along with a high variation of similar 16S rRNA gene sequences within the Pseudomonas genus. Predominance of Pseudomonas is suggested as a typical feature of maple sap microbiota across geographical regions, production sites, and sap flow periods.
Algorithms for optimizing cross-overs in DNA shuffling.
He, Lu; Friedman, Alan M; Bailey-Kellogg, Chris
2012-03-21
DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library. This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. Our CODNS (cross-over optimization for DNA shuffling) approach employs polynomial-time dynamic programming algorithms to select codons for the parental amino acids, allowing for zero or a fixed number of conservative substitutions. We first present efficient algorithms to optimize the local sequence identity or the nearest-neighbor approximation of the change in free energy upon annealing, objectives that were previously optimized by computationally-expensive integer programming methods. We then present efficient algorithms for more powerful objectives that seek to localize and enhance the frequency of recombination by producing "runs" of common nucleotides either overall or according to the sequence diversity of the resulting chimeras. We demonstrate the effectiveness of CODNS in choosing codons and allocating substitutions to promote recombination between parents targeted in earlier studies: two GAR transformylases (41% amino acid sequence identity), two very distantly related DNA polymerases, Pol X and β (15%), and beta-lactamases of varying identity (26-47%). Our methods provide the protein engineer with a new approach to DNA shuffling that supports substantially more diverse parents, is more deterministic, and generates more predictable and more diverse chimeric libraries.
Rabausch, U.; Juergensen, J.; Ilmberger, N.; Böhnke, S.; Fischer, S.; Schubach, B.; Schulte, M.
2013-01-01
The functional detection of novel enzymes other than hydrolases from metagenomes is limited since only a very few reliable screening procedures are available that allow the rapid screening of large clone libraries. For the discovery of flavonoid-modifying enzymes in genome and metagenome clone libraries, we have developed a new screening system based on high-performance thin-layer chromatography (HPTLC). This metagenome extract thin-layer chromatography analysis (META) allows the rapid detection of glycosyltransferase (GT) and also other flavonoid-modifying activities. The developed screening method is highly sensitive, and an amount of 4 ng of modified flavonoid molecules can be detected. This novel technology was validated against a control library of 1,920 fosmid clones generated from a single Bacillus cereus isolate and then used to analyze more than 38,000 clones derived from two different metagenomic preparations. Thereby we identified two novel UDP glycosyltransferase (UGT) genes. The metagenome-derived gtfC gene encoded a 52-kDa protein, and the deduced amino acid sequence was weakly similar to sequences of putative UGTs from Fibrisoma and Dyadobacter. GtfC mediated the transfer of different hexose moieties and exhibited high activities on flavones, flavonols, flavanones, and stilbenes and also accepted isoflavones and chalcones. From the control library we identified a novel macroside glycosyltransferase (MGT) with a calculated molecular mass of 46 kDa. The deduced amino acid sequence was highly similar to sequences of MGTs from Bacillus thuringiensis. Recombinant MgtB transferred the sugar residue from UDP-glucose effectively to flavones, flavonols, isoflavones, and flavanones. Moreover, MgtB exhibited high activity on larger flavonoid molecules such as tiliroside. PMID:23686272
Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome.
Nicholson, Matthew J; Theodorou, Michael K; Brookman, Jayne L
2005-01-01
The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens and lampirin. This gene was present as a single copy in Orpinomyces, was expressed during vegetative growth and was also detected in genomes from another gut fungal genus, Neocallimastix.
NASA Astrophysics Data System (ADS)
Lau, Yun-Fai; Kan, Yuet Wai
1983-09-01
We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.
Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi
2004-02-01
To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
Viggor, Signe; Jõesaar, Merike; Vedler, Eve; Kiiker, Riinu; Pärnpuu, Liis; Heinaru, Ain
2015-12-30
Formation of specific oil degrading bacterial communities in diesel fuel, crude oil, heptane and hexadecane supplemented microcosms of the Baltic Sea surface water samples was revealed. The 475 sequences from constructed alkane hydroxylase alkB gene clone libraries were grouped into 30 OPFs. The two largest groups were most similar to Pedobacter sp. (245 from 475) and Limnobacter sp. (112 from 475) alkB gene sequences. From 56 alkane-degrading bacterial strains 41 belonged to the Pseudomonas spp. and 8 to the Rhodococcus spp. having redundant alkB genes. Together 68 alkB gene sequences were identified. These genes grouped into 20 OPFs, half of them being specific only to the isolated strains. Altogether 543 diverse alkB genes were characterized in the brackish Baltic Sea water; some of them representing novel lineages having very low sequence identities with corresponding genes of the reference strains. Copyright © 2015 Elsevier Ltd. All rights reserved.
Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong
2013-11-01
Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.
Polymenakou, Paraskevi N; Bertilsson, Stefan; Tselepides, Anastasios; Stephanou, Euripides G
2005-10-01
The regional variability of sediment bacterial community composition and diversity was studied by comparative analysis of four large 16S ribosomal DNA (rDNA) clone libraries from sediments in different regions of the Eastern Mediterranean Sea (Thermaikos Gulf, Cretan Sea, and South lonian Sea). Amplified rDNA restriction analysis of 664 clones from the libraries indicate that the rDNA richness and evenness was high: for example, a near-1:1 relationship among screened clones and number of unique restriction patterns when up to 190 clones were screened for each library. Phylogenetic analysis of 207 bacterial 16S rDNA sequences from the sediment libraries demonstrated that Gamma-, Delta-, and Alphaproteobacteria, Holophaga/Acidobacteria, Planctomycetales, Actinobacteria, Bacteroidetes, and Verrucomicrobia were represented in all four libraries. A few clones also grouped with the Betaproteobacteria, Nitrospirae, Spirochaetales, Chlamydiae, Firmicutes, and candidate division OPl 1. The abundance of sequences affiliated with Gammaproteobacteria was higher in libraries from shallow sediments in the Thermaikos Gulf (30 m) and the Cretan Sea (100 m) compared to the deeper South Ionian station (2790 m). Most sequences in the four sediment libraries clustered with uncultured 16S rDNA phylotypes from marine habitats, and many of the closest matches were clones from hydrocarbon seeps, benzene-mineralizing consortia, sulfate reducers, sulk oxidizers, and ammonia oxidizers. LIBSHUFF statistics of 16S rDNA gene sequences from the four libraries revealed major differences, indicating either a very high richness in the sediment bacterial communities or considerable variability in bacterial community composition among regions, or both.
Construction of BAC Libraries from Flow-Sorted Chromosomes.
Šafář, Jan; Šimková, Hana; Doležel, Jaroslav
2016-01-01
Cloned DNA libraries in bacterial artificial chromosome (BAC) are the most widely used form of large-insert DNA libraries. BAC libraries are typically represented by ordered clones derived from genomic DNA of a particular organism. In the case of large eukaryotic genomes, whole-genome libraries consist of a hundred thousand to a million clones, which make their handling and screening a daunting task. The labor and cost of working with whole-genome libraries can be greatly reduced by constructing a library derived from a smaller part of the genome. Here we describe construction of BAC libraries from mitotic chromosomes purified by flow cytometric sorting. Chromosome-specific BAC libraries facilitate positional gene cloning, physical mapping, and sequencing in complex plant genomes.
Principles and application of antibody libraries for infectious diseases.
Lim, Bee Nar; Tye, Gee Jun; Choong, Yee Siew; Ong, Eugene Boon Beng; Ismail, Asma; Lim, Theam Soon
2014-12-01
Antibodies have been used efficiently for the treatment and diagnosis of many diseases. Recombinant antibody technology allows the generation of fully human antibodies. Phage display is the gold standard for the production of human antibodies in vitro. To generate monoclonal antibodies by phage display, the generation of antibody libraries is crucial. Antibody libraries are classified according to the source where the antibody gene sequences were obtained. The most useful library for infectious diseases is the immunized library. Immunized libraries would allow better and selective enrichment of antibodies against disease antigens. The antibodies generated from these libraries can be translated for both diagnostic and therapeutic applications. This review focuses on the generation of immunized antibody libraries and the potential applications of the antibodies derived from these libraries.
Gao, Jian Ping; Wang, Dong; Cao, Ling Ya; Sun, Hai Feng
2015-01-01
Background Codonopsis pilosula (Franch.) Nannf. is one of the most widely used medicinal plants. Although chemical and pharmacological studies have shown that codonopsis polysaccharides (CPPs) are bioactive compounds and that their composition is variable, their biosynthetic pathways remain largely unknown. Next-generation sequencing is an efficient and high-throughput technique that allows the identification of candidate genes involved in secondary metabolism. Principal Findings To identify the components involved in CPP biosynthesis, a transcriptome library, prepared using root and other tissues, was assembled with the help of Illumina sequencing. A total of 9.2 Gb of clean nucleotides was obtained comprising 91,175,044 clean reads, 102,125 contigs, and 45,511 unigenes. After aligning the sequences to the public protein databases, 76.1% of the unigenes were annotated. Among these annotated unigenes, 26,189 were assigned to Gene Ontology categories, 11,415 to Clusters of Orthologous Groups, and 18,848 to Kyoto Encyclopedia of Genes and Genomes pathways. Analysis of abundance of transcripts in the library showed that genes, including those encoding metallothionein, aquaporin, and cysteine protease that are related to stress responses, were in the top list. Among genes involved in the biosynthesis of CPP, those responsible for the synthesis of UDP-L-arabinose and UDP-xylose were highly expressed. Significance To our knowledge, this is the first study to provide a public transcriptome dataset prepared from C. pilosula and an outline of the biosynthetic pathway of polysaccharides in a medicinal plant. Identified candidate genes involved in CPP biosynthesis provide understanding of the biosynthesis and regulation of CPP at the molecular level. PMID:25719364
A method for high-throughput production of sequence-verified DNA libraries and strain collections.
Smith, Justin D; Schlecht, Ulrich; Xu, Weihong; Suresh, Sundari; Horecka, Joe; Proctor, Michael J; Aiyar, Raeka S; Bennett, Richard A O; Chu, Angela; Li, Yong Fuga; Roy, Kevin; Davis, Ronald W; Steinmetz, Lars M; Hyman, Richard W; Levy, Sasha F; St Onge, Robert P
2017-02-13
The low costs of array-synthesized oligonucleotide libraries are empowering rapid advances in quantitative and synthetic biology. However, high synthesis error rates, uneven representation, and lack of access to individual oligonucleotides limit the true potential of these libraries. We have developed a cost-effective method called Recombinase Directed Indexing (REDI), which involves integration of a complex library into yeast, site-specific recombination to index library DNA, and next-generation sequencing to identify desired clones. We used REDI to generate a library of ~3,300 DNA probes that exhibited > 96% purity and remarkable uniformity (> 95% of probes within twofold of the median abundance). Additionally, we created a collection of ~9,000 individually accessible CRISPR interference yeast strains for > 99% of genes required for either fermentative or respiratory growth, demonstrating the utility of REDI for rapid and cost-effective creation of strain collections from oligonucleotide pools. Our approach is adaptable to any complex DNA library, and fundamentally changes how these libraries can be parsed, maintained, propagated, and characterized. © 2017 The Authors. Published under the terms of the CC BY 4.0 license.
Bacterial Artificial Chromosome Libraries for Mouse Sequencing and Functional Analysis
Osoegawa, Kazutoyo; Tateno, Minako; Woon, Peng Yeong; Frengen, Eirik; Mammoser, Aaron G.; Catanese, Joseph J.; Hayashizaki, Yoshihide; de Jong, Pieter J.
2000-01-01
Bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) libraries providing a combined 33-fold representation of the murine genome have been constructed using two different restriction enzymes for genomic digestion. A large-insert PAC library was prepared from the 129S6/SvEvTac strain in a bacterial/mammalian shuttle vector to facilitate functional gene studies. For genome mapping and sequencing, we prepared BAC libraries from the 129S6/SvEvTac and the C57BL/6J strains. The average insert sizes for the three libraries range between 130 kb and 200 kb. Based on the numbers of clones and the observed average insert sizes, we estimate each library to have slightly in excess of 10-fold genome representation. The average number of clones found after hybridization screening with 28 probes was in the range of 9–14 clones per marker. To explore the fidelity of the genomic representation in the three libraries, we analyzed three contigs, each established after screening with a single unique marker. New markers were established from the end sequences and screened against all the contig members to determine if any of the BACs and PACs are chimeric or rearranged. Only one chimeric clone and six potential deletions have been observed after extensive analysis of 113 PAC and BAC clones. Seventy-one of the 113 clones were conclusively nonchimeric because both end markers or sequences were mapped to the other confirmed contig members. We could not exclude chimerism for the remaining 41 clones because one or both of the insert termini did not contain unique sequence to design markers. The low rate of chimerism, ∼1%, and the low level of detected rearrangements support the anticipated usefulness of the BAC libraries for genome research. [The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AQ797173–AQ797398.] PMID:10645956
Microvariation Artifacts Introduced by PCR and Cloning of Closely Related 16S rRNA Gene Sequences†
Speksnijder, Arjen G. C. L.; Kowalchuk, George A.; De Jong, Sander; Kline, Elizabeth; Stephen, John R.; Laanbroek, Hendrikus J.
2001-01-01
A defined template mixture of seven closely related 16S-rDNA clones was used in a PCR-cloning experiment to assess and track sources of artifactual sequence variation in 16S rDNA clone libraries. At least 14% of the recovered clones contained aberrations. Artifact sources were polymerase errors, a mutational hot spot, and cloning of heteroduplexes and chimeras. These data may partially explain the high degree of microheterogeneity typical of sequence clusters detected in environmental clone libraries. PMID:11133483
Formation of Nitrogenase NifDK Tetramers in the Mitochondria of Saccharomyces cerevisiae
2017-01-01
Transferring the prokaryotic enzyme nitrogenase into a eukaryotic host with the final aim of developing N2 fixing cereal crops would revolutionize agricultural systems worldwide. Targeting it to mitochondria has potential advantages because of the organelle’s high O2 consumption and the presence of bacterial-type iron–sulfur cluster biosynthetic machinery. In this study, we constructed 96 strains of Saccharomyces cerevisiae in which transcriptional units comprising nine Azotobacter vinelandii nif genes (nifHDKUSMBEN) were integrated into the genome. Two combinatorial libraries of nif gene clusters were constructed: a library of mitochondrial leading sequences consisting of 24 clusters within four subsets of nif gene expression strength, and an expression library of 72 clusters with fixed mitochondrial leading sequences and nif expression levels assigned according to factorial design. In total, 29 promoters and 18 terminators were combined to adjust nif gene expression levels. Expression and mitochondrial targeting was confirmed at the protein level as immunoblot analysis showed that Nif proteins could be efficiently accumulated in mitochondria. NifDK tetramer formation, an essential step of nitrogenase assembly, was experimentally proven both in cell-free extracts and in purified NifDK preparations. This work represents a first step toward obtaining functional nitrogenase in the mitochondria of a eukaryotic cell. PMID:28221768
Friis, Thor Einar; Stephenson, Sally; Xiao, Yin; Whitehead, Jon
2014-01-01
The sheep (Ovis aries) is favored by many musculoskeletal tissue engineering groups as a large animal model because of its docile temperament and ease of husbandry. The size and weight of sheep are comparable to humans, which allows for the use of implants and fixation devices used in human clinical practice. The construction of a complimentary DNA (cDNA) library can capture the expression of genes in both a tissue- and time-specific manner. cDNA libraries have been a consistent source of gene discovery ever since the technology became commonplace more than three decades ago. Here, we describe the construction of a cDNA library using cells derived from sheep bones based on the pBluescript cDNA kit. Thirty clones were picked at random and sequenced. This led to the identification of a novel gene, C12orf29, which our initial experiments indicate is involved in skeletal biology. We also describe a polymerase chain reaction-based cDNA clone isolation method that allows the isolation of genes of interest from a cDNA library pool. The techniques outlined here can be applied in-house by smaller tissue engineering groups to generate tools for biomolecular research for large preclinical animal studies and highlights the power of standard cDNA library protocols to uncover novel genes. PMID:24447069
Smith, Robin P; Riesenfeld, Samantha J; Holloway, Alisha K; Li, Qiang; Murphy, Karl K; Feliciano, Natalie M; Orecchia, Lorenzo; Oksenberg, Nir; Pollard, Katherine S; Ahituv, Nadav
2013-07-18
Large-scale annotation efforts have improved our ability to coarsely predict regulatory elements throughout vertebrate genomes. However, it is unclear how complex spatiotemporal patterns of gene expression driven by these elements emerge from the activity of short, transcription factor binding sequences. We describe a comprehensive promoter extension assay in which the regulatory potential of all 6 base-pair (bp) sequences was tested in the context of a minimal promoter. To enable this large-scale screen, we developed algorithms that use a reverse-complement aware decomposition of the de Bruijn graph to design a library of DNA oligomers incorporating every 6-bp sequence exactly once. Our library multiplexes all 4,096 unique 6-mers into 184 double-stranded 15-bp oligomers, which is sufficiently compact for in vivo testing. We injected each multiplexed construct into zebrafish embryos and scored GFP expression in 15 tissues at two developmental time points. Twenty-seven constructs produced consistent expression patterns, with the majority doing so in only one tissue. Functional sequences are enriched near biologically relevant genes, match motifs for developmental transcription factors, and are required for enhancer activity. By concatenating tissue-specific functional sequences, we generated completely synthetic enhancers for the notochord, epidermis, spinal cord, forebrain and otic lateral line, and show that short regulatory sequences do not always function modularly. This work introduces a unique in vivo catalog of short, functional regulatory sequences and demonstrates several important principles of regulatory element organization. Furthermore, we provide resources for designing compact, reverse-complement aware k-mer libraries.
Hubert, Casey R J; Oldenburg, Thomas B P; Fustic, Milovan; Gray, Neil D; Larter, Stephen R; Penn, Kevin; Rowan, Arlene K; Seshadri, Rekha; Sherry, Angela; Swainsbury, Richard; Voordouw, Gerrit; Voordouw, Johanna K; Head, Ian M
2012-01-01
Summary The subsurface microbiology of an Athabasca oil sands reservoir in western Canada containing severely biodegraded oil was investigated by combining 16S rRNA gene- and polar lipid-based analyses of reservoir formation water with geochemical analyses of the crude oil and formation water. Biomass was filtered from formation water, DNA was extracted using two different methods, and 16S rRNA gene fragments were amplified with several different primer pairs prior to cloning and sequencing or community fingerprinting by denaturing gradient gel electrophoresis (DGGE). Similar results were obtained irrespective of the DNA extraction method or primers used. Archaeal libraries were dominated by Methanomicrobiales (410 of 414 total sequences formed a dominant phylotype affiliated with a Methanoregula sp.), consistent with the proposed dominant role of CO2-reducing methanogens in crude oil biodegradation. In two bacterial 16S rRNA clone libraries generated with different primer pairs, > 99% and 100% of the sequences were affiliated with Epsilonproteobacteria (n = 382 and 72 total clones respectively). This massive dominance of Epsilonproteobacteria sequences was again obtained in a third library (99% of sequences; n = 96 clones) using a third universal bacterial primer pair (inosine-341f and 1492r). Sequencing of bands from DGGE profiles and intact polar lipid analyses were in accordance with the bacterial clone library results. Epsilonproteobacterial OTUs were affiliated with Sulfuricurvum, Arcobacter and Sulfurospirillum spp. detected in other oil field habitats. The dominant organism revealed by the bacterial libraries (87% of all sequences) is a close relative of Sulfuricurvum kujiense – an organism capable of oxidizing reduced sulfur compounds in crude oil. Geochemical analysis of organic extracts from bitumen at different reservoir depths down to the oil water transition zone of these oil sands indicated active biodegradation of dibenzothiophenes, and stable sulfur isotope ratios for elemental sulfur and sulfate in formation waters were indicative of anaerobic oxidation of sulfur compounds. Microbial desulfurization of crude oil may be an important metabolism for Epsilonproteobacteria indigenous to oil reservoirs with elevated sulfur content and may explain their prevalence in formation waters from highly biodegraded petroleum systems. PMID:21824242
Rosconi, Federico; de Vries, Stefan P. W.; Baig, Abiyad; Fabiano, Elena
2016-01-01
ABSTRACT The interior of plants contains microorganisms (referred to as endophytes) that are distinct from those present at the root surface or in the surrounding soil. Herbaspirillum seropedicae strain SmR1, belonging to the betaproteobacteria, is an endophyte that colonizes crops, including rice, maize, sugarcane, and sorghum. Different approaches have revealed genes and pathways regulated during the interactions of H. seropedicae with its plant hosts. However, functional genomic analysis of transposon (Tn) mutants has been hampered by the lack of genetic tools. Here we successfully employed a combination of in vivo high-density mariner Tn mutagenesis and targeted Tn insertion site sequencing (Tn-seq) in H. seropedicae SmR1. The analysis of multiple gene-saturating Tn libraries revealed that 395 genes are essential for the growth of H. seropedicae SmR1 in tryptone-yeast extract medium. A comparative analysis with the Database of Essential Genes (DEG) showed that 25 genes are uniquely essential in H. seropedicae SmR1. The Tn mutagenesis protocol developed and the gene-saturating Tn libraries generated will facilitate elucidation of the genetic mechanisms of the H. seropedicae endophytic lifestyle. IMPORTANCE A focal point in the study of endophytes is the development of effective biofertilizers that could help to reduce the input of agrochemicals in croplands. Besides the ability to promote plant growth, a good biofertilizer should be successful in colonizing its host and competing against the native microbiota. By using a systematic Tn-based gene-inactivation strategy and massively parallel sequencing of Tn insertion sites (Tn-seq), it is possible to study the fitness of thousands of Tn mutants in a single experiment. We have applied the combination of these techniques to the plant-growth-promoting endophyte Herbaspirillum seropedicae SmR1. The Tn mutant libraries generated will enable studies into the genetic mechanisms of H. seropedicae-plant interactions. The approach that we have taken is applicable to other plant-interacting bacteria. PMID:27590816
Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.
2014-01-01
RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
Han, Yike; Wang, Xianyun; Zhao, Fengyue; Gao, Shang; Wei, Aimin; Chen, Zhengwu; Liu, Nan; Zhang, Zhenxian; Du, Shengli
2018-05-01
Cucumber ( Cucumis sativus L. ) pollen development involves a diverse range of gene interactions between sporophytic and gametophytic tissues. Previous studies in our laboratory showed that male sterility was controlled by a single recessive nuclear gene, and occurred in pollen mother cell meiophase. To fully explore the global gene expression and identify genes related to male sterility, a RNA-seq analysis was adopted in this study. Young male flower-buds (1-2 mm in length) from genetic male sterility (GMS) mutant and homozygous fertile cucumber (WT) were collected for two sequencing libraries. Total 545 differentially expressed genes (DEGs), including 142 up-regulated DEGs and 403 down-regulated DEGs, were detected in two libraries (Fold Change ≥ 2, FDR < 0.01). These genes were involved in a variety of metabolic pathways, like ethylene-activated signaling pathway, sporopollenin biosynthetic pathway, cell cycle and DNA damage repair pathway. qRT-PCR analysis was performed and showed that the correlation between RNA-Seq and qRT-PCR was 0.876. These findings contribute to a better understanding of the mechanism that leads to GMS in cucumber.
Identification of genes differentially expressed in association with acquired cisplatin resistance
Johnsson, A; Zeelenberg, I; Min, Y; Hilinski, J; Berry, C; Howell, S B; Los, G
2000-01-01
The goal of this study was to identify genes whose mRNA levels are differentially expressed in human cells with acquired cisplatin (cDDP) resistance. Using the parental UMSCC10b head and neck carcinoma cell line and the 5.9-fold cDDP-resistant subline, UMSCC10b/Pt-S15, two suppressive subtraction hybridization (SSH) cDNA libraries were prepared. One library represented mRNAs whose levels were increased in the cDDP resistant variant (the UP library), the other one represented mRNAs whose levels were decreased in the resistant cells (the DOWN library). Arrays constructed with inserts recovered from these libraries were hybridized with SSH products to identify truly differentially expressed elements. A total of 51 cDNA fragments present in the UP library and 16 in the DOWN library met the criteria established for differential expression. The sequences of 87% of these cDNA fragments were identified in Genbank. Among the mRNAs in the UP library that were frequently isolated and that showed high levels of differential expression were cytochrome oxidase I, ribosomal protein 28S, elongation factor 1α, α-enolase, stathmin, and HSP70. The approach taken in this study permitted identification of many genes never before linked to the cDDP-resistant phenotype. © 2000 Cancer Research Campaign PMID:10993653
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Wang, Ning; Kinoshita, Shigeharu; Nomura, Naoko; Riho, Chihiro; Maeyama, Kaoru; Nagai, Kiyohito; Watabe, Shugo
2012-04-01
Recent researches revealed the regional preference of biomineralization gene transcription in the pearl oyster Pinctada fucata: it transcribed mainly the genes responsible for nacre secretion in mantle pallial, whereas the ones regulating calcite shells expressed in mantle edge. This study took use of this character and constructed the forward and reverse suppression subtractive hybridization (SSH) cDNA libraries. A total of 669 cDNA clones were sequenced and 360 expressed sequence tags (ESTs) greater than 100 bp were generated. Functional annotation associated 95 ESTs with specific functions, and 79 among them were identified from P. fucata at the first time. In the forward SSH cDNA library, it recognized mass amount of nacre protein genes, biomineralization genes dominantly expressed in the mantle pallial, calcium-ion-binding genes, and other biomineralization-related genes important for pearl formation. Real-time PCR showed that all the examined genes were distributed in oyster mantle tissues with a consistence to the SSH design. The detection of their RNA transcripts in pearl sac confirmed that the identified genes were certainly involved in pearl formation. Therefore, the data from this work will initiate a new round of pearl formation gene study and shed new insights into molluscan biomineralization.
Freimoser, Florian M; Screen, Steven; Bagga, Savita; Hu, Gang; St Leger, Raymond J
2003-01-01
Expressed sequence tag (EST) libraries for Metarhizium anisopliae, the causative agent of green muscardine disease, were developed from the broad host-range pathogen Metarhizium anisopliae sf. anisopliae and the specific grasshopper pathogen, M. anisopliae sf. acridum. Approximately 1,700 5' end sequences from each subspecies were generated from cDNA libraries representing fungi grown under conditions that maximize secretion of cuticle-degrading enzymes. Both subspecies had ESTs for virtually all pathogenicity-related genes cloned to date from M. anisopliae, but many novel genes encoding potential virulence factors were also tagged. Enzymes with potential targets in the insect host included proteases, chitinases, phospholipases, lipases, esterases, phosphatases and enzymes producing toxic secondary metabolites. A diverse array of proteases composed 36 % of all M. anisopliae sf. anisopliae ESTs. Eighty percent of the ESTs that could be clustered into functional groups had significant matches (E<10(-5)) in other ascomycete fungi. These included genes reported to have specific roles in pathogens with plant or vertebrate hosts. Many of the remaining ESTs had their best BLAST match among animal, plant and bacterial sequences. These include genes with plant and microbial counterparts that produce potent antimicrobials. The abundance of transcripts discovered for different functional groups varied between the two subspecies of M. anisopliae in a manner consistent with ecological adaptations of the two pathogens. By hastening gene discovery this project has enhanced development of improved mycoinsecticides. In addition, the M. anisopliae ESTs represent a significant contribution to the extensive database of sequences from ascomycetes that are saprophytes or plant and vertebrate pathogens. Comparative analyses of these sequences is providing important information about the biology and evolutionary history of this clade.
Fang, Yi-Kai; Huang, Kuo-Yang; Huang, Po-Jung; Lin, Rose; Chao, Mei; Tang, Petrus
2015-12-01
Trichomonas vaginalis is the etiologic agent of trichomoniasis, the most common nonviral sexually transmitted disease in the world. This infection affects millions of individuals worldwide annually. Although direct sexual contact is the most common mode of transmission, increasing evidence indicates that T. vaginalis can survive in the external environment and can be transmitted by contaminated utensils. We found that the growth of T. vaginalis under cold conditions is greatly inhibited, but recovers after placing these stressed cells at the normal cultivation temperature of 37 °C. However, the mechanisms by which T. vaginalis regulates this adaptive process are unclear. An expressed sequence tag (EST) database generated from a complementary DNA library of T. vaginalis messenger RNAs expressed under cold-culture conditions (4 °C, TvC) was compared with a previously published normal-cultured EST library (37 °C, TvE) to assess the cold-stress responses of T. vaginalis. A total of 9780 clones were sequenced from the TvC library and were mapped to 2934 genes in the T. vaginalis genome. A total of 1254 genes were expressed in both the TvE and TvC libraries, and 1680 genes were only found in the TvC library. A functional analysis showed that cold temperature has effects on many cellular mechanisms, including increased H2O2 tolerance, activation of the ubiquitin-proteasome system, induction of iron-sulfur cluster assembly, and reduced energy metabolism and enzyme expression. The current study is the first large-scale transcriptomic analysis in cold-stressed T. vaginalis and the results enhance our understanding of this important protist. Copyright © 2014. Published by Elsevier B.V.
2010-01-01
Background Expressed Sequence Tag (EST) has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST) sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047), among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs) in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65%) and low in the peach (46%), and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species. PMID:20626882
DOE Office of Scientific and Technical Information (OSTI.GOV)
Onda, M.; Kudo, S.; Fukuda, M.
Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
High-Throughput Gene Mapping in Caenorhabditis elegans
Swan, Kathryn A.; Curtis, Damian E.; McKusick, Kathleen B.; Voinov, Alexander V.; Mapa, Felipa A.; Cancilla, Michael R.
2002-01-01
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 ± 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18. [The sequence data described in this paper have been submitted to the NCBI dbSNP data library under accession nos. 4388625–4389689 and GenBank dbSTS under accession nos. 973810–974874. The following individuals and institutions kindly provided reagents, samples, or unpublished information as indicated in the paper: The C. elegans Sequencing Consortium and The Caenorhabditis Genetics Center.] PMID:12097347
2013-01-01
Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that not only was the gene nucleotide sequence conserved between soybean and lupin GRRs, but the order and orientation of particular genes in syntenic blocks was homologous, as well. These findings will be valuable to the forthcoming sequencing of the lupin genome. PMID:23379841
Jones, Alicia M; Atkinson, Joshua T; Silberg, Jonathan J
2017-01-01
Rearrangements that alter the order of a protein's sequence are used in the lab to study protein folding, improve activity, and build molecular switches. One of the simplest ways to rearrange a protein sequence is through random circular permutation, where native protein termini are linked together and new termini are created elsewhere through random backbone fission. Transposase mutagenesis has emerged as a simple way to generate libraries encoding different circularly permuted variants of proteins. With this approach, a synthetic transposon (called a permuteposon) is randomly inserted throughout a circularized gene to generate vectors that express different permuted variants of a protein. In this chapter, we outline the protocol for constructing combinatorial libraries of circularly permuted proteins using transposase mutagenesis, and we describe the different permuteposons that have been developed to facilitate library construction.
Rozenberg, Andrey; Leese, Florian; Weiss, Linda C; Tollrian, Ralph
2016-01-01
Tag-Seq is a high-throughput approach used for discovering SNPs and characterizing gene expression. In comparison to RNA-Seq, Tag-Seq eases data processing and allows detection of rare mRNA species using only one tag per transcript molecule. However, reduced library complexity raises the issue of PCR duplicates, which distort gene expression levels. Here we present a novel Tag-Seq protocol that uses the least biased methods for RNA library preparation combined with a novel approach for joint PCR template and sample labeling. In our protocol, input RNA is fragmented by hydrolysis, and poly(A)-bearing RNAs are selected and directly ligated to mixed DNA-RNA P5 adapters. The P5 adapters contain i5 barcodes composed of sample-specific (moderately) degenerate base regions (mDBRs), which later allow detection of PCR duplicates. The P7 adapter is attached via reverse transcription with individual i7 barcodes added during the amplification step. The resulting libraries can be sequenced on an Illumina sequencer. After sample demultiplexing and PCR duplicate removal with a free software tool we designed, the data are ready for downstream analysis. Our protocol was tested on RNA samples from predator-induced and control Daphnia microcrustaceans.
Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Fariss, Robert N; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine
2002-06-15
The retinal pigment epithelium (RPE) and choroid comprise a functional unit of the eye that is essential to normal retinal health and function. Here we describe expressed sequence tag (EST) analysis of human RPE/choroid as part of a project for ocular bioinformatics. A cDNA library (cs) was made from human RPE/choroid and sequenced. Data were analyzed and assembled using the program GRIST (GRouping and Identification of Sequence Tags). Complete sequencing, Northern and Western blots, RH mapping, peptide antibody synthesis and immunofluorescence (IF) have been used to examine expression patterns and genome location for selected transcripts and proteins. Ten thousand individual sequence reads yield over 6300 unique gene clusters of which almost half have no matches with named genes. One of the most abundant transcripts is from a gene (named "alpha") that maps to the BBS1 region of chromosome 11. A number of tissue preferred transcripts are common to both RPE/choroid and iris. These include oculoglycan/opticin, for which an alternative splice form is detected in RPE/choroid, and "oculospanin" (Ocsp), a novel tetraspanin that maps to chromosome 17q. Antiserum to Ocsp detects expression in RPE, iris, ciliary body, and retinal ganglion cells by IF. A newly identified gene for a zinc-finger protein (TIRC) maps to 19q13.4. Variant transcripts of several genes were also detected. Most notably, the predominant form of Bestrophin represented in cs contains a longer open reading frame as a result of splice junction skipping. The unamplified cs library gives a view of the transcriptional repertoire of the adult RPE/choroid. A large number of potentially novel genes and splice forms and candidates for genetic diseases are revealed. Clones from this collection are being included in a large, nonredundant set for cDNA microarray construction.
de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M
2017-07-06
Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.
Current and future resources for functional metagenomics.
Lam, Kathy N; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D; Charles, Trevor C
2015-01-01
Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries-physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.
Tang, Qi; Ma, Xiaojun; Mo, Changming; Wilson, Iain W; Song, Cai; Zhao, Huan; Yang, Yanfang; Fu, Wei; Qiu, Deyou
2011-07-05
Siraitia grosvenorii (Luohanguo) is an herbaceous perennial plant native to southern China and most prevalent in Guilin city. Its fruit contains a sweet, fleshy, edible pulp that is widely used in traditional Chinese medicine. The major bioactive constituents in the fruit extract are the cucurbitane-type triterpene saponins known as mogrosides. Among them, mogroside V is nearly 300 times sweeter than sucrose. However, little is known about mogrosides biosynthesis in S. grosvenorii, especially the late steps of the pathway. In this study, a cDNA library generated from of equal amount of RNA taken from S. grosvenorii fruit at 50 days after flowering (DAF) and 70 DAF were sequenced using Illumina/Solexa platform. More than 48,755,516 high-quality reads from a cDNA library were generated that was assembled into 43,891 unigenes. De novo assembly and gap-filling generated 43,891 unigenes with an average sequence length of 668 base pairs. A total of 26,308 (59.9%) unique sequences were annotated and 11,476 of the unique sequences were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes. cDNA sequences for all of the known enzymes involved in mogrosides backbone synthesis were identified from our library. Additionally, a total of eighty-five cytochrome P450 (CYP450) and ninety UDP-glucosyltransferase (UDPG) unigenes were identified, some of which appear to encode enzymes responsible for the conversion of the mogroside backbone into the various mogrosides. Digital gene expression profile (DGE) analysis using Solexa sequencing was performed on three important stages of fruit development, and based on their expression pattern, seven CYP450s and five UDPGs were selected as the candidates most likely to be involved in mogrosides biosynthesis. A combination of RNA-seq and DGE analysis based on the next generation sequencing technology was shown to be a powerful method for identifying candidate genes encoding enzymes responsible for the biosynthesis of novel secondary metabolites in a non-model plant. Seven CYP450s and five UDPGs were selected as potential candidates involved in mogrosides biosynthesis. The transcriptome data from this study provides an important resource for understanding the formation of major bioactive constituents in the fruit extract from S. grosvenorii.
A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library
Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying
2013-01-01
DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477
A simple and efficient method for assembling TALE protein based on plasmid library.
Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying
2013-01-01
DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.
Tao, Bo; Shao, Bai-Hui; Qiao, Yu-Xin; Wang, Xiao-Qin; Chang, Shu-Jun; Qiu, Li-Juan
2017-08-01
Glyphosate is a widely used broad spectrum herbicide; however, this limits its use once crops are planted. If glyphosate-resistant crops are grown, glyphosate can be used for weed control in crops. While several glyphosate resistance genes are used in commercial glyphosate tolerant crops, there is interest in identifying additional genes for glyphosate tolerance. This research constructed a high-quality cDNA library form the glyphosate-resistant fungus Aspergillus oryzae RIB40 to identify genes that may confer resistance to glyphosate. Using a medium containing glyphosate (120mM), we screened several clones from the library. Based on a nucleotide sequence analysis, we identified a gene of unknown function (GenBank accession number: XM_001826835.2) that encoded a hypothetical 344-amino acid protein. The gene was named MFS40. Its ORF was amplified to construct an expression vector, pGEX-4T-1-MFS40, to express the protein in Escherichia coli BL21. The gene conferred glyphosate tolerance to E. coli ER2799 cells. Copyright © 2017 Elsevier B.V. All rights reserved.
Blair, Matthew W; Hurtado, Natalia; Chavarro, Carolina M; Muñoz-Torres, Monica C; Giraldo, Martha C; Pedraza, Fabio; Tomkins, Jeff; Wing, Rod
2011-03-22
Sequencing of cDNA libraries for the development of expressed sequence tags (ESTs) as well as for the discovery of simple sequence repeats (SSRs) has been a common method of developing microsatellites or SSR-based markers. In this research, our objective was to further sequence and develop common bean microsatellites from leaf and root cDNA libraries derived from the Andean gene pool accession G19833 and the Mesoamerican gene pool accession DOR364, mapping parents of a commonly used reference map. The root libraries were made from high and low phosphorus treated plants. A total of 3,123 EST sequences from leaf and root cDNA libraries were screened and used for direct simple sequence repeat discovery. From these EST sequences we found 184 microsatellites; the majority containing tri-nucleotide motifs, many of which were GC rich (ACC, AGC and AGG in particular). Di-nucleotide motif microsatellites were about half as common as the tri-nucleotide motif microsatellites but most of these were AGn microsatellites with a moderate number of ATn microsatellites in root ESTs followed by few ACn and no GCn microsatellites. Out of the 184 new SSR loci, 120 new microsatellite markers were developed in the BMc (Bean Microsatellites from cDNAs) series and these were evaluated for their capacity to distinguish bean diversity in a germplasm panel of 18 genotypes. We developed a database with images of the microsatellites and their polymorphism information content (PIC), which averaged 0.310 for polymorphic markers. The present study produced information about microsatellite frequency in root and leaf tissues of two important genotypes for common bean genomics: namely G19833, the Andean genotype selected for whole genome shotgun sequencing from race Peru, and DOR364 a race Mesoamerica subgroup 2 genotype that is a small-red seeded, released variety in Central America. Both race Peru and Mesoamerica subgroup 2 (small red beans) have been understudied in comparison to race Nueva Granada and Mesoamerica subgroup 1 (black beans) both with regards to gene expression and as sources of markers. However, we found few differences between SSR type and frequency between the G19833 leaf and DOR364 root tissue-derived ESTs. Overall, our work adds to the analysis of microsatellite frequency evaluation for common bean and provides a new set of 120 BMc markers which combined with the 248 previously developed BMc markers brings the total in this series to 368 markers. Once we include BMd markers, which are derived from GenBank sequences, the current total of gene-based markers from our laboratory surpasses 500 markers. These markers are basic for studies of the transcriptome of common bean and can form anchor points for genetic mapping studies in the future.
Developing a Bacteroides System for Function-Based Screening of DNA from the Human Gut Microbiome.
Lam, Kathy N; Martens, Eric C; Charles, Trevor C
2018-01-01
Functional metagenomics is a powerful method that allows the isolation of genes whose role may not have been predicted from DNA sequence. In this approach, first, environmental DNA is cloned to generate metagenomic libraries that are maintained in Escherichia coli, and second, the cloned DNA is screened for activities of interest. Typically, functional screens are carried out using E. coli as a surrogate host, although there likely exist barriers to gene expression, such as lack of recognition of native promoters. Here, we describe efforts to develop Bacteroides thetaiotaomicron as a surrogate host for screening metagenomic DNA from the human gut. We construct a B. thetaiotaomicron-compatible fosmid cloning vector, generate a fosmid clone library using DNA from the human gut, and show successful functional complementation of a B. thetaiotaomicron glycan utilization mutant. Though we were unable to retrieve the physical fosmid after complementation, we used genome sequencing to identify the complementing genes derived from the human gut microbiome. Our results demonstrate that the use of B. thetaiotaomicron to express metagenomic DNA is promising, but they also exemplify the challenges that can be encountered in the development of new surrogate hosts for functional screening. IMPORTANCE Human gut microbiome research has been supported by advances in DNA sequencing that make it possible to obtain gigabases of sequence data from metagenomes but is limited by a lack of knowledge of gene function that leads to incomplete annotation of these data sets. There is a need for the development of methods that can provide experimental data regarding microbial gene function. Functional metagenomics is one such method, but functional screens are often carried out using hosts that may not be able to express the bulk of the environmental DNA being screened. We expand the range of current screening hosts and demonstrate that human gut-derived metagenomic libraries can be introduced into the gut microbe Bacteroides thetaiotaomicron to identify genes based on activity screening. Our results support the continuing development of genetically tractable systems to obtain information about gene function.
Brammer, Leighanne A; Bolduc, Benjamin; Kass, Jessica L; Felice, Kristin M; Noren, Christopher J; Hall, Marilena Fitzsimons
2008-02-01
Screening of the commercially available Ph.D.-7 phage-displayed heptapeptide library for peptides that bind immobilized Zn2+ resulted in the repeated selection of the peptide HAIYPRH, although binding assays indicated that HAIYPRH is not a zinc-binding peptide. HAIYPRH has also been selected in several other laboratories using completely different targets, and its ubiquity suggests that it is a target-unrelated peptide. We demonstrated that phage displaying HAIYPRH are enriched after serial amplification of the library without exposure to target. The amplification of phage displaying HAIYPRH was found to be dramatically faster than that of the library itself. DNA sequencing uncovered a mutation in the Shine-Dalgarno (SD) sequence for gIIp, a protein involved in phage replication, imparting to the SD sequence better complementarity to the 16S ribosomal RNA (rRNA). Introducing this mutation into phage lacking a displayed peptide resulted in accelerated propagation, whereas phage displaying HAIYPRH with a wild-type SD sequence were found to amplify normally. The SD mutation may alter gIIp expression and, consequently, the rate of propagation of phage. In the Ph.D.-7 library, the mutation is coincident with the displayed peptide HAIYPRH, accounting for the target-unrelated selection of this peptide in multiple reported panning experiments.
Transcriptome Analysis of Gene Expression during Chinese Water Chestnut Storage Organ Formation
Chen, Sainan; Wang, Yan; Yu, Meizhen; Chen, Xuehao; Li, Liangjun; Yin, Jingjing
2016-01-01
The product organ (storage organ; corm) of the Chinese water chestnut has become a very popular food in Asian countries because of its unique nutritional value. Corm formation is a complex biological process, and extensive whole genome analysis of transcripts during corm development has not been carried out. In this study, four corm libraries at different developmental stages were constructed, and gene expression was identified using a high-throughput tag sequencing technique. Approximately 4.9 million tags were sequenced, and 4,371,386, 4,372,602, 4,782,494, and 5,276,540 clean tags, including 119,676, 110,701, 100,089, and 101,239 distinct tags, respectively, were obtained after removal of low-quality tags from each library. More than 39% of the distinct tags were unambiguous and could be mapped to reference genes, while 40% were unambiguous tag-mapped genes. After mapping their functions in existing databases, a total of 11,592, 10,949, 10,585, and 7,111 genes were annotated from the B1, B2, B3, and B4 libraries, respectively. Analysis of the differentially expressed genes (DEGs) in B1/B2, B2/B3, and B3/B4 libraries showed that most of the DEGs at the B1/B2 stages were involved in carbohydrate and hormone metabolism, while the majority of DEGs were involved in energy metabolism and carbohydrate metabolism at the B2/B3 and B3/B4 stages. All of the upregulated transcription factors and 9 important genes related to product organ formation in the above four stages were also identified. The expression changes of nine of the identified DEGs were validated using a quantitative PCR approach. This study provides a comprehensive understanding of gene expression during corm formation in the Chinese water chestnut. PMID:27716802
New developments in ancient genomics.
Millar, Craig D; Huynen, Leon; Subramanian, Sankar; Mohandesan, Elmira; Lambert, David M
2008-07-01
Ancient DNA research is on the crest of a 'third wave' of progress due to the introduction of a new generation of DNA sequencing technologies. Here we review the advantages and disadvantages of the four new DNA sequencers that are becoming available to researchers. These machines now allow the recovery of orders of magnitude more DNA sequence data, albeit as short sequence reads. Hence, the potential reassembly of complete ancient genomes seems imminent, and when used to screen libraries of ancient sequences, these methods are cost effective. This new wealth of data is also likely to herald investigations into the functional properties of extinct genes and gene complexes and will improve our understanding of the biological basis of extinct phenotypes.
Czar, Michael J; Cai, Yizhi; Peccoud, Jean
2009-07-01
Chemical synthesis of custom DNA made to order calls for software streamlining the design of synthetic DNA sequences. GenoCAD (www.genocad.org) is a free web-based application to design protein expression vectors, artificial gene networks and other genetic constructs composed of multiple functional blocks called genetic parts. By capturing design strategies in grammatical models of DNA sequences, GenoCAD guides the user through the design process. By successively clicking on icons representing structural features or actual genetic parts, complex constructs composed of dozens of functional blocks can be designed in a matter of minutes. GenoCAD automatically derives the construct sequence from its comprehensive libraries of genetic parts. Upon completion of the design process, users can download the sequence for synthesis or further analysis. Users who elect to create a personal account on the system can customize their workspace by creating their own parts libraries, adding new parts to the libraries, or reusing designs to quickly generate sets of related constructs.
Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.
2013-01-01
For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960
Combinatorial pooling enables selective sequencing of the barley gene space.
Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J
2013-04-01
For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
Kikuchi, Taisei; Aikawa, Takuya; Kosaka, Hajime; Pritchard, Leighton; Ogura, Nobuo; Jones, John T
2007-09-01
Most Bursaphelenchus species feed on fungi that colonise dead or dying trees. However, Bursaphelenchus xylophilus is unique in that in addition to feeding on fungi it has the capacity to be a parasite of live pine trees. We present an analysis of over 13,000 expressed sequence tags (ESTs) from B. xylophilus and, by way of contrast, over 3000 ESTs from a closely related species that does not parasitise plants as readily; B. mucronatus. Four libraries from B. xylophilus, from a variety of life stages including fungal feeding nematodes, nematodes extracted from plants and dauer-like stage nematodes, and one library from B. mucronatus were constructed and used to generate ESTs. Contig analysis showed that the 13,327 B. xylophilus ESTs could be grouped into 2110 contigs and 4377 singletons giving a total of 6487 identified genes. Similarly the 3193 B. mucronatus ESTs yielded a total of 2219 identified genes from 425 contigs and 1794 singletons. A variety of proteins potentially important in the parasitic process of B. xylophilus and B. mucronatus, including plant and fungal cell wall degrading enzymes and a novel gene potentially encoding a expansin-like protein that may disrupt non-covalent bonds in the plant cell wall were identified in the libraries. Additionally several gene candidates potentially involved in dauer entry or maintenance were also identified in the EST dataset. The EST sequences from this study will provide a solid base for future research on the biology, pathogenicity and evolutionary history of this nematode group.
Identification of the genomic locus for the human Rieske Fe-S Protein gene on Chromosome 19q12
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pennacchio, L.A.
1994-05-06
We have identified the chromosomal location of the human Rieske Iron-Sulfur Protein (UQCRFS1) gene. Mapping by hybridization to a panel of monochromosomal hybrid cell lines indicated that the gene was either on chromosome 19 or 22. By screening a human chromosome 19 specific genomic cosmid library with an oligonucleotide probe made from the published Rieske cDNA sequence, we identified a corresponding cosmid. Portions of this cosmid were sequenced directly. The exon, exon:intron junction, and flanking sequences verified that this cosmid contains the genomic locus. Fluorescent in situ hybridization (FISH) was performed to localize this cosmid to chromosome band 19q12.
Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R
2009-01-01
Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662
Bowers, Robert M.; Clum, Alicia; Tice, Hope; ...
2015-10-24
Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Clum, Alicia; Tice, Hope
Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less
Frentiu, Francesca D; Adamski, Marcin; McGraw, Elizabeth A; Blows, Mark W; Chenoweth, Stephen F
2009-01-21
The native Australian fly Drosophila serrata belongs to the highly speciose montium subgroup of the melanogaster species group. It has recently emerged as an excellent model system with which to address a number of important questions, including the evolution of traits under sexual selection and traits involved in climatic adaptation along latitudinal gradients. Understanding the molecular genetic basis of such traits has been limited by a lack of genomic resources for this species. Here, we present the first expressed sequence tag (EST) collection for D. serrata that will enable the identification of genes underlying sexually-selected phenotypes and physiological responses to environmental change and may help resolve controversial phylogenetic relationships within the montium subgroup. A normalized cDNA library was constructed from whole fly bodies at several developmental stages, including larvae and adults. Assembly of 11,616 clones sequenced from the 3' end allowed us to identify 6,607 unique contigs, of which at least 90% encoded peptides. Partial transcripts were discovered from a variety of genes of evolutionary interest by BLASTing contigs against the 12 Drosophila genomes currently sequenced. By incorporating into the cDNA library multiple individuals from populations spanning a large portion of the geographical range of D. serrata, we were able to identify 11,057 putative single nucleotide polymorphisms (SNPs), with 278 different contigs having at least one "double hit" SNP that is highly likely to be a real polymorphism. At least 394 EST-associated microsatellite markers, representing 355 different contigs, were also found, providing an additional set of genetic markers. The assembled EST library is available online at http://www.chenowethlab.org/serrata/index.cgi. We have provided the first gene collection and largest set of polymorphic genetic markers, to date, for the fly D. serrata. The EST collection will provide much needed genomic resources for this model species and facilitate comparative evolutionary studies within the montium subgroup of the D. melanogaster lineage.
NASA Astrophysics Data System (ADS)
Liu, Hongzhan; Zheng, Fengrong; Sun, Xiuqin; Cai, Yimei
2012-06-01
The aquaculture of sea cucumber Apostichopus japonicus (Echinodermata, Holothuroidea) has grown rapidly during recent years and has become an important sector of the marine industry in Northern China. However, with the rapid growth of the industry and the use of non-standard culture techniques, epidemic diseases of A. japonicus now pose increasing problems to the industry. To screen the genes with stress response to bacterial infection in sea cucumber at a genome wide level, we constructed a cDNA library from A. japonicus Selenka (Aspidochirotida: Stichopodidae) after infecting them with Vibrio sp. for 48 h. Total RNA was extracted from the intestine, mesentery and coelomocyte of infected sea cucumber using Trizol and mRNA was isolated by Oligotex mRNA Kits. The ligated cDNAs were transformed into DH5α, and a library of 3.24×105 clones (3.24×105 cfu mL-1) was obtained with the sizes of inserted fragments ranging from 0.8 to 2.5 kb. Sequencing the cDNA clones resulted in a total of 1106 ESTs that passed the quality control. BlastX and BlastN searches have identified 168 (31.5%) ESTs sharing significant homology with known sequences in NCBI protein or nucleotide databases. Among a panel of 25 putative immunity-related genes, serum lectin isoform, complement component 3, complement component 3-like genes were further studied by real-time PCR and they all increased more than 5 fold in response to Vibrio sp. challenge. Our library provides a valuable molecular tool for future study of invertebrate immunity against bacterial infection and our gene expression data indicates the importance of the immune system in the evolution and development of sea cucumber.
Exploring Nitrilase Sequence Space for Enantioselective Catalysis†
Robertson, Dan E.; Chaplin, Jennifer A.; DeSantis, Grace; Podar, Mircea; Madden, Mark; Chi, Ellen; Richardson, Toby; Milan, Aileen; Miller, Mark; Weiner, David P.; Wong, Kelvin; McQuaid, Jeff; Farwell, Bob; Preston, Lori A.; Tan, Xuqiu; Snead, Marjory A.; Keller, Martin; Mathur, Eric; Kretz, Patricia L.; Burk, Mark J.; Short, Jay M.
2004-01-01
Nitrilases are important in the biosphere as participants in synthesis and degradation pathways for naturally occurring, as well as xenobiotically derived, nitriles. Because of their inherent enantioselectivity, nitrilases are also attractive as mild, selective catalysts for setting chiral centers in fine chemical synthesis. Unfortunately, <20 nitrilases have been reported in the scientific and patent literature, and because of stability or specificity shortcomings, their utility has been largely unrealized. In this study, 137 unique nitrilases, discovered from screening of >600 biotope-specific environmental DNA (eDNA) libraries, were characterized. Using culture-independent means, phylogenetically diverse genomes were captured from entire biotopes, and their genes were expressed heterologously in a common cloning host. Nitrilase genes were targeted in a selection-based expression assay of clonal populations numbering 106 to 1010 members per eDNA library. A phylogenetic analysis of the novel sequences discovered revealed the presence of at least five major sequence clades within the nitrilase subfamily. Using three nitrile substrates targeted for their potential in chiral pharmaceutical synthesis, the enzymes were characterized for substrate specificity and stereospecificity. A number of important correlations were found between sequence clades and the selective properties of these nitrilases. These enzymes, discovered using a high-throughput, culture-independent method, provide a catalytic toolbox for enantiospecific synthesis of a variety of carboxylic acid derivatives, as well as an intriguing library for evolutionary and structural analyses. PMID:15066841
Boltaña, Sebastian; Castellana, Barbara; Goetz, Giles; Tort, Lluis; Teles, Mariana; Mulero, Victor; Novoa, Beatriz; Figueras, Antonio; Goetz, Frederick W; Gallardo-Escarate, Cristian; Planas, Josep V; Mackenzie, Simon
2017-02-03
This study describes the development and validation of an enriched oligonucleotide-microarray platform for Sparus aurata (SAQ) to provide a platform for transcriptomic studies in this species. A transcriptome database was constructed by assembly of gilthead sea bream sequences derived from public repositories of mRNA together with reads from a large collection of expressed sequence tags (EST) from two extensive targeted cDNA libraries characterizing mRNA transcripts regulated by both bacterial and viral challenge. The developed microarray was further validated by analysing monocyte/macrophage activation profiles after challenge with two Gram-negative bacterial pathogen-associated molecular patterns (PAMPs; lipopolysaccharide (LPS) and peptidoglycan (PGN)). Of the approximately 10,000 EST sequenced, we obtained a total of 6837 EST longer than 100 nt, with 3778 and 3059 EST obtained from the bacterial-primed and from the viral-primed cDNA libraries, respectively. Functional classification of contigs from the bacterial- and viral-primed cDNA libraries by Gene Ontology (GO) showed that the top five represented categories were equally represented in the two libraries: metabolism (approximately 24% of the total number of contigs), carrier proteins/membrane transport (approximately 15%), effectors/modulators and cell communication (approximately 11%), nucleoside, nucleotide and nucleic acid metabolism (approximately 7.5%) and intracellular transducers/signal transduction (approximately 5%). Transcriptome analyses using this enriched oligonucleotide platform identified differential shifts in the response to PGN and LPS in macrophage-like cells, highlighting responsive gene-cassettes tightly related to PAMP host recognition. As observed in other fish species, PGN is a powerful activator of the inflammatory response in S. aurata macrophage-like cells. We have developed and validated an oligonucleotide microarray (SAQ) that provides a platform enriched for the study of gene expression in S. aurata with an emphasis upon immunity and the immune response.
Henry, Kevin A
2018-01-01
Immunogenetic analyses of expressed antibody repertoires are becoming increasingly common experimental investigations and are critical to furthering our understanding of autoimmunity, infectious disease, and cancer. Next-generation DNA sequencing (NGS) technologies have now made it possible to interrogate antibody repertoires to unprecedented depths, typically by sequencing of cDNAs encoding immunoglobulin variable domains. In this chapter, we describe simple, fast, and reliable methods for producing and sequencing multiplex PCR amplicons derived from the variable regions (V H , V H H or V L ) of rearranged immunoglobulin heavy and light chain genes using the Illumina MiSeq platform. We include complete protocols and primer sets for amplicon sequencing of V H /V H H/V L repertoires directly from human, mouse, and llama lymphocytes as well as from phage-displayed V H /V H H/V L libraries; these can be easily be adapted to other types of amplicons with little modification. The resulting amplicons are diverse and representative, even using as few as 10 3 input B cells, and their generation is relatively inexpensive, requiring no special equipment and only a limited set of primers. In the absence of heavy-light chain pairing, single-domain antibodies are uniquely amenable to NGS analyses. We present a number of applications of NGS technology useful in discovery of single-domain antibodies from phage display libraries, including: (i) assessment of library functionality; (ii) confirmation of desired library randomization; (iii) estimation of library diversity; and (iv) monitoring the progress of panning experiments. While the case studies presented here are of phage-displayed single-domain antibody libraries, the principles extend to other types of in vitro display libraries.
Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A
2017-01-01
RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.
Gene expression analysis of flax seed development
2011-01-01
Background Flax, Linum usitatissimum L., is an important crop whose seed oil and stem fiber have multiple industrial applications. Flax seeds are also well-known for their nutritional attributes, viz., omega-3 fatty acids in the oil and lignans and mucilage from the seed coat. In spite of the importance of this crop, there are few molecular resources that can be utilized toward improving seed traits. Here, we describe flax embryo and seed development and generation of comprehensive genomic resources for the flax seed. Results We describe a large-scale generation and analysis of expressed sequences in various tissues. Collectively, the 13 libraries we have used provide a broad representation of genes active in developing embryos (globular, heart, torpedo, cotyledon and mature stages) seed coats (globular and torpedo stages) and endosperm (pooled globular to torpedo stages) and genes expressed in flowers, etiolated seedlings, leaves, and stem tissue. A total of 261,272 expressed sequence tags (EST) (GenBank accessions LIBEST_026995 to LIBEST_027011) were generated. These EST libraries included transcription factor genes that are typically expressed at low levels, indicating that the depth is adequate for in silico expression analysis. Assembly of the ESTs resulted in 30,640 unigenes and 82% of these could be identified on the basis of homology to known and hypothetical genes from other plants. When compared with fully sequenced plant genomes, the flax unigenes resembled poplar and castor bean more than grape, sorghum, rice or Arabidopsis. Nearly one-fifth of these (5,152) had no homologs in sequences reported for any organism, suggesting that this category represents genes that are likely unique to flax. Digital analyses revealed gene expression dynamics for the biosynthesis of a number of important seed constituents during seed development. Conclusions We have developed a foundational database of expressed sequences and collection of plasmid clones that comprise even low-expressed genes such as those encoding transcription factors. This has allowed us to delineate the spatio-temporal aspects of gene expression underlying the biosynthesis of a number of important seed constituents in flax. Flax belongs to a taxonomic group of diverse plants and the large sequence database will allow for evolutionary studies as well. PMID:21529361
Single-Cell RNA Sequencing of Glioblastoma Cells.
Sen, Rajeev; Dolgalev, Igor; Bayin, N Sumru; Heguy, Adriana; Tsirigos, Aris; Placantonakis, Dimitris G
2018-01-01
Single-cell RNA sequencing (sc-RNASeq) is a recently developed technique used to evaluate the transcriptome of individual cells. As opposed to conventional RNASeq in which entire populations are sequenced in bulk, sc-RNASeq can be beneficial when trying to better understand gene expression patterns in markedly heterogeneous populations of cells or when trying to identify transcriptional signatures of rare cells that may be underrepresented when using conventional bulk RNASeq. In this method, we describe the generation and analysis of cDNA libraries from single patient-derived glioblastoma cells using the C1 Fluidigm system. The protocol details the use of the C1 integrated fluidics circuit (IFC) for capturing, imaging and lysing cells; performing reverse transcription; and generating cDNA libraries that are ready for sequencing and analysis.
NASA Astrophysics Data System (ADS)
Yakimov, Michail M.; Cono, Violetta La; Denaro, Renata
2009-05-01
The autotrophic and ammonia-oxidizing crenarchaeal assemblage at offshore site located in the deep Mediterranean (Tyrrhenian Sea, depth 3000 m) water was studied by PCR amplification of the key functional genes involved in energy (ammonia mono-oxygenase alpha subunit, amoA) and central metabolism (acetyl-CoA carboxylase alpha subunit, accA). Using two recently annotated genomes of marine crenarchaeons, an initial set of primers targeting archaeal accA-like genes was designed. Approximately 300 clones were analyzed, of which 100% of amoA library and almost 70% of accA library were unambiguously related to the corresponding genes from marine Crenarchaeota. Even though the acetyl-CoA carboxylase is phylogenetically not well conserved and the remaining clones were affiliated to various bacterial acetyl-CoA/propionyl-CoA carboxylase genes, the pool of archaeal sequences was applied for development of quantitative PCR analysis of accA-like distribution using TaqMan ® methodolgy. The archaeal accA gene fragments, together with alignable gene fragments from the Sargasso Sea and North Pacific Subtropical Gyre (ALOHA Station) metagenome databases, were analyzed by multiple sequence alignment. Two accA-like sequences, found in ALOHA Station at the depth of 4000 m, formed a deeply branched clade with 64% of all archaeal Tyrrhenian clones. No close relatives for residual 36% of clones, except of those recovered from Eastern Mediterranean, was found, suggesting the existence of a specific lineage of the crenarchaeal accA genes in deep Mediterranean water. Alignment of Mediterranean amoA sequences defined four cosmopolitan phylotypes of Crenarchaeota putative ammonia mono-oxygenase subunit A gene occurring in the water sample from the 3000 m depth. Without exception all phylotypes fell into Deep Marine Group I cluster that contain the vast majority of known sequences recovered from global deep-sea environment. Remarkably, three phylotypes accounted for 91% of all Mediterranean amoA clones and corresponded to the sequences retrieved from the less deep compartments of the world's ocean, most likely reflecting the higher temperature at the depth of the Mediterranean Sea. In order to verify whether these phylotypes might represent important Crenarchaeota in the functioning of the Mediterranean bathypelagic ecosystem, expression of crenarchaeal amoA gene was monitored by direct RNA retrieval and following analysis of amoA-related mRNA transcripts. Surprisingly, all mRNA-derived sequences formed a tight monophyletic group, which fell into large Shallow Marine Group I cluster with sequences retrieved from shallow (up to 200 m) waters, sediments and corals. This group was not detected in DNA-based clone library, obviously, due to an overwhelming dominance of the Deep Marine Group I. The failure to recover the amoA transcripts, related to Deep Marine Group I of Crenarchaeota, was unanticipated and likely resulted from the physiology of these strongly adapted deep-sea organisms. As far as all seawater samples were treated on-board under atmospheric pressure conditions and sunlight, the decompression and/or photoinhibition likely affected their metabolic activity, followed by the strong decay of gene expression.
NASA Astrophysics Data System (ADS)
Ma, Deyou; Yang, Hongsheng; Sun, Lina; Chen, Muyan
2014-01-01
Sea cucumbers Apostichopus japonicus are one of the most important aquaculture species in China. Their normal body color is black to fit their surroundings. Wild albinos are rare and hard to breed. To understand the differences between albino and normal (control) sea cucumbers at the transcriptional level, we sequenced the transcriptomes in their body-wall tissues using RNA-Seq high-throughput sequencing. Approximately 4.876 million (M) and 4.884 M 200-nucleotide-long cDNA reads were produced in the cDNA libraries derived from the body walls of albino and control samples, respectively. A total of 9 561 (46.89%) putative genes were identified from among the RNA-Seq reads in both libraries. After filtering, 837 significantly differentially regulated genes were identified in the albino library compared with in the control library, and 3.6% of the differentially expressed genes (DEGs) were found to have changed those more than five-fold. The expression levels of 10 DEGs were checked by real-time PCR and the results were in full accord with the RNA-Seq expression trends, although the amplitude of the differences in expression levels was lower in all cases. A series of pathways were significantly enriched for the DEGs. These pathways were closely related to phagocytosis, the complement and coagulation cascades, apoptosis-related diseases, cytokine-cytokine receptor interaction, and cell adhesion. The differences in gene expression and enriched pathways between the albino and control sea cucumbers offer control targets for cultivating excellent albino A. japonicus strains in the future.
Resistance gene homologues in Theobroma cacao as useful genetic markers.
Kuhn, D N; Heath, M; Wisser, R J; Meerow, A; Brown, J S; Lopes, U; Schnell, R J
2003-07-01
Resistance gene homologue (RGH) sequences have been developed into useful genetic markers for marker-assisted selection (MAS) of disease resistant Theobroma cacao. A plasmid library of amplified fragments was created from seven different cultivars of cacao. Over 600 cloned recombinant amplicons were evaluated. From these, 74 unique RGHs were identified that could be placed into 11 categories based on sequence analysis. Primers specific to each category were designed. The primers specific for a single RGH category amplified fragments of equal length from the seven different cultivars used to create the library. However, these fragments exhibited single-strand conformational polymorphism (SSCP), which allowed us to map six of the RGH categories in an F(2) population of T. cacao. RGHs 1, 4 and 5 were in the same linkage group, with RGH 4 and 5 separated by less than 4 cM. As SSCP can be efficiently performed on our automated sequencer, we have developed a convenient and rapid high throughput assay for RGH alleles.
Huang, Xiao Dan; Tan, Hui Yin; Long, Ruijun; Liang, Juan Boo; Wright, André-Denis G
2012-10-19
Methane emissions by methanogen from livestock ruminants have significantly contributed to the agricultural greenhouse gas effect. It is worthwhile to compare methanogen from "energy-saving" animal (yak) and normal animal (cattle) in order to investigate the link between methanogen structure and low methane production. Diversity of methanogens from the yak and cattle rumen was investigated by analysis of 16S rRNA gene sequences from rumen digesta samples from four yaks (209 clones) and four cattle (205 clones) from the Qinghai-Tibetan Plateau area (QTP). Overall, a total of 414 clones (i.e. sequences) were examined and assigned to 95 operational taxonomic units (OTUs) using MOTHUR, based upon a 98% species-level identity criterion. Forty-six OTUs were unique to the yak clone library and 34 OTUs were unique to the cattle clone library, while 15 OTUs were found in both libraries. Of the 95 OTUs, 93 putative new species were identified. Sequences belonging to the Thermoplasmatales-affiliated Linage C (TALC) were found to dominate in both libraries, accounting for 80.9% and 62.9% of the sequences from the yak and cattle clone libraries, respectively. Sequences belonging to the Methanobacteriales represented the second largest clade in both libraries. However, Methanobrevibacter wolinii (QTPC 110) was only found in the cattle library. The number of clones from the order Methanomicrobiales was greater in cattle than in the yak clone library. Although the Shannon index value indicated similar diversity between the two libraries, the Libshuff analysis indicated that the methanogen community structure of the yak was significantly different than those from cattle. This study revealed for the first time the molecular diversity of methanogen community in yaks and cattle in Qinghai-Tibetan Plateau area in China. From the analysis, we conclude that yaks have a unique rumen microbial ecosystem that is significantly different from that of cattle, this may also help to explain why yak produce less methane than cattle.
2012-01-01
Background Methane emissions by methanogen from livestock ruminants have significantly contributed to the agricultural greenhouse gas effect. It is worthwhile to compare methanogen from “energy-saving” animal (yak) and normal animal (cattle) in order to investigate the link between methanogen structure and low methane production. Results Diversity of methanogens from the yak and cattle rumen was investigated by analysis of 16S rRNA gene sequences from rumen digesta samples from four yaks (209 clones) and four cattle (205 clones) from the Qinghai-Tibetan Plateau area (QTP). Overall, a total of 414 clones (i.e. sequences) were examined and assigned to 95 operational taxonomic units (OTUs) using MOTHUR, based upon a 98% species-level identity criterion. Forty-six OTUs were unique to the yak clone library and 34 OTUs were unique to the cattle clone library, while 15 OTUs were found in both libraries. Of the 95 OTUs, 93 putative new species were identified. Sequences belonging to the Thermoplasmatales-affiliated Linage C (TALC) were found to dominate in both libraries, accounting for 80.9% and 62.9% of the sequences from the yak and cattle clone libraries, respectively. Sequences belonging to the Methanobacteriales represented the second largest clade in both libraries. However, Methanobrevibacter wolinii (QTPC 110) was only found in the cattle library. The number of clones from the order Methanomicrobiales was greater in cattle than in the yak clone library. Although the Shannon index value indicated similar diversity between the two libraries, the Libshuff analysis indicated that the methanogen community structure of the yak was significantly different than those from cattle. Conclusion This study revealed for the first time the molecular diversity of methanogen community in yaks and cattle in Qinghai-Tibetan Plateau area in China. From the analysis, we conclude that yaks have a unique rumen microbial ecosystem that is significantly different from that of cattle, this may also help to explain why yak produce less methane than cattle. PMID:23078429
Chi, Xiang-Qun; Wang, Long; Guo, Ruoyu; Zhao, Dexi; Li, Jia; Zhang, Yongyu; Jiao, Nianzhi
2018-06-19
The protein coding genes (rbcL/cbbL/cbbM) for RuBisCO large subunit, the most abundant protein on earth that drives biological CO2 fixation, were considered as useful marker genes in characterizing CO2-assimilating plankton. However, their community specificity has hindered comprehensive screening of genetic diversity. In this study, six different rbcL/cbbL/cbbM primers were employed to screen clone libraries to identify CO2-assimilating plankton in Jiaozhou Bay. The following community compositions were observed: the community components in Form I A/B rbcL/cbbL clone library mainly comprised Chlorophyta and Proteobacteria, Form ID2 and ID3 libraries consisted of Bacillariophyta, Form II cbbM library consisted of Proteobacteria and Alveolata, and both Form I green and red libraries included Proteobacteria, respectively. At the genus taxonomic level, no overlaps among these clone libraries were observed, except for ID2 and ID3. Overall, the phytoplankton in Jiaozhou Bay mainly consists of Bacillariophyta, Chlorophyta, Cryptophyta, Haptophyceae, and Alveolata. The CO2-assimilating prokaryotes mainly consist of Proteobacteria. Considering the high sequence specificities of these marker genes, we propose that the joint use of multiple primers may be utilized in unveiling the diversity of CO2-assimilating organisms. In addition, designing novel RuBisCO gene primers that generate longer amplicons and have broader phylogenetic coverage may be necessary in the future.
Current and future resources for functional metagenomics
Lam, Kathy N.; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D.; Charles, Trevor C.
2015-01-01
Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries—physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research. PMID:26579102
Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun
2010-01-01
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Wang, Chun Ming; Lo, Loong Chueng; Feng, Felicia; Gong, Ping; Li, Jian; Zhu, Ze Yuan; Lin, Grace; Yue, Gen Hua
2008-03-25
Barramundi (Lates calcarifer) is an important farmed marine food fish species. Its first generation linkage map has been applied to map QTL for growth traits. To identify genes located in QTL responsible for specific traits, genomic large insert libraries are of crucial importance. We reported herein a bacterial artificial chromosome (BAC) library and the mapping of BAC clones to the linkage map. This BAC library consisted of 49,152 clones with an average insert size of 98 kb, representing 6.9-fold haploid genome coverage. Screening the library with 24 microsatellites and 15 ESTs/genes demonstrated that the library had good genome coverage. In addition, 62 novel microsatellites each isolated from 62 BAC clones were mapped onto the first generation linkage map. A total of 86 BAC clones were anchored on the linkage map with at least one BAC clone on each linkage group. We have constructed the first BAC library for L. calcarifer and mapped 86 BAC clones to the first generation linkage map. This BAC library and the improved linkage map with 302 DNA markers not only supply an indispensable tool to the integration of physical and linkage maps, the fine mapping of QTL and map based cloning genes located in QTL of commercial importance, but also contribute to comparative genomic studies and eventually whole genome sequencing.
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-01-01
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-05-24
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
2013-01-01
Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G. raimondii contains a D genome (D5). Conclusions The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii. PMID:23537070
Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon
2014-11-01
The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.
Nishiyama, Minako; Yamamoto, Shuichi; Kurosawa, Norio
2013-08-01
Ibusuki hot spring is located on the coastline of Kagoshima Bay, Japan. The hot spring water is characterized by high salinity, high temperature, and neutral pH. The hot spring is covered by the sea during high tide, which leads to severe fluctuations in several environmental variables. A combination of molecular- and culture-based techniques was used to determine the bacterial and archaeal diversity of the hot spring. A total of 48 thermophilic bacterial strains were isolated from two sites (Site 1: 55.6°C; Site 2: 83.1°C) and they were categorized into six groups based on their 16S rRNA gene sequence similarity. Two groups (including 32 isolates) demonstrated low sequence similarity with published species, suggesting that they might represent novel taxa. The 148 clones from the Site 1 bacterial library included 76 operational taxonomy units (OTUs; 97% threshold), while 132 clones from the Site 2 bacterial library included 31 OTUs. Proteobacteria, Bacteroidetes, and Firmicutes were frequently detected in both clone libraries. The clones were related to thermophilic, mesophilic and psychrophilic bacteria. Approximately half of the sequences in bacterial clone libraries shared <92% sequence similarity with their closest sequences in a public database, suggesting that the Ibusuki hot spring may harbor a unique and novel bacterial community. By contrast, 77 clones from the Site 2 archaeal library contained only three OTUs, most of which were affiliated with Thaumarchaeota.
Džunková, Mária; D'Auria, Giuseppe; Pérez-Villarroya, David; Moya, Andrés
2012-01-01
Natural environments represent an incredible source of microbial genetic diversity. Discovery of novel biomolecules involves biotechnological methods that often require the design and implementation of biochemical assays to screen clone libraries. However, when an assay is applied to thousands of clones, one may eventually end up with very few positive clones which, in most of the cases, have to be "domesticated" for downstream characterization and application, and this makes screening both laborious and expensive. The negative clones, which are not considered by the selected assay, may also have biotechnological potential; however, unfortunately they would remain unexplored. Knowledge of the clone sequences provides important clues about potential biotechnological application of the clones in the library; however, the sequencing of clones one-by-one would be very time-consuming and expensive. In this study, we characterized the first metagenomic clone library from the feces of a healthy human volunteer, using a method based on 454 pyrosequencing coupled with a clone-by-clone Sanger end-sequencing. Instead of whole individual clone sequencing, we sequenced 358 clones in a pool. The medium-large insert (7-15 kb) cloning strategy allowed us to assemble these clones correctly, and to assign the clone ends to maintain the link between the position of a living clone in the library and the annotated contig from the 454 assembly. Finally, we found several open reading frames (ORFs) with previously described potential medical application. The proposed approach allows planning ad-hoc biochemical assays for the clones of interest, and the appropriate sub-cloning strategy for gene expression in suitable vectors/hosts.
A novel sodium bicarbonate cotransporter-like gene in an ancient duplicated region: SLC4A9 at 5q31
Lipovich, Leonard; Lynch, Eric D; Lee, Ming K; King, Mary-Claire
2001-01-01
Background: Sodium bicarbonate cotransporter (NBC) genes encode proteins that execute coupled Na+ and HCO3- transport across epithelial cell membranes. We report the discovery, characterization, and genomic context of a novel human NBC-like gene, SLC4A9, on chromosome 5q31. Results: SLC4A9 was initially discovered by genomic sequence annotation and further characterized by sequencing of long-insert cDNA library clones. The predicted protein of 990 amino acids has 12 transmembrane domains and high sequence similarity to other NBCs. The 23-exon gene has 14 known mRNA isoforms. In three regions, mRNA sequence variation is generated by the inclusion or exclusion of portions of an exon. Noncoding SLC4A9 cDNAs were recovered multiple times from different libraries. The 3' untranslated region is fragmented into six alternatively spliced exons and contains expressed Alu, LINE and MER repeats. SLC4A9 has two alternative stop codons and six polyadenylation sites. Its expression is largely restricted to the kidney. In silico approaches were used to characterize two additional novel SLC4A genes and to place SLC4A9 within the context of multiple paralogous gene clusters containing members of the epidermal growth factor (EGF), ankyrin (ANK) and fibroblast growth factor (FGF) families. Seven human EGF-SLC4A-ANK-FGF clusters were found. Conclusion: The novel sodium bicarbonate cotransporter-like gene SLC4A9 demonstrates abundant alternative mRNA processing. It belongs to a growing class of functionally diverse genes characterized by inefficient highly variable splicing. The evolutionary history of the EGF-SLC4A-ANK-FGF gene clusters involves multiple rounds of duplication, apparently followed by large insertions and deletions at paralogous loci and genome-wide gene shuffling. PMID:11305939
Anaerobic Ammonium-Oxidizing Bacteria in Cow Manure Composting.
Wang, Tingting; Cheng, Lijun; Zhang, Wenhao; Xu, Xiuhong; Meng, Qingxin; Sun, Xuewei; Liu, Huajing; Li, Hongtao; Sun, Yu
2017-07-28
Composting is widely used to transform waste into valuable agricultural organic fertilizer. Anaerobic ammonium-oxidizing (anammox) bacteria play an important role in the global nitrogen cycle, but their role in composting remains poorly understood. In the present study, the community structure, diversity, and abundance of anammox bacteria were analyzed using cloning and sequencing methods by targeting the 16S rRNA gene and the hydrazine oxidase gene ( hzo ) in samples isolated from compost produced from cow manure and rice straw. A total of 25 operational taxonomic units were classified based on 16S rRNA gene clone libraries, and 14 operational taxonomic units were classified based on hzo gene clone libraries. The phylogenetic tree analysis of the 16S rRNA gene and deduced HZO protein sequences from the corresponding encoding genes indicated that the majority of the obtained clones were related to the known anammox bacteria Candidatus "Brocadia," Candidatus "Kuenenia," and Candidatus "Scalindua." The abundances of anammox bacteria were determined by quantitative PCR, and between 2.13 × 10 5 and 1.15 × 10 6 16S rRNA gene copies per gram of compost were found. This study provides the first demonstration of the existence of anammox bacteria with limited diversity in cow manure composting.
Saavedra-Lira, E; Pérez-Montfort, R
1994-05-16
We isolated three overlapping clones from a DNA genomic library of Entamoeba histolytica strain HM1:IMSS, whose translated nucleotide (nt) sequence shows similarities of 51, 48 and 47% with the amino acid (aa) sequences reported for the pyruvate phosphate dikinases from Bacteroides symbiosus, maize and Flaveria trinervia, respectively. The reading frame determined codes for a protein of 886 aa.
Effects of field-grown genetically modified Zoysia grass on bacterial community structure.
Lee, Yong-Eok; Yang, Sang-Hwan; Bae, Tae-Woong; Kang, Hong-Gyu; Lim, Pyung-Ok; Lee, Hyo-Yeon
2011-04-01
Herbicide-tolerant Zoysia grass has been previously developed through Agrobacterium-mediated transformation. We investigated the effects of genetically modified (GM) Zoysia grass and the associated herbicide application on bacterial community structure by using culture-independent approaches. To assess the possible horizontal gene transfer (HGT) of transgenic DNA to soil microorganisms, total soil DNAs were amplified by PCR with two primer sets for the bar and hpt genes, which were introduced into the GM Zoysia grass by a callus-type transformation. The transgenic genes were not detected from the total genomic DNAs extracted from 1.5 g of each rhizosphere soils of GM and non-GM Zoysia grasses. The structures and diversities of the bacterial communities in rhizosphere soils of GM and non-GM Zoysia grasses were investigated by constructing 16S rDNA clone libraries. Classifier, provided in the RDP II, assigned 100 clones in the 16S rRNA gene sequences library into 11 bacterial phyla. The most abundant phyla in both clone libraries were Acidobacteria and Proteobacteria. The bacterial diversity of the GM clone library was lower than that of the non- GM library. The former contained four phyla, whereas the latter had seven phyla. Phylogenetic trees were constructed to confirm these results. Phylogenetic analyses of the two clone libraries revealed considerable difference from each other. The significance of difference between clone libraries was examined with LIBSHUFF statistics. LIBSHUFF analysis revealed that the two clone libraries differed significantly (P〈0.025), suggesting alterations in the composition of the microbial community associated with GM Zoysia grass.
Towards Spectral Library-free MALDI-TOF MS Bacterial Identification.
Cheng, Ding; Qiao, Liang; Horvatovich, Péter
2018-05-11
Bacterial identification is of great importance in clinical diagnosis, environmental monitoring and food safety control. Among various strategies, matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) has drawn significant interests, and has been clinically used. Nevertheless, current bioinformatics solutions use spectral libraries for the identification of bacterial strains. Spectral library generation requires acquisition of MALDI-TOF spectra from monoculture bacterial colonies, which is time-consuming and not possible for many species and strains. We propose a strategy for bacterial typing by MALDI-TOF using protein sequences from public database, i.e. UniProt. Ten genes were identified to encode proteins most often observed by MALD-TOF from bacteria through 500 times repeated a 10-fold double cross-validation procedure, using 403 MALDI-TOF spectra corresponding to 14 genera, 81 species and 403 strains, and the protein sequences of 1276 species in UniProt. The 10 genes were then used to annotate peaks on MALDI-TOF spectra of bacteria for bacterial identification. With the approach, bacteria can be identified at the genus level by searching against a database containing the protein sequences of 42 genera of bacteria from UniProt. Our approach identified 84.1% of the 403 spectra correctly at the genus level. Source code of the algorithm is available at https://github.com/dipcarbon/BacteriaMSLF.
Isolation of expressed sequences from the region commonly deleted in Velo-cardio-facial syndrome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sirotkin, H.; Morrow, B.; DasGupta, R.
Velo-cardio-facial syndrome (VCFS) is a relatively common autosomal dominant genetic disorder characterized by cleft palate, cardiac abnormalities, learning disabilities and a characteristic facial dysmorphology. Most VCFS patients have interstitial deletions of 22q11 of 1-2 mb. In an effort to isolate the gene(s) responsible for VCFS we have utilized a hybrid selection protocol to recover expressed sequences from three non-overlapping YACs comprising almost 1 mb of the commonly deleted region. Total yeast genomic DNA or isolated YAC DNA was immobilized on Hybond-N filters, blocked with yeast and human ribosomal and human repetitive sequences and hybridized with a mixture of random primedmore » short fragment cDNA libraries. Six human short fragment libraries derived from total fetus, fetal brain, adult brain, testes, thymus and spleen have been used for the selections. Short fragment cDNAs retained on the filter were passed through a second round of selection and cloned into lambda gt10. cDNAs shown to originate from the YACs and from chromosome 22 are being used to isolate full length cDNAs. Three genes known to be present on these YACs, catechol-O-methyltransferase, tuple 1 and clathrin heavy chain have been recovered. Additionally, a gene related to the murine p120 gene and a number of novel short cDNAs have been isolated. The role of these genes in VCFS is being investigated.« less
Yang, Fang; Lei, Yingying; Zhou, Meiling; Yao, Qili; Han, Yichao; Wu, Xiang; Zhong, Wanshun; Zhu, Chenghang; Xu, Weize; Tao, Ran; Chen, Xi; Lin, Da; Rahman, Khaista; Tyagi, Rohit; Habib, Zeshan; Xiao, Shaobo; Wang, Dang; Yu, Yang; Chen, Huanchun; Fu, Zhenfang; Cao, Gang
2018-02-16
Protein-protein interaction (PPI) network maintains proper function of all organisms. Simple high-throughput technologies are desperately needed to delineate the landscape of PPI networks. While recent state-of-the-art yeast two-hybrid (Y2H) systems improved screening efficiency, either individual colony isolation, library preparation arrays, gene barcoding or massive sequencing are still required. Here, we developed a recombination-based 'library vs library' Y2H system (RLL-Y2H), by which multi-library screening can be accomplished in a single pool without any individual treatment. This system is based on the phiC31 integrase-mediated integration between bait and prey plasmids. The integrated fragments were digested by MmeI and subjected to deep sequencing to decode the interaction matrix. We applied this system to decipher the trans-kingdom interactome between Mycobacterium tuberculosis and host cells and further identified Rv2427c interfering with the phagosome-lysosome fusion. This concept can also be applied to other systems to screen protein-RNA and protein-DNA interactions and delineate signaling landscape in cells.
Fu, Minghui; Jiang, Lihua; Li, Yuanmei; Yan, Guohua; Zheng, Lijun; Jinping, Peng
2014-12-01
Eichhornia crassipes is an aquatic plant native to the Amazon River Basin. It has become a serious weed in freshwater habitats in rivers, lakes and reservoirs both in tropical and warm temperate areas worldwide. Some research has stated that it can be used for water phytoremediation, due to its strong assimilation of nitrogen and phosphorus, and the accumulation of heavy metals, and its growth and spread may play an important role in environmental ecology. In order to explore the molecular mechanism of E. crassipes to responses to nitrogen deficiency, we constructed forward and reversed subtracted cDNA libraries for E. crassipes roots under nitrogen deficient condition using a suppressive subtractive hybridization (SSH) method. The forward subtraction included 2,100 clones, and the reversed included 2,650 clones. One thousand clones were randomly selected from each library for sequencing. About 737 (527 unigenes) clones from the forward library and 757 (483 unigenes) clones from the reversed library were informative. Sequence BlastX analysis showed that there were more transporters and adenosylhomocysteinase-like proteins in E. crassipes cultured in nitrogen deficient medium; while, those cultured in nitrogen replete medium had more proteins such as UBR4-like e3 ubiquitin-protein ligase and fasciclin-like arabinogalactan protein 8-like, as well as more cytoskeletal proteins, including actin and tubulin. Cluster of Orthologous Group (COG) analysis also demonstrated that in the forward library, the most ESTs were involved in coenzyme transportation and metabolism. In the reversed library, cytoskeletal ESTs were the most abundant. Gene Ontology (GO) analysis categories demonstrated that unigenes involved in binding, cellular process and electron carrier were the most differentially expressed unigenes between the forward and reversed libraries. All these results suggest that E. crassipes can respond to different nitrogen status by efficiently regulating and controlling some transporter gene expressions, certain metabolism processes, specific signal transduction pathways and cytoskeletal construction.
Shin, Sung Jae; Wu, Chia-wei; Steinberg, Howard; Talaat, Adel M.
2006-01-01
Johne's disease, caused by Mycobacterium paratuberculosis infection, is a worldwide problem for the dairy industry and has a possible involvement in Crohn's disease in humans. To identify virulence determinants of this economically important pathogen, a library of 5,060 transposon mutants was constructed using Tn5367 insertion mutagenesis, followed by large-scale sequencing to identify disrupted genes. In this report, 1,150 mutants were analyzed and 970 unique insertion sites were identified. Sequence analysis of the disrupted genes indicated that the insertion of Tn5367 was more prevalent in genomic regions with G+C content (50.5 to 60.5%) lower than the average G+C content (69.3%) of the rest of the genome. Phenotypic screening of the library identified disruptions of genes involved in iron, tryptophan, or mycolic acid metabolic pathways that displayed unique growth characteristics. Bioinformatic analysis of disrupted genes identified a list of potential virulence determinants for further testing with animals. Mouse infection studies showed a significant decrease in tissue colonization by mutants with a disruption in the gcpE, pstA, kdpC, papA2, impA, umaA1, or fabG2_2 gene. Attenuation phenotypes were tissue specific (e.g., for the umaA1 mutant) as well as time specific (e.g., for the impA mutant), suggesting that those genes may be involved in different virulence mechanisms. The identified potential virulence determinants represent novel functional classes that could be necessary for mycobacterial survival during infection and could provide suitable targets for vaccine and drug development against Johne's and Crohn's diseases. PMID:16790754
Genetic Control of Plant Root Colonization by the Biocontrol agent, Pseudomonas fluorescens
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cole, Benjamin J.; Fletcher, Meghan; Waters, Jordan
Plant growth promoting rhizobacteria (PGPR) are a critical component of plant root ecosystems. PGPR promote plant growth by solubilizing inaccessible minerals, suppressing pathogenic microorganisms in the soil, and directly stimulating growth through hormone synthesis. Pseudomonas fluorescens is a well-established PGPR isolated from wheat roots that can also colonize the root system of the model plant, Arabidopsis thaliana. We have created barcoded transposon insertion mutant libraries suitable for genome-wide transposon-mediated mutagenesis followed by sequencing (TnSeq). These libraries consist of over 105 independent insertions, collectively providing loss-of-function mutants for nearly all genes in the P.fluorescens genome. Each insertion mutant can be unambiguouslymore » identified by a randomized 20 nucleotide sequence (barcode) engineered into the transposon sequence. We used these libraries in a gnotobiotic assay to examine the colonization ability of P.fluorescens on A.thaliana roots. Taking advantage of the ability to distinguish individual colonization events using barcode sequences, we assessed the timing and microbial concentration dependence of colonization of the rhizoplane niche. These data provide direct insight into the dynamics of plant root colonization in an in vivo system and define baseline parameters for the systematic identification of the bacterial genes and molecular pathways using TnSeq assays. Having determined parameters that facilitate potential colonization of roots by thousands of independent insertion mutants in a single assay, we are currently establishing a genome-wide functional map of genes required for root colonization in P.fluorescens. Importantly, the approach developed and optimized here for P.fluorescens>A.thaliana colonization will be applicable to a wide range of plant-microbe interactions, including biofuel feedstock plants and microbes known or hypothesized to impact on biofuel-relevant traits including biomass productivity and pathogen resistance.« less
Fatty acid-oxidizing consortia along a nutrient gradient in the Florida Everglades.
Chauhan, Ashvini; Ogram, Andrew
2006-04-01
The Florida Everglades is one of the largest freshwater marshes in North America and has been subject to eutrophication for decades. A gradient in P concentrations extends for several kilometers into the interior of the northern regions of the marsh, and the structure and function of soil microbial communities vary along the gradient. In this study, stable isotope probing was employed to investigate the fate of carbon from the fermentation products propionate and butyrate in soils from three sites along the nutrient gradient. For propionate microcosms, 16S rRNA gene clone libraries from eutrophic and transition sites were dominated by sequences related to previously described propionate oxidizers, such as Pelotomaculum spp. and Syntrophobacter spp. Significant representation was also observed for sequences related to Smithella propionica, which dismutates propionate to butyrate. Sequences of dominant phylotypes from oligotrophic samples did not cluster with known syntrophs but with sulfate-reducing prokaryotes (SRP) and Pelobacter spp. In butyrate microcosms, sequences clustering with Syntrophospora spp. and Syntrophomonas spp. dominated eutrophic microcosms, and sequences related to Pelospora dominated the transition microcosm. Sequences related to Pelospora spp. and SRP dominated clone libraries from oligotrophic microcosms. Sequences from diverse bacterial phyla and primary fermenters were also present in most libraries. Archaeal sequences from eutrophic microcosms included sequences characteristic of Methanomicrobiaceae, Methanospirillaceae, and Methanosaetaceae. Oligotrophic microcosms were dominated by acetotrophs, including sequences related to Methanosarcina, suggesting accumulation of acetate.
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Perreault, Nancy N.; Andersen, Dale T.; Pollard, Wayne H.; Greer, Charles W.; Whyte, Lyle G.
2007-01-01
The springs at Gypsum Hill and Colour Peak on Axel Heiberg Island in the Canadian Arctic originate from deep salt aquifers and are among the few known examples of cold springs in thick permafrost on Earth. The springs discharge cold anoxic brines (7.5 to 15.8% salts), with a mean oxidoreduction potential of −325 mV, and contain high concentrations of sulfate and sulfide. We surveyed the microbial diversity in the sediments of seven springs by denaturing gradient gel electrophoresis (DGGE) and analyzing clone libraries of 16S rRNA genes amplified with Bacteria and Archaea-specific primers. Dendrogram analysis of the DGGE banding patterns divided the springs into two clusters based on their geographic origin. Bacterial 16S rRNA clone sequences from the Gypsum Hill library (spring GH-4) were classified into seven phyla (Actinobacteria, Bacteroidetes, Firmicutes, Gemmatimonadetes, Proteobacteria, Spirochaetes, and Verrucomicrobia); Deltaproteobacteria and Gammaproteobacteria sequences represented half of the clone library. Sequences related to Proteobacteria (82%), Firmicutes (9%), and Bacteroidetes (6%) constituted 97% of the bacterial clone library from Colour Peak (spring CP-1). Most GH-4 archaeal clone sequences (79%) were related to the Crenarchaeota while half of the CP-1 sequences were related to orders Halobacteriales and Methanosarcinales of the Euryarchaeota. Sequences related to the sulfur-oxidizing bacterium Thiomicrospira psychrophila dominated both the GH-4 (19%) and CP-1 (45%) bacterial libraries, and 56 to 76% of the bacterial sequences were from potential sulfur-metabolizing bacteria. These results suggest that the utilization and cycling of sulfur compounds may play a major role in the energy production and maintenance of microbial communities in these unique, cold environments. PMID:17220254
2010-01-01
Background The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. Results We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. Conclusions This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations. PMID:21092232
NASA Astrophysics Data System (ADS)
Zhou, Jun; Hou, Fujing; Li, Ye; Su, Xiurong; Li, Taiwu; Jin, Chunhua
2016-07-01
Acaudina leucoprocta is an edible sea cucumber of economic interest that is widely distributed in China. Little information is available concerning the molecular genetics of this species although such knowledge would contribute to a better understanding of the optimal conditions for its aquaculture and its mechanisms of defense against disease. Therefore, we constructed a cDNA library and, based on bioinformatics analysis of the sequences, the functions of 75% of the cDNAs were identified, including those involved in cell structure, energy metabolism, mitochondrial function, and signal transduction pathways. Approximately 25% of genes in the library were unmatched. The gene for A. leucoprocta ferritin was also cloned. The predicted amino-acid sequence of ferritin displayed significant homology with other sea-cucumber counterparts but indicated that it was a new member of the ferritin family. Semiquantitative real-time RT-PCR indicated the highest levels of ferritin mRNA expression in the intestine. A polyclonal antibody of ferritin was also produced. These data provide a set of molecular tools essential for further studies of the functions of ferritin protein in A. leucoprocta.
de Bellocq, J Goüy; Leirs, H
2009-09-01
Sequences of the complete open reading frame (ORF) for rodents major histocompatibility complex (MHC) class II genes are rare. Multimammate rat (Mastomys natalensis) complementary DNA (cDNA) encoding the alpha and beta chains of MHC class II DQ gene was cloned from a rapid amplifications of cDNA Emds (RACE) cDNA library. The ORFs consist of 801 and 771 bp encoding 266 and 256 amino acid residues for DQB and DQA, respectively. The genomic structure of Mana-DQ genes is globally analogous to that described for other rodents except for the insertion of a serine residue in the signal peptide of Mana-DQB, which is unique among known rodents.
Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean
2003-08-01
A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.
Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries.
Scanlon, Thomas C; Gray, Elizabeth C; Griswold, Karl E
2009-11-20
In addition to providing the molecular machinery for transcription and translation, recombinant microbial expression hosts maintain the critical genotype-phenotype link that is essential for high throughput screening and recovery of proteins encoded by plasmid libraries. It is known that Escherichia coli cells can be simultaneously transformed with multiple unique plasmids and thusly complicate recombinant library screening experiments. As a result of their potential to yield misleading results, bacterial multiple vector transformants have been thoroughly characterized in previous model studies. In contrast to bacterial systems, there is little quantitative information available regarding multiple vector transformants in yeast. Saccharomyces cerevisiae is the most widely used eukaryotic platform for cell surface display, combinatorial protein engineering, and other recombinant library screens. In order to characterize the extent and nature of multiple vector transformants in this important host, plasmid-born gene libraries constructed by yeast homologous recombination were analyzed by DNA sequencing. It was found that up to 90% of clones in yeast homologous recombination libraries may be multiple vector transformants, that on average these clones bear four or more unique mutant genes, and that these multiple vector cells persist as a significant proportion of library populations for greater than 24 hours during liquid outgrowth. Both vector concentration and vector to insert ratio influenced the library proportion of multiple vector transformants, but their population frequency was independent of transformation efficiency. Interestingly, the average number of plasmids born by multiple vector transformants did not vary with their library population proportion. These results highlight the potential for multiple vector transformants to dominate yeast libraries constructed by homologous recombination. The previously unrecognized prevalence and persistence of multiply transformed yeast cells have important implications for yeast library screens. The quantitative information described herein should increase awareness of this issue, and the rapid sequencing approach developed for these studies should be widely useful for identifying multiple vector transformants and avoiding complications associated with cells that have acquired more than one unique plasmid.
Hecht, Jochen; Kuhl, Heiner; Haas, Stefan A; Bauer, Sebastian; Poustka, Albert J; Lienau, Jasmin; Schell, Hanna; Stiege, Asita C; Seitz, Volkhard; Reinhardt, Richard; Duda, Georg N; Mundlos, Stefan; Robinson, Peter N
2006-07-05
The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes. In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis. The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
Hoffberg, Sandra L; Troendle, Nicholas J; Glenn, Travis C; Mahmud, Ousman; Louha, Swarnali; Chalopin, Domitille; Bennetzen, Jeffrey L; Mauricio, Rodney
2018-04-27
The western mosquitofish, Gambusia affinis, is a freshwater poecilid fish native to the southeastern United States but with a global distribution due to widespread human introduction. Gambusia affinis has been used as a model species for a broad range of evolutionary and ecological studies. We sequenced the genome of a male G. affinis to facilitate genetic studies in diverse fields including invasion biology and comparative genetics. We generated Illumina short read data from paired-end libraries and in vitro proximity-ligation libraries. We obtained 54.9× coverage, N50 contig length of 17.6 kb, and N50 scaffold length of 6.65 Mb. Compared to two other species in the Poeciliidae family, G. affinis has slightly fewer genes that have shorter total, exon, and intron length on average. Using a set of universal single-copy orthologs in fish genomes, we found 95.5% of these genes were complete in the G. affinis assembly. The number of transposable elements in the G. affinis assembly is similar to those of closely related species. The high-quality genome sequence and annotations we report will be valuable resources for scientists to map the genetic architecture of traits of interest in this species. Copyright © 2018, G3: Genes, Genomes, Genetics.
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan
Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria
Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan; ...
2018-01-16
Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
2013-01-01
Background Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. Results In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. Conclusion RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid. PMID:23617896
Zhang, Jianxia; Wu, Kunlin; Zeng, Songjun; Teixeira da Silva, Jaime A; Zhao, Xiaolan; Tian, Chang-En; Xia, Haoqiang; Duan, Jun
2013-04-24
Cymbidium sinense belongs to the Orchidaceae, which is one of the most abundant angiosperm families. C. sinense, a high-grade traditional potted flower, is most prevalent in China and some Southeast Asian countries. The control of flowering time is a major bottleneck in the industrialized development of C. sinense. Little is known about the mechanisms responsible for floral development in this orchid. Moreover, genome references for entire transcriptome sequences do not currently exist for C. sinense. Thus, transcriptome and expression profiling data for this species are needed as an important resource to identify genes and to better understand the biological mechanisms of floral development in C. sinense. In this study, de novo transcriptome assembly and gene expression analysis using Illumina sequencing technology were performed. Transcriptome analysis assembles gene-related information related to vegetative and reproductive growth of C. sinense. Illumina sequencing generated 54,248,006 high quality reads that were assembled into 83,580 unigenes with an average sequence length of 612 base pairs, including 13,315 clusters and 70,265 singletons. A total of 41,687 (49.88%) unique sequences were annotated, 23,092 of which were assigned to specific metabolic pathways by the Kyoto Encyclopedia of Genes and Genomes (KEGG). Gene Ontology (GO) analysis of the annotated unigenes revealed that the majority of sequenced genes were associated with metabolic and cellular processes, cell and cell parts, catalytic activity and binding. Furthermore, 120 flowering-associated unigenes, 73 MADS-box unigenes and 28 CONSTANS-LIKE (COL) unigenes were identified from our collection. In addition, three digital gene expression (DGE) libraries were constructed for the vegetative phase (VP), floral differentiation phase (FDP) and reproductive phase (RP). The specific expression of many genes in the three development phases was also identified. 32 genes among three sub-libraries with high differential expression were selected as candidates connected with flower development. RNA-seq and DGE profiling data provided comprehensive gene expression information at the transcriptional level that could facilitate our understanding of the molecular mechanisms of floral development at three development phases of C. sinense. This data could be used as an important resource for investigating the genetics of the flowering pathway and various biological mechanisms in this orchid.
Rapid Identification of Sequences for Orphan Enzymes to Power Accurate Protein Annotation
Ojha, Sunil; Watson, Douglas S.; Bomar, Martha G.; Galande, Amit K.; Shearer, Alexander G.
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the “back catalog” of enzymology – “orphan enzymes,” those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme “back catalog” is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology’s “back catalog” another powerful tool to drive accurate genome annotation. PMID:24386392
Rapid identification of sequences for orphan enzymes to power accurate protein annotation.
Ramkissoon, Kevin R; Miller, Jennifer K; Ojha, Sunil; Watson, Douglas S; Bomar, Martha G; Galande, Amit K; Shearer, Alexander G
2013-01-01
The power of genome sequencing depends on the ability to understand what those genes and their proteins products actually do. The automated methods used to assign functions to putative proteins in newly sequenced organisms are limited by the size of our library of proteins with both known function and sequence. Unfortunately this library grows slowly, lagging well behind the rapid increase in novel protein sequences produced by modern genome sequencing methods. One potential source for rapidly expanding this functional library is the "back catalog" of enzymology--"orphan enzymes," those enzymes that have been characterized and yet lack any associated sequence. There are hundreds of orphan enzymes in the Enzyme Commission (EC) database alone. In this study, we demonstrate how this orphan enzyme "back catalog" is a fertile source for rapidly advancing the state of protein annotation. Starting from three orphan enzyme samples, we applied mass-spectrometry based analysis and computational methods (including sequence similarity networks, sequence and structural alignments, and operon context analysis) to rapidly identify the specific sequence for each orphan while avoiding the most time- and labor-intensive aspects of typical sequence identifications. We then used these three new sequences to more accurately predict the catalytic function of 385 previously uncharacterized or misannotated proteins. We expect that this kind of rapid sequence identification could be efficiently applied on a larger scale to make enzymology's "back catalog" another powerful tool to drive accurate genome annotation.
Construction of a general human chromosome jumping library, with application to cystic fibrosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Collins, F.S.; Drumm, M.L.; Cole, J.L.
1987-02-27
In many genetic disorders, the responsible gene and its protein product are unknown. The technique known as reverse genetics, in which chromosomal map positions and genetically linked DNA markers are used to identify and clone such genes, is complicated by the fact that the molecular distances from the closest DNA markers to the gene itself are often too large to traverse by standard cloning techniques. To address this situation, a general human chromosome jumping library was constructed that allows the cloning of DNA sequences approximately 100 kilobases away from any starting point in genomic DNA. As an illustration of itsmore » usefulness, this library was searched for a jumping clone, starting at the met oncogene, which is a marker tightly linked to the cystic fibrosis gene that is located on human chromosome 7. Mapping of the new genomic fragment by pulsed field gel electrophoresis confirmed that it resides on chromosome 7 within 240 kilobases downstream of the met gene. The use of chromosome jumping should be applicable to any genetic locus for which a closely linked DNA marker is available.« less
SAGE analysis of early oogenesis in the silkworm, Bombyx mori.
Funaguma, Shunsuke; Hashimoto, Shin-ichi; Suzuki, Yutaka; Omuro, Naoko; Sugano, Sumio; Mita, Kazuei; Katsuma, Susumu; Shimada, Toru
2007-02-01
To identify genes involved in the differentiation of Bombyx cystoblast, we constructed two 3' long serial analysis of gene expression (Long SAGE) libraries from stage 1-3 or stage 2-3 egg chambers and compared their gene expression profiles. In both libraries, the most frequent tags were derived from the same novel transcript. The transcript does not have any open reading frame capable of encoding a protein with over 100 amino acids in length. RNA blot analysis revealed that this transcript is specifically and abundantly expressed in the Bombyx ovary, mainly the germ line cells in the ovarioles. These results suggest that Bombyx oogenesis may be regulated by a previously unidentified non-coding RNA. Comparison of the gene expression profiles between the stage 1-3 and stage 2-3 egg chamber libraries revealed that 272 tags were significantly more abundant in stage 1-3 egg chambers (p<0.05 and at least two-fold change) than in library 2. Among the differentially expressed transcripts were the sequences that correspond to ATP synthase subunit d (3.1-fold enriched) and ATP synthase coupling factor 6 (9.1-fold enriched), suggesting that they are involved in regulation of cell cycle of cystocytes.
Nitrous Oxide Reductase (nosZ) Gene Fragments Differ between Native and Cultivated Michigan Soils
Stres, Blaž; Mahne, Ivan; Avguštin, Gorazd; Tiedje, James M.
2004-01-01
The effect of standard agricultural management on the genetic heterogeneity of nitrous oxide reductase (nosZ) fragments from denitrifying prokaryotes in native and cultivated soil was explored. Thirty-six soil cores were composited from each of the two soil management conditions. nosZ gene fragments were amplified from triplicate samples, and PCR products were cloned and screened by restriction fragment length polymorphism (RFLP). The total nosZ RFLP profiles increased in similarity with soil sample size until triplicate 3-g samples produced visually identical RFLP profiles for each treatment. Large differences in total nosZ profiles were observed between the native and cultivated soils. The fragments representing major groups of clones encountered at least twice and four randomly selected clones with unique RFLP patterns were sequenced to verify nosZ identity. The sequence diversity of nosZ clones from the cultivated field was higher, and only eight patterns were found in clone libraries from both soils among the 182 distinct nosZ RFLP patterns identified from the two soils. A group of clones that comprised 32% of all clones dominated the gene library of native soil, whereas many minor groups were observed in the gene library of cultivated soil. The 95% confidence intervals of the Chao1 nonparametric richness estimator for nosZ RFLP data did not overlap, indicating that the levels of species richness are significantly different in the two soils, the cultivated soil having higher diversity. Phylogenetic analysis of deduced amino acid sequences grouped the majority of nosZ clones into an interleaved Michigan soil cluster whose cultured members are α-Proteobacteria. Only four nosZ sequences from cultivated soil and one from the native soil were related to sequences found in γ-Proteobacteria. Sequences from the native field formed a distinct, closely related cluster (Dmean = 0.16) containing 91.6% of the native clones. Clones from the cultivated field were more distantly related to each other (Dmean = 0.26), and 65% were found outside of the cluster from the native soil, further indicating a difference in the two communities. Overall, there appears to be a relationship between use and richness, diversity, and the phylogenetic position of nosZ sequences, indicating that agricultural use of soil caused a shift to a more diverse denitrifying community. PMID:14711656
Mapping genes to human chromosome 19
DOE Office of Scientific and Technical Information (OSTI.GOV)
Connolly, Sarah
1996-05-01
For this project, 22 Expressed Sequence Tags (ESTs) were fine mapped to regions of human chromosome 19. An EST is a short DNA sequence that occurs once in the genome and corresponds to a single expressed gene. {sup 32}P-radiolabeled probes were made by polymerase chain reaction for each EST and hybridized to filters containing a chromosome 19-specific cosmid library. The location of the ESTs on the chromosome was determined by the location of the ordered cosmid to which the EST hybridized. Of the 22 ESTs that were sublocalized, 6 correspond to known genes, and 16 correspond to anonymous genes. Thesemore » localized ESTs may serve as potential candidates for disease genes, as well as markers for future physical mapping.« less
Knietsch, Anja; Waschkowitz, Tanja; Bowien, Susanne; Henne, Anke; Daniel, Rolf
2003-01-01
Metagenomic DNA libraries from three different soil samples (meadow, sugar beet field, cropland) were constructed. The three unamplified libraries comprised approximately 1267000 independent clones and harbored approximately 4.05 Gbp of environmental DNA. Approximately 300000 recombinant Escherichia coli strains of each library per test substrate were screened for the production of carbonyls from short-chain (C2 to C4) polyols such as 1,2-ethanediol, 2,3-butanediol, and a mixture of glycerol and 1,2-propanediol on indicator agar. Twenty-four positive E. COLI clones were obtained during the initial screen. Fifteen of them contained recombinant plasmids, designated pAK201-215, which conferred a stable carbonyl-forming phenotype on E. coli Sequencing revealed that the inserts of pAK201-215 encoded 26 complete and 14 incomplete predicted protein-encoding genes. Most of these genes were similar to genes with unknown functions from other microorganisms or unrelated to any other known gene. The further analysis was focused on the 7 plasmids (pAK204, pAK206, pAK208, and pAK210-213) recovered from the positive clones, which exhibited an NAD(H)-dependent alcohol oxidoreductase activity with polyols or the correlating carbonyls as substrates in crude extracts. Three genes (ORF6, ORF24, and ORF25) conferring this activity were identified during subcloning of the inserts of pAK204, pAK211, and pAK212. The sequences of the three deduced gene products revealed no significant similarities to known alcohol oxidoreductases, but contained putative glycine-rich regions, which are characteristic for binding of nicotinamide cofactors. Copyright 2003 S. Karger AG, Basel
Chun, Carlene K; Scheetz, Todd E; Bonaldo, Maria de Fatima; Brown, Bartley; Clemens, Anik; Crookes-Goodson, Wendy J; Crouch, Keith; DeMartini, Tad; Eyestone, Mari; Goodson, Michael S; Janssens, Bernadette; Kimbell, Jennifer L; Koropatnick, Tanya A; Kucaba, Tamara; Smith, Christina; Stewart, Jennifer J; Tong, Deyan; Troll, Joshua V; Webster, Sarahrose; Winhall-Rice, Jane; Yap, Cory; Casavant, Thomas L; McFall-Ngai, Margaret J; Soares, M Bento
2006-01-01
Background Biologists are becoming increasingly aware that the interaction of animals, including humans, with their coevolved bacterial partners is essential for health. This growing awareness has been a driving force for the development of models for the study of beneficial animal-bacterial interactions. In the squid-vibrio model, symbiotic Vibrio fischeri induce dramatic developmental changes in the light organ of host Euprymna scolopes over the first hours to days of their partnership. We report here the creation of a juvenile light-organ specific EST database. Results We generated eleven cDNA libraries from the light organ of E. scolopes at developmentally significant time points with and without colonization by V. fischeri. Single pass 3' sequencing efforts generated 42,564 expressed sequence tags (ESTs) of which 35,421 passed our quality criteria and were then clustered via the UIcluster program into 13,962 nonredundant sequences. The cDNA clones representing these nonredundant sequences were sequenced from the 5' end of the vector and 58% of these resulting sequences overlapped significantly with the associated 3' sequence to generate 8,067 contigs with an average sequence length of 1,065 bp. All sequences were annotated with BLASTX (E-value < -03) and Gene Ontology (GO). Conclusion Both the number of ESTs generated from each library and GO categorizations are reflective of the activity state of the light organ during these early stages of symbiosis. Future analyses of the sequences identified in these libraries promise to provide valuable information not only about pathways involved in colonization and early development of the squid light organ, but also about pathways conserved in response to bacterial colonization across the animal kingdom. PMID:16780587
Identification of genes from the Treacher Collins candidate region
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dixon, M.; Dixon, J.; Edwards, S.
Treacher Collins syndrome (TCOF1) is an autosomal dominant disorder of craniofacial development. The TCOF1 locus has previously been mapped to chromosome 5q32-33. The candidate gene region has been defined as being between two flanking markers, ribosomal protein S14 (RPS14) and Annexin 6 (ANX6), by analyzing recombination events in affected individuals. It is estimated that the distance between these flanking markers is 500 kb by three separate analysis methods: (1) radiation hybrid mapping; (2) genetic linkage; and (3) YAC contig analysis. A cosmid contig which spans the candidate gene region for TCOF1 has been constructed by screening the Los Alamos Nationalmore » Laboratory flow-sorted chromosome 5 cosmid library. Cosmids were obtained by using a combination of probes generated from YAC end clones, Alu-PCR fragments from YACs, and asymmetric PCR fragments from both T7 and T3 cosmid ends. Exon amplifications, the selection of genomic coding sequences based upon the presence of functional splice acceptor and donor sites, was used to identify potential exon sequences. Sequences found to be conserved between species were then used to screen cDNA libraries in order to identify candidate genes. To date, four different cDNAs have been isolated from this region and are being analyzed as potential candidate genes for TCOF1. These include the genes encoding plasma glutathione peroxidase (GPX3), heparin sulfate sulfotransferase (HSST), a gene with homology to the ETS family of proteins and one which shows no homology to any known genes. Work is also in progress to identify and characterize additional cDNAs from the candidate gene region.« less
Rosconi, Federico; de Vries, Stefan P W; Baig, Abiyad; Fabiano, Elena; Grant, Andrew J
2016-11-15
The interior of plants contains microorganisms (referred to as endophytes) that are distinct from those present at the root surface or in the surrounding soil. Herbaspirillum seropedicae strain SmR1, belonging to the betaproteobacteria, is an endophyte that colonizes crops, including rice, maize, sugarcane, and sorghum. Different approaches have revealed genes and pathways regulated during the interactions of H. seropedicae with its plant hosts. However, functional genomic analysis of transposon (Tn) mutants has been hampered by the lack of genetic tools. Here we successfully employed a combination of in vivo high-density mariner Tn mutagenesis and targeted Tn insertion site sequencing (Tn-seq) in H. seropedicae SmR1. The analysis of multiple gene-saturating Tn libraries revealed that 395 genes are essential for the growth of H. seropedicae SmR1 in tryptone-yeast extract medium. A comparative analysis with the Database of Essential Genes (DEG) showed that 25 genes are uniquely essential in H. seropedicae SmR1. The Tn mutagenesis protocol developed and the gene-saturating Tn libraries generated will facilitate elucidation of the genetic mechanisms of the H. seropedicae endophytic lifestyle. A focal point in the study of endophytes is the development of effective biofertilizers that could help to reduce the input of agrochemicals in croplands. Besides the ability to promote plant growth, a good biofertilizer should be successful in colonizing its host and competing against the native microbiota. By using a systematic Tn-based gene-inactivation strategy and massively parallel sequencing of Tn insertion sites (Tn-seq), it is possible to study the fitness of thousands of Tn mutants in a single experiment. We have applied the combination of these techniques to the plant-growth-promoting endophyte Herbaspirillum seropedicae SmR1. The Tn mutant libraries generated will enable studies into the genetic mechanisms of H. seropedicae-plant interactions. The approach that we have taken is applicable to other plant-interacting bacteria. Copyright © 2016 Rosconi et al.
A framework linkage map of perennial ryegrass based on SSR markers
G.P. Gill; P.L. Wilcox; D.J. Whittaker; R.A. Winz; P. Bickerstaff; Craig E. Echt; J. Kent; M.O. Humphreys; K.M. Elborough; R.C. Gardner
2006-01-01
A moderate-density linkage map for Lolium perenne L. has been constructed based on 376 simple sequence repeat (SSR) markers. Approximately one third ( 124) of the SSR markers were developed from GeneThresher libraries that preferentially select genomic DNA clones from the gene-rich unmethylated portion of the genome. The remaining SSR marker loci...
USDA-ARS?s Scientific Manuscript database
Oocyte-specific genes play critical roles in oogenesis, folliculogenesis and early embryonic development. Through analysis of expressed sequence tags (ESTs) from a rainbow trout oocyte cDNA library, we identified a novel transcript which is represented by multiple ESTs derived only from the oocyte c...
Karaevskaia, E S; Demchenko, L S; Demidov, N É; Rivkina, E M; Bulat, S A; Gilichinskiĭ, D A
2014-01-01
Archaeal communities of permafrost deposits of King George Island and Bunger Hills Oasis (Antarctica) differing in the content of biogenic methane were analyzed using clone libraries of two 16S rRNA gene regions. Phylotypes belonging to methanogenic archaea were identified in all horizons.
2014-01-01
Background Coconut (Cocos nucifera L.) is one of the world’s most versatile, economically important tropical crops. Little is known about the physiological and molecular basis of coconut pulp (endosperm) development and only a few coconut genes and gene product sequences are available in public databases. This study identified genes that were differentially expressed during development of coconut pulp and functionally annotated these identified genes using bioinformatics analysis. Results Pulp from three different coconut developmental stages was collected. Four suppression subtractive hybridization (SSH) libraries were constructed (forward and reverse libraries A and B between stages 1 and 2, and C and D between stages 2 and 3), and identified sequences were computationally annotated using Blast2GO software. A total of 1272 clones were obtained for analysis from four SSH libraries with 63% showing similarity to known proteins. Pairwise comparing of stage-specific gene ontology ids from libraries B-D, A-C, B-C and A-D showed that 32 genes were continuously upregulated and seven downregulated; 28 were transiently upregulated and 23 downregulated. KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis showed that 1-acyl-sn-glycerol-3-phosphate acyltransferase (LPAAT), phospholipase D, acetyl-CoA carboxylase carboxyltransferase beta subunit, 3-hydroxyisobutyryl-CoA hydrolase-like and pyruvate dehydrogenase E1 β subunit were associated with fatty acid biosynthesis or metabolism. Triose phosphate isomerase, cellulose synthase and glucan 1,3-β-glucosidase were related to carbohydrate metabolism, and phosphoenolpyruvate carboxylase was related to both fatty acid and carbohydrate metabolism. Of 737 unigenes, 103 encoded enzymes were involved in fatty acid and carbohydrate biosynthesis and metabolism, and a number of transcription factors and other interesting genes with stage-specific expression were confirmed by real-time PCR, with validation of the SSH results as high as 66.6%. Based on determination of coconut endosperm fatty acids content by gas chromatography–mass spectrometry, a number of candidate genes in fatty acid anabolism were selected for further study. Conclusion Functional annotation of genes differentially expressed in coconut pulp development helped determine the molecular basis of coconut endosperm development. The SSH method identified genes related to fatty acids, carbohydrate and secondary metabolites. The results will be important for understanding gene functions and regulatory networks in coconut fruit. PMID:25084812
Liang, Yuanxue; Yuan, Yijun; Liu, Tao; Mao, Wei; Zheng, Yusheng; Li, Dongdong
2014-08-02
Coconut (Cocos nucifera L.) is one of the world's most versatile, economically important tropical crops. Little is known about the physiological and molecular basis of coconut pulp (endosperm) development and only a few coconut genes and gene product sequences are available in public databases. This study identified genes that were differentially expressed during development of coconut pulp and functionally annotated these identified genes using bioinformatics analysis. Pulp from three different coconut developmental stages was collected. Four suppression subtractive hybridization (SSH) libraries were constructed (forward and reverse libraries A and B between stages 1 and 2, and C and D between stages 2 and 3), and identified sequences were computationally annotated using Blast2GO software. A total of 1272 clones were obtained for analysis from four SSH libraries with 63% showing similarity to known proteins. Pairwise comparing of stage-specific gene ontology ids from libraries B-D, A-C, B-C and A-D showed that 32 genes were continuously upregulated and seven downregulated; 28 were transiently upregulated and 23 downregulated. KEGG (Kyoto Encyclopedia of Genes and Genomes) analysis showed that 1-acyl-sn-glycerol-3-phosphate acyltransferase (LPAAT), phospholipase D, acetyl-CoA carboxylase carboxyltransferase beta subunit, 3-hydroxyisobutyryl-CoA hydrolase-like and pyruvate dehydrogenase E1 β subunit were associated with fatty acid biosynthesis or metabolism. Triose phosphate isomerase, cellulose synthase and glucan 1,3-β-glucosidase were related to carbohydrate metabolism, and phosphoenolpyruvate carboxylase was related to both fatty acid and carbohydrate metabolism. Of 737 unigenes, 103 encoded enzymes were involved in fatty acid and carbohydrate biosynthesis and metabolism, and a number of transcription factors and other interesting genes with stage-specific expression were confirmed by real-time PCR, with validation of the SSH results as high as 66.6%. Based on determination of coconut endosperm fatty acids content by gas chromatography-mass spectrometry, a number of candidate genes in fatty acid anabolism were selected for further study. Functional annotation of genes differentially expressed in coconut pulp development helped determine the molecular basis of coconut endosperm development. The SSH method identified genes related to fatty acids, carbohydrate and secondary metabolites. The results will be important for understanding gene functions and regulatory networks in coconut fruit.
Xuxia, Wang; Jie, Chen; Bo, Wang; Lijun, Liu; Hui, Jiang; Diluo, Tang; Dingxiang, Peng
2012-01-01
For the purpose of screening putative anthracnose resistance-related genes of ramie ( Boehmeria nivea L. Gaud), a cDNA library was constructed by suppression subtractive hybridization using anthracnose-resistant cultivar Huazhu no. 4. The cDNAs from Huazhu no. 4, which were infected with Colletotrichum gloeosporioides , were used as the tester and cDNAs from uninfected Huazhu no. 4 as the driver. Sequencing analysis and homology searching showed that these clones represented 132 single genes, which were assigned to functional categories, including 14 putative cellular functions, according to categories established for Arabidopsis . These 132 genes included 35 disease resistance and stress tolerance-related genes including putative heat-shock protein 90, metallothionein, PR-1.2 protein, catalase gene, WRKY family genes, and proteinase inhibitor-like protein. Partial disease-related genes were further analyzed by reverse transcription PCR and RNA gel blot. These expressed sequence tags are the first anthracnose resistance-related expressed sequence tags reported in ramie.
Wang, Zhong-dong; Wu, Ji-nan; Zhou, Lin; Ling, Jun-qi; Guo, Xi-min; Xiao, Ming-zhen; Zhu, Feng; Pu, Qin; Chai, Yu-bo; Zhao, Zhong-liang
2007-02-01
To study the biological properties of human dental pulp cells (HDPC) by cloning and analysis of genes differentially expressed in HDPC in comparison with human gingival fibroblasts (HGF). HDPC and HGF were cultured and identified by immunocytochemistry. HPDC and HGF subtractive cDNA library was established by PCR-based modified subtractive hybridization, genes differentially expressed by HPDC were cloned, sequenced and compared to find homogeneous sequence in GenBank by BLAST. Cloning and sequencing analysis indicate 12 genes differentially expressed were obtained, in which two were unknown genes. Among the 10 known genes, 4 were related to signal transduction, 2 were related to trans-membrane transportation (both cell membrane and nuclear membrane), and 2 were related to RNA splicing mechanisms. The biological properties of HPDC are determined by the differential expression of some genes and the growth and differentiation of HPDC are associated to the dynamic protein synthesis and secretion activities of the cell.
2013-01-01
Background Soybean is an important crop that provides valuable proteins and oils for human use. Because soybean growth and development is extremely sensitive to water deficit, quality and crop yields are severely impacted by drought stress. In the face of limited water resources, drought-responsive genes are therefore of interest. Identification and analysis of dehydration- and rehydration-inducible differentially expressed genes (DEGs) would not only aid elucidation of molecular mechanisms of stress response, but also enable improvement of crop stress tolerance via gene transfer. Using Digital Gene Expression Tag profiling (DGE), a new technique based on Illumina sequencing, we analyzed expression profiles between two soybean genotypes to identify drought-responsive genes. Results Two soybean genotypes—drought-tolerant Jindou21 and drought-sensitive Zhongdou33—were subjected to dehydration and rehydration conditions. For analysis of DEGs under dehydration conditions, 20 cDNA libraries were generated from roots and leaves at two different time points under well-watered and dehydration conditions. We also generated eight libraries for analysis under rehydration conditions. Sequencing of the 28 libraries produced 25,000–33,000 unambiguous tags, which were mapped to reference sequences for annotation of expressed genes. Many genes exhibited significant expression differences among the libraries. DEGs in the drought-tolerant genotype were identified by comparison of DEGs among treatments and genotypes. In Jindou21, 518 and 614 genes were differentially expressed under dehydration in leaves and roots, respectively, with 24 identified both in leaves and roots. The main functional categories enriched in these DEGs were metabolic process, response to stresses, plant hormone signal transduction, protein processing, and plant-pathogen interaction pathway; the associated genes primarily encoded transcription factors, protein kinases, and other regulatory proteins. The seven most significantly expressed (|log2 ratio| ≥ 8) genes— Glyma15g03920, Glyma05g02470, Glyma15g15010, Glyma05g09070, Glyma06g35630, Glyma08g12590, and Glyma11g16000—are more likely to determine drought stress tolerance. The expression patterns of eight randomly-selected genes were confirmed by quantitative RT-PCR; the results of QRT-PCR analysis agreed with transcriptional profile data for 96 out of 128 (75%) data points. Conclusions Many soybean genes were differentially expressed between drought-tolerant and drought-sensitive genotypes. Based on GO functional annotation and pathway enrichment analysis, some of these genes encoded transcription factors, protein kinases, and other regulatory proteins. The seven most significant DEGs are candidates for improving soybean drought tolerance. These findings will be helpful for analysis and elucidation of molecular mechanisms of drought tolerance; they also provide a basis for cultivating new varieties of drought-tolerant soybean. PMID:24093224
Kasi, Devi; Catherine, Christy; Lee, Seung-Won; Lee, Kyung-Ho; Kim, Yu Jung; Ro Lee, Myeong; Ju, Jung Won; Kim, Dong-Myung
2017-05-01
The rapidly evolving cloning and sequencing technologies have enabled understanding of genomic structure of parasite genomes, opening up new ways of combatting parasite-related diseases. To make the most of the exponentially accumulating genomic data, however, it is crucial to analyze the proteins encoded by these genomic sequences. In this study, we adopted an engineered cell-free protein synthesis system for large-scale expression screening of an expression sequence tag (EST) library of Clonorchis sinensis to identify potential antigens that can be used for diagnosis and treatment of clonorchiasis. To allow high-throughput expression and identification of individual genes comprising the library, a cell-free synthesis reaction was designed such that both the template DNA and the expressed proteins were co-immobilized on the same microbeads, leading to microbead-based linkage of the genotype and phenotype. This reaction configuration allowed streamlined expression, recovery, and analysis of proteins. This approach enabled us to identify 21 antigenic proteins. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:832-837, 2017. © 2017 American Institute of Chemical Engineers.
Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.
Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción
2016-02-27
In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.
Camanocha, Anuj; Dewhirst, Floyd E.
2014-01-01
Background and objective In addition to the well-known phyla Firmicutes, Proteobacteria, Bacteroidetes, Actinobacteria, Spirochaetes, Fusobacteria, Tenericutes, and Chylamydiae, the oral microbiomes of mammals contain species from the lesser-known phyla or candidate divisions, including Synergistetes, TM7, Chlorobi, Chloroflexi, GN02, SR1, and WPS-2. The objectives of this study were to create phyla-selective 16S rDNA PCR primer pairs, create selective 16S rDNA clone libraries, identify novel oral taxa, and update canine and human oral microbiome databases. Design 16S rRNA gene sequences for members of the lesser-known phyla were downloaded from GenBank and Greengenes databases and aligned with sequences in our RNA databases. Primers with potential phylum level selectivity were designed heuristically with the goal of producing nearly full-length 16S rDNA amplicons. The specificity of primer pairs was examined by making clone libraries from PCR amplicons and determining phyla identity by BLASTN analysis. Results Phylum-selective primer pairs were identified that allowed construction of clone libraries with 96–100% specificity for each of the lesser-known phyla. From these clone libraries, seven human and two canine novel oral taxa were identified and added to their respective taxonomic databases. For each phylum, genome sequences closest to human oral taxa were identified and added to the Human Oral Microbiome Database to facilitate metagenomic, transcriptomic, and proteomic studies that involve tiling sequences to the most closely related taxon. While examining ribosomal operons in lesser-known phyla from single-cell genomes and metagenomes, we identified a novel rRNA operon order (23S-5S-16S) in three SR1 genomes and the splitting of the 23S rRNA gene by an I-CeuI-like homing endonuclease in a WPS-2 genome. Conclusions This study developed useful primer pairs for making phylum-selective 16S rRNA clone libraries. Phylum-specific libraries were shown to be useful for identifying previously unrecognized taxa in lesser-known phyla and would be useful for future environmental and host-associated studies. PMID:25317252
Mammalian cDNA Library from the NIH Mammalian Gene Collection (MGC) | Office of Cancer Genomics
The MGC provides the research community full-length clones for most of the defined (as of 2006) human and mouse genes, along with selected clones of cow and rat genes. Clones were designed to allow easy transfer of the ORF sequences into nearly any type of expression vector. MGC provides protein ‘expression-ready’ clones for each of the included human genes. MGC is part of the ORFeome Collaboration (OC).
Gao, Lihai; Lin, Weitie
2011-01-01
In order to study the diversity of ammonia-oxidizing bacteria (AOB) and ammonia-oxidizing archaea (AOA) in shrimp farm sediment. Total microbial DNA was directly extracted from the shrimp farm sediment. The clone library of amoA genes were constructed with beta-Proteobacterial-AOB and AOA specific primers. The library was screened by PCR-restriction fragment length polymorphism (RFLP) analysis and clones with unique RFLP patterns were sequenced. Phylogenetic analyses of the amoA gene fragments showed that all AOB sequences from shrimp farm sediment were affiliated with Nitrosomonas (61.54%) or Nitrosomonas-like (38. 46%) species and grouped into Nitrosomonas communis cluster, Nitrosomonas sp. Nm148 cluster, Nitrosomonas oligotropha cluster. All AOA sequences belonged to the kingdom Crenarchaeote except that one Operational Taxa Unit (OTU) sequence was Unclassified-Archaea and fell within cluster S (soil origin). AOB and AOA species composition included 13 OTUs and 9 OTUs. The clone coverage of bacterial and archaeal amoA genes was 73.47% and 90.43%. The Shannon-Wiener index, Evenness index, Simpson index and Richness index of AOB were higher than those of AOA. These findings represent the first detailed examination of archaeal amoA diversity in shrimp farm sediment and demonstrate that diverse communities of Crenarchaeote capable of ammonia oxidation are present within shrimp farm sediment, where they may be actively involved in nitrification.
2010-01-01
Background Cutaneous mycoses are common human infections among healthy and immunocompromised hosts, and the anthropophilic fungus Trichophyton rubrum is the most prevalent microorganism isolated from such clinical cases worldwide. The aim of this study was to determine the transcriptional profile of T. rubrum exposed to various stimuli in order to obtain insights into the responses of this pathogen to different environmental challenges. Therefore, we generated an expressed sequence tag (EST) collection by constructing one cDNA library and nine suppression subtractive hybridization libraries. Results The 1388 unigenes identified in this study were functionally classified based on the Munich Information Center for Protein Sequences (MIPS) categories. The identified proteins were involved in transcriptional regulation, cellular defense and stress, protein degradation, signaling, transport, and secretion, among other functions. Analysis of these unigenes revealed 575 T. rubrum sequences that had not been previously deposited in public databases. Conclusion In this study, we identified novel T. rubrum genes that will be useful for ORF prediction in genome sequencing and facilitating functional genome analysis. Annotation of these expressed genes revealed metabolic adaptations of T. rubrum to carbon sources, ambient pH shifts, and various antifungal drugs used in medical practice. Furthermore, challenging T. rubrum with cytotoxic drugs and ambient pH shifts extended our understanding of the molecular events possibly involved in the infectious process and resistance to antifungal drugs. PMID:20144196
2011-01-01
Background Lupinus angustifolius L, also known as narrow-leafed lupin (NLL), is becoming an important grain legume crop that is valuable for sustainable farming and is becoming recognised as a potential human health food. Recent interest is being directed at NLL to improve grain production, disease and pest management and health benefits of the grain. However, studies have been hindered by a lack of extensive genomic resources for the species. Results A NLL BAC library was constructed consisting of 111,360 clones with an average insert size of 99.7 Kbp from cv Tanjil. The library has approximately 12 × genome coverage. Both ends of 9600 randomly selected BAC clones were sequenced to generate 13985 BAC end-sequences (BESs), covering approximately 1% of the NLL genome. These BESs permitted a preliminary characterisation of the NLL genome such as organisation and composition, with the BESs having approximately 39% G:C content, 16.6% repetitive DNA and 5.4% putative gene-encoding regions. From the BESs 9966 simple sequence repeat (SSR) motifs were identified and some of these are shown to be potential markers. Conclusions The NLL BAC library and BAC-end sequences are powerful resources for genetic and genomic research on lupin. These resources will provide a robust platform for future high-resolution mapping, map-based cloning, comparative genomics and assembly of whole-genome sequencing data for the species. PMID:22014081
Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P
2008-01-01
Background The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry. PMID:18570660
Gene Expression Differences in Infected and Noninfected Middle Ear Complementary DNA Libraries
Kerschner, Joseph E.; Horsey, Edward; Ahmed, Azad; Erbe, Christy; Khampang, Pawjai; Cioffi, Joseph; Hu, Fen Ze; Post, James Christopher; Ehrlich, Garth D.
2010-01-01
Objectives To investigate genetic differences in middle ear mucosa (MEM) with nontypeable Haemophilus influenzae (NTHi) infection. Genetic upregulation and downregulation occurs in MEM during otitis media (OM) pathogenesis. A comprehensive assessment of these genetic differences using the techniques of complementary DNA (cDNA) library creation has not been performed. Design The cDNA libraries were constructed from NTHi-infected and noninfected chinchilla MEM. Random clones were picked, sequenced bidirectionally, and submitted to the National Center for Biotechnology Information (NCBI) Expressed Sequence Tags database, where they were assigned accession numbers. These numbers were used with the basic local alignment search tool (BLAST) to align clones against the nonredundant nucleotide database at NCBI. Results Analysis with the Web-based statistical program FatiGO identified several biological processes with significant differences in numbers of represented genes. Processes involved in immune, stress, and wound responses were more prevalent in the NTHi-infected library. S100 calcium-binding protein A9 (S100A9); secretory leukoprotease inhibitor (SLPI); β2-microglobulin (B2M); ferritin, heavy-chain polypeptide 1 (FTH1); and S100 calcium-binding protein A8 (S100A8) were expressed at significantly higher levels in the NTHi-infected library. Calcium-binding proteins S100A9 and S100A8 serve as markers for inflammation and have antibacterial effects. Secretory leukoprotease inhibitor is an antibacterial protein that inhibits stimuli-induced MUC1, MUC2, and MUC5AC production. Conclusions A number of genes demonstrate changes during the pathogenesis of OM, including SLPI, which has an impact on mucin gene expression; this expression is known to be an important regulator in OM. The techniques described herein provide a framework for future investigations to more thoroughly understand molecular changes in the middle ear, which will likely be important in developing new therapeutic and intervention strategies. PMID:19153305
Uprobe: a genome-wide universal probe resource for comparative physical mapping in vertebrates.
Kellner, Wendy A; Sullivan, Robert T; Carlson, Brian H; Thomas, James W
2005-01-01
Interspecies comparisons are important for deciphering the functional content and evolution of genomes. The expansive array of >70 public vertebrate genomic bacterial artificial chromosome (BAC) libraries can provide a means of comparative mapping, sequencing, and functional analysis of targeted chromosomal segments that is independent and complementary to whole-genome sequencing. However, at the present time, no complementary resource exists for the efficient targeted physical mapping of the majority of these BAC libraries. Universal overgo-hybridization probes, designed from regions of sequenced genomes that are highly conserved between species, have been demonstrated to be an effective resource for the isolation of orthologous regions from multiple BAC libraries in parallel. Here we report the application of the universal probe design principal across entire genomes, and the subsequent creation of a complementary probe resource, Uprobe, for screening vertebrate BAC libraries. Uprobe currently consists of whole-genome sets of universal overgo-hybridization probes designed for screening mammalian or avian/reptilian libraries. Retrospective analysis, experimental validation of the probe design process on a panel of representative BAC libraries, and estimates of probe coverage across the genome indicate that the majority of all eutherian and avian/reptilian genes or regions of interest can be isolated using Uprobe. Future implementation of the universal probe design strategy will be used to create an expanded number of whole-genome probe sets that will encompass all vertebrate genomes.
Modeling Structure-Function Relationships in Synthetic DNA Sequences using Attribute Grammars
Cai, Yizhi; Lux, Matthew W.; Adam, Laura; Peccoud, Jean
2009-01-01
Recognizing that certain biological functions can be associated with specific DNA sequences has led various fields of biology to adopt the notion of the genetic part. This concept provides a finer level of granularity than the traditional notion of the gene. However, a method of formally relating how a set of parts relates to a function has not yet emerged. Synthetic biology both demands such a formalism and provides an ideal setting for testing hypotheses about relationships between DNA sequences and phenotypes beyond the gene-centric methods used in genetics. Attribute grammars are used in computer science to translate the text of a program source code into the computational operations it represents. By associating attributes with parts, modifying the value of these attributes using rules that describe the structure of DNA sequences, and using a multi-pass compilation process, it is possible to translate DNA sequences into molecular interaction network models. These capabilities are illustrated by simple example grammars expressing how gene expression rates are dependent upon single or multiple parts. The translation process is validated by systematically generating, translating, and simulating the phenotype of all the sequences in the design space generated by a small library of genetic parts. Attribute grammars represent a flexible framework connecting parts with models of biological function. They will be instrumental for building mathematical models of libraries of genetic constructs synthesized to characterize the function of genetic parts. This formalism is also expected to provide a solid foundation for the development of computer assisted design applications for synthetic biology. PMID:19816554
USDA-ARS?s Scientific Manuscript database
About 447 millions of RNA-Seq sequences were generated from 40 RNA libraries covering 8 different berry developmental stages of table grape ‘Kyoho’ and its early ripening bud mutant ‘Fengzao’. These sequences were mapped to 23,178 and 22,982 genes in the flesh and peel tissues, respectively. While m...
Fan, Qing-Jie; Yan, Feng-Xia; Qiao, Guang; Zhang, Bing-Xue; Wen, Xiao-Peng
2014-01-01
Drought is one of the most severe threats to the growth, development and yield of plant. In order to unravel the molecular basis underlying the high tolerance of pitaya (Hylocereus undatus) to drought stress, suppression subtractive hybridization (SSH) and cDNA microarray approaches were firstly combined to identify the potential important or novel genes involved in the plant responses to drought stress. The forward (drought over drought-free) and reverse (drought-free over drought) suppression subtractive cDNA libraries were constructed using in vitro shoots of cultivar 'Zihonglong' exposed to drought stress and drought-free (control). A total of 2112 clones, among which half were from either forward or reverse SSH library, were randomly picked up to construct a pitaya cDNA microarray. Microarray analysis was carried out to verify the expression fluctuations of this set of clones upon drought treatment compared with the controls. A total of 309 expressed sequence tags (ESTs), 153 from forward library and 156 from reverse library, were obtained, and 138 unique ESTs were identified after sequencing by clustering and blast analyses, which included genes that had been previously reported as responsive to water stress as well as some functionally unknown genes. Thirty six genes were mapped to 47 KEGG pathways, including carbohydrate metabolism, lipid metabolism, energy metabolism, nucleotide metabolism, and amino acid metabolism of pitaya. Expression analysis of the selected ESTs by reverse transcriptase polymerase chain reaction (RT-PCR) corroborated the results of differential screening. Moreover, time-course expression patterns of these selected ESTs further confirmed that they were closely responsive to drought treatment. Among the differentially expressed genes (DEGs), many are related to stress tolerances including drought tolerance. Thereby, the mechanism of drought tolerance of this pitaya genotype is a very complex physiological and biochemical process, in which multiple metabolism pathways and many genes were implicated. The data gained herein provide an insight into the mechanism underlying the drought stress tolerance of pitaya, as well as may facilitate the screening of candidate genes for drought tolerance. © 2013 Elsevier B.V. All rights reserved.
Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag
2015-01-01
Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity
Hurst, Gregory D.D.
2017-01-01
High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.
Gerth, Michael; Hurst, Gregory D D
2017-01-01
High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Isolation and expression of a Bacillus cereus gene encoding benzil reductase.
Maruyama, R; Nishizawa, M; Itoi, Y; Ito, S; Inoue, M
2001-12-20
Benzil was reduced stereospecifically to (S)-benzoin by Bacillus cereus strain Tim-r01. To isolate the gene responsible for asymmetric reduction, we constructed a library consisting of Escherichia coli clones that harbored plasmids expressing Bacillus cereus genes. The library was screened using the halo formation assay, and one clone showed benzil reduction to (S)-benzoin. Thus, this clone seemed to carry a plasmid encoding a Bacillus cereus benzil reductase. The deduced amino acid sequence had marked homologies to the Bacillus subtilis yueD protein (41% identity), the yeast open reading frame YIR036C protein (31%), and the mammalian sepiapterin reductases (28% to 30%), suggesting that benzil reductase is a novel short-chain de-hydrogenases/ reductase. Copyright 2001 John Wiley & Sons, Inc.
Ng, C Y; Wickneswari, R; Choong, C Y
2014-08-07
Calamus palustris Griff. is an economically important dioecious rattan species in Southeast Asia. However, dioecy and onset of flowering at 3-4 years old render uncertainties in desired female:male seedling ratios to establish a productive seed orchard for this rattan species. We constructed a subtractive library for male floral tissue to understand the genetic mechanism for gender determination in C. palustris. The subtractive library produced 1536 clones with 1419 clones of high quality. Reverse Northern screening showed 313 clones with differential expression, and sequence analyses clustered them into 205 unigenes, including 32 contigs and 173 singletons. The subtractive library was further validated with reverse transcription-quantitative polymerase chain reaction analysis. Homology identification classified the unigenes into 12 putative functional proteins with 83% unigenes showing significant match to proteins in databases. Functional annotations of these unigenes revealed genes involved in male flower development, including MADS-box genes, pollen-related genes, phytohormones for flower development, and male flower organ development. Our results showed that the male floral genes may play a vital role in sex determination in C. palustris. The identified genes can be exploited to understand the molecular basis of sex determination in C. palustris.
Wen, Yangming; Lan, Kaijian; Wang, Junjie; Yu, Jingyi; Qu, Yarong; Zhao, Wei; Zhang, Fuchun; Tan, Wanlong; Cao, Hong; Zhou, Chen
2013-06-01
To construct dengue virus-specific full-length fully human antibody libraries using mammalian cell surface display technique. Total RNA was extracted from peripheral blood mononuclear cells (PBMCs) from convalescent patients with dengue fever. The reservoirs of the light chain and heavy chain variable regions (LCκ and VH) of the antibody genes were amplified by RT-PCR and inserted into the vector pDGB-HC-TM separately to construct the light chain and heavy chain libraries. The library DNAs were transfected into CHO cells and the expression of full-length fully human antibodies on the surface of CHO cells was analyzed by flow cytometry. Using 1.2 µg of the total RNA isolated from the PBMCs as the template, the LCκ and VH were amplified and the full-length fully human antibody mammalian display libraries were constructed. The kappa light chain gene library had a size of 1.45×10(4) and the heavy chain gene library had a size of 1.8×10(5). Sequence analysis showed that 8 out of the 10 light chain clones and 7 out of the 10 heavy chain clones randomly picked up from the constructed libraries contained correct open reading frames. FACS analysis demonstrated that all the 15 clones with correct open reading frames expressed full-length antibodies, which could be detected on CHO cell surfaces. After co-transfection of the heavy chain and light chain gene libraries into CHO cells, the expression of full-length antibodies on CHO cell surfaces could be detected by FACS analysis with an expressible diversity of the antibody library reaching 1.46×10(9) [(1.45×10(4)×80%)×(1.8×10(5)×70%)]. Using 1.2 µg of total RNA as template, the LCκ and VH full-length fully human antibody libraries against dengue virus have been successfully constructed with an expressible diversity of 10(9).
Rampuria, Sakshi; Joshi, Uma; Palit, Paramita; Deokar, Amit A; Meghwal, Raju R; Mohapatra, T; Srinivasan, R; Bhatt, K V; Sharma, Ramavtar
2012-11-01
Moth bean ( Vigna aconitifolia (Jacq.) Marechal) is an important grain legume crop grown in rain fed areas of hot desert regions of Thar, India, under scorching sun rays with very little supplementation of water. An SSH cDNA library was generated from leaf tissues of V. aconitifolia var. RMO-40 exposed to an elevated temperature of 42 °C for 5 min to identify early-induced genes. A total of 488 unigenes (114 contigs and 374 singletons) were derived by cluster assembly and sequence alignment of 738 ESTs; out of 206 ESTs (28%) of unknown proteins, 160 ESTs (14%) were found to be novel to moth bean. Only 578 ESTs (78%) showed significant BLASTX similarity (<1 × 10(-6)) in the NCBI non-redundant database. Gene ontology functional classification terms were retrieved for 479 (65%) sequences, and 339 sequences were annotated with 165 EC codes and mapped to 68 different KEGG pathways. Four hundred and fifty-two ESTs were further annotated with InterProScan (IPS), and no IPS was assigned to 153 ESTs. In addition, the expression level of 27 ESTs in response to heat stress was evaluated through semiquantitative RT-PCR assay. Approximately 20 different signaling genes and 16 different transcription factors have been shown to be associated with heat stress in moth bean for the first time.
Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.
Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M
1991-02-15
The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.
Identification and characterization of microRNAs in white and brown alpaca skin
2012-01-01
Background MicroRNAs (miRNAs) are small, non-coding 21–25 nt RNA molecules that play an important role in regulating gene expression. Little is known about the expression profiles and functions of miRNAs in skin and their role in pigmentation. Alpacas have more than 22 natural coat colors, more than any other fiber producing species. To better understand the role of miRNAs in control of coat color we performed a comprehensive analysis of miRNA expression profiles in skin of white versus brown alpacas. Results Two small RNA libraries from white alpaca (WA) and brown alpaca (BA) skin were sequenced with the aid of Illumina sequencing technology. 272 and 267 conserved miRNAs were obtained from the WA and BA skin libraries, respectively. Of these conserved miRNAs, 35 and 13 were more abundant in WA and BA skin, respectively. The targets of these miRNAs were predicted and grouped based on Gene Ontology and KEGG pathway analysis. Many predicted target genes for these miRNAs are involved in the melanogenesis pathway controlling pigmentation. In addition to the conserved miRNAs, we also obtained 22 potentially novel miRNAs from the WA and BA skin libraries. Conclusion This study represents the first comprehensive survey of miRNAs expressed in skin of animals of different coat colors by deep sequencing analysis. We discovered a collection of miRNAs that are differentially expressed in WA and BA skin. The results suggest important potential functions of miRNAs in coat color regulation. PMID:23067000
Parton, Angela; Bayne, Christopher J.; Barnes, David W.
2010-01-01
Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories “envelope” and “oxidoreductase activity” but the SAE transcripts did not. GO analysis of SAE transcripts identified the category “anatomical structure formation” that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. PMID:20471924
Parton, Angela; Bayne, Christopher J; Barnes, David W
2010-09-01
Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. Copyright 2010 Elsevier Inc. All rights reserved.
Xu, Chao; Dong, Wenpan; Shi, Shuo; Cheng, Tao; Li, Changhao; Liu, Yanlei; Wu, Ping; Wu, Hongkun; Gao, Peng; Zhou, Shiliang
2015-11-01
A well-covered reference library is crucial for successful identification of species by DNA barcoding. The biggest difficulty in building such a reference library is the lack of materials of organisms. Herbarium collections are potentially an enormous resource of materials. In this study, we demonstrate that it is likely to build such reference libraries using the reconstructed (self-primed PCR amplified) DNA from the herbarium specimens. We used 179 rosaceous specimens to test the effects of DNA reconstruction, 420 randomly sampled specimens to estimate the usable percentage and another 223 specimens of true cherries (Cerasus, Rosaceae) to test the coverage of usable specimens to the species. The barcode rbcLb (the central four-sevenths of rbcL gene) and matK was each amplified in two halves and sequenced on Roche GS 454 FLX+. DNA from the herbarium specimens was typically shorter than 300 bp. DNA reconstruction enabled amplification fragments of 400-500 bp without bringing or inducing any sequence errors. About one-third of specimens in the national herbarium of China (PE) were proven usable after DNA reconstruction. The specimens in PE cover all Chinese true cherry species and 91.5% of vascular species listed in Flora of China. It is very possible to build well-covered reference libraries for DNA barcoding of vascular species in China. As exemplified in this study, DNA reconstruction and DNA-labelled next-generation sequencing can accelerate the construction of local reference libraries. By putting the local reference libraries together, a global library for DNA barcoding becomes closer to reality. © 2015 John Wiley & Sons Ltd.
Chromosomal arrangement of leghemoglobin genes in soybean.
Lee, J S; Brown, G G; Verma, D P
1983-01-01
A cluster of four different leghemoglobin (Lb) genes was isolated from AluI-HaeIII and EcoRI genomic libraries of soybean in a set of overlapping clones which together include 45 kilobases (kb) of contiguous DNA. These four genes, including a pseudogene, are present in the same orientation and are arranged in the order: 5'-Lba-Lbc1-Lb psi-Lbc3-3'. The intergenic regions average 2.5 kb. In addition to this main Lb locus, there are other Lb genes which do not appear to be contiguous to this locus. A sequence probably common to the 3' region of Lb loci was found flanking the Lbc3 gene. The 3' flanking region of the main Lb locus also contains a sequence that appears to be expressed more abundantly in root tissue. Another sequence which is primarily expressed in root and leaf is found 5' to two Lb loci. Overall, the main leghemoglobin locus is similar in structure to the mammalian globin gene loci. Images PMID:6310504
Fujimi, T J; Nakajyo, T; Nishimura, E; Ogura, E; Tsuchiya, T; Tamiya, T
2003-08-14
The genes encoding erabutoxin (short chain neurotoxin) isoforms (Ea, Eb, and Ec), LsIII (long chain neurotoxin) and a novel long chain neurotoxin pseudogene were cloned from a Laticauda semifasciata genomic library. Short and long chain neurotoxin genes were also cloned from the genome of Laticauda laticaudata, a closely related species of L. semifasciata, by PCR. A putative matrix attached region (MAR) sequence was found in the intron I of the LsIII gene. Comparative analysis of 11 structurally relevant snake toxin genes (three-finger-structure toxins) revealed the molecular evolution of these toxins. Three-finger-structure toxin genes diverged from a common ancestor through two types of evolutionary pathways (long and short types), early in the course of evolution. At a later stage of evolution in each gene, the accumulation of mutations in the exons, especially exon II, by accelerated evolution may have caused the increased diversification in their functions. It was also revealed that the putative MAR sequence found in the LsIII gene was integrated into the gene after the species-level divergence.
The carnegie protein trap library: a versatile tool for Drosophila developmental studies.
Buszczak, Michael; Paterno, Shelley; Lighthouse, Daniel; Bachman, Julia; Planck, Jamie; Owen, Stephenie; Skora, Andrew D; Nystul, Todd G; Ohlstein, Benjamin; Allen, Anna; Wilhelm, James E; Murphy, Terence D; Levis, Robert W; Matunis, Erika; Srivali, Nahathai; Hoskins, Roger A; Spradling, Allan C
2007-03-01
Metazoan physiology depends on intricate patterns of gene expression that remain poorly known. Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. By sequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced green fluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, we found that 600-900 different genes are trapped in our collection. A core set of 244 lines trapped different identifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256 additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegie collection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggest new strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions.
Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing.
Vega-Arreguín, Julio C; Ibarra-Laclette, Enrique; Jiménez-Moraila, Beatriz; Martínez, Octavio; Vielle-Calzada, Jean Philippe; Herrera-Estrella, Luis; Herrera-Estrella, Alfredo
2009-07-06
In-depth sequencing analysis has not been able to determine the overall complexity of transcriptional activity of a plant organ or tissue sample. In some cases, deep parallel sequencing of Expressed Sequence Tags (ESTs), although not yet optimized for the sequencing of cDNAs, has represented an efficient procedure for validating gene prediction and estimating overall gene coverage. This approach could be very valuable for complex plant genomes. In addition, little emphasis has been given to efforts aiming at an estimation of the overall transcriptional universe found in a multicellular organism at a specific developmental stage. To explore, in depth, the transcriptional diversity in an ancient maize landrace, we developed a protocol to optimize the sequencing of cDNAs and performed 4 consecutive GS20-454 pyrosequencing runs of a cDNA library obtained from 2 week-old Palomero Toluqueño maize plants. The protocol reported here allowed obtaining over 90% of informative sequences. These GS20-454 runs generated over 1.5 Million reads, representing the largest amount of sequences reported from a single plant cDNA library. A collection of 367,391 quality-filtered reads (30.09 Mb) from a single run was sufficient to identify transcripts corresponding to 34% of public maize ESTs databases; total sequences generated after 4 filtered runs increased this coverage to 50%. Comparisons of all 1.5 Million reads to the Maize Assembled Genomic Islands (MAGIs) provided evidence for the transcriptional activity of 11% of MAGIs. We estimate that 5.67% (86,069 sequences) do not align with public ESTs or annotated genes, potentially representing new maize transcripts. Following the assembly of 74.4% of the reads in 65,493 contigs, real-time PCR of selected genes confirmed a predicted correlation between the abundance of GS20-454 sequences and corresponding levels of gene expression. A protocol was developed that significantly increases the number, length and quality of cDNA reads using massive 454 parallel sequencing. We show that recurrent 454 pyrosequencing of a single cDNA sample is necessary to attain a thorough representation of the transcriptional universe present in maize, that can also be used to estimate transcript abundance of specific genes. This data suggests that the molecular and functional diversity contained in the vast native landraces remains to be explored, and that large-scale transcriptional sequencing of a presumed ancestor of the modern maize varieties represents a valuable approach to characterize the functional diversity of maize for future agricultural and evolutionary studies.
Discovering genes associated with dormancy in the monogonont rotifer Brachionus plicatilis
Denekamp, Nadav Y; Thorne, Michael AS; Clark, Melody S; Kube, Michael; Reinhardt, Richard; Lubzens, Esther
2009-01-01
Background Microscopic monogonont rotifers, including the euryhaline species Brachionus plicatilis, are typically found in water bodies where environmental factors restrict population growth to short periods lasting days or months. The survival of the population is ensured via the production of resting eggs that show a remarkable tolerance to unfavorable conditions and remain viable for decades. The aim of this study was to generate Expressed Sequence Tags (ESTs) for molecular characterisation of processes associated with the formation of resting eggs, their survival during dormancy and hatching. Results Four normalized and four subtractive libraries were constructed to provide a resource for rotifer transcriptomics associated with resting-egg formation, storage and hatching. A total of 47,926 sequences were assembled into 18,000 putative transcripts and analyzed using both Blast and GO annotation. About 28–55% (depending on the library) of the clones produced significant matches against the Swissprot and Trembl databases. Genes known to be associated with desiccation tolerance during dormancy in other organisms were identified in the EST libraries. These included genes associated with antioxidant activity, low molecular weight heat shock proteins and Late Embryonic Abundant (LEA) proteins. Real-time PCR confirmed that LEA transcripts, small heat-shock proteins and some antioxidant genes were upregulated in resting eggs, therefore suggesting that desiccation tolerance is a characteristic feature of resting eggs even though they do not necessarily fully desiccate during dormancy. The role of trehalose in resting-egg formation and survival remains unclear since there was no significant difference between resting-egg producing females and amictic females in the expression of the tps-1 gene. In view of the absence of vitellogenin transcripts, matches to lipoprotein lipase proteins suggest that, similar to the situation in dipterans, these proteins may serve as the yolk proteins in rotifers. Conclusion The 47,926 ESTs expand significantly the current sequence resource of B. plicatilis. It describes, for the first time, genes putatively associated with resting eggs and will serve as a database for future global expression experiments, particularly for the further identification of dormancy related genes. PMID:19284654
Discovering genes associated with dormancy in the monogonont rotifer Brachionus plicatilis.
Denekamp, Nadav Y; Thorne, Michael A S; Clark, Melody S; Kube, Michael; Reinhardt, Richard; Lubzens, Esther
2009-03-13
Microscopic monogonont rotifers, including the euryhaline species Brachionus plicatilis, are typically found in water bodies where environmental factors restrict population growth to short periods lasting days or months. The survival of the population is ensured via the production of resting eggs that show a remarkable tolerance to unfavorable conditions and remain viable for decades. The aim of this study was to generate Expressed Sequence Tags (ESTs) for molecular characterisation of processes associated with the formation of resting eggs, their survival during dormancy and hatching. Four normalized and four subtractive libraries were constructed to provide a resource for rotifer transcriptomics associated with resting-egg formation, storage and hatching. A total of 47,926 sequences were assembled into 18,000 putative transcripts and analyzed using both Blast and GO annotation. About 28-55% (depending on the library) of the clones produced significant matches against the Swissprot and Trembl databases. Genes known to be associated with desiccation tolerance during dormancy in other organisms were identified in the EST libraries. These included genes associated with antioxidant activity, low molecular weight heat shock proteins and Late Embryonic Abundant (LEA) proteins. Real-time PCR confirmed that LEA transcripts, small heat-shock proteins and some antioxidant genes were upregulated in resting eggs, therefore suggesting that desiccation tolerance is a characteristic feature of resting eggs even though they do not necessarily fully desiccate during dormancy. The role of trehalose in resting-egg formation and survival remains unclear since there was no significant difference between resting-egg producing females and amictic females in the expression of the tps-1 gene. In view of the absence of vitellogenin transcripts, matches to lipoprotein lipase proteins suggest that, similar to the situation in dipterans, these proteins may serve as the yolk proteins in rotifers. The 47,926 ESTs expand significantly the current sequence resource of B. plicatilis. It describes, for the first time, genes putatively associated with resting eggs and will serve as a database for future global expression experiments, particularly for the further identification of dormancy related genes.
Cornette, Richard; Kanamori, Yasushi; Watanabe, Masahiko; Nakahara, Yuichi; Gusev, Oleg; Mitsumasu, Kanako; Kadono-Okuda, Keiko; Shimomura, Michihiko; Mita, Kazuei; Kikawada, Takahiro; Okuda, Takashi
2010-01-01
Some organisms are able to survive the loss of almost all their body water content, entering a latent state known as anhydrobiosis. The sleeping chironomid (Polypedilum vanderplanki) lives in the semi-arid regions of Africa, and its larvae can survive desiccation in an anhydrobiotic form during the dry season. To unveil the molecular mechanisms of this resistance to desiccation, an anhydrobiosis-related Expressed Sequence Tag (EST) database was obtained from the sequences of three cDNA libraries constructed from P. vanderplanki larvae after 0, 12, and 36 h of desiccation. The database contained 15,056 ESTs distributed into 4,807 UniGene clusters. ESTs were classified according to gene ontology categories, and putative expression patterns were deduced for all clusters on the basis of the number of clones in each library; expression patterns were confirmed by real-time PCR for selected genes. Among up-regulated genes, antioxidants, late embryogenesis abundant (LEA) proteins, and heat shock proteins (Hsps) were identified as important groups for anhydrobiosis. Genes related to trehalose metabolism and various transporters were also strongly induced by desiccation. Those results suggest that the oxidative stress response plays a central role in successful anhydrobiosis. Similarly, protein denaturation and aggregation may be prevented by marked up-regulation of Hsps and the anhydrobiosis-specific LEA proteins. A third major feature is the predicted increase in trehalose synthesis and in the expression of various transporter proteins allowing the distribution of trehalose and other solutes to all tissues. PMID:20833722
Leal, Gildemberg Amorim; Albuquerque, Paulo S B; Figueira, Antonio
2007-05-01
SUMMARY The basidiomycete Crinipellis perniciosa is the causal agent of witches' broom disease of Theobroma cacao (cocoa). Hypertrophic growth of infected buds ('brooms') is the most dramatic symptom, but the main economic losses derive from pod infection. To identify cocoa genes differentially expressed during the early stages of infection, two cDNA libraries were constructed using the suppression subtractive hybridization (SSH) approach. Subtraction hybridization was conducted between cDNAs from infected shoot-tips of the susceptible genotype 'ICS 39' and the resistant 'CAB 214', in both directions. A total of 187 unique sequences were obtained, with 83 from the library enriched for the susceptible 'ICS 39' sequences, and 104 for the resistant 'CAB 214'. By homology search and ontology analyses, the identified sequences were mainly putatively categorized as belonging to 'signal transduction', 'response to biotic and abiotic stress', 'metabolism', 'RNA and DNA metabolism', 'protein metabolism' and 'cellular maintenance' classes. Quantitative reverse transcription amplification (RT-qPCR) of 23 transcripts identified as differentially expressed between genotypes revealed distinct kinetics of gene up-regulation at the asymptomatic stage of the disease. Expression induction in the susceptible 'ICS 39' in response to C. perniciosa was delayed and limited, while in 'CAB 214' there was a quicker and more intense reaction, with two peaks of gene induction at 48 and 120 h after inoculation, corresponding to morphological and biochemical changes previously described during colonization. Similar differences in gene induction were validated for another resistant genotype ('CAB 208') in an independent experiment. Validation of these genes corroborated similar hypothetical mechanisms of resistance described in other pathosystems.
Peters, Linda M.; Belyantseva, Inna A.; Lagziel, Ayala; Battey, James F.; Friedman, Thomas B.; Morell, Robert J.
2007-01-01
Specialization in cell function and morphology is influenced by the differential expression of mRNAs, many of which are expressed at low abundance and restricted to certain cell types. Detecting such transcripts in cDNA libraries may require sequencing millions of clones. Massively parallel signature sequencing (MPSS) is well-suited for identifying transcripts that are expressed in discrete cell types and in low abundance. We have made MPSS libraries from microdissections of three inner ear tissues. By comparing these MPSS libraries to those of 87 other tissues included in the Mouse Reference Transcriptome (MRT) online resource, we have identified genes that are highly enriched in, or specific to, the inner ear. We show by RT-PCR and in situ hybridization that signatures unique to the inner ear libraries identify transcripts with highly specific cell-type localizations. These transcripts serve to illustrate the utility of a resource that is available to the research community. Utilization of these resources will increase the number of known transcription units and expand our knowledge of the tissue-specific regulation of the transcriptome. PMID:17049805
Nakamura, S; Asakawa, S; Ohmido, N; Fukui, K; Shimizu, N; Kawasaki, S
1997-05-01
We constructed a rice Bacterial Artificial Chromosome (BAC) library from green leaf protoplasts of the cultivar Shimokita harboring the rice blast resistance gene Pi-ta. The average insert size of 155 kb and the library size of seven genome equivalents make it one of the most comprehensive BAC libraries available, and larger than many plant YAC libraries. The library clones were plated on seven high density membranes of microplate size, enabling efficient colony identification in colony hybridization experiments. Seven percent of clones carried chloroplast DNA. By probing with markers close to the blast resistance genes Pi-ta2(closely linked to Pi-ta) and Pi-b, respectively located in the centromeric region of chromosome 12 and near the telomeric end of chromosome 2, on average 2.2 +/- 1.3 and 8.0 +/- 2.6 BAC clones/marker were isolated. Differences in chromosomal structures may contribute to this wide variation in yield. A contig of about 800 kb, consisting of 19 clones, was constructed in the Pi-ta2 region. This region had a high frequency of repetitive sequences. To circumvent this difficulty, we devised a "two-step walking" method. The contig spanned a 300 kb region between markers located at 0 cM and 0.3 cM from Pi-ta. The ratio of physical to genetic distances (> 1,000 kb/cM) was more than three times larger than the average of rice (300 kb/cM). The low recombination rate and high frequency of repetitive sequences may also be related to the near centromeric character of this region. Fluorescent in situ hybridization (FISH) with a BAC clone from the Pi-b region yielded very clear signals on the long arm of chromosome 2, while a clone from the Pi-ta2 region showed various cross-hybridizing signals near the centromeric regions of all chromosomes.
2002-05-01
homozygous for the pcna and p21 mutant genes will be accomplised with the help of Gene Targeting and Transgenic Facility at the Rosewel Park Cancer Institute...screening of BAC library was performed with the help of the DNA Microarray Facility Facility at the Rosewel Park Cancer Institute. Sequence of mouse
Two novel gull-specific qPCR assays were developed using 16S rRNA gene sequences from gull fecal clone libraries: a SYBR-green-based assay targeting Streptococcus spp. (i.e., gull3) and a TaqMan qPCR assay targeting Catellicoccus marimammalium (i.e., gull4). The main objectives ...
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Song, Jinlong; Shi, Yanhua; Li, Kang; Zhao, Bin; Yan, Yanchun
2013-01-01
A novel pyrethroid-degrading esterase gene pytY was isolated from the genomic library of Ochrobactrum anthropi YZ-1. It possesses an open reading frame (ORF) of 897 bp. Blast search showed that its deduced amino acid sequence shares moderate identities (30% to 46%) with most homologous esterases. Phylogenetic analysis revealed that PytY is a member of the esterase VI family. pytY showed very low sequence similarity compared with reported pyrethroid-degrading genes. PytY was expressed, purified, and characterized. Enzyme assay revealed that PytY is a broad-spectrum degrading enzyme that can degrade various pyrethroids. It is a new pyrethroid-degrading gene and enriches genetic resource. Kinetic constants of Km and Vmax were 2.34 mmol·L−1 and 56.33 nmol min−1, respectively, with lambda-cyhalothrin as substrate. PytY displayed good degrading ability and stability over a broad range of temperature and pH. The optimal temperature and pH were of 35°C and 7.5. No cofactors were required for enzyme activity. The results highlighted the potential use of PytY in the elimination of pyrethroid residuals from contaminated environments. PMID:24155944
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Kim, Minseok; Morrison, Mark; Yu, Zhongtang
2011-09-01
Phylogenetic analysis was conducted to examine ruminal bacteria in two ruminal fractions (adherent fraction vs. liquid fraction) collected from cattle fed with two different diets: forage alone vs. forage plus concentrate. One hundred forty-four 16S rRNA gene (rrs) sequences were obtained from clone libraries constructed from the four samples. These rrs sequences were assigned to 116 different operational taxonomic units (OTUs) defined at 0.03 phylogenetic distance. Most of these OTUs could not be assigned to any known genus. The phylum Firmicutes was represented by approximately 70% of all the sequences. By comparing to the OTUs already documented in the rumen, 52 new OTUs were identified. UniFrac, SONS, and denaturing gradient gel electrophoresis analyses revealed difference in diversity between the two fractions and between the two diets. This study showed that rrs sequences recovered from small clone libraries can still help identify novel species-level OTUs.
Matsunaga, Hiroko; Goto, Mari; Arikawa, Koji; Shirai, Masataka; Tsunoda, Hiroyuki; Huang, Huan; Kambara, Hideki
2015-02-15
Analyses of gene expressions in single cells are important for understanding detailed biological phenomena. Here, a highly sensitive and accurate method by sequencing (called "bead-seq") to obtain a whole gene expression profile for a single cell is proposed. A key feature of the method is to use a complementary DNA (cDNA) library on magnetic beads, which enables adding washing steps to remove residual reagents in a sample preparation process. By adding the washing steps, the next steps can be carried out under the optimal conditions without losing cDNAs. Error sources were carefully evaluated to conclude that the first several steps were the key steps. It is demonstrated that bead-seq is superior to the conventional methods for single-cell gene expression analyses in terms of reproducibility, quantitative accuracy, and biases caused during sample preparation and sequencing processes. Copyright © 2014 Elsevier Inc. All rights reserved.
Hamaguchi-Hamada, Kayoko; Kurumata-Shigeto, Mami; Minobe, Sumiko; Fukuoka, Nozomi; Sato, Manami; Matsufuji, Miyuki; Koizumi, Osamu; Hamada, Shun
2016-01-01
The head region of Hydra, the hypostome, is a key body part for developmental control and the nervous system. We herein examined genes specifically expressed in the head region of Hydra oligactis using suppression subtractive hybridization (SSH) cloning. A total of 1414 subtracted clones were sequenced and found to be derived from at least 540 different genes by BLASTN analyses. Approximately 25% of the subtracted clones had sequences encoding thrombospondin type-1 repeat (TSR) domains, and were derived from 17 genes. We identified 11 TSR domain-containing genes among the top 36 genes that were the most frequently detected in our SSH library. Whole-mount in situ hybridization analyses confirmed that at least 13 out of 17 TSR domain-containing genes were expressed in the hypostome of Hydra oligactis. The prominent expression of TSR domain-containing genes suggests that these genes play significant roles in the hypostome of Hydra oligactis.
Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong
2016-11-21
Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum. These resources as a robust platform will be used in high-resolution mapping, gene cloning, assembly of genome sequences, comparative genomics and evolution for sweetpotato.
Li, Ruixue; Chen, Dandan; Wang, Taichu; Wan, Yizhen; Li, Rongfang; Fang, Rongjun; Wang, Yuting; Hu, Fei; Zhou, Hong; Li, Long; Zhao, Weiguo
2017-01-01
MicroRNAs (miRNAs) play important regulatory roles by targeting mRNAs for cleavage or translational repression. Identification of miRNA targets is essential to better understanding the roles of miRNAs. miRNA targets have not been well characterized in mulberry (Morus alba). To anatomize miRNA guided gene regulation under drought stress, transcriptome-wide high throughput degradome sequencing was used in this study to directly detect drought stress responsive miRNA targets in mulberry. A drought library (DL) and a contrast library (CL) were constructed to capture the cleaved mRNAs for sequencing. In CL, 409 target genes of 30 conserved miRNA families and 990 target genes of 199 novel miRNAs were identified. In DL, 373 target genes of 30 conserved miRNA families and 950 target genes of 195 novel miRNAs were identified. Of the conserved miRNA families in DL, mno-miR156, mno-miR172, and mno-miR396 had the highest number of targets with 54, 52 and 41 transcripts, respectively, indicating that these three miRNA families and their target genes might play important functions in response to drought stress in mulberry. Additionally, we found that many of the target genes were transcription factors. By analyzing the miRNA-target molecular network, we found that the DL independent networks consisted of 838 miRNA-mRNA pairs (63.34%). The expression patterns of 11 target genes and 12 correspondent miRNAs were detected using qRT-PCR. Six miRNA targets were further verified by RNA ligase-mediated 5' rapid amplification of cDNA ends (RLM-5' RACE). Gene Ontology (GO) annotations and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis revealed that these target transcripts were implicated in a broad range of biological processes and various metabolic pathways. This is the first study to comprehensively characterize target genes and their associated miRNAs in response to drought stress by degradome sequencing in mulberry. This study provides a framework for understanding the molecular mechanisms of drought resistance in mulberry.
Entcheva, P; Liebl, W; Johann, A; Hartsch, T; Streit, W R
2001-01-01
Enrichment cultures of microbial consortia enable the diverse metabolic and catabolic activities of these populations to be studied on a molecular level and to be explored as potential sources for biotechnology processes. We have used a combined approach of enrichment culture and direct cloning to construct cosmid libraries with large (>30-kb) inserts from microbial consortia. Enrichment cultures were inoculated with samples from five environments, and high amounts of avidin were added to the cultures to favor growth of biotin-producing microbes. DNA was extracted from three of these enrichment cultures and used to construct cosmid libraries; each library consisted of between 6,000 and 35,000 clones, with an average insert size of 30 to 40 kb. The inserts contained a diverse population of genomic DNA fragments isolated from the consortia organisms. These three libraries were used to complement the Escherichia coli biotin auxotrophic strain ATCC 33767 Delta(bio-uvrB). Initial screens resulted in the isolation of seven different complementing cosmid clones, carrying biotin biosynthesis operons. Biotin biosynthesis capabilities and growth under defined conditions of four of these clones were studied. Biotin measured in the different culture supernatants ranged from 42 to 3,800 pg/ml/optical density unit. Sequencing the identified biotin synthesis genes revealed high similarities to bio operons from gram-negative bacteria. In addition, random sequencing identified other interesting open reading frames, as well as two operons, the histidine utilization operon (hut), and the cluster of genes involved in biosynthesis of molybdopterin cofactors in bacteria (moaABCDE).
2012-01-01
Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burger, Brian T.; Imam, Saheed; Scarborough, Matthew J.
Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosyntheticmore » growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species.« less
de Groot, Reinoud; Lüthi, Joel; Lindsay, Helen; Holtackers, René; Pelkmans, Lucas
2018-01-23
High-content imaging using automated microscopy and computer vision allows multivariate profiling of single-cell phenotypes. Here, we present methods for the application of the CISPR-Cas9 system in large-scale, image-based, gene perturbation experiments. We show that CRISPR-Cas9-mediated gene perturbation can be achieved in human tissue culture cells in a timeframe that is compatible with image-based phenotyping. We developed a pipeline to construct a large-scale arrayed library of 2,281 sequence-verified CRISPR-Cas9 targeting plasmids and profiled this library for genes affecting cellular morphology and the subcellular localization of components of the nuclear pore complex (NPC). We conceived a machine-learning method that harnesses genetic heterogeneity to score gene perturbations and identify phenotypically perturbed cells for in-depth characterization of gene perturbation effects. This approach enables genome-scale image-based multivariate gene perturbation profiling using CRISPR-Cas9. © 2018 The Authors. Published under the terms of the CC BY 4.0 license.
Bäumlein, H; Wobus, U; Pustell, J; Kafatos, F C
1986-01-01
The field bean, Vicia faba L. var. minor, possesses two sub-families of 11 S legumin genes named A and B. We isolated from a genomic library a B-type gene (LeB4) and determined its primary DNA sequence. Gene LeB4 codes for a 484 amino acid residue prepropolypeptide, encompassing a signal peptide of 22 amino acid residues, an acidic, very hydrophilic alpha-chain of 281 residues and a basic, somewhat hydrophobic beta-chain of 181 residues. The latter two coding regions are immediately contiguous, but each is interrupted by a short intron. Type A legumin genes from soybean and pea are known to have introns in the same two positions, in addition to an extra intron (within the alpha-coding sequence). Sequence comparisons of legumin genes from these three plants revealed a highly conserved sequence element of at least 28 bp, centered at approximately 100 bp upstream of each cap site. The element is absent from the equivalent position of all non-legumin and other plant and fungal genes examined. We tentatively name this element "legumin box" and suggest that it may have a function in the regulation of legumin gene expression. PMID:3960730
López-López, Olalla; Knapik, Kamila; Cerdán, Maria-Esperanza; González-Siso, María-Isabel
2015-01-01
A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH = 8.2) located in Ourense (Spain). Metagenomic sequencing of the fosmid library allowed the assembly of 9722 contigs ranging in size from 500 to 56,677 bp and spanning ~18 Mbp. 23,207 ORFs (Open Reading Frames) were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The six most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae, and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species "Candidatus Caldiarchaeum subterraneum." Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases, and lipases, was also revealed in the metagenomic library. Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 min at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5 to 8.
Deschamps, Philippe; Zivanovic, Yvan; Moreira, David; Rodriguez-Valera, Francisco; López-García, Purificación
2014-06-12
Horizontal gene transfer (HGT) is an important force in evolution, which may lead, among other things, to the adaptation to new environments by the import of new metabolic functions. Recent studies based on phylogenetic analyses of a few genome fragments containing archaeal 16S rRNA genes and fosmid-end sequences from deep-sea metagenomic libraries have suggested that marine planktonic archaea could be affected by high HGT frequency. Likewise, a composite genome of an uncultured marine euryarchaeote showed high levels of gene sequence similarity to bacterial genes. In this work, we ask whether HGT is frequent and widespread in genomes of these marine archaea, and whether HGT is an ancient and/or recurrent phenomenon. To answer these questions, we sequenced 997 fosmid archaeal clones from metagenomic libraries of deep-Mediterranean waters (1,000 and 3,000 m depth) and built comprehensive pangenomes for planktonic Thaumarchaeota (Group I archaea) and Euryarchaeota belonging to the uncultured Groups II and III Euryarchaeota (GII/III-Euryarchaeota). Comparison with available reference genomes of Thaumarchaeota and a composite marine surface euryarchaeote genome allowed us to define sets of core, lineage-specific core, and shell gene ortholog clusters for the two archaeal lineages. Molecular phylogenetic analyses of all gene clusters showed that 23.9% of marine Thaumarchaeota genes and 29.7% of GII/III-Euryarchaeota genes had been horizontally acquired from bacteria. HGT is not only extensive and directional but also ongoing, with high HGT levels in lineage-specific core (ancient transfers) and shell (recent transfers) genes. Many of the acquired genes are related to metabolism and membrane biogenesis, suggesting an adaptive value for life in cold, oligotrophic oceans. We hypothesize that the acquisition of an important amount of foreign genes by the ancestors of these archaeal groups significantly contributed to their divergence and ecological success. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Horlbeck, Max A; Gilbert, Luke A; Villalta, Jacqueline E; Adamson, Britt; Pak, Ryan A; Chen, Yuwen; Fields, Alexander P; Park, Chong Yon; Corn, Jacob E; Kampmann, Martin; Weissman, Jonathan S
2016-01-01
We recently found that nucleosomes directly block access of CRISPR/Cas9 to DNA (Horlbeck et al., 2016). Here, we build on this observation with a comprehensive algorithm that incorporates chromatin, position, and sequence features to accurately predict highly effective single guide RNAs (sgRNAs) for targeting nuclease-dead Cas9-mediated transcriptional repression (CRISPRi) and activation (CRISPRa). We use this algorithm to design next-generation genome-scale CRISPRi and CRISPRa libraries targeting human and mouse genomes. A CRISPRi screen for essential genes in K562 cells demonstrates that the large majority of sgRNAs are highly active. We also find CRISPRi does not exhibit any detectable non-specific toxicity recently observed with CRISPR nuclease approaches. Precision-recall analysis shows that we detect over 90% of essential genes with minimal false positives using a compact 5 sgRNA/gene library. Our results establish CRISPRi and CRISPRa as premier tools for loss- or gain-of-function studies and provide a general strategy for identifying Cas9 target sites. DOI: http://dx.doi.org/10.7554/eLife.19760.001 PMID:27661255
Tsurushita, N; Fu, H; Warren, C
1996-06-12
New phage display vectors for in vivo recombination of immunoglobulin (Ig) heavy (VH) and light (VL) chain variable genes, to make single-chain Fv fragments (scFv), were constructed. The VH and VL genes of monoclonal antibody (mAb) EP-5C7, which binds to both human E- and P-selectin, were cloned into a pUC19-derived plasmid vector, pCW93, and a pACYC184-derived phagemid vector, pCW99, respectively. Upon induction of Cre recombinase (phage P1 recombinase), the VH and VL genes were efficiently recombined into the same plasmid via the two loxP sites (phage P1 recombination sites), one located downstream from a VH gene in pCW93 and another upstream from a VL gene in pCW99. In the resulting phagemid, the loxP sequence also encodes a polypeptide linker connecting the VH and VL domains to form a scFv of EP-5C7. Whether expressed on the phage surface or as a soluble form, the EP-5C7 scFv showed specific binding to human E- and P-selectin. This phagemid vector system provides a way to recombine VH and VL gene libraries efficiently in vivo to make extremely large Ig combinatorial libraries.
[Isolation and function of genes regulating aphB expression in Vibrio cholerae].
Chen, Haili; Zhu, Zhaoqin; Zhong, Zengtao; Zhu, Jun; Kan, Biao
2012-02-04
We identified genes that regulate the expression of aphB, the gene encoding a key virulence regulator in Vibrio cholerae O1 E1 Tor C6706(-). We constructed a transposon library in V. cholerae C6706 strain containing a P(aphB)-luxCDABE and P(aphB)-lacZ transcriptional reporter plasmids. Using a chemiluminescence imager system, we rapidly detected aphB promoter expression level at a large scale. We then sequenced the transposon insertion sites by arbitrary PCR and sequencing analysis. We obtained two candidate mutants T1 and T2 which displayed reduced aphB expression from approximately 40,000 transposon insertion mutants. Sequencing analysis shows that Tn inserted in vc1585 reading frame in the T1 mutant and Tn inserted in the end of coding sequence of vc1602 in the T2 mutant. By using a genetic screen, we identified two potential genes that may involve in regulation of the expression of the key virulence regulator AphB. This study sheds light on our further investigation to fully understand V. cholerae virulence gene regulatory cascades.
Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou
2013-08-01
Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Baillet, Adrienne; Mandon-Pépin, Béatrice; Cabau, Cédric; Poumerol, Elodie; Pailhoux, Eric; Cotinot, Corinne
2008-09-23
The key steps in germ cell survival during ovarian development are the entry into meiosis of oogonies and the formation of primordial follicles, which then determine the reproductive lifespan of the ovary. In sheep, these steps occur during fetal life, between 55 and 80 days of gestation, respectively. The aim of this study was to identify differentially expressed ovarian genes during prophase I meiosis and early folliculogenesis in sheep. In order to elucidate the molecular events associated with early ovarian differentiation, we generated two ovary stage-specific subtracted cDNA libraries using SSH. Large-scale sequencing of these SSH libraries identified 6,080 ESTs representing 2,535 contigs. Clustering and assembly of these ESTs resulted in a total of 2,101 unique sequences depicted in 1,305 singleton (62.11%) and 796 contigs (37.9%) ESTs (clusters). BLASTX evaluation indicated that 99% of the ESTs were homologous to various known genes/proteins in a broad range of organisms, especially ovine, bovine and human species. The remaining 1% which exhibited any homology to known gene sequences was considered as novel. Detailed study of the expression patterns of some of these genes using RT-PCR revealed new promising candidates for ovary differentiation genes in sheep. We showed that the SSH approach was relevant to determining new mammalian genes which might be involved in oogenesis and early follicle development, and enabled the discovery of new potential oocyte and granulosa cell markers for future studies. These genes may have significant implications regarding our understanding of ovarian function in molecular terms, and for the development of innovative strategies to both promote and control fertility.
Wolff, G; Kück, U
1990-04-01
The gene for the mitochondrial small subunit rRNA (SSUrRNA) from the heterotrophic alga Prototheca wickerhamii has been isolated from a gene library of extranuclear DNA. Sequence and structural analyses allow the determination of a secondary structure model for this rRNA. In addition, several sequence motifs are present which are typically found in SSUrRNAs of various mitochondrial origins. Unexpectedly, the Prototheca RNA sequence has more features in common with mitochondrial SSUrRNAs from plants than with that from the green alga Chlamydomonas reinhardtii. The phylogenetic relationship between mitochondria from plants and algae is discussed.
Chen, H T; Alexander, C B; Mage, R G
1995-06-15
Normal rabbits preferentially rearrange the 3'-most VH gene, VH1, to encode Igs with VHa allotypes, which constitute the majority of rabbit serum Igs. A gene conversion-like mechanism is employed to diversify the primary Ab repertoire. In mutant Alicia rabbits that derived from a rabbit with VHa2 allotype, the VH1 gene was deleted. Our previous studies showed that the first functional gene (VH4) or VH4-like genes were rearranged in 2- to 8-wk-old homozygous Alicia. The VH1a2-like sequences that were found in splenic mRNA from 6-wk and older Alicia rabbits still had some residues that were typical of VH4. The appearances of sequences resembling that of VH1a2 may have been caused by gene conversions that altered the sequences of the rearranged VH or there may have been rearrangement of upstream VH1a2-like genes later in development. To investigate this further, we constructed a cosmid library and isolated a VH1a2-like gene, VH12-1-6, with a sequence almost identical to VH1a2. This gene had a deleted base in the heptamer of its recombination signal sequence. However, even if this defect diminished or eliminated its ability to rearrange, the a2-like gene could have acted as a donor for gene-conversion-like alteration of rearranged VH genes. Sequence comparisons suggested that this gene or a gene like it could have acted as a donor for gene conversion in mutant Alicia and in normal rabbits.
Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena
2006-01-01
Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344
Pardo, Belén G; Fernández, Carlos; Millán, Adrián; Bouza, Carmen; Vázquez-López, Araceli; Vera, Manuel; Alvarez-Dios, José A; Calaza, Manuel; Gómez-Tato, Antonio; Vázquez, María; Cabaleiro, Santiago; Magariños, Beatriz; Lemos, Manuel L; Leiro, José M; Martínez, Paulino
2008-01-01
Background The turbot (Scophthalmus maximus; Scophthalmidae; Pleuronectiformes) is a flatfish species of great relevance for marine aquaculture in Europe. In contrast to other cultured flatfish, very few genomic resources are available in this species. Aeromonas salmonicida and Philasterides dicentrarchi are two pathogens that affect turbot culture causing serious economic losses to the turbot industry. Little is known about the molecular mechanisms for disease resistance and host-pathogen interactions in this species. In this work, thousands of ESTs for functional genomic studies and potential markers linked to ESTs for mapping (microsatellites and single nucleotide polymorphisms (SNPs)) are provided. This information enabled us to obtain a preliminary view of regulated genes in response to these pathogens and it constitutes the basis for subsequent and more accurate microarray analysis. Results A total of 12584 cDNAs partially sequenced from three different cDNA libraries of turbot (Scophthalmus maximus) infected with Aeromonas salmonicida, Philasterides dicentrarchi and from healthy fish were analyzed. Three immune-relevant tissues (liver, spleen and head kidney) were sampled at several time points in the infection process for library construction. The sequences were processed into 9256 high-quality sequences, which constituted the source for the turbot EST database. Clustering and assembly of these sequences, revealed 3482 different putative transcripts, 1073 contigs and 2409 singletons. BLAST searches with public databases detected significant similarity (e-value ≤ 1e-5) in 1766 (50.7%) sequences and 816 of them (23.4%) could be functionally annotated. Two hundred three of these genes (24.9%), encoding for defence/immune-related proteins, were mostly identified for the first time in turbot. Some ESTs showed significant differences in the number of transcripts when comparing the three libraries, suggesting regulation in response to these pathogens. A total of 191 microsatellites, with 104 having sufficient flanking sequences for primer design, and 1158 putative SNPs were identified from these EST resources in turbot. Conclusion A collection of 9256 high-quality ESTs was generated representing 3482 unique turbot sequences. A large proportion of defence/immune-related genes were identified, many of them regulated in response to specific pathogens. Putative microsatellites and SNPs were identified. These genome resources constitute the basis to develop a microarray for functional genomics studies and marker validation for genetic linkage and QTL analysis in turbot. PMID:18817567
Li, Xinguo; Wu, Harry X; Dillon, Shannon K; Southerton, Simon G
2009-01-01
Background Wood is a major renewable natural resource for the timber, fibre and bioenergy industry. Pinus radiata D. Don is the most important commercial plantation tree species in Australia and several other countries; however, genomic resources for this species are very limited in public databases. Our primary objective was to sequence a large number of expressed sequence tags (ESTs) from genes involved in wood formation in radiata pine. Results Six developing xylem cDNA libraries were constructed from earlywood and latewood tissues sampled at juvenile (7 yrs), transition (11 yrs) and mature (30 yrs) ages, respectively. These xylem tissues represent six typical development stages in a rotation period of radiata pine. A total of 6,389 high quality ESTs were collected from 5,952 cDNA clones. Assembly of 5,952 ESTs from 5' end sequences generated 3,304 unigenes including 952 contigs and 2,352 singletons. About 97.0% of the 5,952 ESTs and 96.1% of the unigenes have matches in the UniProt and TIGR databases. Of the 3,174 unigenes with matches, 42.9% were not assigned GO (Gene Ontology) terms and their functions are unknown or unclassified. More than half (52.1%) of the 5,952 ESTs have matches in the Pfam database and represent 772 known protein families. About 18.0% of the 5,952 ESTs matched cell wall related genes in the MAIZEWALL database, representing all 18 categories, 91 of all 174 families and possibly 557 genes. Fifteen cell wall-related genes are ranked in the 30 most abundant genes, including CesA, tubulin, AGP, SAMS, actin, laccase, CCoAMT, MetE, phytocyanin, pectate lyase, cellulase, SuSy, expansin, chitinase and UDP-glucose dehydrogenase. Based on the PlantTFDB database 41 of the 64 transcription factor families in the poplar genome were identified as being involved in radiata pine wood formation. Comparative analysis of GO term abundance revealed a distinct transcriptome in juvenile earlywood formation compared to other stages of wood development. Conclusion The first large scale genomic resource in radiata pine was generated from six developing xylem cDNA libraries. Cell wall-related genes and transcription factors were identified. Juvenile earlywood has a distinct transcriptome, which is likely to contribute to the undesirable properties of juvenile wood in radiata pine. The publicly available resource of radiata pine will also be valuable for gene function studies and comparative genomics in forest trees. PMID:19159482
Rudi, Knut; Zimonja, Monika; Kvenshagen, Bente; Rugtveit, Jarle; Midtvedt, Tore; Eggesbø, Merete
2007-01-01
We present a novel approach for comparing 16S rRNA gene clone libraries that is independent of both DNA sequence alignment and definition of bacterial phylogroups. These steps are the major bottlenecks in current microbial comparative analyses. We used direct comparisons of taxon density distributions in an absolute evolutionary coordinate space. The coordinate space was generated by using alignment-independent bilinear multivariate modeling. Statistical analyses for clone library comparisons were based on multivariate analysis of variance, partial least-squares regression, and permutations. Clone libraries from both adult and infant gastrointestinal tract microbial communities were used as biological models. We reanalyzed a library consisting of 11,831 clones covering complete colons from three healthy adults in addition to a smaller 390-clone library from infant feces. We show that it is possible to extract detailed information about microbial community structures using our alignment-independent method. Our density distribution analysis is also very efficient with respect to computer operation time, meeting the future requirements of large-scale screenings to understand the diversity and dynamics of microbial communities. PMID:17337554
2009-01-01
Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666
ESTuber db: an online database for Tuber borchii EST sequences.
Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo
2007-03-08
The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.
Wang, Zhao-Xin; Li, Shu-Ming; Heide, Lutz
2000-01-01
The biosynthetic gene cluster of the aminocoumarin antibiotic coumermycin A1 was cloned by screening of a cosmid library of Streptomyces rishiriensis DSM 40489 with heterologous probes from a dTDP-glucose 4,6-dehydratase gene, involved in deoxysugar biosynthesis, and from the aminocoumarin resistance gyrase gene gyrBr. Sequence analysis of a 30.8-kb region upstream of gyrBr revealed the presence of 28 complete open reading frames (ORFs). Fifteen of the identified ORFs showed, on average, 84% identity to corresponding ORFs in the biosynthetic gene cluster of novobiocin, another aminocoumarin antibiotic. Possible functions of 17 ORFs in the biosynthesis of coumermycin A1 could be assigned by comparison with sequences in GenBank. Experimental proof for the function of the identified gene cluster was provided by an insertional gene inactivation experiment, which resulted in an abolishment of coumermycin A1 production. PMID:11036020
2012-01-01
Background MicroRNAs (miRNAs) are a class of endogenous, small, non-coding RNAs that regulate gene expression by mediating gene silencing at transcriptional and post-transcriptional levels in high plants. However, the diversity of miRNAs and their roles in floral development in Japanese apricot (Prunus mume Sieb. et Zucc) remains largely unexplored. Imperfect flowers with pistil abortion seriously decrease production yields. To understand the role of miRNAs in pistil development, pistil development-related miRNAs were identified by Solexa sequencing in Japanese apricot. Results Solexa sequencing was used to identify and quantitatively profile small RNAs from perfect and imperfect flower buds of Japanese apricot. A total of 22,561,972 and 24,952,690 reads were sequenced from two small RNA libraries constructed from perfect and imperfect flower buds, respectively. Sixty-one known miRNAs, belonging to 24 families, were identified. Comparative profiling revealed that seven known miRNAs exhibited significant differential expression between perfect and imperfect flower buds. A total of 61 potentially novel miRNAs/new members of known miRNA families were also identified by the presence of mature miRNAs and corresponding miRNA*s in the sRNA libraries. Comparative analysis showed that six potentially novel miRNAs were differentially expressed between perfect and imperfect flower buds. Target predictions of the 13 differentially expressed miRNAs resulted in 212 target genes. Gene ontology (GO) annotation revealed that high-ranking miRNA target genes are those implicated in the developmental process, the regulation of transcription and response to stress. Conclusions This study represents the first comparative identification of miRNAomes between perfect and imperfect Japanese apricot flowers. Seven known miRNAs and six potentially novel miRNAs associated with pistil development were identified, using high-throughput sequencing of small RNAs. The findings, both computationally and experimentally, provide valuable information for further functional characterisation of miRNAs associated with pistil development in plants. PMID:22863067
Stress-Driven Selection of Novel Phenotypes
NASA Technical Reports Server (NTRS)
Fox, George E.; Stepaov, Victor G.; Liu, Yamei
2011-01-01
A process has been developed that can confer novel properties, such as metal resistance, to a host bacterium. This same process can also be used to produce RNAs and peptides that have novel properties, such as the ability to bind particular compounds. It is inherent in the method that the peptide or RNA will behave as expected in the target organism. Plasmid-born mini-gene libraries coding for either a population of combinatorial peptides or stable, artificial RNAs carrying random inserts are produced. These libraries, which have no bias towards any biological function, are used to transform the organism of interest and to serve as an initial source of genetic variation for stress-driven evolution. The transformed bacteria are propagated under selective pressure in order to obtain variants with the desired properties. The process is highly distinct from in vitro methods because the variants are selected in the context of the cell while it is experiencing stress. Hence, the selected peptide or RNA will, by definition, work as expected in the target cell as the cell adapts to its presence during the selection process. Once the novel gene, which produces the sought phenotype, is obtained, it can be transferred to the main genome to increase the genetic stability in the organism. Alternatively, the cell line can be used to produce novel RNAs or peptides with selectable properties in large quantity for separate purposes. The system allows for easy, large-scale purification of the RNAs or peptide products. The process has been reduced to practice by imposing sub-inhibitory concentrations of NiCl2 on cells of the bacterium Escherichia coli that were transformed separately with the peptide library and RNA library. The evolved resistant clones were isolated, and sequences of the selected mini-gene variants were established. Clones resistant to NiCl2 were found to carry identical plasmid variants with a functional mini-gene that specifically conferred significant nickel tolerance on the host cells. Sequencing of the selected mini-gene revealed a propensity of the encoded peptide to bind transient metal ions. Expression of the mini-gene markedly improved growth parameters of the evolved clones at sub-inhibitory concentrations of NiCl2 while being slightly detrimental in the absence of stress. Similar results have been obtained with the RNA libraries. Overall, the results demonstrate a very natural outcome of the selection experiments in which the mini-genes were expected to be either successfully integrated into bacterial genetic networks, or rejected depending upon their effect on host fitness. This described approach can be useful as a laboratory model to study the dynamics of bacterial adaptive evolution on the molecular level. It can also provide a strategy for screening expressed DNA libraries in search of novel genes with desirable properties.
Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi
2017-09-04
Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
Construction of naïve camelids VHH repertoire in phage display-based library.
Sabir, Jamal S M; Atef, Ahmed; El-Domyati, Fotouh M; Edris, Sherif; Hajrah, Nahid; Alzohairy, Ahmed M; Bahieldin, Ahmed
2014-04-01
Camelids have unique antibodies, namely HCAbs (VHH) or commercially named Nanobodies(®) (Nb) that are composed only of a heavy-chain homodimer. As libraries based on immunized camelids are time-consuming, costly and likely redundant for certain antigens, we describe the construction of a naïve camelid VHHs library from blood serum of non-immunized camelids with affinity in the subnanomolar range and suitable for standard immune applications. This approach is rapid and recovers VHH repertoire with the advantages of being more diverse, non-specific and devoid of subpopulations of specific antibodies, which allows the identification of binders for any potential antigen (or pathogen). RNAs from a number of camelids from Saudi Arabia were isolated and cDNAs of the diverse vhh gene were amplified; the resulting amplicons were cloned in the phage display pSEX81 vector. The size of the library was found to be within the required range (10(7)) suitable for subsequent applications in disease diagnosis and treatment. Two hundred clones were randomly selected and the inserted gene library was either estimated for redundancy or sequenced and aligned to the reference camelid vhh gene (acc. No. ADE99145). Results indicated complete non-specificity of this small library in which no single event of redundancy was detected. These results indicate the efficacy of following this approach in order to yield a large and diverse enough gene library to secure the presence of the required version encoding the required antibodies for any target antigen. This work is a first step towards the construction of phage display-based biosensors useful in disease (e.g., TB or tuberculosis) diagnosis and treatment. Copyright © 2014 Académie des sciences. Published by Elsevier SAS. All rights reserved.
Gambling on a shortcut to genome sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, L.
1991-06-21
Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusingmore » on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.« less
ASR5 is involved in the regulation of miRNA expression in rice.
Neto, Lauro Bücker; Arenhart, Rafael Augusto; de Oliveira, Luiz Felipe Valter; de Lima, Júlio Cesar; Bodanese-Zanettini, Maria Helena; Margis, Rogerio; Margis-Pinheiro, Márcia
2015-11-01
The work describes an ASR knockdown transcriptomic analysis by deep sequencing of rice root seedlings and the transactivation of ASR cis-acting elements in the upstream region of a MIR gene. MicroRNAs are key regulators of gene expression that guide post-transcriptional control of plant development and responses to environmental stresses. ASR (ABA, Stress and Ripening) proteins are plant-specific transcription factors with key roles in different biological processes. In rice, ASR proteins have been suggested to participate in the regulation of stress response genes. This work describes the transcriptomic analysis by deep sequencing two libraries, comparing miRNA abundance from the roots of transgenic ASR5 knockdown rice seedlings with that of the roots of wild-type non-transformed rice seedlings. Members of 59 miRNA families were detected, and 276 mature miRNAs were identified. Our analysis detected 112 miRNAs that were differentially expressed between the two libraries. A predicted inverse correlation between miR167abc and its target gene (LOC_Os07g29820) was confirmed using RT-qPCR. Protoplast transactivation assays showed that ASR5 is able to recognize binding sites upstream of the MIR167a gene and drive its expression in vivo. Together, our data establish a comparative study of miRNAome profiles and is the first study to suggest the involvement of ASR proteins in miRNA gene regulation.
Ali, Zulfiqar; Zhang, Da Yong; Xu, Zhao Long; Xu, Ling; Yi, Jin Xin; He, Xiao Lan; Huang, Yi Hong; Liu, Xiao Qing; Khan, Asif Ali; Trethowan, Richard M.; Ma, Hong Xiang
2012-01-01
Soil salinity has very adverse effects on growth and yield of crop plants. Several salt tolerant wild accessions and cultivars are reported in soybean. Functional genomes of salt tolerant Glycine soja and a salt sensitive genotype of Glycine max were investigated to understand the mechanism of salt tolerance in soybean. For this purpose, four libraries were constructed for Tag sequencing on Illumina platform. We identify around 490 salt responsive genes which included a number of transcription factors, signaling proteins, translation factors and structural genes like transporters, multidrug resistance proteins, antiporters, chaperons, aquaporins etc. The gene expression levels and ratio of up/down-regulated genes was greater in tolerant plants. Translation related genes remained stable or showed slightly higher expression in tolerant plants under salinity stress. Further analyses of sequenced data and the annotations for gene ontology and pathways indicated that soybean adapts to salt stress through ABA biosynthesis and regulation of translation and signal transduction of structural genes. Manipulation of these pathways may mitigate the effect of salt stress thus enhancing salt tolerance. PMID:23209559
Qi, Xiao-Hua; Xu, Xue-Wen; Lin, Xiao-Jian; Zhang, Wen-Jie; Chen, Xue-Hao
2012-03-01
High-throughput tag-sequencing (Tag-seq) analysis based on the Solexa Genome Analyzer platform was applied to analyze the gene expression profiling of cucumber plant at 5 time points over a 24h period of waterlogging treatment. Approximately 5.8 million total clean sequence tags per library were obtained with 143013 distinct clean tag sequences. Approximately 23.69%-29.61% of the distinct clean tags were mapped unambiguously to the unigene database, and 53.78%-60.66% of the distinct clean tags were mapped to the cucumber genome database. Analysis of the differentially expressed genes revealed that most of the genes were down-regulated in the waterlogging stages, and the differentially expressed genes mainly linked to carbon metabolism, photosynthesis, reactive oxygen species generation/scavenging, and hormone synthesis/signaling. Finally, quantitative real-time polymerase chain reaction using nine genes independently verified the tag-mapped results. This present study reveals the comprehensive mechanisms of waterlogging-responsive transcription in cucumber. Copyright © 2011 Elsevier Inc. All rights reserved.
Babu, Peram Ravindra; Rao, Khareedu Venkateswara; Reddy, Vudem Dashavantha
2013-01-15
Flax CYPome analysis resulted in the identification of 334 putative cytochrome P450 (CYP450) genes in the cultivated flax genome. Classification of flax CYP450 genes based on the sequence similarity with Arabidopsis orthologs and CYP450 nomenclature, revealed 10 clans representing 44 families and 98 subfamilies. CYP80, CYP83, CYP92, CYP702, CYP705, CYP708, CYP728, CYP729, CYP733 and CYP736 families are absent in the flax genome. The subfamily members exhibited conserved sequences, length of exons and phasing of introns. Similarity search of the genomic resources of wild flax species Linum bienne with CYP450 coding sequences of the cultivated flax, revealed the presence of 127 CYP450 gene orthologs, indicating amplification of novel CYP450 genes in the cultivated flax. Seven families CYP73, 74, 75, 76, 77, 84 and 709, coding for enzymes associated with phenylpropanoid/fatty acid metabolism, showed extensive gene amplification in the flax. About 59% of the flax CYP450 genes were present in the EST libraries. Copyright © 2012 Elsevier B.V. All rights reserved.
Assignment of the human PAX4 gene to chromosome band 7q32 by fluorescence in situ hybridization.
Tamura, T; Izumikawa, Y; Kishino, T; Soejima, H; Jinno, Y; Niikawa, N
1994-01-01
Of the nine known members of a human paired box-containing gene family (Pax), only PAX4 has not been precisely localized. We screened a cosmid library of human genomic DNA using polymerase chain reaction products for PAX4 as a probe and isolated three positive cosmid clones. Sequence analysis revealed that at least two of them had exon-like sequences and showed extensive homology to Pax-4 in the mouse. These two cosmid clones were mapped to human chromosome band 7q32 by fluorescence in situ hybridization.
Preparation of highly multiplexed small RNA sequencing libraries.
Persson, Helena; Søkilde, Rolf; Pirona, Anna Chiara; Rovira, Carlos
2017-08-01
MicroRNAs (miRNAs) are ~22-nucleotide-long small non-coding RNAs that regulate the expression of protein-coding genes by base pairing to partially complementary target sites, preferentially located in the 3´ untranslated region (UTR) of target mRNAs. The expression and function of miRNAs have been extensively studied in human disease, as well as the possibility of using these molecules as biomarkers for prognostication and treatment guidance. To identify and validate miRNAs as biomarkers, their expression must be screened in large collections of patient samples. Here, we develop a scalable protocol for the rapid and economical preparation of a large number of small RNA sequencing libraries using dual indexing for multiplexing. Combined with the use of off-the-shelf reagents, more samples can be sequenced simultaneously on large-scale sequencing platforms at a considerably lower cost per sample. Sample preparation is simplified by pooling libraries prior to gel purification, which allows for the selection of a narrow size range while minimizing sample variation. A comparison with publicly available data from benchmarking of miRNA analysis platforms showed that this method captures absolute and differential expression as effectively as commercially available alternatives.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kovacik, William P.; Scholten, Johannes C.; Culley, David E.
2010-08-01
The complexity and diversity of the microbial communities in biogranules from an upflow anaerobic sludge blanket (UASB) bioreactor were determined in response to short-term changes in substrate feeds. The reactor was fed simulated brewery wastewater (SBWW) (70% ethanol, 15% acetate, 15% propionate) for 1.5 months (phase 1), acetate / sulfate for 2 months (phase 2), acetate-alone for 3 months (phase 3), and then a return to SBWW for 2 months (phase 4). Performance of the reactor remained relatively stable throughout the experiment as shown by COD removal and gas production. 16S rDNA, methanogen-associated mcrA and sulfate reducer-associated dsrAB genes weremore » PCR amplified, then cloned and sequenced. Sequence analysis of 16S clone libraries showed a relatively simple community composed mainly of the methanogenic Archaea (Methanobacterium and Methanosaeta), members of the Green Non-Sulfur (Chloroflexi) group of Bacteria, followed by fewer numbers of Syntrophobacter, Spirochaeta, Acidobacteria and Cytophaga-related Bacterial sequences. Methanogen-related mcrA clone libraries were dominated throughout by Methanobacter and Methanospirillum related sequences. Although not numerous enough to be detected in our 16S rDNA libraries, sulfate reducers were detected in dsrAB clone libraries, with sequences related to Desulfovibrio and Desulfomonile. Community diversity levels (Shannon-Weiner index) generally decreased for all libraries in response to a change from SBWW to acetate-alone feed. But there was a large transitory increase noted in 16S diversity at the two-month sampling on acetate-alone, entirely related to an increase in Bacterial diversity. Upon return to SBWW conditions in phase 4, all diversity measures returned to near phase 1 levels.« less
Genome-scale CRISPR-Cas9 knockout screening in human cells.
Shalem, Ophir; Sanjana, Neville E; Hartenian, Ella; Shi, Xi; Scott, David A; Mikkelson, Tarjei; Heckl, Dirk; Ebert, Benjamin L; Root, David E; Doench, John G; Zhang, Feng
2014-01-03
The simplicity of programming the CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease Cas9 to modify specific genomic loci suggests a new way to interrogate gene function on a genome-wide scale. We show that lentiviral delivery of a genome-scale CRISPR-Cas9 knockout (GeCKO) library targeting 18,080 genes with 64,751 unique guide sequences enables both negative and positive selection screening in human cells. First, we used the GeCKO library to identify genes essential for cell viability in cancer and pluripotent stem cells. Next, in a melanoma model, we screened for genes whose loss is involved in resistance to vemurafenib, a therapeutic RAF inhibitor. Our highest-ranking candidates include previously validated genes NF1 and MED12, as well as novel hits NF2, CUL3, TADA2B, and TADA1. We observe a high level of consistency between independent guide RNAs targeting the same gene and a high rate of hit confirmation, demonstrating the promise of genome-scale screening with Cas9.
Xiao, Yongli; Sheng, Zong-Mei; Taubenberger, Jeffery K.
2015-01-01
The vast majority of surgical biopsy and post-mortem tissue samples are formalin-fixed and paraffin-embedded (FFPE), but this process leads to RNA degradation that limits gene expression analysis. As an example, the viral RNA genome of the 1918 pandemic influenza A virus was previously determined in a 9-year effort by overlapping RT-PCR from post-mortem samples. Using the protocols described here, the full genome of the 1918 virus at high coverage was determined in one high-throughput sequencing run of a cDNA library derived from total RNA of a 1918 FFPE sample after duplex-specific nuclease treatments. This basic methodological approach should assist in the analysis of FFPE tissue samples isolated over the past century from a variety of infectious diseases. PMID:26344216
Liu, Ying; Wang, Li-Hua; Hao, Chun-Bo; Li, Lu; Li, Si-Yuan; Feng, Chuan-Ping
2014-06-01
The main physicochemical parameters of the soil sample which was collected near an acid mine drainage reservoir in Anhui province was analyzed. The microbial diversity and community structure was studied through the construction of bacteria and archaea 16S rRNA gene clone libraries and ammonia monooxygenase gene clone library of archaea. The functional groups which were responsible for the process of ammonia oxidation were also discussed. The results indicated that the soil sample had extreme low pH value (pH < 3) and high ions concentration, which was influenced by the acid mine drainage (AMD). All the 16S rRNA gene sequences of bacteria clone library fell into 11 phyla, and Acidobacteria played the most significant role in the ecosystem followed by Verrucomicrobia. A great number of acidophilic bacteria existed in the soil sample, such as Candidatus Koribacter versatilis and Holophaga sp.. The archaea clone library consisted of 2 phyla (Thaumarchaeota and Euryarchaeota). The abundance of Thaumarchaeota was remarkably higher than Euryarchaeota. The ammonia oxidation in the soil environment was probably driven by ammonia-oxidizing archaea, and new species of ammonia-oxidizing archaea existed in the soil sample.
Luo, Meizhong; Kim, Hyeran; Kudrna, Dave; Sisneros, Nicholas B; Lee, So-Jeong; Mueller, Christopher; Collura, Kristi; Zuccolo, Andrea; Buckingham, E Bryan; Grim, Suzanne M; Yanagiya, Kazuyo; Inoko, Hidetoshi; Shiina, Takashi; Flajnik, Martin F; Wing, Rod A; Ohta, Yuko
2006-05-03
Sharks are members of the taxonomic class Chondrichthyes, the oldest living jawed vertebrates. Genomic studies of this group, in comparison to representative species in other vertebrate taxa, will allow us to theorize about the fundamental genetic, developmental, and functional characteristics in the common ancestor of all jawed vertebrates. In order to obtain mapping and sequencing data for comparative genomics, we constructed a bacterial artificial chromosome (BAC) library for the nurse shark, Ginglymostoma cirratum. The BAC library consists of 313,344 clones with an average insert size of 144 kb, covering ~4.5 x 1010 bp and thus providing an 11-fold coverage of the haploid genome. BAC end sequence analyses revealed, in addition to LINEs and SINEs commonly found in other animal and plant genomes, two new groups of nurse shark-specific repetitive elements, NSRE1 and NSRE2 that seem to be major components of the nurse shark genome. Screening the library with single-copy or multi-copy gene probes showed 6-28 primary positive clones per probe of which 50-90% were true positives, demonstrating that the BAC library is representative of the different regions of the nurse shark genome. Furthermore, some BAC clones contained multiple genes, making physical mapping feasible. We have constructed a deep-coverage, high-quality, large insert, and publicly available BAC library for a cartilaginous fish. It will be very useful to the scientific community interested in shark genomic structure, comparative genomics, and functional studies. We found two new groups of repetitive elements specific to the nurse shark genome, which may contribute to the architecture and evolution of the nurse shark genome.
Lemgruber, Renato de Souza Pinto; Marshall, Nislanha Ana dos Anjos; Ghelfi, Andrea; Fagundes, Daniel Barros; Val, Adalberto Luis
2013-01-01
This study aims to evaluate the transcriptome alterations, through cDNA libraries, associated with the combined effects of two PAHs, benzo[a]pyrene (0.5 µg/L) and phenanthrene (50 µg/L), present in crude oil, on specimens of Symphysodon aequifasciatus (discus fish) after 48 h of exposure. The cDNA libraries were constructed according to the SOLiD™ SAGE™ protocol for sequencing in the SOLiD v.3 Plus sequencer. The results were analyzed by bioinformatics and differentially expressed genes were categorized using the gene ontology program. The functional categories (terms) found in the gene ontology and the gene network generated using STRING software were used to predict the adverse effects of benzo[a]pyrene and phenanthrene in the liver. In the present study, 27,127 genes (compared to Danio rerio database) were identified. Considering only those genes with a p-value less than or equal to 0.05 and greater than or equal to two-fold change in expression across libraries, we found 804 genes, 438 down-regulated (54%) and 366 up-regulated (46%), in the experimental group compared to the control. Out of this total, 327 genes were successfully categorized, 174 down-regulated and 153 up-regulated, using gene ontology. Using String, the gene network was composed by 199 nodes, 124 of them resulting in 274 interactions. The results showed that even an acute exposure of 48 h caused metabolic change in response to environmental contaminants, resulting in changes of cell integrity, in oxidation-reduction processes, in the immune response and disturbances of intracellular signaling of discus fish. Also the gene network has showed no central interplay cluster, exhibiting instead interconnected clusters interactions and connected sub-networks. These findings highlight that even an acute sublethal exposure of PAHs can cause metabolism changes that may affect survival of discus. Our findings using SOLiD coupled with SAGE-method resulted in a powerful and reliable means for gene expression analysis in discus, a non-model Amazonian fish. PMID:24312524
Rim, Yeonggil; Kumar, Ritesh; Han, Xiao; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean
2014-01-01
The Korean black raspberry (Rubus coreanus Miquel, KB) on ripening is usually consumed as fresh fruit, whereas the unripe KB has been widely used as a source of traditional herbal medicine. Such a stage specific utilization of KB has been assumed due to the changing metabolite profile during fruit ripening process, but so far molecular and biochemical changes during its fruit maturation are poorly understood. To analyze biochemical changes during fruit ripening process at molecular level, firstly, we have sequenced, assembled, and annotated the transcriptome of KB fruits. Over 4.86 Gb of normalized cDNA prepared from fruits was sequenced using Illumina HiSeq™ 2000, and assembled into 43,723 unigenes. Secondly, we have reported that alterations in anthocyanins and proanthocyanidins are the major factors facilitating variations in these stages of fruits. In addition, up-regulation of F3′H1, DFR4 and LDOX1 resulted in the accumulation of cyanidin derivatives during the ripening process of KB, indicating the positive relationship between the expression of anthocyanin biosynthetic genes and the anthocyanin accumulation. Furthermore, the ability of RcMCHI2 (R. coreanus Miquel chalcone flavanone isomerase 2) gene to complement Arabidopsis transparent testa 5 mutant supported the feasibility of our transcriptome library to provide the gene resources for improving plant nutrition and pigmentation. Taken together, these datasets obtained from transcriptome library and metabolic profiling would be helpful to define the gene-metabolite relationships in this non-model plant. PMID:24505466
Yang, Cheng-Hong; Chuang, Li-Yeh; Shih, Tsung-Mu; Chang, Hsueh-Wei
2010-12-17
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers. To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique "multi-pool method" that analyzes multiple pools of pair-wise case controls individually. When all the settings are in "inclusion", the common SAGE tag sequences are mined. When one tissue type is in "inclusion" and the other types of tissues are not in "inclusion", the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries. The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing.
Bhinder, Bhavneet; Shum, David; Djaballah, Hakim
2014-02-01
RNAi screening in combination with the genome-sequencing projects would constitute the Holy Grail of modern genetics; enabling discovery and validation towards a better understanding of fundamental biology leading to novel targets to combat disease. Hit discordance at inter-screen level together with the lack of reproducibility is emerging as the technology's main pitfalls. To examine some of the underlining factors leading to such discrepancies, we reasoned that perhaps there is an inherent difference in knockdown efficiency of the various RNAi technologies. For this purpose, we utilized the two most popular ones, chemically synthesized siRNA duplex and plasmid-based shRNA hairpin, in order to perform a head to head comparison. Using a previously developed gain-of-function assay probing modulators of the miRNA biogenesis pathway, we first executed on a siRNA screen against the Silencer Select V4.0 library (AMB) nominating 1,273, followed by an shRNA screen against the TRC1 library (TRC1) nominating 497 gene candidates. We observed a poor overlap of only 29 hits given that there are 15,068 overlapping genes between the two libraries; with DROSHA as the only common hit out of the seven known core miRNA biogenesis genes. Distinct genes interacting with the same biogenesis regulators were observed in both screens, with a dismal cross-network overlap of only 3 genes (DROSHA, TGFBR1, and DIS3). Taken together, our study demonstrates differential knockdown activities between the two technologies, possibly due to the inefficient intracellular processing and potential cell-type specificity determinants in generating intended targeting sequences for the plasmid-based shRNA hairpins; and suggests this observed inefficiency as potential culprit in addressing the lack of reproducibility.
Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L
2007-01-01
Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168
Li, Yongping; Wei, Wei; Feng, Jia; Luo, Huifeng; Pi, Mengting; Liu, Zhongchi; Kang, Chunying
2018-01-01
Abstract The genome of the wild diploid strawberry species Fragaria vesca, an ideal model system of cultivated strawberry (Fragaria × ananassa, octoploid) and other Rosaceae family crops, was first published in 2011 and followed by a new assembly (Fvb). However, the annotation for Fvb mainly relied on ab initio predictions and included only predicted coding sequences, therefore an improved annotation is highly desirable. Here, a new annotation version named v2.0.a2 was created for the Fvb genome by a pipeline utilizing one PacBio library, 90 Illumina RNA-seq libraries, and 9 small RNA-seq libraries. Altogether, 18,641 genes (55.6% out of 33,538 genes) were augmented with information on the 5′ and/or 3′ UTRs, 13,168 (39.3%) protein-coding genes were modified or newly identified, and 7,370 genes were found to possess alternative isoforms. In addition, 1,938 long non-coding RNAs, 171 miRNAs, and 51,714 small RNA clusters were integrated into the annotation. This new annotation of F. vesca is substantially improved in both accuracy and integrity of gene predictions, beneficial to the gene functional studies in strawberry and to the comparative genomic analysis of other horticultural crops in Rosaceae family. PMID:29036429
Subramaniam, R; Reinold, S; Molitor, E K; Douglas, C J
1993-01-01
A heterologous probe encoding phenylalanine ammonia-lyase (PAL) was used to identify PAL clones in cDNA libraries made with RNA from young leaf tissue of two Populus deltoides x P. trichocarpa F1 hybrid clones. Sequence analysis of a 2.4-kb cDNA confirmed its identity as a full-length PAl clone. The predicted amino acid sequence is conserved in comparison with that of PAL genes from several other plants. Southern blot analysis of popular genomic DNA from parental and hybrid individuals, restriction site polymorphism in PAL cDNA clones, and sequence heterogeneity in the 3' ends of several cDNA clones suggested that PAL is encoded by at least two genes that can be distinguished by HindIII restriction site polymorphisms. Clones containing each type of PAL gene were isolated from a poplar genomic library. Analysis of the segregation of PAL-specific HindIII restriction fragment-length polymorphisms demonstrated the existence of two independently segregating PAL loci, one of which was mapped to a linkage group of the poplar genetic map. Developmentally regulated PAL expression in poplar was analyzed using RNA blots. Highest expression was observed in young stems, apical buds, and young leaves. Expression was lower in older stems and undetectable in mature leaves. Cellular localization of PAL expression by in situ hybridization showed very high levels of expression in subepidermal cells of leaves early during leaf development. In stems and petioles, expression was associated with subepidermal cells and vascular tissues. PMID:8108506
2010-01-01
Background The presence of closely related genomes in polyploid species makes the assembly of total genomic sequence from shotgun sequence reads produced by the current sequencing platforms exceedingly difficult, if not impossible. Genomes of polyploid species could be sequenced following the ordered-clone sequencing approach employing contigs of bacterial artificial chromosome (BAC) clones and BAC-based physical maps. Although BAC contigs can currently be constructed for virtually any diploid organism with the SNaPshot high-information-content-fingerprinting (HICF) technology, it is currently unknown if this is also true for polyploid species. It is possible that BAC clones from orthologous regions of homoeologous chromosomes would share numerous restriction fragments and be therefore included into common contigs. Because of this and other concerns, physical mapping utilizing the SNaPshot HICF of BAC libraries of polyploid species has not been pursued and the possibility of doing so has not been assessed. The sole exception has been in common wheat, an allohexaploid in which it is possible to construct single-chromosome or single-chromosome-arm BAC libraries from DNA of flow-sorted chromosomes and bypass the obstacles created by polyploidy. Results The potential of the SNaPshot HICF technology for physical mapping of polyploid plants utilizing global BAC libraries was evaluated by assembling contigs of fingerprinted clones in an in silico merged BAC library composed of single-chromosome libraries of two wheat homoeologous chromosome arms, 3AS and 3DS, and complete chromosome 3B. Because the chromosome arm origin of each clone was known, it was possible to estimate the fidelity of contig assembly. On average 97.78% or more clones, depending on the library, were from a single chromosome arm. A large portion of the remaining clones was shown to be library contamination from other chromosomes, a feature that is unavoidable during the construction of single-chromosome BAC libraries. Conclusions The negligibly low level of incorporation of clones from homoeologous chromosome arms into a contig during contig assembly suggested that it is feasible to construct contigs and physical maps using global BAC libraries of wheat and almost certainly also of other plant polyploid species with genome sizes comparable to that of wheat. Because of the high purity of the resulting assembled contigs, they can be directly used for genome sequencing. It is currently unknown but possible that equally good BAC contigs can be also constructed for polyploid species containing smaller, more gene-rich genomes. PMID:20170511
Analyzing Immunoglobulin Repertoires
Chaudhary, Neha; Wesemann, Duane R.
2018-01-01
Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis. PMID:29593723
Beger, Carmela; Pierce, Leigh N.; Krüger, Martin; Marcusson, Eric G.; Robbins, Joan M.; Welcsh, Piri; Welch, Peter J.; Welte, Karl; King, Mary-Claire; Barber, Jack R.; Wong-Staal, Flossie
2001-01-01
Expression of the breast and ovarian cancer susceptibility gene BRCA1 is down-regulated in sporadic breast and ovarian cancer cases. Therefore, the identification of genes involved in the regulation of BRCA1 expression might lead to new insights into the pathogenesis and treatment of these tumors. In the present study, an “inverse genomics” approach based on a randomized ribozyme gene library was applied to identify cellular genes regulating BRCA1 expression. A ribozyme gene library with randomized target recognition sequences was introduced into human ovarian cancer-derived cells stably expressing a selectable marker [enhanced green fluorescence protein (EGFP)] under the control of the BRCA1 promoter. Cells in which BRCA1 expression was upregulated by particular ribozymes were selected through their concomitant increase in EGFP expression. The cellular target gene of one ribozyme was identified to be the dominant negative transcriptional regulator Id4. Modulation of Id4 expression resulted in inversely regulated expression of BRCA1. In addition, increase in Id4 expression was associated with the ability of cells to exhibit anchorage-independent growth, demonstrating the biological relevance of this gene. Our data suggest that Id4 is a crucial gene regulating BRCA1 expression and might therefore be important for the BRCA1 regulatory pathway involved in the pathogenesis of sporadic breast and ovarian cancer. PMID:11136250
Chen, Chun-lan; Wu, Min-na; Wei, Wen-xue
2011-05-01
The aim of this study was to determine the effect of long-term (16 years) application of nitrogen fertilizer on the diversity of nitrifying genes (amoA and hao) in paddy soil on the basis of long-term paddy field experimental station (started in 1990) located in Taoyuan, with the molecular approaches of PCR, constructing libraries and sequencing. The fertilizer was urea and no fertilizer was as control. The Shannon index showed that long-term application of nitrogen fertilizer made the diversity of amoA gene descend while no effect on the diversity of hao gene. The LIBSHUFF statistical analyses demonstrated that both amoA and hao libraries of CK and N treatments were significantly different from each other and the rarefaction curves of libraries failed to meet the plateaus indicating that there were lots kinds of genes haven't been detected. The results of blasting with GenBank and the phylogenetic tree showed that the amoA genes detected in our study had a similarity with the uncultured gene of amoA, which showed some similar to Nitrosospira. Otherwise, the hao genes cloned showed a relationship to the genes of cultured bacteria such as Silicibacteria, Nitrosospira and Methylococcus, and the hao genes found in the N treatment dominated in alpha-Proteobacteria. These results suggest that long-term fertilization of nitrogen had significant impacts on the diversity or community of amoA and hao genes.
Ho, Chai-Ling; Kwan, Yen-Yen; Choi, Mei-Chooi; Tee, Sue-Sean; Ng, Wai-Har; Lim, Kok-Ang; Lee, Yang-Ping; Ooi, Siew-Eng; Lee, Weng-Wah; Tee, Jin-Ming; Tan, Siang-Hee; Kulaveerasingam, Harikrishna; Alwee, Sharifah Shahrul Rabiah Syed; Abdullah, Meilina Ong
2007-01-01
Background Oil palm is the second largest source of edible oil which contributes to approximately 20% of the world's production of oils and fats. In order to understand the molecular biology involved in in vitro propagation, flowering, efficient utilization of nitrogen sources and root diseases, we have initiated an expressed sequence tag (EST) analysis on oil palm. Results In this study, six cDNA libraries from oil palm zygotic embryos, suspension cells, shoot apical meristems, young flowers, mature flowers and roots, were constructed. We have generated a total of 14537 expressed sequence tags (ESTs) from these libraries, from which 6464 tentative unique contigs (TUCs) and 2129 singletons were obtained. Approximately 6008 of these tentative unique genes (TUGs) have significant matches to the non-redundant protein database, from which 2361 were assigned to one or more Gene Ontology categories. Predominant transcripts and differentially expressed genes were identified in multiple oil palm tissues. Homologues of genes involved in many aspects of flower development were also identified among the EST collection, such as CONSTANS-like, AGAMOUS-like (AGL)2, AGL20, LFY-like, SQUAMOSA, SQUAMOSA binding protein (SBP) etc. Majority of them are the first representatives in oil palm, providing opportunities to explore the cause of epigenetic homeotic flowering abnormality in oil palm, given the importance of flowering in fruit production. The transcript levels of two flowering-related genes, EgSBP and EgSEP were analysed in the flower tissues of various developmental stages. Gene homologues for enzymes involved in oil biosynthesis, utilization of nitrogen sources, and scavenging of oxygen radicals, were also uncovered among the oil palm ESTs. Conclusion The EST sequences generated will allow comparative genomic studies between oil palm and other monocotyledonous and dicotyledonous plants, development of gene-targeted markers for the reference genetic map, design and fabrication of DNA array for future studies of oil palm. The outcomes of such studies will contribute to oil palm improvements through the establishment of breeding program using marker-assisted selection, development of diagnostic assays using gene targeted markers, and discovery of candidate genes related to important agronomic traits of oil palm. PMID:17953740
Kock, K; Ahlers, C; Schmale, H
1994-05-01
The rat von Ebner's gland protein 1 (VEGP 1) is a secretory protein, which is abundantly expressed in the small acinar von Ebner's salivary glands of the tongue. Based on the primary structure of this protein we have previously suggested that it is a member of the lipocalin superfamily of lipophilic-ligand carrier proteins. Although the physiological role of VEGP 1 is not clear, it might be involved in sensory or protective functions in the taste epithelium. Here, we report the purification of VEGP 1 and of a closely related secretory polypeptide, VEGP 2, the isolation of a cDNA clone encoding VEGP 2, and the isolation and structural characterization of the genes for both proteins. Protein purification by gel-filtration and anion-exchange chromatography using Mono Q revealed the presence of two different immunoreactive VEGP species. N-terminal sequence determination of peptide fragments isolated after protease Asp-N digestion allowed the identification of a new VEGP, named VEGP 2, in addition to the previously characterized VEGP 1. The complete VEGP 2 sequence was deduced from a cDNA clone isolated from a von Ebner's gland cDNA library. The VEGP 2 cDNA encodes a protein of 177 amino acids and is 94% identical to VEGP 1. DNA sequence analysis of the rat VEGP 1 and 2 genes isolated from rat genomic libraries revealed that both span about 4.5 kb and contain seven exons. The VEGP 1 and 2 genes are non-allelic distinct genes in the rat genome and probably arose by gene duplication. The high degree of nucleotide sequence identity in introns A-C (94-100%) points to a recent gene conversion event that included the 5' part of the genes. The genomic organization of the rat VEGP genes closely resembles that found in other lipocalins such as beta-lactoglobulin, mouse urinary proteins (MUPs) and prostaglandin D synthase, and therefore provides clear evidence that VEGPs belong to this superfamily of proteins.
Sequence verification as quality-control step for production of cDNA microarrays.
Taylor, E; Cogdell, D; Coombes, K; Hu, L; Ramdas, L; Tabor, A; Hamilton, S; Zhang, W
2001-07-01
To generate cDNA arrays in our core laboratory, we amplified about 2300 PCR products from a human, sequence-verified cDNA clone library. As a quality-control step, we sequenced the PCR products immediately before printing. The sequence information was used to search the GenBank database to confirm the identities. Although these clones were previously sequence verified by the company, we found that only 79% of the clones matched the original database after handling. Our experience strongly indicates the necessity to sequence verify the clones at the final stage before printing on microarray slides and to modify the gene list accordingly.
A method for the further assembly of targeted unigenes in a transcriptome after assembly by Trinity
Xiao, Xinlong; Ma, Jinbiao; Sun, Yufang; Yao, Yinan
2015-01-01
RNA-sequencing has been widely used to obtain high throughput transcriptome sequences in various species, but the assembly of a full set of complete transcripts is still a significant challenge. Judging by the number of expected transcripts and assembled unigenes in a transcriptome library, we believe that some unigenes could be reassembled. In this study, using the nitrate transporter (NRT) gene family and phosphate transporter (PHT) gene family in Salicornia europaea as examples, we introduced an approach to further assemble unigenes found in transcriptome libraries which had been previously generated by Trinity. To find the unigenes of a particular transcript that contained gaps, we respectively selected 16 NRT candidate unigene pairs and 12 PHT candidate unigene pairs for which the two unigenes had the same annotations, the same expression patterns among various RNA-seq samples, and different positions of the proteins coded as mapped to a reference protein. To fill a gap between the two unigenes, PCR was performed using primers that mapped to the two unigenes and the PCR products were sequenced, which demonstrated that 5 unigene pairs of NRT and 3 unigene pairs of PHT could be reassembled when the gaps were filled using the corresponding PCR product sequences. This fast and simple method will reduce the redundancy of targeted unigenes and allow acquisition of complete coding sequences (CDS). PMID:26528307
Improving draft genome contiguity with reference-derived in silico mate-pair libraries.
Grau, José Horacio; Hackl, Thomas; Koepfli, Klaus-Peter; Hofreiter, Michael
2018-05-01
Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding-a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.
Upregulated Genes In Sporadic, Idiopathic Pulmonary Arterial Hypertension
Edgar, Alasdair J; Chacón, Matilde R; Bishop, Anne E; Yacoub, Magdi H; Polak, Julia M
2006-01-01
Background To elucidate further the pathogenesis of sporadic, idiopathic pulmonary arterial hypertension (IPAH) and identify potential therapeutic avenues, differential gene expression in IPAH was examined by suppression subtractive hybridisation (SSH). Methods Peripheral lung samples were obtained immediately after removal from patients undergoing lung transplant for IPAH without familial disease, and control tissues consisted of similarly sampled pieces of donor lungs not utilised during transplantation. Pools of lung mRNA from IPAH cases containing plexiform lesions and normal donor lungs were used to generate the tester and driver cDNA libraries, respectively. A subtracted IPAH cDNA library was made by SSH. Clones isolated from this subtracted library were examined for up regulated expression in IPAH using dot blot arrays of positive colony PCR products using both pooled cDNA libraries as probes. Clones verified as being upregulated were sequenced. For two genes the increase in expression was verified by northern blotting and data analysed using Student's unpaired two-tailed t-test. Results We present preliminary findings concerning candidate genes upregulated in IPAH. Twenty-seven upregulated genes were identified out of 192 clones examined. Upregulation in individual cases of IPAH was shown by northern blot for tissue inhibitor of metalloproteinase-3 and decorin (P < 0.01) compared with the housekeeping gene glyceraldehydes-3-phosphate dehydrogenase. Conclusion Four of the up regulated genes, magic roundabout, hevin, thrombomodulin and sucrose non-fermenting protein-related kinase-1 are expressed specifically by endothelial cells and one, muscleblind-1, by muscle cells, suggesting that they may be associated with plexiform lesions and hypertrophic arterial wall remodelling, respectively. PMID:16390543
Small gene family encoding an eggshell (chorion) protein of the human parasite Schistosoma mansoni
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bobek, L.A.; Rekosh, D.M.; Lo Verde, P.T.
1988-08-01
The authors isolated six independent genomic clones encoding schistosome chorion or eggshell proteins from a Schistosoma mansoni genomic library. A linkage map of five of the clones spanning 35 kilobase pairs (kbp) of the S. mansoni genome was constructed. The region contained two eggshell protein genes closely linked, separated by 7.5 kbp of intergenic DNA. The two genes of the cluster were arranged in the same orientation, that is, they were transcribed from the same strand. The sixth clone probably represents a third copy of the eggshell gene that is not contained within the 35-kbp region. The 5- end ofmore » the mRNA transcribed from these genes was defined by primer extension directly off the RNA. The ATCAT cap site sequence was homologous to a silkmoth chorion PuTCATT cap site sequence, where Pu indicates any purine. DNA sequence analysis showed that there were no introns in these genes. The DNA sequences of the three genes were very homologous to each other and to a cDNA clone, pSMf61-46, differing only in three or four nucleotices. A multiple TATA box was located at positions -23 to -31, and a CAAAT sequence was located at -52 upstream of the eggshell transcription unit. Comparison of sequences in regions further upstream with silkmoth and Drosophila sequences revealed very short elements that were shared. One such element, TCACGT, recently shown to be an essential cis-regulatory element for silkmoth chorion gene promoter function, was found at a similar position in all three organisms.« less
Rapid and accurate synthesis of TALE genes from synthetic oligonucleotides.
Wang, Fenghua; Zhang, Hefei; Gao, Jingxia; Chen, Fengjiao; Chen, Sijie; Zhang, Cuizhen; Peng, Gang
2016-01-01
Custom synthesis of transcription activator-like effector (TALE) genes has relied upon plasmid libraries of pre-fabricated TALE-repeat monomers or oligomers. Here we describe a novel synthesis method that directly incorporates annealed synthetic oligonucleotides into the TALE-repeat units. Our approach utilizes iterative sets of oligonucleotides and a translational frame check strategy to ensure the high efficiency and accuracy of TALE-gene synthesis. TALE arrays of more than 20 repeats can be constructed, and the majority of the synthesized constructs have perfect sequences. In addition, this novel oligonucleotide-based method can readily accommodate design changes to the TALE repeats. We demonstrated an increased gene targeting efficiency against a genomic site containing a potentially methylated cytosine by incorporating non-conventional repeat variable di-residue (RVD) sequences.
Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferrari, S.; Finelli, P.; Rocchi, M.
The human genome contains a large number of sequences related to the cDNA for High Mobility Group 1 protein (HMG1), which so far has hampered the cloning and mapping of the active HMG1 gene. We show that the human HMG1 gene contains introns, while the HMG1-related sequences do not and most likely are retrotransposed pseudogenes. We identified eight YACs from the ICI and CEPH libraries that contain the human HMG1 gene. The HMG1 gene is similar in structure to the previously characterized murine homologue and maps to human chromosome 13 and q12, as determined by in situ hybridization. The mousemore » Hmg1 gene maps to the telomeric region of murine Chromosome 5, which is syntenic to the human 13q12 band. 18 refs., 3 figs.« less
Bacterial diversity in permanently cold and alkaline ikaite columns from Greenland.
Schmidt, Mariane; Priemé, Anders; Stougaard, Peter
2006-12-01
Bacterial diversity in alkaline (pH 10.4) and permanently cold (4 degrees C) ikaite tufa columns from the Ikka Fjord, SW Greenland, was investigated using growth characterization of cultured bacterial isolates with Terminal-restriction fragment length polymorphism (T-RFLP) and sequence analysis of bacterial 16S rRNA gene fragments. More than 200 bacterial isolates were characterized with respect to pH and temperature tolerance, and it was shown that the majority were cold-active alkaliphiles. T-RFLP analysis revealed distinct bacterial communities in different fractions of three ikaite columns, and, along with sequence analysis, it showed the presence of rich and diverse bacterial communities. Rarefaction analysis showed that the 109 sequenced clones in the 16S rRNA gene library represented between 25 and 65% of the predicted species richness in the three ikaite columns investigated. Phylogenetic analysis of the 16S rRNA gene sequences revealed many sequences with similarity to alkaliphilic or psychrophilic bacteria, and showed that 33% of the cloned sequences and 33% of the cultured bacteria showed less than 97% sequence identity to known sequences in databases, and may therefore represent yet unknown species.
Walker, M D; Park, C W; Rosen, A; Aronheim, A
1990-01-01
Cell specific expression of the insulin gene is achieved through transcriptional mechanisms operating on multiple DNA sequence elements located in the 5' flanking region of the gene. Of particular importance in the rat insulin I gene are two closely similar 9 bp sequences (IEB1 and IEB2): mutation of either of these leads to 5-10 fold reduction in transcriptional activity. We have screened an expression cDNA library derived from mouse pancreatic endocrine beta cells with a radioactive DNA probe containing multiple copies of the IEB1 sequence. A cDNA clone (A1) isolated by this procedure encodes a protein which shows efficient binding to the IEB1 probe, but much weaker binding to either an unrelated DNA probe or to a probe bearing a single base pair insertion within the recognition sequence. DNA sequence analysis indicates a protein belonging to the helix-loop-helix family of DNA-binding proteins. The ability of the protein encoded by clone A1 to recognize a number of wild type and mutant DNA sequences correlates closely with the ability of each sequence element to support transcription in vivo in the context of the insulin 5' flanking DNA. We conclude that the isolated cDNA may encode a transcription factor that participates in control of insulin gene expression. Images PMID:2181401
2011-01-01
Background BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. Results This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Conclusions Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed. PMID:21794110
Feltus, Frank A; Saski, Christopher A; Mockaitis, Keithanne; Haiminen, Niina; Parida, Laxmi; Smith, Zachary; Ford, James; Staton, Margaret E; Ficklin, Stephen P; Blackmon, Barbara P; Cheng, Chun-Huai; Schnell, Raymond J; Kuhn, David N; Motamayor, Juan-Carlos
2011-07-27
BAC-based physical maps provide for sequencing across an entire genome or a selected sub-genomic region of biological interest. Such a region can be approached with next-generation whole-genome sequencing and assembly as if it were an independent small genome. Using the minimum tiling path as a guide, specific BAC clones representing the prioritized genomic interval are selected, pooled, and used to prepare a sequencing library. This pooled BAC approach was taken to sequence and assemble a QTL-rich region, of ~3 Mbp and represented by twenty-seven BACs, on linkage group 5 of the Theobroma cacao cv. Matina 1-6 genome. Using various mixtures of read coverages from paired-end and linear 454 libraries, multiple assemblies of varied quality were generated. Quality was assessed by comparing the assembly of 454 reads with a subset of ten BACs individually sequenced and assembled using Sanger reads. A mixture of reads optimal for assembly was identified. We found, furthermore, that a quality assembly suitable for serving as a reference genome template could be obtained even with a reduced depth of sequencing coverage. Annotation of the resulting assembly revealed several genes potentially responsible for three T. cacao traits: black pod disease resistance, bean shape index, and pod weight. Our results, as with other pooled BAC sequencing reports, suggest that pooling portions of a minimum tiling path derived from a BAC-based physical map is an effective method to target sub-genomic regions for sequencing. While we focused on a single QTL region, other QTL regions of importance could be similarly sequenced allowing for biological discovery to take place before a high quality whole-genome assembly is completed.
Zhang, De-Chao; Liu, Yan-Xia; Li, Xin-Zheng
2015-09-01
Deep sea ferromanganese (FeMn) nodules contain metallic mineral resources and have great economic potential. In this study, a combination of culture-dependent and culture-independent (16S rRNA genes clone library and pyrosequencing) methods was used to investigate the bacterial diversity in FeMn nodules from Jiaolong Seamount, the South China Sea. Eleven bacterial strains including some moderate thermophiles were isolated. The majority of strains belonged to the phylum Proteobacteria; one isolate belonged to the phylum Firmicutes. A total of 259 near full-length bacterial 16S rRNA gene sequences in a clone library and 67,079 valid reads obtained using pyrosequencing indicated that members of the Gammaproteobacteria dominated, with the most abundant bacterial genera being Pseudomonas and Alteromonas. Sequence analysis indicated the presence of many organisms whose closest relatives are known manganese oxidizers, iron reducers, hydrogen-oxidizing bacteria and methylotrophs. This is the first reported investigation of bacterial diversity associated with deep sea FeMn nodules from the South China Sea.
Burger, Brian T.; Imam, Saheed; Scarborough, Matthew J.; ...
2017-06-06
Rhodobacter sphaeroides is one of the best-studied alphaproteobacteria from biochemical, genetic, and genomic perspectives. To gain a better systems-level understanding of this organism, we generated a large transposon mutant library and used transposon sequencing (Tn-seq) to identify genes that are essential under several growth conditions. Using newly developed Tn-seq analysis software (TSAS), we identified 493 genes as essential for aerobic growth on a rich medium. We then used the mutant library to identify conditionally essential genes under two laboratory growth conditions, identifying 85 additional genes required for aerobic growth in a minimal medium and 31 additional genes required for photosyntheticmore » growth. In all instances, our analyses confirmed essentiality for many known genes and identified genes not previously considered to be essential. We used the resulting Tn-seq data to refine and improve a genome-scale metabolic network model (GEM) for R. sphaeroides. Together, we demonstrate how genetic, genomic, and computational approaches can be combined to obtain a systems-level understanding of the genetic framework underlying metabolic diversity in bacterial species.« less
Zhang, Fengjiao; Dong, Wen; Huang, Lulu; Song, Aiping; Wang, Haibin; Fang, Weimin; Chen, Fadi; Teng, Nianjun
2015-01-01
MicroRNAs (miRNAs) are important regulators in plant development. They post-transcriptionally regulate gene expression during various biological and metabolic processes by binding to the 3'-untranslated region of target mRNAs to facilitate mRNA degradation or inhibit translation. Chrysanthemum (Chrysanthemum morifolium) is one of the most important ornamental flowers with increasing demand each year. However, embryo abortion is the main reason for chrysanthemum cross breeding failure. To date, there have been no experiments examining the expression of miRNAs associated with chrysanthemum embryo development. Therefore, we sequenced three small RNA libraries to identify miRNAs and their functions. Our results will provide molecular insights into chrysanthemum embryo abortion. Three small RNA libraries were built from normal chrysanthemum ovules at 12 days after pollination (DAP), and normal and abnormal chrysanthemum ovules at 18 DAP. We validated 228 miRNAs with significant changes in expression frequency during embryonic development. Comparative profiling revealed that 69 miRNAs exhibited significant differential expression between normal and abnormal embryos at 18 DAP. In addition, a total of 1037 miRNA target genes were predicted, and their annotations were defined by transcriptome data. Target genes associated with metabolic pathways were most highly represented according to the annotation. Moreover, 52 predicted target genes were identified to be associated with embryonic development, including 31 transcription factors and 21 additional genes. Gene ontology (GO) annotation also revealed that high-ranking miRNA target genes related to cellular processes and metabolic processes were involved in transcription regulation and the embryo developmental process. The present study generated three miRNA libraries and gained information on miRNAs and their targets in the chrysanthemum embryo. These results enrich the growing database of new miRNAs and lay the foundation for the further understanding of miRNA biological function in the regulation of chrysanthemum embryo abortion.
Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko
2014-01-01
Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750
Han, Jun; Zhao, Xiaojie; Cui, Yu; Song, Wei; Huo, Naxin; Liang, Yong; Xie, Jingzhong; Wang, Zhenzhong; Wu, Qiuhong; Chen, Yong-Xing; Lu, Ping; Zhang, De-Yun; Wang, Lili; Sun, Hua; Yang, Tsomin; Keeble-Gagnere, Gabriel; Appels, Rudi; Doležel, Jaroslav; Ling, Hong-Qing; Luo, Mingcheng; Gu, Yongqiang; Sun, Qixin; Liu, Zhiyong
2014-01-01
Powdery mildew, caused by Blumeria graminis f. sp. tritici, is one of the most important wheat diseases in the world. In this study, a single dominant powdery mildew resistance gene MlIW172 was identified in the IW172 wild emmer accession and mapped to the distal region of chromosome arm 7AL (bin7AL-16-0.86-0.90) via molecular marker analysis. MlIW172 was closely linked with the RFLP probe Xpsr680-derived STS marker Xmag2185 and the EST markers BE405531 and BE637476. This suggested that MlIW172 might be allelic to the Pm1 locus or a new locus closely linked to Pm1. By screening genomic BAC library of durum wheat cv. Langdon and 7AL-specific BAC library of hexaploid wheat cv. Chinese Spring, and after analyzing genome scaffolds of Triticum urartu containing the marker sequences, additional markers were developed to construct a fine genetic linkage map on the MlIW172 locus region and to delineate the resistance gene within a 0.48 cM interval. Comparative genetics analyses using ESTs and RFLP probe sequences flanking the MlIW172 region against other grass species revealed a general co-linearity in this region with the orthologous genomic regions of rice chromosome 6, Brachypodium chromosome 1, and sorghum chromosome 10. However, orthologous resistance gene-like RGA sequences were only present in wheat and Brachypodium. The BAC contigs and sequence scaffolds that we have developed provide a framework for the physical mapping and map-based cloning of MlIW172. PMID:24955773
2013-10-01
proposal for the first phase of the project, gene expression and epigenetic alterations are to be analyzed by next generation sequencing. Laser captured... genes and also alternative splicing events which could provide valuable biomarkers for this project. We required RRBS libraries for the methylation...regulated compared to BP in both the benign tissue adjacent to cancer (BPC) and in prostate cancer. Interestingly, both of these genes can also be
Synthetic muscle promoters: activities exceeding naturally occurring regulatory sequences
NASA Technical Reports Server (NTRS)
Li, X.; Eastman, E. M.; Schwartz, R. J.; Draghia-Akli, R.
1999-01-01
Relatively low levels of expression from naturally occurring promoters have limited the use of muscle as a gene therapy target. Myogenic restricted gene promoters display complex organization usually involving combinations of several myogenic regulatory elements. By random assembly of E-box, MEF-2, TEF-1, and SRE sites into synthetic promoter recombinant libraries, and screening of hundreds of individual clones for transcriptional activity in vitro and in vivo, several artificial promoters were isolated whose transcriptional potencies greatly exceed those of natural myogenic and viral gene promoters.
CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates.
Low, Joel Z B; Khang, Tsung Fei; Tammi, Martti T
2017-12-28
In current statistical methods for calling differentially expressed genes in RNA-Seq experiments, the assumption is that an adjusted observed gene count represents an unknown true gene count. This adjustment usually consists of a normalization step to account for heterogeneous sample library sizes, and then the resulting normalized gene counts are used as input for parametric or non-parametric differential gene expression tests. A distribution of true gene counts, each with a different probability, can result in the same observed gene count. Importantly, sequencing coverage information is currently not explicitly incorporated into any of the statistical models used for RNA-Seq analysis. We developed a fast Bayesian method which uses the sequencing coverage information determined from the concentration of an RNA sample to estimate the posterior distribution of a true gene count. Our method has better or comparable performance compared to NOISeq and GFOLD, according to the results from simulations and experiments with real unreplicated data. We incorporated a previously unused sequencing coverage parameter into a procedure for differential gene expression analysis with RNA-Seq data. Our results suggest that our method can be used to overcome analytical bottlenecks in experiments with limited number of replicates and low sequencing coverage. The method is implemented in CORNAS (Coverage-dependent RNA-Seq), and is available at https://github.com/joel-lzb/CORNAS .
Analysis of expressed sequence tags from a NaHCO(3)-treated alkali-tolerant plant, Chloris virgata.
Nishiuchi, Shunsaku; Fujihara, Kazumasa; Liu, Shenkui; Takano, Tetsuo
2010-04-01
Chloris virgata Swartz (C. virgata) is a gramineous wild plant that can survive in saline-alkali areas in northeast China. To examine the tolerance mechanisms of C. virgata, we constructed a cDNA library from whole plants of C. virgata that had been treated with 100 mM NaHCO(3) for 24 h and sequenced 3168 randomly selected clones. Most (2590) of the expressed sequence tags (ESTs) showed significant similarity to sequences in the NCBI database. Of the 2590 genes, 1893 were unique. Gene Ontology (GO) Slim annotations were obtained for 1081 ESTs by BLAST2GO and it was found that 75 genes of them were annotated with GO terms "response to stress", "response to abiotic stimulus", and "response to biotic stimulus", indicating these genes were likely to function in tolerance mechanism of C. virgata. In a separate experiment, 24 genes that are known from previous studies to be associated with abiotic stress tolerance were further examined by real-time RT-PCR to see how their expressions were affected by NaHCO(3) stress. NaHCO(3) treatment up-regulated the expressions of pathogenesis-related gene (DC998527), Win1 precursor gene (DC998617), catalase gene (DC999385), ribosome inactivating protein 1 (DC999555), Na(+)/H(+) antiporter gene (DC998043), and two-component regulator gene (DC998236). Copyright 2010 Elsevier Masson SAS. All rights reserved.
Feng, X; Happ, G M
1996-11-14
The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.
Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong
2008-09-01
We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.
Schwarz, Jodi A; Brokstein, Peter B; Voolstra, Christian; Terry, Astrid Y; Miller, David J; Szmant, Alina M; Coffroth, Mary Alice; Medina, Mónica
2008-01-01
Background Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Results We generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembled into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. Conclusion Partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies. PMID:18298846
Schwarz, Jodi A.; Brokstein, Peter B.; Voolstra, Christian R.; ...
2008-02-25
Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Here we generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembledmore » into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. In conclusion, partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schwarz, Jodi A.; Brokstein, Peter B.; Voolstra, Christian R.
Scleractinian corals are the foundation of reef ecosystems in tropical marine environments. Their great success is due to interactions with endosymbiotic dinoflagellates (Symbiodinium spp.), with which they are obligately symbiotic. To develop a foundation for studying coral biology and coral symbiosis, we have constructed a set of cDNA libraries and generated and annotated ESTs from two species of corals, Acropora palmata and Montastraea faveolata. Here we generated 14,588 (Ap) and 3,854 (Mf) high quality ESTs from five life history/symbiosis stages (spawned eggs, early-stage planula larvae, late-stage planula larvae either infected with symbionts or uninfected, and adult coral). The ESTs assembledmore » into a set of primarily stage-specific clusters, producing 4,980 (Ap), and 1,732 (Mf) unigenes. The egg stage library, relative to the other developmental stages, was enriched in genes functioning in cell division and proliferation, transcription, signal transduction, and regulation of protein function. Fifteen unigenes were identified as candidate symbiosis-related genes as they were expressed in all libraries constructed from the symbiotic stages and were absent from all of the non symbiotic stages. These include several DNA interacting proteins, and one highly expressed unigene (containing 17 cDNAs) with no significant protein-coding region. A significant number of unigenes (25) encode potential pattern recognition receptors (lectins, scavenger receptors, and others), as well as genes that may function in signaling pathways involved in innate immune responses (toll-like signaling, NFkB p105, and MAP kinases). Comparison between the A. palmata and an A. millepora EST dataset identified ferritin as a highly expressed gene in both datasets that appears to be undergoing adaptive evolution. Five unigenes appear to be restricted to the Scleractinia, as they had no homology to any sequences in the nr databases nor to the non-scleractinian cnidarians Nematostella vectensis and Hydra magnipapillata. In conclusion, partial sequencing of 5 cDNA libraries each for A. palmata and M. faveolata has produced a rich set of candidate genes (4,980 genes from A. palmata, and 1,732 genes from M. faveolata) that we can use as a starting point for examining the life history and symbiosis of these two species, as well as to further expand the dataset of cnidarian genes for comparative genomics and evolutionary studies.« less
Defining Genetic Fitness Determinants and Creating Genomic Resources for an Oral Pathogen
Narayanan, Ajay M.; Ramsey, Matthew M.
2017-01-01
ABSTRACT Periodontitis is a microbial infection that destroys the structures that support the teeth. Although it is typically a chronic condition, rapidly progressing, aggressive forms are associated with the oral pathogen Aggregatibacter actinomycetemcomitans. One of this bacterium's key virulence traits is its ability to attach to surfaces and form robust biofilms that resist killing by the host and antibiotics. Though much has been learned about A. actinomycetemcomitans since its initial discovery, we lack insight into a fundamental aspect of its basic biology, as we do not know the full set of genes that it requires for viability (the essential genome). Furthermore, research on A. actinomycetemcomitans is hampered by the field's lack of a mutant collection. To address these gaps, we used rapid transposon mutant sequencing (Tn-seq) to define the essential genomes of two strains of A. actinomycetemcomitans, revealing a core set of 319 genes. We then generated an arrayed mutant library comprising >1,500 unique insertions and used a sequencing-based approach to define each mutant's position (well and plate) in the library. To demonstrate its utility, we screened the library for mutants with weakened resistance to subinhibitory erythromycin, revealing the multidrug efflux pump AcrAB as a critical resistance factor. During the screen, we discovered that erythromycin induces A. actinomycetemcomitans to form biofilms. We therefore devised a novel Tn-seq-based screen to identify specific factors that mediate this phenotype and in follow-up experiments confirmed 4 mutants. Together, these studies present new insights and resources for investigating the basic biology and disease mechanisms of a human pathogen. IMPORTANCE Millions suffer from gum disease, which often is caused by Aggregatibacter actinomycetemcomitans, a bacterium that forms antibiotic-resistant biofilms. To fully understand any organism, we should be able to answer: what genes does it require for life? Here, we address this question for A. actinomycetemcomitans by determining the genes in its genome that cannot be mutated. As for the genes that can be mutated, we archived these mutants into a library, which we used to find genes that contribute to antibiotic resistance, leading us to discover that antibiotics cause A. actinomycetemcomitans to form biofilms. We then devised an approach to find genes that mediate this process and confirmed 4 genes. These results illuminate new fundamental traits of a human pathogen. PMID:28476775
Tetteh, Kevin K. A.; Loukas, Alex; Tripp, Cindy; Maizels, Rick M.
1999-01-01
Larvae of Toxocara canis, a nematode parasite of dogs, infect humans, causing visceral and ocular larva migrans. In noncanid hosts, larvae neither grow nor differentiate but endure in a state of arrested development. Reasoning that parasite protein production is orientated to immune evasion, we undertook a random sequencing project from a larval cDNA library to characterize the most highly expressed transcripts. In all, 266 clones were sequenced, most from both 3′ and 5′ ends, and similarity searches against GenBank protein and dbEST nucleotide databases were conducted. Cluster analyses showed that 128 distinct gene products had been found, all but 3 of which represented newly identified genes. Ninety-five genes were represented by a single clone, but seven transcripts were present at high frequencies, each composing >2% of all clones sequenced. These high-abundance transcripts include a mucin and a C-type lectin, which are both major excretory-secretory antigens released by parasites. Four highly expressed novel gene transcripts, termed ant (abundant novel transcript) genes, were found. Together, these four genes comprised 18% of all cDNA clones isolated, but no similar sequences occur in the Caenorhabditis elegans genome. While the coding regions of the four genes are dissimilar, their 3′ untranslated tracts have significant homology in nucleotide sequence. The discovery of these abundant, parasite-specific genes of newly identified lectins and mucins, as well as a range of conserved and novel proteins, provides defined candidates for future analysis of the molecular basis of immune evasion by T. canis. PMID:10456930
Cloning and characterization of a novel α-amylase from a fecal microbial metagenome.
Xu, Bo; Yang, Fuya; Xiong, Caiyun; Li, Junjun; Tang, Xianghua; Zhou, Junpei; Xie, Zhenrong; Ding, Junmei; Yang, Yunjuan; Huang, Zunxi
2014-04-01
To isolate novel and useful microbial enzymes from uncultured gastrointestinal microorganisms, a fecal microbial metagenomic library of the pygmy loris was constructed. The library was screened for amylolytic activity, and 8 of 50,000 recombinant clones showed amylolytic activity. Subcloning and sequence analysis of a positive clone led to the identification a novel gene (amyPL) coding for α-amylase. AmyPL was expressed in Escherichia coli BL21 (DE3) and the purified AmyPL was enzymatically characterized. This study is the first to report the molecular and biochemical characterization of a novel α-amylase from a gastrointestinal metagenomic library.
Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping
2013-01-01
Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520
Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping
2013-01-01
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
Fan, R; Ling, P; Hao, C Y; Li, F P; Huang, L F; Wu, B D; Wu, H S
2015-10-19
Black pepper is a perennial climbing vine. It is widely cultivated because its berries can be utilized not only as a spice in food but also for medicinal use. This study aimed to construct a standardized, high-quality cDNA library to facilitated identification of new Piper hainanense transcripts. For this, 262 unigenes were used to generate raw reads. The average length of these 262 unigenes was 774.8 bp. Of these, 94 genes (35.9%) were newly identified, according to the NCBI protein database. Thus, identification of new genes may broaden the molecular knowledge of P. hainanense on the basis of Clusters of Orthologous Groups and Gene Ontology categories. In addition, certain basic genes linked to physiological processes, which can contribute to disease resistance and thereby to the breeding of black pepper. A total of 26 unigenes were found to be SSR markers. Dinucleotide SSR was the main repeat motif, accounting for 61.54%, followed by trinucleotide SSR (23.07%). Eight primer pairs successfully amplified DNA fragments and detected significant amounts of polymorphism among twenty-one piper germplasm. These results present a novel sequence information of P. hainanense, which can serve as the foundation for further genetic research on this species.
Bacterial community composition characterization of a lead-contaminated Microcoleus sp. consortium.
Giloteaux, Ludovic; Solé, Antoni; Esteve, Isabel; Duran, Robert
2011-08-01
A Microcoleus sp. consortium, obtained from the Ebro delta microbial mat, was maintained under different conditions including uncontaminated, lead-contaminated, and acidic conditions. Terminal restriction fragment length polymorphism and 16S rRNA gene library analyses were performed in order to determine the effect of lead and culture conditions on the Microcoleus sp. consortium. The bacterial composition inside the consortium revealed low diversity and the presence of specific terminal-restriction fragments under lead conditions. 16S rRNA gene library analyses showed that members of the consortium were affiliated to the Alpha, Beta, and Gammaproteobacteria and Cyanobacteria. Sequences closely related to Achromobacter spp., Alcaligenes faecalis, and Thiobacillus species were exclusively found under lead conditions while sequences related to Geitlerinema sp., a cyanobacterium belonging to the Oscillatoriales, were not found in presence of lead. This result showed a strong lead selection of the bacterial members present in the Microcoleus sp. consortium. Several of the 16S rRNA sequences were affiliated to nitrogen-fixing microorganisms including members of the Rhizobiaceae and the Sphingomonadaceae. Additionally, confocal laser scanning microscopy and scanning and transmission electron microscopy showed that under lead-contaminated condition Microcoleus sp. cells were grouped and the number of electrodense intracytoplasmic inclusions was increased.
Bovine mammary gene expression profiling during the onset of lactation.
Gao, Yuanyuan; Lin, Xueyan; Shi, Kerong; Yan, Zhengui; Wang, Zhonghua
2013-01-01
Lactogenesis includes two stages. Stage I begins a few weeks before parturition. Stage II is initiated around the time of parturition and extends for several days afterwards. To better understand the molecular events underlying these changes, genome-wide gene expression profiling was conducted using digital gene expression (DGE) on bovine mammary tissue at three time points (on approximately day 35 before parturition (-35 d), day 7 before parturition (-7 d) and day 3 after parturition (+3 d)). Approximately 6.2 million (M), 5.8 million (M) and 6.1 million (M) 21-nt cDNA tags were sequenced in the three cDNA libraries (-35 d, -7 d and +3 d), respectively. After aligning to the reference sequences, the three cDNA libraries included 8,662, 8,363 and 8,359 genes, respectively. With a fold change cutoff criteria of ≥ 2 or ≤-2 and a false discovery rate (FDR) of ≤ 0.001, a total of 812 genes were significantly differentially expressed at -7 d compared with -35 d (stage I). Gene ontology analysis showed that those significantly differentially expressed genes were mainly associated with cell cycle, lipid metabolism, immune response and biological adhesion. A total of 1,189 genes were significantly differentially expressed at +3 d compared with -7 d (stage II), and these genes were mainly associated with the immune response and cell cycle. Moreover, there were 1,672 genes significantly differentially expressed at +3 d compared with -35 d. Gene ontology analysis showed that the main differentially expressed genes were those associated with metabolic processes. The results suggest that the mammary gland begins to lactate not only by a gain of function but also by a broad suppression of function to effectively push most of the cell's resources towards lactation.
The Carnegie Protein Trap Library: A Versatile Tool for Drosophila Developmental Studies
Buszczak, Michael; Paterno, Shelley; Lighthouse, Daniel; Bachman, Julia; Planck, Jamie; Owen, Stephenie; Skora, Andrew D.; Nystul, Todd G.; Ohlstein, Benjamin; Allen, Anna; Wilhelm, James E.; Murphy, Terence D.; Levis, Robert W.; Matunis, Erika; Srivali, Nahathai; Hoskins, Roger A.; Spradling, Allan C.
2007-01-01
Metazoan physiology depends on intricate patterns of gene expression that remain poorly known. Using transposon mutagenesis in Drosophila, we constructed a library of 7404 protein trap and enhancer trap lines, the Carnegie collection, to facilitate gene expression mapping at single-cell resolution. By sequencing the genomic insertion sites, determining splicing patterns downstream of the enhanced green fluorescent protein (EGFP) exon, and analyzing expression patterns in the ovary and salivary gland, we found that 600–900 different genes are trapped in our collection. A core set of 244 lines trapped different identifiable protein isoforms, while insertions likely to act as GFP-enhancer traps were found in 256 additional genes. At least 8 novel genes were also identified. Our results demonstrate that the Carnegie collection will be useful as a discovery tool in diverse areas of cell and developmental biology and suggest new strategies for greatly increasing the coverage of the Drosophila proteome with protein trap insertions. PMID:17194782
Differences in Brain Transcriptomes of Closely Related Baikal Coregonid Species
Bychenko, Oksana S.; Sukhanova, Lyubov V.; Azhikina, Tatyana L.; Skvortsov, Timofey A.; Belomestnykh, Tuyana V.; Sverdlov, Eugene D.
2014-01-01
The aim of this work was to get deeper insight into genetic factors involved in the adaptive divergence of closely related species, specifically two representatives of Baikal coregonids—Baikal whitefish (Coregonus baicalensis Dybowski) and Baikal omul (Coregonus migratorius Georgi)—that diverged from a common ancestor as recently as 10–20 thousand years ago. Using the Serial Analysis of Gene Expression method, we obtained libraries of short representative cDNA sequences (tags) from the brains of Baikal whitefish and omul. A comparative analysis of the libraries revealed quantitative differences among ~4% tags of the fishes under study. Based on the similarity of these tags with cDNA of known organisms, we identified candidate genes taking part in adaptive divergence. The most important candidate genes related to the adaptation of Baikal whitefish and Baikal omul, identified in this work, belong to the genes of cell metabolism, nervous and immune systems, protein synthesis, and regulatory genes as well as to DTSsa4 Tc1-like transposons which are widespread among fishes. PMID:24719892
2013-01-01
Background Olive cDNA libraries to isolate candidate genes that can help enlightening the molecular mechanism of periodicity and / or fruit production were constructed and analyzed. For this purpose, cDNA libraries from the leaves of trees in “on year” and in “off year” in July (when fruits start to appear) and in November (harvest time) were constructed. Randomly selected 100 positive clones from each library were analyzed with respect to sequence and size. A fruit-flesh cDNA library was also constructed and characterized to confirm the reliability of each library’s temporal and spatial properties. Results Quantitative real-time RT-PCR (qRT-PCR) analyses of the cDNA libraries confirmed cDNA molecules that are associated with different developmental stages (e. g. “on year” leaves in July, “off year” leaves in July, leaves in November) and fruits. Hence, a number of candidate cDNAs associated with “on year” and “off year” were isolated. Comparison of the detected cDNAs to the current EST database of GenBank along with other non - redundant databases of NCBI revealed homologs of previously described genes along with several unknown cDNAs. Of around 500 screened cDNAs, 48 cDNA elements were obtained after eliminating ribosomal RNA sequences. These independent transcripts were analyzed using BLAST searches (cutoff E-value of 1.0E-5) against the KEGG and GenBank nucleotide databases and 37 putative transcripts corresponding to known gene functions were annotated with gene names and Gene Ontology (GO) terms. Transcripts in the biological process were found to be related with metabolic process (27%), cellular process (23%), response to stimulus (17%), localization process (8.5%), multicellular organismal process (6.25%), developmental process (6.25%) and reproduction (4.2%). Conclusions A putative P450 monooxigenase expressed fivefold more in the “on year” than that of “off year” leaves in July. Two putative dehydrins expressed significantly more in “on year” leaves than that of “off year” leaves in November. Homologs of UDP – glucose epimerase, acyl - CoA binding protein, triose phosphate isomerase and a putative nuclear core anchor protein were significant in fruits only, while a homolog of an embryo binding protein / small GTPase regulator was detected in “on year” leaves only. One of the two unknown cDNAs was specific to leaves in July while the other was detected in all of the libraries except fruits. KEGG pathway analyses for the obtained sequences correlated with essential metabolisms such as galactose metabolism, amino sugar and nucleotide sugar metabolisms and photosynthesis. Detailed analysis of the results presents candidate cDNAs that can be used to dissect further the genetic basis of fruit production and / or alternate bearing which causes significant economical loss for olive growers. PMID:23552171
Genetic determinants of mate recognition in Brachionus manjavacas (Rotifera)
Snell, Terry W; Shearer, Tonya L; Smith, Hilary A; Kubanek, Julia; Gribble, Kristin E; Welch, David B Mark
2009-01-01
Background Mate choice is of central importance to most animals, influencing population structure, speciation, and ultimately the survival of a species. Mating behavior of male brachionid rotifers is triggered by the product of a chemosensory gene, a glycoprotein on the body surface of females called the mate recognition pheromone. The mate recognition pheromone has been biochemically characterized, but little was known about the gene(s). We describe the isolation and characterization of the mate recognition pheromone gene through protein purification, N-terminal amino acid sequence determination, identification of the mate recognition pheromone gene from a cDNA library, sequencing, and RNAi knockdown to confirm the functional role of the mate recognition pheromone gene in rotifer mating. Results A 29 kD protein capable of eliciting rotifer male circling was isolated by high-performance liquid chromatography. Two transcript types containing the N-terminal sequence were identified in a cDNA library; further characterization by screening a genomic library and by polymerase chain reaction revealed two genes belonging to each type. Each gene begins with a signal peptide region followed by nearly perfect repeats of an 87 to 92 codon motif with no codons between repeats and the final motif prematurely terminated by the stop codon. The two Type A genes contain four and seven repeats and the two Type B genes contain three and five repeats, respectively. Only the Type B gene with three repeats encodes a peptide with a molecular weight of 29 kD. Each repeat of the Type B gene products contains three asparagines as potential sites for N-glycosylation; there are no asparagines in the Type A genes. RNAi with Type A double-stranded RNA did not result in less circling than in the phosphate-buffered saline control, but transfection with Type B double-stranded RNA significantly reduced male circling by 17%. The very low divergence between repeat units, even at synonymous positions, suggests that the repeats are kept nearly identical through a process of concerted evolution. Information-rich molecules like surface glycoproteins are well adapted for chemical communication and aquatic animals may have evolved signaling systems based on these compounds, whereas insects use cuticular hydrocarbons. Conclusion Owing to its critical role in mating, the mate recognition pheromone gene will be a useful molecular marker for exploring the mechanisms and rates of selection and the evolution of reproductive isolation and speciation using rotifers as a model system. The phylogenetic variation in the mate recognition pheromone gene can now be studied in conjunction with the large amount of ecological and population genetic data being gathered for the Brachionus plicatilis species complex to understand better the evolutionary drivers of cryptic speciation. PMID:19740420
Genetic determinants of mate recognition in Brachionus manjavacas (Rotifera).
Snell, Terry W; Shearer, Tonya L; Smith, Hilary A; Kubanek, Julia; Gribble, Kristin E; Welch, David B Mark
2009-09-09
Mate choice is of central importance to most animals, influencing population structure, speciation, and ultimately the survival of a species. Mating behavior of male brachionid rotifers is triggered by the product of a chemosensory gene, a glycoprotein on the body surface of females called the mate recognition pheromone. The mate recognition pheromone has been biochemically characterized, but little was known about the gene(s). We describe the isolation and characterization of the mate recognition pheromone gene through protein purification, N-terminal amino acid sequence determination, identification of the mate recognition pheromone gene from a cDNA library, sequencing, and RNAi knockdown to confirm the functional role of the mate recognition pheromone gene in rotifer mating. A 29 kD protein capable of eliciting rotifer male circling was isolated by high-performance liquid chromatography. Two transcript types containing the N-terminal sequence were identified in a cDNA library; further characterization by screening a genomic library and by polymerase chain reaction revealed two genes belonging to each type. Each gene begins with a signal peptide region followed by nearly perfect repeats of an 87 to 92 codon motif with no codons between repeats and the final motif prematurely terminated by the stop codon. The two Type A genes contain four and seven repeats and the two Type B genes contain three and five repeats, respectively. Only the Type B gene with three repeats encodes a peptide with a molecular weight of 29 kD. Each repeat of the Type B gene products contains three asparagines as potential sites for N-glycosylation; there are no asparagines in the Type A genes. RNAi with Type A double-stranded RNA did not result in less circling than in the phosphate-buffered saline control, but transfection with Type B double-stranded RNA significantly reduced male circling by 17%. The very low divergence between repeat units, even at synonymous positions, suggests that the repeats are kept nearly identical through a process of concerted evolution. Information-rich molecules like surface glycoproteins are well adapted for chemical communication and aquatic animals may have evolved signaling systems based on these compounds, whereas insects use cuticular hydrocarbons. Owing to its critical role in mating, the mate recognition pheromone gene will be a useful molecular marker for exploring the mechanisms and rates of selection and the evolution of reproductive isolation and speciation using rotifers as a model system. The phylogenetic variation in the mate recognition pheromone gene can now be studied in conjunction with the large amount of ecological and population genetic data being gathered for the Brachionus plicatilis species complex to understand better the evolutionary drivers of cryptic speciation.
Leal-Alvarado, Daniel A; Martínez-Hernández, A; Calderón-Vázquez, C L; Uh-Ramos, D; Fuentes, G; Ramírez-Prado, J H; Sáenz-Carbonell, L; Santamaría, J M
2017-12-01
Lead (Pb) is one of the most serious environmental pollutants. The aquatic fern Salvinia minima Baker is capable to hyper-accumulate Pb in their tissues. However, the molecular mechanisms involved in its Pb accumulation and tolerance capacity are not fully understood. In order to investigate the molecular mechanisms that are activated by S. minima in response to Pb, we constructed a suppression subtractive hybridization library (SSH) in response to an exposure to 40μM of Pb(NO 3 ) 2 for 12h. 365 lead-related differentially expressed sequences tags (ESTs) were isolated and sequenced. Among these ESTs, 143 unique cDNA (97 were registered at the GenBank and 46 ESTs were not registered, because they did not meet the GenBank conditions). Those ESTs were identified and classified into 3 groups according to Blast2GO. In terms of metabolic pathways, they were grouped into 29 KEGG pathways. Among the ESTs, we identified some that might be part of the mechanism that this fern may have to deal with this metal, including abiotic-stress-related transcription factors, some that might be involved in tolerance mechanisms such as ROS scavenging, membrane protection, and those of cell homeostasis recovery. To validate the SSH library, 4 genes were randomly selected from the library and analyzed by qRT-PCR. These 4 genes were transcriptionally up-regulated in response to lead in at least one of the two tested tissues (roots and leaves). The present library is one of the few genomics approaches to study the response to metal stress in an aquatic fern, representing novel molecular information and tools to understand the molecular physiology of its Pb tolerance and hyperaccumulation capacity. Further research is required to elucidate the functions of the lead-induced genes that remain classified as unknown, to perhaps reveal novel molecular mechanisms of Pb tolerance and accumulation capacity in aquatic plants. Copyright © 2017 Elsevier B.V. All rights reserved.
Pauchet, Y; Saski, C A; Feltus, F A; Luyten, I; Quesneville, H; Heckel, D G
2014-06-01
The ability of herbivorous beetles from the superfamilies Chrysomeloidea and Curculionoidea to degrade plant cell wall polysaccharides has only recently begun to be appreciated. The presence of plant cell wall degrading enzymes (PCWDEs) in the beetle's digestive tract makes this degradation possible. Sequences encoding these beetle-derived PCWDEs were originally identified from transcriptomes and strikingly resemble those of saprophytic and phytopathogenic microorganisms, raising questions about their origin; e.g. are they insect- or microorganism-derived? To demonstrate unambiguously that the genes encoding PCWDEs found in beetle transcriptomes are indeed of insect origin, we generated a bacterial artificial chromosome library from the genome of the leaf beetle Chrysomela tremula, containing 18 432 clones with an average size of 143 kb. After hybridizing this library with probes derived from 12 C. tremula PCWDE-encoding genes and sequencing the positive clones, we demonstrated that the latter genes are encoded by the insect's genome and are surrounded by genes possessing orthologues in the genome of Tribolium castaneum as well as in three other beetle genomes. Our analyses showed that although the level of overall synteny between C. tremula and T. castaneum seems high, the degree of microsynteny between both species is relatively low, in contrast to the more closely related Colorado potato beetle. © 2014 The Royal Entomological Society.
Recchia, Gustavo Henrique; Caldas, Danielle Gregorio Gomes; Beraldo, Ana Luiza Ahern; da Silva, Márcio José; Tsai, Siu Mui
2013-01-01
In Brazil, common bean (Phaseolus vulgaris L.) productivity is severely affected by drought stress due to low technology cultivation systems. Our purpose was to identify differentially expressed genes in roots of a genotype tolerant to water deficit (BAT 477) when submitted to an interruption of irrigation during its development. A SSH library was constructed taking as “driver” the genotype Carioca 80SH (susceptible to drought). After clustering and data mining, 1572 valid reads were obtained, resulting in 1120 ESTs (expressed sequence tags). We found sequences for transcription factors, carbohydrates metabolism, proline-rich proteins, aquaporins, chaperones and ubiquitins, all of them organized according to their biological processes. Our suppressive subtractive hybridization (SSH) library was validated through RT-qPCR experiment by assessing the expression patterns of 10 selected genes in both genotypes under stressed and control conditions. Finally, the expression patterns of 31 ESTs, putatively related to drought responses, were analyzed in a time-course experiment. Our results confirmed that such genes are more expressed in the tolerant genotype during stress; however, they are not exclusive, since different levels of these transcripts were also detected in the susceptible genotype. In addition, we observed a fluctuation in gene regulation over time for both the genotypes, which seem to adopt and adapt different strategies in order to develop tolerance against this stress. PMID:23538843