Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen
2009-06-01
To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2013-01-01
A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng
2012-01-01
To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
2011-01-01
Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean. PMID:22118559
NASA Astrophysics Data System (ADS)
Kikuchi, Shoshi
2009-02-01
Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi
2006-01-01
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon
2011-01-01
Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Ning, ZhongHua; Hincke, Maxwell T.; Yang, Ning; Hou, ZhuoCheng
2014-01-01
Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not ‘finished’. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences. PMID:24676480
Zhang, Quan; Liu, Long; Zhu, Feng; Ning, ZhongHua; Hincke, Maxwell T; Yang, Ning; Hou, ZhuoCheng
2014-01-01
Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not 'finished'. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences.
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...
2016-06-24
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.
2016-01-01
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou
2016-11-01
It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A
2009-01-01
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386
High-Resolution Sequence-Function Mapping of Full-Length Proteins
Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.
2015-01-01
Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
Hirayama, Junichi; Tazumi, Akihiro; Hayashi, Kyohei; Tasaki, Erina; Kuribayashi, Takashi; Moore, John E; Millar, Beverley C; Matsuda, Motoo
2011-06-01
In the present study, the reliability of full-length gene sequence information for several genes including 16S rRNA was examined, for the discrimination of the two representative Campylobacter lari taxa, namely urease-negative (UN) C. lari and urease-positive thermophilic Campylobacter (UPTC). As previously described, 16S rRNA gene sequence are not reliable for the molecular discrimination of UN C. lari from UPTC organisms employing both the unweighted pair group method using arithmetic means analysis (UPGMA) and neighbor joining (NJ) methods. In addition, three composite full-length gene sequences (ciaB, flaC and vacJ) out of seven gene loci examined were reliable for discrimination employing dendrograms constructed by the UPGMA method. In addition, all the dendrograms of the NJ phylogenetic trees constructed based on the nine gene information were not reliable for the discrimination. Three composite full-length gene sequences (ciaB, flaC and vacJ) were reliable for the molecular discrimination between UN C. lari and UPTC organisms employing the UPGMA method, as well as among four thermophilic Campylobacter species. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José
2016-02-01
During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Sequence comparison between the full-length 2412 bp DNA gyrase subunit B (gyrB) gene of a novobiocin resistant Aeromonas hydrophila AH11NOVO vaccine strain and that of its virulent parent strain AH11P revealed 10 missense mutations. Similarly, sequence comparison between the full-length 4092 bp RNA ...
Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V
2017-07-01
Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin
2015-01-01
Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.
Laird Smith, Melissa; Murrell, Ben; Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E; Kosakovsky Pond, Sergei L; Smith, Davey M
2016-07-01
The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences' Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples
Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E.; Kosakovsky Pond, Sergei L.
2016-01-01
Abstract The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences’ Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data. PMID:29492273
Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu
2007-01-01
Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061
2010-01-01
Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar), but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST) resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius) ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate. PMID:20433749
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stapleton, Mark; Liao, Guochun; Brokstein, Peter
2002-08-12
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5prime expressed sequence tags (EST) from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to {approx}40 percent of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remainingmore » genes, we have generated an additional 157,835 5prime ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22hr embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70 percent of the predicted genes in Drosophila.« less
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.
Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory
2017-12-01
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2016-02-16
The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof
Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius
2015-11-04
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius
2015-09-01
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-09-15
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius
2015-08-18
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome
Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.
2001-01-01
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.
Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M
2001-10-09
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Cheng, Bing; Furtado, Agnelo
2017-01-01
Abstract Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. PMID:29048540
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Species identification of mutans streptococci by groESL gene sequence.
Hung, Wei-Chung; Tsai, Jui-Chang; Hsueh, Po-Ren; Chia, Jean-San; Teng, Lee-Jene
2005-09-01
The near full-length sequences of the groESL genes were determined and analysed among eight reference strains (serotypes a to h) representing five species of mutans group streptococci. The groES sequences from these reference strains revealed that there are two lengths (285 and 288 bp) in the five species. The intergenic spacer between groES and groEL appears to be a unique marker for species, with a variable size (ranging from 111 to 310 bp) and sequence. Phylogenetic analysis of groES and groEL separated the eight serotypes into two major clusters. Strains of serotypes b, c, e and f were highly related and had groES gene sequences of the same length, 288 bp, while strains of serotypes a, d, g and h were also closely related and their groES gene sequence lengths were 285 bp. The groESL sequences in clinical isolates of three serotypes of S. mutans were analysed for intraspecies polymorphism. The results showed that the groESL sequences could provide information for differentiation among species, but were unable to distinguish serotypes of the same species. Based on the determined sequences, a PCR assay was developed that could differentiate members of the mutans streptococci by amplicon size and provide an alternative way for distinguishing mutans streptococci from other viridans streptococci.
Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts.
Hirozane-Kishikawa, Tomoko; Shiraki, Toshiyuki; Waki, Kazunori; Nakamura, Mari; Arakawa, Takahiro; Kawai, Jun; Fagiolini, Michela; Hensch, Takao K; Hayashizaki, Yoshihide; Carninci, Piero
2003-09-01
The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
Duquesne, Véronique; Delcont, Aurélie; Huleux, Anthéa; Beven, Véronique; Touzain, Fabrice; Ribière-Chabert, Magali
2017-11-02
We report here the full mitochondrial genome sequence of Aethina tumida , a Nitidulidae species beetle, that is a pest of bee hives. The obtained sequence is 16,576 bp in length and contains 13 protein-coding genes, 2 rRNA genes, and 22 tRNAs. Copyright © 2017 Duquesne et al.
Carbohydrate degrading polypeptide and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Ahmed, Md Atique; Fauzi, Muh; Han, Eun-Taek
2018-03-14
Human infections due to the monkey malaria parasite Plasmodium knowlesi is on the rise in most Southeast Asian countries specifically Malaysia. The C-terminal 19 kDa domain of PvMSP1P is a potential vaccine candidate, however, no study has been conducted in the orthologous gene of P. knowlesi. This study investigates level of polymorphisms, haplotypes and natural selection of full-length pkmsp1p in clinical samples from Malaysia. A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software. Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes. This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.
Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia
2014-01-01
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn
2009-01-01
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong
2008-06-01
A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui
2015-01-01
Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi
2018-02-12
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.
Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B
2004-12-15
cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi
2016-11-01
The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun
2013-01-01
Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Pervasive sequence patents cover the entire human genome.
Rosenfeld, Jeffrey A; Mason, Christopher E
2013-01-01
The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.
Lv, Daoyuan; Song, Ping; Chen, Yungui; Gong, Wuming; Mo, Saijun
2005-04-08
Using the digital differential display program of the National Center for Biotechnology Information, we identified a contig of expression sequence tags (ESTs) (Accession No. BM316936), which came from zebrafish ovary and testis libraries. The full-length cDNA of this transcript was cloned and further confirmed by polymerase chain reaction and sequencing. The full-length cDNA of the novel gene is 807bp and encodes a novel protein of 187 amino acids, which shares no significant homology with any other known proteins. Characterization of genomic sequences of the gene revealed that it spans 6kb on the linkage group 3 and is composed of five exons and four introns. RT-PCR analysis showed that it was expressed in mature oocytes and one-cell stage, and persisted until 24h of development. RT-PCR also revealed that it is expressed in gonad and kidney, with the highest level of expression in the testis. The expression sites of the novel gene in adult gonad were further localized by in situ hybridization to oogonia and growing oocytes in ovary and to spermatogonia, spermatocytes but not to spermatids in testis. Based on its abundance in testis and the germline stem cell-spermatogonia and oogonia, we hypothesize that it may function as a testicular development and gametogenesis related gene that plays important roles in spermatogenesis, and named it Zsrg (zebrafish testis spermatogenesis related gene, Zsrg).
Donald, L. J.; Chernushevich, I. V.; Zhou, J.; Verentchikov, A.; Poppe-Schriemer, N.; Hosfield, D. J.; Westmore, J. B.; Ens, W.; Duckworth, H. W.; Standing, K. G.
1996-01-01
IclR protein, the repressor of the aceBAK operon of Escherichia coli, has been examined by time-of-flight mass spectrometry, with ionization by matrix assisted laser desorption or by electrospray. The purified protein was found to have a smaller mass than that predicted from the base sequence of the cloned iclR gene. Additional measurements were made on mixtures of peptides derived from IclR by treatment with trypsin and cyanogen bromide. They showed that the amino acid sequence is that predicted from the gene sequence, except that the protein has suffered truncation by removal of the N-terminal eight or, in some cases, nine amino acid residues. The peptide bond whose hydrolysis would remove eight residues is a typical target for the E. coli protease OmpT. We find that, by taking precautions to minimize Omp T proteolysis, or by eliminating it through mutation of the host strain, we can isolate full-length IclR protein (lacking only the N-terminal methionine residue). Full-length IclR is a much better DNA-binding protein than the truncated versions: it binds the aceBAK operator sequence 44-fold more tightly, presumably because of additional contacts that the N-terminal residues make with the DNA. Our experience thus demonstrates the advantages of using mass spectrometry to characterize newly purified proteins produced from cloned genes, especially where proteolysis or other covalent modification is a concern. This technique gives mass spectra from complex peptide mixtures that can be analyzed completely, without any fractionation of the mixtures, by reference to the amino acid sequence inferred from the base sequence of the cloned gene. PMID:8844850
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E
2005-12-02
cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.
Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.
2006-01-01
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
NASA Astrophysics Data System (ADS)
Hamid, Nur Athirah Abd; Ismail, Ismanizan
2013-11-01
Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Gene length as a biological timer to establish temporal transcriptional regulation
Kirkconnell, Killeen S.; Magnuson, Brian; Paulsen, Michelle T.; Lu, Brian; Bedi, Karan; Ljungman, Mats
2017-01-01
ABSTRACT Transcriptional timing is inherently influenced by gene length, thus providing a mechanism for temporal regulation of gene expression. While gene size has been shown to be important for the expression timing of specific genes during early development, whether it plays a role in the timing of other global gene expression programs has not been extensively explored. Here, we investigate the role of gene length during the early transcriptional response of human fibroblasts to serum stimulation. Using the nascent sequencing techniques Bru-seq and BruUV-seq, we identified immediate genome-wide transcriptional changes following serum stimulation that were linked to rapid activation of enhancer elements. We identified 873 significantly induced and 209 significantly repressed genes. Variations in gene size allowed for a large group of genes to be simultaneously activated but produce full-length RNAs at different times. The median length of the group of serum-induced genes was significantly larger than the median length of all expressed genes, housekeeping genes, and serum-repressed genes. These gene length relationships were also observed in corresponding mouse orthologs, suggesting that relative gene size is evolutionarily conserved. The sizes of transcription factor and microRNA genes immediately induced after serum stimulation varied dramatically, setting up a cascade mechanism for temporal expression arising from a single activation event. The retention and expansion of large intronic sequences during evolution have likely played important roles in fine-tuning the temporal expression of target genes in various cellular response programs. PMID:28055303
Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke
2010-03-30
The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.
Genome-wide analysis of esterase-like genes in the striped rice stem borer, Chilo suppressalis.
Wang, Baoju; Wang, Ying; Zhang, Yang; Han, Ping; Li, Fei; Han, Zhaojun
2015-06-01
The striped rice stem borer, Chilo suppressalis, a destructive pest of rice, has developed high levels of resistance to certain insecticides. Esterases are reported to be involved in insecticide resistance in several insects. Therefore, this study systematically analyzed esterase-like genes in C. suppressalis. Fifty-one esterase-like genes were identified in the draft genomic sequences of the species, and 20 cDNA sequences were derived which encoded full- or nearly full-length proteins. The putative esterase proteins derived from these full-length genes are overall highly diversified. However, key residues that are functionally important including the serine residue in the active site are conserved in 18 out of the 20 proteins. Phylogenetic analysis revealed that most of these genes have homologues in other lepidoptera insects. Genes CsuEst6, CsuEst10, CsuEst11, and CsuEst51 were induced by the insecticide triazophos, and genes CsuEst9, CsuEst11, CsuEst14, and CsuEst51 were induced by the insecticide chlorantraniliprole. Our results provide a foundation for future studies of insecticide resistance in C. suppressalis and for comparative research with esterase genes from other insect species.
Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong
2015-03-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG
2015-01-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
Loss of GATA-1 Full Length as a Cause of Diamond–Blackfan Anemia Phenotype
Parrella, Sara; Aspesi, Anna; Quarello, Paola; Garelli, Emanuela; Pavesi, Elisa; Carando, Adriana; Nardi, Margherita; Ellis, Steven R.; Ramenghi, Ugo; Dianzani, Irma
2015-01-01
Mutations in the hematopoietic transcription factor GATA-1 alter the proliferation/differentiation of hemopoietic progenitors. Mutations in exon 2 interfere with the synthesis of the full-length isoform of GATA-1 and lead to the production of a shortened isoform, GATA-1s. These mutations have been found in patients with Diamond–Blackfan anemia (DBA), a congenital erythroid aplasia typically caused by mutations in genes encoding ribosomal proteins. We sequenced GATA-1 in 23 patients that were negative for mutations in the most frequently mutated DBA genes. One patient showed a c.2T > C mutation in the initiation codon leading to the loss of the full-length GATA-1 isoform. PMID:24453067
Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik
2011-10-01
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Rodríguez-Martín, Carlos; Cidre, Florencia; Fernández-Teijeiro, Ana; Gómez-Mariano, Gema; de la Vega, Leticia; Ramos, Patricia; Zaballos, Ángel; Monzón, Sara; Alonso, Javier
2016-05-01
Retinoblastoma (RB, MIM 180200) is the paradigm of hereditary cancer. Individuals harboring a constitutional mutation in one allele of the RB1 gene have a high predisposition to develop RB. Here, we present the first case of familial RB caused by a de novo insertion of a full-length long interspersed element-1 (LINE-1) into intron 14 of the RB1 gene that caused a highly heterogeneous splicing pattern of RB1 mRNA. LINE-1 insertion was inferred by mRNA studies and full-length sequenced by massive parallel sequencing. Some of the aberrant mRNAs were produced by noncanonical acceptor splice sites, a new finding that up to date has not been described to occur upon LINE-1 retrotransposition. Our results clearly show that RNA-based strategies have the potential to detect disease-causing transposon insertions. It also confirms that the incorporation of new genetic approaches, such as massive parallel sequencing, contributes to characterize at the sequence level these unique and exceptional genetic alterations.
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Characterization and chromosomal mapping of the human TFG gene involved in thyroid carcinoma
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mencinger, M.; Panagopoulos, I.; Andreasson, P.
1997-05-01
Homology searches in the Expressed Sequence Tag Database were performed using SPYGQ-rich regions as query sequences to find genes encoding protein regions similar to the N-terminal parts of the sarcoma-associated EWS and FUS proteins. Clone 22911 (T74973), encoding a SPYGQ-rich region in its 5{prime} end, and several other clones that overlapped 22911 were selected. The combined data made it possible to assemble a full-length cDNA sequence. This cDNA sequence is 1677 bp, containing an initiation codon ATG, an open reading frame of 400 amino acids, a poly(A) signal, and a poly(A) tail. We found 100% identity between the 5{prime} partmore » of the consensus sequence and the 598-bp-long sequence named TFG. The TFG sequence is fused to the 3{prime} end of NTRK1, generating the TRK-T3 fusion transcript found in papillary thyroid carcinoma. The cDNA therefore represents the full-length transcript of the TFG gene. TFG was localized to 3q11-q12 by fluorescence in situ hybridization. The 3{prime} and the 5{prime} ends of the TFG cDNA probe hybridized to a 2.2-kb band on Northern blot filters in all tissues examined. 28 refs., 5 figs., 1 tab.« less
High-resolution phylogenetic microbial community profiling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, Esther; Coleman-Derr, Devin; Bowman, Brett
2014-03-17
The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less
Stevens, Mark; Viganó, Felicita
2007-04-01
The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.
Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin
2003-06-01
A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Birla, Bhagyashree S; Chou, Hui-Hsien
2015-01-01
Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.
NASA Astrophysics Data System (ADS)
Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid
2017-02-01
Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.
2015-01-01
Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
Singh, B N; Mudgil, Yashwanti; Sopory, S K; Reddy, M K
2003-07-01
We have successfully expressed enzymatically active plant topoisomerase II in Escherichia coli for the first time, which has enabled its biochemical characterization. Using a PCR-based strategy, we obtained a full-length cDNA and the corresponding genomic clone of tobacco topoisomerase II. The genomic clone has 18 exons interrupted by 17 introns. Most of the 5' and 3' splice junctions follow the typical canonical consensus dinucleotide sequence GU-AG present in other plant introns. The position of introns and phasing with respect to primary amino acid sequence in tobacco TopII and Arabidopsis TopII are highly conserved, suggesting that the two genes are evolved from the common ancestral type II topoisomerase gene. The cDNA encodes a polypeptide of 1482 amino acids. The primary amino acid sequence shows a striking sequence similarity, preserving all the structural domains that are conserved among eukaryotic type II topoisomerases in an identical spatial order. We have expressed the full-length polypeptide in E. coli and purified the recombinant protein to homogeneity. The full-length polypeptide relaxed supercoiled DNA and decatenated the catenated DNA in a Mg(2+)- and ATP-dependent manner, and this activity was inhibited by 4'-(9-acridinylamino)-3'-methoxymethanesulfonanilide (m-AMSA). The immunofluorescence and confocal microscopic studies, with antibodies developed against the N-terminal region of tobacco recombinant topoisomerase II, established the nuclear localization of topoisomerase II in tobacco BY2 cells. The regulated expression of tobacco topoisomerase II gene under the GAL1 promoter functionally complemented a temperature-sensitive TopII(ts) yeast mutant.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
Einer-Jensen, Katja; Winton, James R.; Lorenzen, Niels
2005-01-01
The aim of this study was to develop a standardized molecular assay that used limited resources and equipment for routine genotyping of isolates of the fish rhabdovirus, viral haemorrhagic septicaemia virus (VHSV). Computer generated restriction maps, based on 62 unique full-length (1524 nt) sequences of the VHSV glycoprotein (G) gene, were used to predict restriction fragment length polymorphism (RFLP) patterns that were subsequently grouped and compared with a phylogenetic analysis of the G-gene sequences of the same set of isolates. Digestion of PCR amplicons from the full-lengthG-gene by a set of three restriction enzymes was predicted to accurately enable the assignment of the VHSV isolates into the four major genotypes discovered to date. Further sub-typing of the isolates into the recently described sub-lineages of genotype I was possible by applying three additional enzymes. Experimental evaluation of the method consisted of three steps: (i) RT-PCR amplification of the G-gene of VHSV isolates using purified viral RNA as template, (ii) digestion of the PCR products with a panel of restriction endonucleases and (iii) interpretation of the resulting RFLP profiles. The RFLP analysis was shown to approximate the level of genetic discrimination obtained by other, more labour-intensive, molecular techniques such as the ribonuclease protection assay or sequence analysis. In addition, 37 previously uncharacterised isolates from diverse sources were assigned to specific genotypes. While the assay was able to distinguish between marine and continental isolates of VHSV, the differences did not correlate with the pathogenicity of the isolates.
Prasad, B. C. Narasimha; Kumar, Vinod; Gururaj, H. B.; Parimalan, R.; Giridhar, P.; Ravishankar, G. A.
2006-01-01
Capsaicin is a unique alkaloid of the plant kingdom restricted to the genus Capsicum. Capsaicin is the pungency factor, a bioactive molecule of food and of medicinal importance. Capsaicin is useful as a counterirritant, antiarthritic, analgesic, antioxidant, and anticancer agent. Capsaicin biosynthesis involves condensation of vanillylamine and 8-methyl nonenoic acid, brought about by capsaicin synthase (CS). We found that CS activity correlated with genotype-specific capsaicin levels. We purified and characterized CS (≈35 kDa). Immunolocalization studies confirmed that CS is specifically localized to the placental tissues of Capsicum fruits. Western blot analysis revealed concomitant enhancement of CS levels and capsaicin accumulation during fruit development. We determined the N-terminal amino acid sequence of purified CS, cloned the CS gene (csy1) and sequenced full-length cDNA (981 bp). The deduced amino acid sequence of CS from full-length cDNA was 38 kDa. Functionality of csy1 through heterologous expression in recombinant Escherichia coli was also demonstrated. Here we report the gene responsible for capsaicin biosynthesis, which is unique to Capsicum spp. With this information on the CS gene, speculation on the gene for pungency is unequivocally resolved. Our findings have implications in the regulation of capsaicin levels in Capsicum genotypes. PMID:16938870
K.D. Jermstad; L.A. Sheppard; B.B. Kinloch; A. Delfino-Mix; E.S. Ersoz; K.V. Krutovsky; D.B Neale
2006-01-01
The nucleotide-binding-site and leucine-rich-repeat (NBSâLRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBSâLRR class resistance genes catalogued in GenBank. The polymerase chain reaction products...
Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.
2011-01-01
Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074
Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro
2015-11-18
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants
2014-01-01
Background The source inoculum of gastrointestinal tract (GIT) microbes is largely influenced by delivery mode in full-term infants, but these influences may be decoupled in very low birth weight (VLBW, <1,500 g) neonates via conventional broad-spectrum antibiotic treatment. We hypothesize the built environment (BE), specifically room surfaces frequently touched by humans, is a predominant source of colonizing microbes in the gut of premature VLBW infants. Here, we present the first matched fecal-BE time series analysis of two preterm VLBW neonates housed in a neonatal intensive care unit (NICU) over the first month of life. Results Fresh fecal samples were collected every 3 days and metagenomes sequenced on an Illumina HiSeq2000 device. For each fecal sample, approximately 33 swabs were collected from each NICU room from 6 specified areas: sink, feeding and intubation tubing, hands of healthcare providers and parents, general surfaces, and nurse station electronics (keyboard, mouse, and cell phone). Swabs were processed using a recently developed ‘expectation maximization iterative reconstruction of genes from the environment’ (EMIRGE) amplicon pipeline in which full-length 16S rRNA amplicons were sheared and sequenced using an Illumina platform, and short reads reassembled into full-length genes. Over 24,000 full-length 16S rRNA sequences were produced, generating an average of approximately 12,000 operational taxonomic units (OTUs) (clustered at 97% nucleotide identity) per room-infant pair. Dominant gut taxa, including Staphylococcus epidermidis, Klebsiella pneumoniae, Bacteroides fragilis, and Escherichia coli, were widely distributed throughout the room environment with many gut colonizers detected in more than half of samples. Reconstructed genomes from infant gut colonizers revealed a suite of genes that confer resistance to antibiotics (for example, tetracycline, fluoroquinolone, and aminoglycoside) and sterilizing agents, which likely offer a competitive advantage in the NICU environment. Conclusions We have developed a high-throughput culture-independent approach that integrates room surveys based on full-length 16S rRNA gene sequences with metagenomic analysis of fecal samples collected from infants in the room. The approach enabled identification of discrete ICU reservoirs of microbes that also colonized the infant gut and provided evidence for the presence of certain organisms in the room prior to their detection in the gut. PMID:24468033
PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.
Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng
2018-05-01
The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.
Kornthong, Napamanee; Cummins, Scott F; Chotwiwatthanakun, Charoonroj; Khornchatri, Kanjana; Engsusophon, Attakorn; Hanna, Peter J; Sobhon, Prasert
2014-01-01
The central nervous system (CNS) is often intimately involved in reproduction control and is therefore a target organ for transcriptomic investigations to identify reproduction-associated genes. In this study, 454 transcriptome sequencing was performed on pooled brain and ventral nerve cord of the female mud crab (Scylla olivacea) following serotonin injection (5 µg/g BW). A total of 197,468 sequence reads was obtained with an average length of 828 bp. Approximately 38.7% of 2,183 isotigs matched with significant similarity (E value < 1e-4) to sequences within the Genbank non-redundant (nr) database, with most significant matches being to crustacean and insect sequences. Approximately 32 putative neuropeptide genes were identified from nonmatching blast sequences. In addition, we identified full-length transcripts for crustacean reproductive-related genes, namely farnesoic acid o-methyltransferase (FAMeT), estrogen sulfotransferase (ESULT) and prostaglandin F synthase (PGFS). Following serotonin injection, which would normally initiate reproductive processes, we found up-regulation of FAMeT, ESULT and PGFS expression in the female CNS and ovary. Our data here provides an invaluable new resource for understanding the molecular role of the CNS on reproduction in S. olivacea.
Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth; Welch, Kenneth C; Timp, Winston
2018-03-01
Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism.
NASA Astrophysics Data System (ADS)
Zhao, Liyuan; Mi, Tiezhu; Zhen, Yu; Yu, Zhigang
2012-05-01
Mitochondrial cytochrome b (Cytb), one of the few proteins encoded by the mitochondrial DNA, plays an important role in transferring electrons. As a mitochondrial gene, it has been widely used for phylogenetic analysis. Previously, a 949-bp fragment of the coding gene and mRNA editing were characterized from Prorocentrum donghaiense, which might prove useful for resolving P. donghaiense from closely related species. However, the full-length coding region has not been characterized. In this study, we used rapid amplification of cDNA ends (RACE) to obtain full-length, 1 124 bp cDNA. Cytb transcript contained a standard initiation codon ATG, but did not have a recognizable stop codon. Homology comparison showed that the P. donghaiense Cytb had a high sequence identity to Cytb sequences from other dinoflagellate species. Phylogenetic analysis placed Cytb from P. donghaiense in the clade of dinoflagellates and it clustered together strongly with that from P. minimum. Based on the full-length sequence, we inferred 32 editing events at different positions, accounting for 2.93% of the Cytb gene. 34.4% (11) of the changes were A to G, 25% (8) were T to C, and 25% (8) were C to U, with smaller proportions of G to C and G to A edits (9.4% (3) and 6.2% (2), respectively). The expression level of the Cytb transcript was quantified by real-time PCR with a TaqMan probe at different times during the whole growth phase. The average Cytb transcript was present at 39.27±7.46 copies of cDNA per cell during the whole growth cycle, and the expression of Cytb was relatively stable over the different phases. These results deepen our understanding of the structure and characteristics of Cytb in P. donghaiense, and confirmed that Cytb in P. donghaiense is a candidate reference gene for studying the expression of other genes.
Isoform Sequencing Provides a More Comprehensive View of the Panax ginseng Transcriptome.
Jo, Ick-Hyun; Lee, Jinsu; Hong, Chi Eun; Lee, Dong Jin; Bae, Wonsil; Park, Sin-Gi; Ahn, Yong Ju; Kim, Young Chang; Kim, Jang Uk; Lee, Jung Woo; Hyun, Dong Yun; Rhee, Sung-Keun; Hong, Chang Pyo; Bang, Kyong Hwan; Ryu, Hojin
2017-09-15
Korean ginseng ( Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng , we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana . Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng . In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.
Shitara, M; Tsuboi, Y; Sekizuka, T; Tazumi, A; Moorei, J E; Millar, B C; Taneike, I; Matsuda, M
2008-01-01
Nucleotide sequences of approximately 3.1 kbp consisting of the full-length open reading frame (ORF) for grpE, a non-coding (NC) region and a putative ORF for the full-length dnaK gene (1860 bp) were identified from a urease-positive thermophilic Campylobacter (UPTC) CF89-12 isolate. Then, following the construction of a new degenerate polymerase chain reaction (PCR) primer pair for amplification of the dnaK structural gene, including the transcription terminator region of C. lari isolates, the dnaK region was amplified successfully, TA-cloned and sequenced in nine C. lari isolates. The dnaK gene sequences commenced with an ATG and terminated with a TAA in all 10 isolates, including CF89-12. In addition, the putative ORFs for the dnaK gene locus from seven UPTC isolates consisted of 1860 bases, and the four urease-negative (UN) C. lari isolates included C. lari RM2100 reference strain 1866. Interestingly, different probable ribosome binding sites and hypothetically intrinsic p-independent terminator structures were identified between the seven UPTC and four UN C. lari isolates, respectively. Moreover, it is interesting to note that 20 out of a total of 28 polymorphic sites occurred among amino acid sequences of the dnaK ORF from 11 C. lari isolates, identified to be alternatively UPTC-specific or UN C. lari-specific. In the neighbour-joining tree based on the nucleotide sequence information of the dnaK gene, C. lari forms two major distinct clusters consisting of UPTC and UN C. lari isolates, respectively, with UN C. lari being more closely related to other thermophilic campylobacters than to UPTC.
Molecular basis of length polymorphism in the human zeta-globin gene complex.
Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J
1983-01-01
The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features
Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna
2005-01-01
Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
USDA-ARS?s Scientific Manuscript database
Polygalacturonase-inhibiting proteins (PGIPs) are leucine-rich repeat (LRR) proteins involved in plant defense. Sugar beet (Beta vulgaris L.) PGIP genes, BvPGIP1, BvPGIP2 and BvPGIP3, were isolated from two breeding lines, F1016 and F1010. Full-length cDNA sequences of the three BvPGIP genes encod...
Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren
2015-01-01
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Loperfido, Mariana; Jarmin, Susan; Dastidar, Sumitava; Di Matteo, Mario; Perini, Ilaria; Moore, Marc; Nair, Nisha; Samara-Kuko, Ermira; Athanasopoulos, Takis; Tedesco, Francesco Saverio; Dickson, George; Sampaolesi, Maurilio; VandenDriessche, Thierry; Chuah, Marinee K.
2016-01-01
Duchenne muscular dystrophy (DMD) is a genetic neuromuscular disorder caused by the absence of dystrophin. We developed a novel gene therapy approach based on the use of the piggyBac (PB) transposon system to deliver the coding DNA sequence (CDS) of either full-length human dystrophin (DYS: 11.1 kb) or truncated microdystrophins (MD1: 3.6 kb; MD2: 4 kb). PB transposons encoding microdystrophins were transfected in C2C12 myoblasts, yielding 65±2% MD1 and 66±2% MD2 expression in differentiated multinucleated myotubes. A hyperactive PB (hyPB) transposase was then deployed to enable transposition of the large-size PB transposon (17 kb) encoding the full-length DYS and green fluorescence protein (GFP). Stable GFP expression attaining 78±3% could be achieved in the C2C12 myoblasts that had undergone transposition. Western blot analysis demonstrated expression of the full-length human DYS protein in myotubes. Subsequently, dystrophic mesoangioblasts from a Golden Retriever muscular dystrophy dog were transfected with the large-size PB transposon resulting in 50±5% GFP-expressing cells after stable transposition. This was consistent with correction of the differentiated dystrophic mesoangioblasts following expression of full-length human DYS. These results pave the way toward a novel non-viral gene therapy approach for DMD using PB transposons underscoring their potential to deliver large therapeutic genes. PMID:26682797
Molecular cloning of allelopathy related genes and their relation to HHO in Eupatorium adenophorum.
Guo, Huiming; Pei, Xixiang; Wan, Fanghao; Cheng, Hongmei
2011-10-01
In this study, conserved sequence regions of HMGR, DXR, and CHS (encoding 3-hydroxy-3-methylglutaryl-CoA reductase, 1-deoxyxylulose-5-phosphate reductoisomerase and chalcone synthase, respectively) were amplified by reverse transcriptase (RT)-PCR from Eupatorium adenophorum. Quantitative real-time PCR showed that the expression of CHS was related to the level of HHO, an allelochemical isolated from E. adenophorum. Semi-quantitative RT-PCR showed that there was no significant difference in expression of genes among three different tissues, except for CHS. Southern blotting indicated that at least three CHS genes are present in the E. adenophorum genome. A full-length cDNA from CHS genes (named EaCHS1, GenBank ID: FJ913888) was cloned. The 1,455 bp cDNA contained an open reading frame (1,206 bp) encoding a protein of 401 amino acids. Preliminary bioinformatics analysis of EaCHS1 revealed that EaCHS1 was a member of CHS family, the subcellular localization predicted that EaCHS1 was a cytoplasmic protein. To the best of our knowledge, this is the first report of conserved sequences of these genes and of a full-length EaCHS1 gene in E. adenophorum. The results indicated that CHS gene is related to allelopathy of E. adenophorum.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.
2000-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.
2001-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose
2012-02-01
The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.
USDA-ARS?s Scientific Manuscript database
The present work characterized a second endogenous cellulase (endo-ß-1,4-glucanase) gene, CfEG4, uncovered in the transcriptome of Formosan subterranean termite (Coptotermes formosanus). The full-length gene was cloned and sequenced. It is similar to the CfEG3a described earlier (Zhang et al. 2009) ...
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P
2010-11-01
Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
[cDNA library construction from panicle meristem of finger millet].
Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B
2014-01-01
The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing
2009-08-01
Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers rabies viruses were likely to be street virus that already circulating in wildlife.
Wen, Yangming; Lan, Kaijian; Wang, Junjie; Yu, Jingyi; Qu, Yarong; Zhao, Wei; Zhang, Fuchun; Tan, Wanlong; Cao, Hong; Zhou, Chen
2013-06-01
To construct dengue virus-specific full-length fully human antibody libraries using mammalian cell surface display technique. Total RNA was extracted from peripheral blood mononuclear cells (PBMCs) from convalescent patients with dengue fever. The reservoirs of the light chain and heavy chain variable regions (LCκ and VH) of the antibody genes were amplified by RT-PCR and inserted into the vector pDGB-HC-TM separately to construct the light chain and heavy chain libraries. The library DNAs were transfected into CHO cells and the expression of full-length fully human antibodies on the surface of CHO cells was analyzed by flow cytometry. Using 1.2 µg of the total RNA isolated from the PBMCs as the template, the LCκ and VH were amplified and the full-length fully human antibody mammalian display libraries were constructed. The kappa light chain gene library had a size of 1.45×10(4) and the heavy chain gene library had a size of 1.8×10(5). Sequence analysis showed that 8 out of the 10 light chain clones and 7 out of the 10 heavy chain clones randomly picked up from the constructed libraries contained correct open reading frames. FACS analysis demonstrated that all the 15 clones with correct open reading frames expressed full-length antibodies, which could be detected on CHO cell surfaces. After co-transfection of the heavy chain and light chain gene libraries into CHO cells, the expression of full-length antibodies on CHO cell surfaces could be detected by FACS analysis with an expressible diversity of the antibody library reaching 1.46×10(9) [(1.45×10(4)×80%)×(1.8×10(5)×70%)]. Using 1.2 µg of total RNA as template, the LCκ and VH full-length fully human antibody libraries against dengue virus have been successfully constructed with an expressible diversity of 10(9).
Deutscher, Ania T; Burke, Catherine M; Darling, Aaron E; Riegler, Markus; Reynolds, Olivia L; Chapman, Toni A
2018-05-05
Gut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets). Near full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains. Variation in the gut bacterial communities of B. tryoni larvae depends on diet, domestication, and horizontal acquisition. Bacterial variation in wild larvae suggests that more than one bacterial species can perform the same functional role; however, Asaia could be an important gut bacterium in larvae and warrants further study. A greater understanding of the functions of the bacteria detected in larvae could lead to increased fly quality and performance as part of the SIT.
USDA-ARS?s Scientific Manuscript database
The cDNA of a NADH dehydrogenase -ubiquinone Fe-S protein 8 subunit (NDUFS8) gene from Aedes (Ochlerotatus) taeniorhynchus Wiedemann has been cloned and sequenced. The full-length mRNA sequence (824 bp) of AetNDUFS8 encodes an open reading region of 651 bp (i.e., 217 amino acids). To detect whether ...
Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing
2015-02-01
The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.
Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth
2018-01-01
Abstract Background Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. Findings We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. Conclusions We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism. PMID:29618047
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia
Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A.; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A.; Fagiolini, Michela; Hensch, Takao K.; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide
2003-01-01
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete. PMID:12819125
Mammalian cDNA Library from the NIH Mammalian Gene Collection (MGC) | Office of Cancer Genomics
The MGC provides the research community full-length clones for most of the defined (as of 2006) human and mouse genes, along with selected clones of cow and rat genes. Clones were designed to allow easy transfer of the ORF sequences into nearly any type of expression vector. MGC provides protein ‘expression-ready’ clones for each of the included human genes. MGC is part of the ORFeome Collaboration (OC).
Genome-wide analysis of the WRKY transcription factors in aegilops tauschii.
Ma, Jianhui; Zhang, Daijing; Shao, Yun; Liu, Pei; Jiang, Lina; Li, Chunxi
2014-01-01
The WRKY transcription factors (TFs) play important roles in responding to abiotic and biotic stress in plants. However, due to its unfinished genome sequencing, relatively few WRKY TFs with full-length coding sequences (CDSs) have been identified in wheat. Instead, the Aegilops tauschii genome, which is the D-genome progenitor of the hexaploid wheat genome, provides important resources for the discovery of new genes. In this study, we performed a bioinformatics analysis to identify WRKY TFs with full-length CDSs from the A. tauschii genome. A detailed evolutionary analysis for all these TFs was conducted, and quantitative real-time PCR was carried out to investigate the expression patterns of the abiotic stress-related WRKY TFs under different abiotic stress conditions in A. tauschii seedlings. A total of 93 WRKY TFs were identified from A. tauschii, and 79 of them were found to be newly discovered genes compared with wheat. Gene phylogeny, gene structure and chromosome location of the 93 WRKY TFs were fully analyzed. These studies provide a global view of the WRKY TFs from A. tauschii and a firm foundation for further investigations in both A. tauschii and wheat. © 2015 S. Karger AG, Basel.
Human AZU-1 gene, variants thereof and expressed gene products
Chen, Huei-Mei; Bissell, Mina
2004-06-22
A human AZU-1 gene, mutants, variants and fragments thereof. Protein products encoded by the AZU-1 gene and homologs encoded by the variants of AZU-1 gene acting as tumor suppressors or markers of malignancy progression and tumorigenicity reversion. Identification, isolation and characterization of AZU-1 and AZU-2 genes localized to a tumor suppressive locus at chromosome 10q26, highly expressed in nonmalignant and premalignant cells derived from a human breast tumor progression model. A recombinant full length protein sequences encoded by the AZU-1 gene and nucleotide sequences of AZU-1 and AZU-2 genes and variant and fragments thereof. Monoclonal or polyclonal antibodies specific to AZU-1, AZU-2 encoded protein and to AZU-1, or AZU-2 encoded protein homologs.
Goyal, K; Browne, J A; Burnell, A M; Tunnacliffe, A
2005-06-01
Accumulation of the non-reducing disaccharide trehalose is associated with desiccation tolerance during anhydrobiosis in a number of invertebrates, but there is little information on trehalose biosynthetic genes in these organisms. We have identified two trehalose-6-phosphate synthase (tps) genes in the anhydrobiotic nematode Aphelenchus avenae and determined full length cDNA sequences for both; for comparison, full length tps cDNAs from the model nematode, Caenorhabditis elegans, have also been obtained. The A. avenae genes encode very similar proteins containing the catalytic domain characteristic of the GT-20 family of glycosyltransferases and are most similar to tps-2 of C. elegans; no evidence was found for a gene in A. avenae corresponding to Ce-tps-1. Analysis of A. avenae tps cDNAs revealed several features of interest, including alternative trans-splicing of spliced leader sequences in Aav-tps-1, and four different, novel SL1-related trans-spliced leaders, which were different to the canonical SL1 sequence found in all other nematodes studied. The latter observation suggests that A. avenae does not comply with the strict evolutionary conservation of SL1 sequences observed in other species. Unusual features were also noted in predicted nematode TPS proteins, which distinguish them from homologues in other higher eukaryotes (plants and insects) and in micro-organisms. Phylogenetic analysis confirmed their membership of the GT-20 glycosyltransferase family, but indicated an accelerated rate of molecular evolution. Furthermore, nematode TPS proteins possess N- and C-terminal domains, which are unrelated to those of other eukaryotes: nematode C-terminal domains, for example, do not contain trehalose-6-phosphate phosphatase-like sequences, as seen in plant and insect homologues. During onset of anhydrobiosis, both tps genes in A. avenae are upregulated, but exposure to cold or increased osmolarity also results in gene induction, although to a lesser extent. Trehalose seems likely therefore to play a role in a number of stress responses in nematodes.
Loperfido, Mariana; Jarmin, Susan; Dastidar, Sumitava; Di Matteo, Mario; Perini, Ilaria; Moore, Marc; Nair, Nisha; Samara-Kuko, Ermira; Athanasopoulos, Takis; Tedesco, Francesco Saverio; Dickson, George; Sampaolesi, Maurilio; VandenDriessche, Thierry; Chuah, Marinee K
2016-01-29
Duchenne muscular dystrophy (DMD) is a genetic neuromuscular disorder caused by the absence of dystrophin. We developed a novel gene therapy approach based on the use of the piggyBac (PB) transposon system to deliver the coding DNA sequence (CDS) of either full-length human dystrophin (DYS: 11.1 kb) or truncated microdystrophins (MD1: 3.6 kb; MD2: 4 kb). PB transposons encoding microdystrophins were transfected in C2C12 myoblasts, yielding 65±2% MD1 and 66±2% MD2 expression in differentiated multinucleated myotubes. A hyperactive PB (hyPB) transposase was then deployed to enable transposition of the large-size PB transposon (17 kb) encoding the full-length DYS and green fluorescence protein (GFP). Stable GFP expression attaining 78±3% could be achieved in the C2C12 myoblasts that had undergone transposition. Western blot analysis demonstrated expression of the full-length human DYS protein in myotubes. Subsequently, dystrophic mesoangioblasts from a Golden Retriever muscular dystrophy dog were transfected with the large-size PB transposon resulting in 50±5% GFP-expressing cells after stable transposition. This was consistent with correction of the differentiated dystrophic mesoangioblasts following expression of full-length human DYS. These results pave the way toward a novel non-viral gene therapy approach for DMD using PB transposons underscoring their potential to deliver large therapeutic genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Bedon, Frank; Grima-Pettenati, Jacqueline; Mackay, John
2007-01-01
Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco. PMID:17397551
Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O
2004-01-01
Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.
2002-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang
2005-01-01
In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies
2014-01-01
Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
Himuro, Yasuyo; Tanaka, Hidenori; Hashiguchi, Masatsugu; Ichikawa, Takanari; Nakazawa, Miki; Seki, Motoaki; Fujita, Miki; Shinozaki, Kazuo; Matsui, Minami; Akashi, Ryo; Hoffmann, Franz
2011-01-15
Using the full-length cDNA overexpressor (FOX) gene-hunting system, we have generated 130 Arabidopsis FOX-superroot lines in bird's-foot trefoil (Lotus corniculatus) for the systematic functional analysis of genes expressed in roots and for the selection of induced mutants with interesting root growth characteristics. We used the Arabidopsis-FOX Agrobacterium library (constructed by ligating pBIG2113SF) for the Agrobacterium-mediated transformation of superroots (SR) and the subsequent selection of gain-of-function mutants with ectopically expressed Arabidopsis genes. The original superroot culture of L. corniculatus is a unique host system displaying fast root growth in vitro, allowing continuous root cloning, direct somatic embryogenesis and mass regeneration of plants under entirely hormone-free culture conditions. Several of the Arabidopsis FOX-superroot lines show interesting deviations from normal growth and morphology of roots from SR-plants, such as differences in pigmentation, growth rate, length or diameter. Some of these mutations are of potential agricultural interest. Genomic PCR analysis revealed that 100 (76.9%) out of the 130 transgenic lines showed the amplification of single fragments. Sequence analysis of the PCR fragments from these 100 lines identified full-length cDNA in 74 of them. Forty-three out of 74 full-length cDNA carried known genes. The Arabidopsis FOX-superroot lines of L. corniculatus, produced in this study, expand the FOX hunting system and provide a new tool for the genetic analysis and control of root growth in a leguminous forage plant. Copyright © 2010 Elsevier GmbH. All rights reserved.
Sequencing and analysis of 10967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morin, R D; Chang, E; Petrescu, A
2005-10-31
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection initiative. Here we present an analysis of 10967 clones (8049 from X. laevis and 2918 from X. tropicalis). The clone set contains 2013 orthologs between X. laevis and X. tropicalis as well as 1795 paralog pairs within X. laevis. 1199 are in-paralogs, believed to have resulted from an allotetraploidization event approximately 30 million years ago, and the remaining 546 are likely out-paralogs that have resulted from more ancient gene duplications, prior to the divergence betweenmore » the two species. We do not detect any evidence for positive selection by the Yang and Nielsen maximum likelihood method of approximating d{sub N}/d{sub S}. However, d{sub N}/d{sub S} for X. laevis in-paralogs is elevated relative to X. tropicalis orthologs. This difference is highly significant, and indicates an overall relaxation of selective pressures on duplicated gene pairs. Within both groups of paralogs, we found evidence of subfunctionalization, manifested as differential expression of paralogous genes among tissues, as measured by EST information from public resources. We have observed, as expected, a higher instance of subfunctionalization in out-paralogs relative to in-paralogs.« less
Chen, X L; Lui, E Y; Ip, Y Kwong; Lam, S H
2018-06-21
To obtain transcriptomic insights into branchial responses to salinity challenge in Anabas testudineus, this study employed RNA sequencing (RNA-Seq) to analyse the gill transcriptome of A. testudineus exposed to seawater (SW) for 6 days compared with the freshwater (FW) control group. A combined FW and SW gill transcriptome was de novo assembled from 169.9 million 101 bp paired-end reads. In silico validation employing 17 A. testudineus Sanger full-length coding sequences showed that 15/17 of them had greater than 80% of their sequences aligned to the de novo assembled contigs where 5/17 had their full-length (100%) aligned and 9/17 had greater than 90% of their sequences aligned. The combined FW and SW gill transcriptome was mapped to 13780 unique human identifiers at E-value < 1.0E-20 while 952 and 886 identifiers were determined as up and down-regulated by 1.5 fold, respectively, in the gills of A. testudineus in SW when compared with FW. These genes were found to be associated with at least 23 biological processes. A larger proportion of genes encoding enzymes and transporters associated with molecular transport, energy production, metabolisms were up-regulated, while a larger proportion of genes encoding transmembrane receptors, G-protein coupled receptors, kinases and transcription regulators associated with cell cycle, growth, development, signalling, morphology and gene expression were relatively lower in the gills of A. testudineus in SW when compared with FW. High correlation (R = 0.99) was observed between RNA-Seq data and real-time quantitative PCR validation for 13 selected genes. The transcriptomic sequence information will facilitate development of molecular resources and tools while the findings will provide insights for future studies into branchial iono-osmoregulation and related cellular processes in A. testudineus. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa
2014-02-03
Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.
Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong
2008-09-01
We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.
Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F.; Zhang, Qiuheng
2016-01-01
Background Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Methods Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3’ UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Results Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Conclusion Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation. PMID:27798706
Yin, Yuxin; Lan, James H; Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F; Zhang, Qiuheng
2016-01-01
Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3' UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation.
Tulman, E. R.; Delhon, G.; Afonso, C. L.; Lu, Z.; Zsak, L.; Sandybaev, N. T.; Kerembekova, U. Z.; Zaitsev, V. L.; Kutish, G. F.; Rock, D. L.
2006-01-01
Here we present the genomic sequence of horsepox virus (HSPV) isolate MNR-76, an orthopoxvirus (OPV) isolated in 1976 from diseased Mongolian horses. The 212-kbp genome contained 7.5-kbp inverted terminal repeats and lacked extensive terminal tandem repetition. HSPV contained 236 open reading frames (ORFs) with similarity to those in other OPVs, with those in the central 100-kbp region most conserved relative to other OPVs. Phylogenetic analysis of the conserved region indicated that HSPV is closely related to sequenced isolates of vaccinia virus (VACV) and rabbitpox virus, clearly grouping together these VACV-like viruses. Fifty-four HSPV ORFs likely represented fragments of 25 orthologous OPV genes, including in the central region the only known fragmented form of an OPV ribonucleotide reductase large subunit gene. In terminal genomic regions, HSPV lacked full-length homologues of genes variably fragmented in other VACV-like viruses but was unique in fragmentation of the homologue of VACV strain Copenhagen B6R, a gene intact in other known VACV-like viruses. Notably, HSPV contained in terminal genomic regions 17 kbp of OPV-like sequence absent in known VACV-like viruses, including fragments of genes intact in other OPVs and approximately 1.4 kb of sequence present only in cowpox virus (CPXV). HSPV also contained seven full-length genes fragmented or missing in other VACV-like viruses, including intact homologues of the CPXV strain GRI-90 D2L/I4R CrmB and D13L CD30-like tumor necrosis factor receptors, D3L/I3R and C1L ankyrin repeat proteins, B19R kelch-like protein, D7L BTB/POZ domain protein, and B22R variola virus B22R-like protein. These results indicated that HSPV contains unique genomic features likely contributing to a unique virulence/host range phenotype. They also indicated that while closely related to known VACV-like viruses, HSPV contains additional, potentially ancestral sequences absent in other VACV-like viruses. PMID:16940536
Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C
2013-01-16
The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine
2002-06-15
To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-06-24
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-01-01
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928
HUNT: launch of a full-length cDNA database from the Helix Research Institute.
Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y
2001-01-01
The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.
Vergani, Stefano; Korsunsky, Ilya; Mazzarello, Andrea Nicola; Ferrer, Gerardo; Chiorazzi, Nicholas; Bagnara, Davide
2017-01-01
Efficient and accurate high-throughput DNA sequencing of the adaptive immune receptor repertoire (AIRR) is necessary to study immune diversity in healthy subjects and disease-related conditions. The high complexity and diversity of the AIRR coupled with the limited amount of starting material, which can compromise identification of the full biological diversity makes such sequencing particularly challenging. AIRR sequencing protocols often fail to fully capture the sampled AIRR diversity, especially for samples containing restricted numbers of B lymphocytes. Here, we describe a library preparation method for immunoglobulin sequencing that results in an exhaustive full-length repertoire where virtually every sampled B-cell is sequenced. This maximizes the likelihood of identifying and quantifying the entire IGHV-D-J repertoire of a sample, including the detection of rearrangements present in only one cell in the starting population. The methodology establishes the importance of circumventing genetic material dilution in the preamplification phases and incorporates the use of certain described concepts: (1) balancing the starting material amount and depth of sequencing, (2) avoiding IGHV gene-specific amplification, and (3) using Unique Molecular Identifier. Together, this methodology is highly efficient, in particular for detecting rare rearrangements in the sampled population and when only a limited amount of starting material is available.
Yang, Xian-Xian; Zhang, Mei; Yan, Zhao-Wen; Zhang, Ru-Hong; Mu, Xiong-Zheng
2008-01-01
To construct a high effective eukaryotic expressing plasmid PcDNA 3.1-MSX-2 encoding Sprague-Dawley rat MSX-2 gene for the further study of MSX-2 gene function. The full length SD rat MSX-2 gene was amplified by PCR, and the full length DNA was inserted in the PMD1 8-T vector. It was isolated by restriction enzyme digest with BamHI and Xhol, then ligated into the cloning site of the PcDNA3.1 expression plasmid. The positive recombinant was identified by PCR analysis, restriction endonudease analysis and sequence analysis. Expression of RNA and protein was detected by RT-PCR and Western blot analysis in PcDNA3.1-MSX-2 transfected HEK293 cells. Sequence analysis and restriction endonudease analysis of PcDNA3.1-MSX-2 demonstrated that the position and size of MSX-2 cDNA insertion were consistent with the design. RT-PCR and Western blot analysis showed specific expression of mRNA and protein of MSX-2 in the transfected HEK293 cells. The high effective eukaryotic expression plasmid PcDNA3.1-MSX-2 encoding Sprague-Dawley Rat MSX-2 gene which is related to craniofacial development can be successfully reconstructed. It may serve as the basis for the further study of MSX-2 gene function.
USDA-ARS?s Scientific Manuscript database
Two different alleles of an ethylene receptor gene (CaETR-1) of chickpea (Cicer aritinum) were isolated and characterized through synteny analysis with genome sequences of Medicago truncatula. The full length of CaETR-1 in cultivar FLIP84-92C (CaETR-1a) is 4,428 bp including the polyadenylation sig...
USDA-ARS?s Scientific Manuscript database
The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...
Liu, Wei-long; Yang, Gui-lin; Wei, Qing; Zhang, Ming-xia; Chen, Xin-chun; Liu, Ying-xia; Gao, Yang; Zhou, Bo-ping
2011-02-01
To investigate the characteristics of molecular epidemiology and molecular evolution of 5 EV 71 (enterovirus 71, EV71) strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV 71 infection. 5 EV 71 strains were isolated, and sequenced to analyzed the full length gene sequences in order to compare nucleotide and amino acid homology with other EV71 strains from other regions and countries as well as previous strains across the world through bioinformatics software. 5 strains of EV 71 belonged to sub-genotype C4 by analysis of nucleotide sequences of VP1 and VP4 of EV 71. The differences of nucleotide and amino acid sequences were much small with nucleotide homology of 93% and amino acid homology of 98% among these 5 strains. A phylogenetic tree analysis indicated that 2008 Shenzhen epidemic strains were the most close to 2004 Shenzhen circulating strains, and also much close to 1998 Shenzhen epidemic strains and 2008 Fuyang Anhui strains. The dead strain was very close to 2008 Fuyang Anhui epidemic strains. It can be speculated that this epidemic strains of EV 71 probably originate from the same ancient strain in the history, may from 1998 Shenzhen strain.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.
Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew
2012-12-20
The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
2011-01-01
Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition. PMID:21492485
Natarajan, Purushothaman; Parani, Madasamy
2011-04-15
Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition.
Rees, D A; Hepburn, P J; McNicol, A M; Francis, K; Jasani, B; Lewis, M D; Farrell, W E; Lewis, B M; Scanlon, M F; Ham, J
2002-03-28
The proopiomelanocortin (POMC) gene is highly expressed in the pituitary gland where the resulting mRNA of 1200 base pairs (bp) gives rise to a full-length protein sequence. In peripheral tissues however both shorter and longer POMC variants have been described, these include for example placental tissue which contain 800 (truncated at the 5' end) and 1500 as well as the 1200 bp transcripts. The importance of the 800 bp transcript is unclear as the lack of a signal sequence renders the molecule to be non-functional. This transcript has not been previously demonstrated in the pituitary gland. In this report we show evidence of a 5' truncated POMC gene in human pituitary corticotroph macroadenoma cells (JE) maintained in primary culture for >1 year. The original tumour tissue and the derived cells during early passage (up to passage 4-5) immunostained for ACTH and in situ hybridisation confirmed the presence of the POMC gene in the cultured cells. These cells also secreted 15-40 pg/10(5) cells/24 h ACTH. In addition, as expected RT-PCR demonstrated the presence of all three POMC gene exons and is thus indicative of a full-length POMC gene. In late culture passages (passages 8-15) JE cells ceased to express ACTH and cell growth became very slow due presumably to cells reaching their Hayflick limit. ACTH immunostaining in these cells was undetectable and ACTH secretion was also at the detection limits of the assay and no greater than 10 pg/10(5) cells/24 h. ACTH precursor molecules were also undetectable. RT-PCR for the POMC gene in these late passage cells showed that only exon 3 was detectable, in contrast to early passage cells where all three exons were present. In summary we isolated in culture, human pituitary cells that possessed initially all three exons of the POMC gene and immunostained for ACTH. On further passaging these cells showed a loss of exons 1 and 2 in the POMC gene and a loss of ACTH immunostaining and secretion. We would like to suggest that the loss of ACTH peptide expression in these late passage cells is in part due to the loss of the POMC signal sequence. An alternative explanation for our findings is that there were originally two populations of corticotrophs in the cultures, one of which possessed the full-length POMC gene and the other only the 5' truncated POMC transcript and it is these latter cells which survived in culture. In either scenario this is the first report of the 5' truncated POMC gene occurring in pituitary cells.
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa
2006-03-01
The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie.
Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol; Na, Ki-Jeong
2010-12-01
P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance.
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie
Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol
2010-01-01
P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance. PMID:21113104
Wang, Bu-Yong; Wen, Rong-Rong; Ma, Ling
2017-09-26
Aphelenchoides besseyi, the nematode agent of rice tip white disease, causes huge economic losses in almost all the rice-growing regions of the world. Glutathione peroxidase (GPx), an esophageal glands secretion protein, plays important roles in the parasitism, immune evasion, reproduction and pathogenesis of many plant-parasitic nematodes (PPNs). Therefore, GPx is a promising target for control A. besseyi. Here, the full-length sequence of the GPx gene from A. besseyi (AbGPx1) was cloned using the rapid amplification of cDNA ends method. The full-length 944 bp AbGPx1 sequence, which contains a 678 bp open reading frame, encodes a 225 amino acid protein. The deduced amino acid sequence of the AbGPxl shares highly homologous with other nematode GPxs, and showed the closest evolutionary relationship with DrGPx. In situ hybridization showed that AbGPx1 was constitutively expressed in the esophageal glands of A. besseyi, suggesting its potential roles in parasitism and reproduction. RNA interference (RNAi) was used to assess the functions of the AbGPx1 gene, and quantitative real-time PCR was used to monitor the RNAi effects. After treatment with dsRNA for 12 h, AbGPx1 expression levels and reproduction in the nematodes decreased compared with the same parameters in the control group; thus, the AbGPx1 gene is likely to be associated with the development, reproduction, and infection ability of A. besseyi. These findings may open new avenues towards nematode control.
Cotesia vestalis parasitization suppresses expression of a Plutella xylostella thioredoxin
USDA-ARS?s Scientific Manuscript database
Thioredoxins (Trxs) are a family of small, highly conserved and ubiquitous proteins involved in protecting organisms against toxic reactive oxygen species (ROS). In this study, a typical thioredoxin gene, PxTrx, was isolated from Plutella xylostella. The full-length cDNA sequence is composed of 959 ...
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic
Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis
2016-01-01
HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh
2016-12-23
HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
Zhao, Feng; Li, Qiuying; Weng, Manli; Wang, Xiuliang; Guo, Baotai; Wang, Li; Wang, Wei; Duan, Delin; Wang, Bin
2013-12-01
The full-length cDNA sequence (2613 bp) of the trehalose-6-phosphate synthase (TPS) gene of eelgrass Zostera marina (ZmTPS) was identified and cloned. Z. marina is a kind of seed-plant growing in sea water during its whole life history. The open reading frame (ORF) region of ZmTPS gene encodes a protein of 870 amino acid residues and a stop codon. The corresponding genomic DNA sequence is 3770 bp in length, which contains 3 exons and 2 introns. The ZmTPS gene was transformed into rice variety ZH11 via Agrobacterium-mediated transformation method. After antibiotic screening, molecular characterization, salt-tolerance and trehalose content determinations, two transgenic lines resistant to 150 mM NaCL solutions were screened. Our study results indicated that the ZmTPS gene was integrated into the genomic DNA of the two transgenic rice lines and could be expressed well. Moreover, the detection of the transformed ZmTPS gene in the progenies of the two transgenic lines was performed from T1 to T4 generations; and results suggested that the transformed ZmTPS gene can be transmitted from parent to the progeny in transgenic rice. © 2013.
Patnaik, Bharat Bhusan; Kim, Dong Hyun; Oh, Seung Han; Song, Yong-Su; Chanh, Nguyen Dang Minh; Kim, Jong Sun; Jung, Woo-jin; Saha, Atul Kumar; Bindroo, Bharat Bhushan; Han, Yeon Soo
2012-01-01
Background Silkworm fecal matter is considered one of the richest sources of antimicrobial and antiviral protein (substances) and such economically feasible and eco-friendly proteins acting as secondary metabolites from the insect system can be explored for their practical utility in conferring broad spectrum disease resistance against pathogenic microbial specimens. Methodology/Principal Findings Silkworm fecal matter extracts prepared in 0.02 M phosphate buffer saline (pH 7.4), at a temperature of 60°C was subjected to 40% saturated ammonium sulphate precipitation and purified by gel-filtration chromatography (GFC). SDS-PAGE under denaturing conditions showed a single band at about 21.5 kDa. The peak fraction, thus obtained by GFC wastested for homogeneityusing C18reverse-phase high performance liquid chromatography (HPLC). The activity of the purified protein was tested against selected Gram +/− bacteria and phytopathogenic Fusarium species with concentration-dependent inhibitionrelationship. The purified bioactive protein was subjected to matrix-assisted laser desorption and ionization-time of flight mass spectrometry (MALDI-TOF-MS) and N-terminal sequencing by Edman degradation towards its identification. The N-terminal first 18 amino acid sequence following the predicted signal peptide showed homology to plant germin-like proteins (Glp). In order to characterize the full-length gene sequence in detail, the partial cDNA was cloned and sequenced using degenerate primers, followed by 5′- and 3′-rapid amplification of cDNA ends (RACE-PCR). The full-length cDNA sequence composed of 630 bp encoding 209 amino acids and corresponded to germin-like proteins (Glps) involved in plant development and defense. Conclusions/Significance The study reports, characterization of novel Glpbelonging to subfamily 3 from M. alba by the purification of mature active protein from silkworm fecal matter. The N-terminal amino acid sequence of the purified protein was found similar to the deduced amino acid sequence (without the transit peptide sequence) of the full length cDNA from M. alba. PMID:23284650
Molecular cloning and nucleotide sequence of CYP6BF1 from the diamondback moth, Plutella xylostella
Li, Hongshan; Dai, Huaguo; Wei, Hui
2005-01-01
A novel cDNA clong encoding a cytochrome P450 was screened from the insecticide-susceptible strain of Plutella xylostella (L.) (Lepidoptera:Yponomeutidae). The nucleotide sequence of the clone, designated CYP6BF1, was determined. This is the first full-length sequence of the CYP6 family from Plutella xylostella (L.). The cDNA is 1661bp in length and contains an open reading frame from base pairs 26 to 1570, encoding a protein of 514 amino acid residues. It is similar to the other insect P450s in gene family 6, including CYP6AE1 from Depressaria pastinacella, (46%). The GenBank accession number is AY971374. PMID:17119627
2004-01-01
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
Yang, Fengxi; Zhu, Genfa
2015-01-01
Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566
Isolation of genes from female sterile flowers in Medicago sativa.
Capomaccio, Stefano; Barone, Pierluigi; Reale, Lara; Veronesi, Fabio; Rosellini, Daniele
2009-06-01
A better knowledge of female sporogenesis and gametogenesis could have several practical applications, from commercial hybrid seed production to gene containment in GM crops. With the purpose of isolating genes involved in the megasporogenesis process, the cDNA-AFLP technique was employed to isolate transcript-derived fragments (TDF) differentially expressed between female-fertile and female-sterile full-sib alfalfa plants. This female sterility trait involves female-specific arrest of sporogenesis at early prophase associated with ectopic, massive callose deposition within the nucellus. Ninety-six TDFs were generated and BLAST analyses revealed similarities with genes involved in different Gene Ontology categories. Three TDFs were selected based on their putative functions: showing high similarity to a soybean flower-expressed beta 1,3-glucanase, to an Arabidopsis thaliana MAPKKK, and to an A. thaliana eukaryotic initiation translation factor eIF4G III, respectively. The full length mRNA sequences were obtained. RT-PCR and in situ hybridizations were performed to confirm differential expression during flower development. The genomic organization of the three genes was assessed through sequencing and Southern experiments. Sequence polymorphisms were found between sterile and fertile plants. Our approach based on differential display and bulked segregant analysis was successful in isolating genes that were differentially expressed between fertile and sterile alfalfa plants.
Experimental annotation of the human genome using microarray technology.
Shoemaker, D D; Schadt, E E; Armour, C D; He, Y D; Garrett-Engele, P; McDonagh, P D; Loerch, P M; Leonardson, A; Lum, P Y; Cavet, G; Wu, L F; Altschuler, S J; Edwards, S; King, J; Tsang, J S; Schimmack, G; Schelter, J M; Koch, J; Ziman, M; Marton, M J; Li, B; Cundiff, P; Ward, T; Castle, J; Krolewski, M; Meyer, M R; Mao, M; Burchard, J; Kidd, M J; Dai, H; Phillips, J W; Linsley, P S; Stoughton, R; Scherer, S; Boguski, M S
2001-02-15
The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using 'exon' and 'tiling' arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.
Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J
2007-06-01
As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Comparative Analysis of Type IV Pilin in Desulfuromonadales
Shu, Chuanjun; Xiao, Ke; Yan, Qin; Sun, Xiao
2016-01-01
During anaerobic respiration, the bacteria Geobacter sulfurreducens can transfer electrons to extracellular electron accepters through its pilus. G. sulfurreducens pili have been reported to have metallic-like conductivity that is similar to doped organic semiconductors. To study the characteristics and origin of conductive pilin proteins found in the pilus structure, their genetic, structural, and phylogenetic properties were analyzed. The genetic relationships, and conserved structures and sequences that were obtained were used to predict the evolution of the pilins. Homologous genes that encode conductive pilin were found using PilFind and Cluster. Sequence characteristics and protein tertiary structures were analyzed with MAFFT and QUARK, respectively. The origin of conductive pilins was explored by building a phylogenetic tree. Truncation is a characteristic of conductive pilin. The structures of truncated pilins and their accompanying proteins were found to be similar to the N-terminal and C-terminal ends of full-length pilins respectively. The emergence of the truncated pilins can probably be ascribed to the evolutionary pressure of their extracellular electron transporting function. Genes encoding truncated pilins and proteins similar to the C-terminal of full-length pilins, which contain a group of consecutive anti-parallel beta-sheets, are adjacent in bacterial genomes. According to the genetic, structure, and phylogenetic analyses performed in this study, we inferred that the truncated pilins and their accompanying proteins probably evolved from full-length pilins by gene fission through duplication, degeneration, and separation. These findings provide new insights about the molecular mechanisms involved in long-range electron transport along the conductive pili of Geobacter species. PMID:28066394
Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri
2015-12-01
Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
2010-01-01
Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097
Dynamic Energy Landscapes of Riboswitches Help Interpret Conformational Rearrangements and Function
Quarta, Giulio; Sin, Ken; Schlick, Tamar
2012-01-01
Riboswitches are RNAs that modulate gene expression by ligand-induced conformational changes. However, the way in which sequence dictates alternative folding pathways of gene regulation remains unclear. In this study, we compute energy landscapes, which describe the accessible secondary structures for a range of sequence lengths, to analyze the transcriptional process as a given sequence elongates to full length. In line with experimental evidence, we find that most riboswitch landscapes can be characterized by three broad classes as a function of sequence length in terms of the distribution and barrier type of the conformational clusters: low-barrier landscape with an ensemble of different conformations in equilibrium before encountering a substrate; barrier-free landscape in which a direct, dominant “downhill” pathway to the minimum free energy structure is apparent; and a barrier-dominated landscape with two isolated conformational states, each associated with a different biological function. Sharing concepts with the “new view” of protein folding energy landscapes, we term the three sequence ranges above as the sensing, downhill folding, and functional windows, respectively. We find that these energy landscape patterns are conserved in various riboswitch classes, though the order of the windows may vary. In fact, the order of the three windows suggests either kinetic or thermodynamic control of ligand binding. These findings help understand riboswitch structure/function relationships and open new avenues to riboswitch design. PMID:22359488
Alternative polyadenylation of the gene transcripts encoding a rat DNA polymerase beta.
Konopiński, R; Nowak, R; Siedlecki, J A
1996-10-17
Rat cells produce two different transcripts of DNA polymerase beta (beta-Pol). The low-molecular-weight transcript (1.4 kb) was already sequenced. We report here the cloning and sequencing of the full-length cDNA, corresponding to the high-molecular-weight (HMW) transcript (4.0 kb) of beta-Pol. Sequence data strongly suggest that both transcripts are produced from a single gene by alternative polyadenylation. The HMW transcript contains the entire 1.4 kb transcript sequence and additional 2.2 kb on the 3' end. The 3' UTR of the HMW transcript contains some regulatory sequences which are not present in the 1.4-kb transcript. The A + U-rich fragment and (GU)21 sequence are believed to influence the stability of the mRNA. The functional significance of the A-rich region locally destabilizing double-stranded secondary structure remains unknown.
Miao, Hong-Xia; Qin, Yong-Hua; Ye, Zi-Xing; Hu, Gui-Bing
2013-01-25
Ubiquitin-activating enzyme E1 (UBE1) catalyzes the first step in the ubiquitination reaction, which targets a protein for degradation via a proteasome pathway. UBE1 plays an important role in metabolic processes. In this study, full-length cDNA and DNA sequences of UBE1 gene, designated CrUBE1, were obtained from 'Wuzishatangju' (self-incompatible, SI) and 'Shatangju' (self-compatible, SC) mandarins. 5 amino acids and 8 bases were different in cDNA and DNA sequences of CrUBE1 between 'Wuzishatangju' and 'Shatangju', respectively. Southern blot analysis showed that there existed only one copy of the CrUBE1 gene in genome of 'Wuzishatangju' and 'Shatangju'. The temporal and spatial expression characteristics of the CrUBE1 gene were investigated using semi-quantitative RT-PCR (SqPCR) and quantitative real-time PCR (qPCR). The expression level of the CrUBE1 gene in anthers of 'Shatangju' was approximately 10-fold higher than in anthers of 'Wuzishatangju'. The highest expression level of CrUBE1 was detected in pistils at 7days after self-pollination of 'Wuzishatangju', which was approximately 5-fold higher than at 0 h. To obtain CrUBE1 protein, the full-length cDNA of CrUBE1 genes from 'Wuzishatangju' and 'Shatangju' were successfully expressed in Pichia pastoris. Pollen germination frequency of 'Wuzishatangju' was significantly inhibited with increasing of CrUBE1 protein concentrations from 'Wuzishatangju'. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Omar, Aimi Farehah; Ismail, Ismanizan
2016-11-01
Sesquiterpene synthase (SS) catalyzes the formation of sesquiterpenes from farnesyl diphosphate (FDP) via carbocation intermediates. In this study, the promoter region of sesquiterpene synthase was isolated from Persicaria minor to identify possible cis-acting elements in the promoter. The full-length PmSS promoter of P. minor is 1824-bp sequences. The sequence was analyzed and several putative cis-acting regulatory elements were identified. Three cis-acting regulatory elements were selected for deletion analysis which are cis-acting element involved in wound responsiveness (WUN), cis - acting element involved in defense and stress responsiveness (TC) and cis-acting element involved in ABA responsiveness (ABRE). Series of deletions were conducted to assess the promoter activity producing three truncated fragments promoter; Prom 2 1606-bp, Prom 3 1144- bp, and Prom 4 921-bp. The full-length promoter and its deletion series were cloned into the pBGWFS7 vector which contain β-glucuronidase (GUS) gene and green fluorescent protein (GFP) as the reporter gene. All constructs were successfully transformed into Arabidopsis thaliana based on PCR of positive BASTA resistance plants.
Characterisation of single domain ATP-binding cassette protien homologues of Theileria parva.
Kibe, M K; Macklin, M; Gobright, E; Bishop, R; Urakawa, T; ole-MoiYoi, O K
2001-09-01
Two distinct genes encoding single domain, ATP-binding cassette transport protein homologues of Theileria parva were cloned and sequenced. Neither of the genes is tandemly duplicated. One gene, TpABC1, encodes a predicted protein of 593 amino acids with an N-terminal hydrophobic domain containing six potential membrane-spanning segments. A single discontinuous ATP-binding element was located in the C-terminal region of TpABC1. The second gene, TpABC2, also contains a single C-terminal ATP-binding motif. Copies of TpABC2 were present at four loci in the T. parva genome on three different chromosomes. TpABC1 exhibited allelic polymorphism between stocks of the parasite. Comparison of cDNA and genomic sequences revealed that TpABC1 contained seven short introns, between 29 and 84 bp in length. The full-length TpABC1 protein was expressed in insect cells using the baculovirus system. Application of antibodies raised against the recombinant antigen to western blots of T. parva piroplasm lysates detected an 85 kDa protein in this life-cycle stage.
Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes
An, Dong; Li, Changsheng; Humbeck, Klaus
2018-01-01
Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292
Urantowka, Adam Dawid; Hajduk, Kacper; Kosowska, Barbara
2013-08-01
Amazona barbadensis is an endangered species of parrot living in northern coastal Venezuela and in several Caribbean islands. In this study, we sequenced full mitochondrial genome of the considered species. The total length of the mitogenome was 18,983 bp and contained 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, duplicated control region, and degenerate copies of ND6 and tRNA (Glu) genes. High degree of identity between two copies of control region suggests their coincident evolution and functionality. Comparative analysis of both the control region sequences from four Amazona species revealed their 89.1% identity over a region of 1300 bp and indicates the presence of distinctive parts of two control region copies.
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags
de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.
2000-01-01
Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
NASA Astrophysics Data System (ADS)
Zhao, Chunling; Ju, Jiyu
2015-06-01
The full-length cDNA of a protease gene from a marine annelid Arenicola cristata was amplified through rapid amplification of cDNA ends technique and sequenced. The size of the cDNA was 936 bp in length, including an open reading frame encoding a polypeptide of 270 amino acid residues. The deduced amino acid sequnce consisted of pro- and mature sequences. The protease belonged to the serine protease family because it contained the highly conserved sequence GDSGGP. This protease was novel as it showed a low amino acid sequence similarity (< 40%) to other serine proteases. The gene encoding the active form of A. cristata serine protease was cloned and expressed in E. coli. Purified recombinant protease in a supernatant could dissolve an artificial fibrin plate with plasminogen-rich fibrin, whereas the plasminogen-free fibrin showed no clear zone caused by hydrolysis. This result suggested that the recombinant protease showed an indirect fibrinolytic activity of dissolving fibrin, and was probably a plasminogen activator. A rat model with venous thrombosis was established to demonstrate that the recombinant protease could also hydrolyze blood clot in vivo. Therefore, this recombinant protease may be used as a thrombolytic agent for thrombosis treatment. To our knowledge, this study is the first of reporting the fibrinolytic serine protease gene in A. cristata.
Virtual Northern analysis of the human genome.
Hurowitz, Evan H; Drori, Iddo; Stodden, Victoria C; Donoho, David L; Brown, Patrick O
2007-05-23
We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Gaby, John Christian; Buckley, Daniel H
2014-01-01
We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
Gaby, John Christian; Buckley, Daniel H.
2014-01-01
We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou
2013-08-01
Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Livernois, Alexandra; Hardy, Kristine; Domaschenz, Renae; Papanicolaou, Alexie; Georges, Arthur; Sarre, Stephen D; Rao, Sudha; Ezaz, Tariq; Deakin, Janine E
2016-10-01
Interleukins are a group of cytokines with complex immunomodulatory functions that are important for regulating immunity in vertebrate species. Reptiles and mammals last shared a common ancestor more than 350 million years ago, so it is not surprising that low sequence identity has prevented divergent interleukin genes from being identified in the central bearded dragon lizard, Pogona vitticeps, in its genome assembly. To determine the complete nucleotide sequences of key interleukin genes, we constructed full-length transcripts, using the Trinity platform, from short paired-end read RNA sequences from stimulated spleen cells. De novo transcript reconstruction and analysis allowed us to identify interleukin genes that are missing from the published P. vitticeps assembly. Identification of key cytokines in P. vitticeps will provide insight into the essential molecular mechanisms and evolution of interleukin gene families and allow for characterization of the immune response in a lizard for comparison with mammals.
Metatranscriptomics of Soil Eukaryotic Communities.
Yadav, Rajiv K; Bragalini, Claudia; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia
2016-01-01
Functions expressed by eukaryotic organisms in soil can be specifically studied by analyzing the pool of eukaryotic-specific polyadenylated mRNA directly extracted from environmental samples. In this chapter, we describe two alternative protocols for the extraction of high-quality RNA from soil samples. Total soil RNA or mRNA can be converted to cDNA for direct high-throughput sequencing. Polyadenylated mRNA-derived full-length cDNAs can also be cloned in expression plasmid vectors to constitute soil cDNA libraries, which can be subsequently screened for functional gene categories. Alternatively, the diversity of specific gene families can also be explored following cDNA sequence capture using exploratory oligonucleotide probes.
Horse cDNA clones encoding two MHC class I genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barbis, D.P.; Maher, J.K.; Stanek, J.
1994-12-31
Two full-length clones encoding MHC class I genes were isolated by screening a horse cDNA library, using a probe encoding in human HLA-A2.2Y allele. The library was made in the pcDNA1 vector (Invitrogen, San Diego, CA), using mRNA from peripheral blood lymphocytes obtained from a Thoroughbred stallion (No. 0834) homozygous for a common horse MHC haplotype (ELA-A2, -B2, -D2; Antczak et al. 1984; Donaldson et al. 1988). The clones were sequenced, using SP6 and T7 universal primers and horse-specific oligonucleotides designed to extend previously determined sequences.
Cloning and characterization of an abalone (Haliotis discus hannai) actin gene
NASA Astrophysics Data System (ADS)
Ma, Hongming; Xu, Wei; Mai, Kangsen; Liufu, Zhiguo; Chen, Hong
2004-10-01
An actin encoding gene was cloned by using RT-PCR, 3‧ RACE and 5‧ RACE from abalone Haliotis discus hannai. The full length of the gene is 1532 base pairs, which contains a long 3‧ untranslated region of 307 base pairs and 79 base pairs of 5‧ untranslated sequence. The open reading frame encodes 376 amino acid residues. Sequence comparison with those of human and other mollusks showed high conservation among species at amino acid level. The identities was 96%, 97% and 96% respectively compared with Aplysia californica, Biomphalaria glabrata and Homo sapience β-actin. It is also indicated that this actin is more similar to the human cytoplasmic actin (β-actin) than to human muscle actin.
Inada, Mari; Kihara, Keisuke; Kono, Tomoya; Sudhakaran, Raja; Mekata, Tohru; Sakai, Masahiro; Yoshida, Terutoyo; Itami, Toshiaki
2013-02-01
In many physiological processes, including the innate immune system, free radicals such as nitric oxide (NO) and reactive oxygen species (ROS) play significant roles. In humans, 2 homologs of Dual oxidases (Duox) generate hydrogen peroxide (H(2)O(2)), which is a type of ROS. Here, we report the identification and characterization of a Duox from kuruma shrimp, Marsupenaeus japonicus. The full-length cDNA sequence of the M. japonicus Dual oxidase (MjDuox) gene contains 4695 bp and was generated using reverse transcriptase-polymerase chain reaction (RT-PCR) and random amplification of cDNA ends (RACE). The open reading frame of MjDuox encodes a protein of 1498 amino acids with an estimated mass of 173 kDa. In a homology analysis using amino acid sequences, MjDuox exhibited 69.3% sequence homology with the Duox of the red flour beetle, Tribolium castaneum. A transcriptional analysis revealed that the MjDuox mRNA is highly expressed in the gills of healthy kuruma shrimp. In the gills, MjDuox expression reached its peak 60 h after injection with WSSV and decreased to its normal level at 72 h. In gene knockdown experiments of free radical-generating enzymes, the survival rates decreased during the early stages of a white spot syndrome virus (WSSV) infection following the knockdown of the NADPH oxidase (MjNox) or MjDuox genes. In the present study, the identification, cloning and gene knockdown of the kuruma shrimp MjDuox are reported. Duoxes have been identified in vertebrates and some insects; however, few reports have investigated Duoxes in crustaceans. This study is the first to identify and clone a Dual oxidase from a crustacean species. Copyright © 2012 Elsevier Ltd. All rights reserved.
2014-01-01
Background Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). Results We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. Conclusions This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar. PMID:24490620
Phylotranscriptomic consolidation of the jawed vertebrate timetree.
Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé
2017-09-01
Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.
Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang
2017-03-09
In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone.
Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang
2017-01-01
In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone. PMID:28282934
Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice; Grasland, Béatrice; Frossard, Jean-Pierre; Dastjerdi, Akbar; Hulst, Marcel; Hanke, Dennis; Pohlmann, Anne; Blome, Sandra; van der Poel, Wim H. M.; Steinbach, Falko; Blanchard, Yannick; Lavazza, Antonio; Bøtner, Anette
2018-01-01
Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence (Accession number AF353511) but they are >99% identical. Unique and shared differences between sequences were identified. The coding region for the surface-exposed spike protein showed the highest proportion of variability including both point mutations and small deletions. The predicted expression of the ORF3 gene product was more dramatically affected in three different variants of this virus through either loss of the initiation codon or gain of a premature termination codon. The genome of one isolate had a substantially rearranged 5´-terminal sequence. This rearrangement was validated through the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies. PMID:29494671
Au, Chun Hang; Wa, Anna; Ho, Dona N; Chan, Tsun Leung; Ma, Edmond S K
2016-01-22
Genomic techniques in recent years have allowed the identification of many mutated genes important in the pathogenesis of acute myeloid leukemia (AML). Together with cytogenetic aberrations, these gene mutations are powerful prognostic markers in AML and can be used to guide patient management, for example selection of optimal post-remission therapy. The mutated genes also hold promise as therapeutic targets themselves. We evaluated the applicability of a gene panel for the detection of AML mutations in a diagnostic molecular pathology laboratory. Fifty patient samples comprising 46 AML and 4 other myeloid neoplasms were accrued for the study. They consisted of 19 males and 31 females at a median age of 60 years (range: 18-88 years). A total of 54 genes (full coding exons of 15 genes and exonic hotspots of 39 genes) were targeted by 568 amplicons that ranged from 225 to 275 bp. The combined coverage was 141 kb in sequence length. Amplicon libraries were prepared by TruSight myeloid sequencing panel (Illumina, CA) and paired-end sequencing runs were performed on a MiSeq (Illumina) genome sequencer. Sequences obtained were analyzed by in-house bioinformatics pipeline, namely BWA-MEM, Samtools, GATK, Pindel, Ensembl Variant Effect Predictor and a novel algorithm ITDseek. The mean count of sequencing reads obtained per sample was 3.81 million and the mean sequencing depth was over 3000X. Seventy-seven mutations in 24 genes were detected in 37 of 50 samples (74 %). On average, 2 mutations (range 1-5) were detected per positive sample. TP53 gene mutations were found in 3 out of 4 patients with complex and unfavorable cytogenetics. Comparing NGS results with that of conventional molecular testing showed a concordance rate of 95.5 %. After further resolution and application of a novel bioinformatics algorithm ITDseek to aid the detection of FLT3 internal tandem duplication (ITD), the concordance rate was revised to 98.2 %. Gene panel testing by NGS approach was applicable for sensitive and accurate detection of actionable AML gene mutations in the clinical laboratory to individualize patient management. A novel algorithm ITDseek was presented that improved the detection of FLT3-ITD of varying length, position and at low allelic burden.
Candidate Genes Expressed in Tolerant Common Wheat With Resistant to English Grain Aphid.
Luo, Kun; Zhang, Gaisheng; Wang, Chunping; Ouellet, Thérèse; Wu, Jingjing; Zhu, Qidi; Zhao, Huiyan
2014-10-01
The English grain aphid, Sitobion avenae (F.) (Hemiptera: Aphididae), is a common worldwide pest of wheat (Triticum aestivum L.). The use of improved resistant cultivars by the farmers is the most effective and environmentally friendly method to control this aphid in the field. The winter wheat genotypes 98-10-35 and Amigo are resistant to S. avenae. To identify genes responsible for resistance to S. avenae in these genotypes, differential-display reverse transcription-polymerase chain reaction was used to identify the corresponding differentially expressed sequences in current study. Two backcross progenies were obtained by crossing the two resistant genotypes with the susceptible genotype 1376. Six potential expected-differential bands were sequenced. Lengths of the expressed sequence tags ranged from 128 to 532 bp. Although these expressed sequences were likely associated with S. avenae resistance, there was one expressed sequence tag located on 7DL chromosome, and its potential function may associate with the ability to maintain photosynthesis in wheat. That serves as an active way for tolerant common wheat with resistant to S. avenae. Cloning the full length of these sequences would help us thoroughly understand the mechanism of wheat resistance to S. avenae and be valuable for breeding cultivars with S. avenae resistance. © 2014 Entomological Society of America.
Identification and characterization of a novel serine-threonine kinase gene from the Xp22 region.
Montini, E; Andolfi, G; Caruso, A; Buchner, G; Walpole, S M; Mariani, M; Consalez, G; Trump, D; Ballabio, A; Franco, B
1998-08-01
Eukaryotic protein kinases are part of a large and expanding family of proteins. Through our transcriptional mapping effort in the Xp22 region, we have isolated and sequenced the full-length transcript of STK9, a novel cDNA highly homologous to serine-threonine kinases. A number of human genetic disorders have been mapped to the region where STK9 has been localized including Nance-Horan (NH) syndrome, oral-facial-digital syndrome type 1 (OFD1), and a novel locus for nonsyndromic sensorineural deafness (DFN6). To evaluate the possible involvement of STK9 in any of the above-mentioned disorders, a 2416-bp full-length cDNA was assembled. The entire genomic structure of the gene, which is composed of 20 coding exons, was determined. Northern analysis revealed a transcript larger than 9.5 kb in several tissues including brain, lung, and kidney. The mouse homologue (Stk9) was identified and mapped in the mouse in the region syntenic to human Xp. This location is compatible with the location of the Xcat mutant, which shows congenital cataracts very similar to those observed in NH patients. Sequence homologies, expression pattern, and mapping information in both human and mouse make STK9 a candidate gene for the above-mentioned disorders. Copyright 1998 Academic Press.
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-01-01
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-05-24
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
2011-01-01
Transmission from pet rats and cats to humans as well as severe infection in felids and other animal species have recently drawn increasing attention to cowpox virus (CPXV). We report the cloning of the entire genome of cowpox virus strain Brighton Red (BR) as a bacterial artificial chromosome (BAC) in Escherichia coli and the recovery of infectious virus from cloned DNA. Generation of a full-length CPXV DNA clone was achieved by first introducing a mini-F vector, which allows maintenance of large circular DNA in E. coli, into the thymidine kinase locus of CPXV by homologous recombination. Circular replication intermediates were then electroporated into E. coli DH10B cells. Upon successful establishment of the infectious BR clone, we modified the full-length clone such that recombination-mediated excision of bacterial sequences can occur upon transfection in eukaryotic cells. This self-excision of the bacterial replicon is made possible by a sequence duplication within mini-F sequences and allows recovery of recombinant virus progeny without remaining marker or vector sequences. The in vitro growth properties of viruses derived from both BAC clones were determined and found to be virtually indistinguishable from those of parental, wild-type BR. Finally, the complete genomic sequence of the infectious clone was determined and the cloned viral genome was shown to be identical to that of the parental virus. In summary, the generated infectious clone will greatly facilitate studies on individual genes and pathogenesis of CPXV. Moreover, the vector potential of CPXV can now be more systematically explored using this newly generated tool. PMID:21314965
Molecular cloning and characterization of Hymenolepis diminuta alpha-tubulin gene.
Mohajer-Maghari, Behrokh; Amini-Bavil-Olyaee, Samad; Webb, Rodney A; Coe, Imogen R
2007-02-01
To isolate a full-length alpha-tubulin cDNA from an eucestode, Hymenolepis diminuta, a lambda phage cDNA library was constructed. The alpha-tubulin gene was cloned, sequenced and characterized. The H. diminuta alpha-tubulin consisted of 450 amino acids. This protein contained putative sites for all posttranslational modifications as detyrosination/tyrosination at the carboxyl-terminal of protien, phosphorylation at residues R79 and K336, glycylation/glutamylation at residue G445 and acetylation at residue K40. Comparisons of H. diminuta alpha-tubulin with all full-length alpha-tubulin proteins revealed that H. diminuta alpha-tubulin possesses 10 distinctive residues, which are not found in any other alpha-tubulins. Phylogenetic analysis showed that H. diminuta alpha-tubulin has grouped in a separated branch adjacent eucestode and trematodes branch with 92% bootstrap value (1000 replicates). In conclusion, this is the first report of H. diminuta cDNA library construction, cloning and characterization of H. diminuta alpha-tubulin gene.
Spliced synthetic genes as internal controls in RNA sequencing experiments.
Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R
2016-09-01
RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming
2012-07-01
We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Description and physical localization of the bovine survival of motor neuron gene (SMN).
Pietrowski, D; Goldammer, T; Meinert, S; Schwerin, M; Förster, M
1998-01-01
Proximal spinal muscular atrophy (SMA) is an autosomal recessive disease in humans and other mammals, characterized by degeneration of anterior horn cells of the spinal cord. In humans, the survival of motor neuron gene (SMN) has been recognized as the SMA-determining gene and has been mapped to 5q13. In cattle, SMA is a recurrent, inherited disease that plays an important economic role in breeding programs of Brown Swiss stock. Now we have identified the full- length cDNA sequence of the bovine SMN gene. Molecular analysis and characterization of the sequence documents 85% identity to its human counterpart and three evolutionarily conserved domains in different species. Physical mapping data reveals that bovine SMN is localized to chromosome region 20q12-->q13, supporting the conserved synteny of this chromosomal region between humans and cattle.
dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães
2017-01-01
Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages. PMID:28068432
Ivamoto, Suzana Tiemi; Reis, Osvaldo; Domingues, Douglas Silva; Dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães; Pereira, Luiz Filipe Protasio
2017-01-01
Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages.
Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee
2015-09-21
Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
Capsicum annuum dehydrin, an osmotic-stress gene in hot pepper plants.
Chung, Eunsook; Kim, Soo-Yong; Yi, So Young; Choi, Doil
2003-06-30
Osmotic stress-related genes were selected from an EST database constructed from 7 cDNA libraries from different tissues of the hot pepper. A full-length cDNA of Capsicum annuum dehydrin (Cadhn), a late embryogenesis abundant (lea) gene, was selected from the 5' single pass sequenced cDNA clones and sequenced. The deduced polypeptide has 87% identity with potato dehydrin C17, but very little identity with the dehydrin genes of other organisms. It contains a serine-tract (S-segment) and 3 conserved lysine-rich domains (K-segments). Southern blot analysis showed that 2 copies are present in the hot pepper genome. Cadhn was induced by osmotic stress in leaf tissues as well as by the application of abscisic acid. The RNA was most abundant in green fruit. The expression of several osmotic stress-related genes was examined and Cadhn proved to be the most abundantly expressed of these in response to osmotic stress.
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moeis, Maelita R., E-mail: sony@sith.itb.ac.id; Berlian, Liska, E-mail: sony@sith.itb.ac.id; Suhandono, Sony, E-mail: sony@sith.itb.ac.id
Klebsiella oxytoca produces sucrose isomerase which catalyses the conversion of sucrose to isomaltulose, a new generation of sugar. From the previous study, palI gene from Klebsiella oxytoca was succesfully isolated from sapodilla fruit (Manilkara zapota). The full-length palI gene sequence of Klebsiella oxytoca was cloned in E. coli DH5α. The deduced amino acid sequence shows 498 residues which includes conserved motif for sucrose isomerisation {sup 325}RLDRD{sup 329} and 97% identical to palI gene from Klebsiella sp. LX3 (GenBank:AAK82938.1). This fragment was succesfullly ligated into the expression vector pET-32b using overlap-extension PCR and cloned in Escherichia coli BL21 (DE3) pLysS. DNAmore » sequencing result shows that palI gene of Klebsiella oxytoca was inserted in-frame in pET-32b. This is the first report on cloning of palI gene from Klebsiella oxytoca.« less
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.
Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi
2017-07-01
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium
Cai, Yongping; Lin, Yi
2013-01-01
In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048
Sun, Mei-Yu; Li, Jing-Yi; Li, Dong; Huang, Feng-Jie; Wang, Di; Li, Hui; Xing, Quan; Zhu, Hui-Bin; Shi, Lei
2018-04-12
Drynaria roosii (Nakaike) is a traditional Chinese medicinal fern, known as 'GuSuiBu'. The corresponding effective components of naringin/neoeriocitrin share highly similar chemical structure and medicinal function. Our HPLC-MS/MS results showed that the accumulation of naringin/neoeriocitrin depended on specific tissues or ages. However, little was known about the expression patterns of naringin/neoeriocitrin related genes involved in their regulatory pathways. For lack of the basic genetic information, we applied a combination of SMRT sequencing and SGS to generate the complete and full-length transcriptome of D. roosii. According to the SGS data, the DEG-based heat map analysis revealed the naringin/neoeriocitrin related gene expression exhibited obvious tissue- and time-specific transcriptomic differences. Using the systems biology method of modular organization analysis, we clustered 16,472 DEGs into 17 gene modules and studied the relationships between modules and tissue/time point samples, as well as modules and naringin/neoeriocitrin contents. Hereinto, naringin/neoeriocitrin related DEGs distributed in nine distinct modules, and DEGs in these modules showed significant different patterns of transcript abundance to be linked with specific tissues or ages. Moreover, WGCNA results further identified that PAL, 4CL, C4H and C3H, HCT acted as the major hub genes involved in naringin and neoeriocitrin synthesis respectively and exhibited high co-expression with MYB- and bHLH-regulated genes. In this work, modular organization and co-expression networks elucidated the tissue- and time-specificity of gene expression pattern, as well as hub genes associated with naringin/neoeriocitrin synthesis in D. roosii. Simultaneously, the comprehensive transcriptome dataset provided the important genetic information for further research on D. roosii.
The complete chloroplast genome sequence of Hibiscus syriacus.
Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin
2016-09-01
The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.
Bioinformatic analysis of phage AB3, a phiKMV-like virus infecting Acinetobacter baumannii.
Zhang, J; Liu, X; Li, X-J
2015-01-16
The phages of Acinetobacter baumannii has drawn increasing attention because of the multi-drug resistance of A. baumanni. The aim of this study was to sequence Acinetobacter baumannii phage AB3 and conduct bioinformatic analysis to lay a foundation for genome remodeling and phage therapy. We isolated and sequenced A. baumannii phage AB3 and attempted to annotate and analyze its genome. The results showed that the genome is a double-stranded DNA with a total length of 31,185 base pairs (bp) and 97 open reading frames greater than 100 bp. The genome includes 28 predicted genes, of which 24 are homologous to phage AB1. The entire coding sequence is located on the negative strand, representing 90.8% of the total length. The G+C mol% was 39.18%, without areas of high G+C content over 200 bp in length. No GC island, tRNA gene, or repeated sequence was identified. Gene lengths were 120-3099 bp, with an average of 1011 bp. Six genes were found to be greater than 2000 bp in length. Genomic alignment and phylogenetic analysis of the RNA polymerase gene showed that similar to phage AB1, phage AB3 is a phiKMV-like virus in the T7 phage family.
Sitthithaworn, W; Kojima, N; Viroonchatapan, E; Suh, D Y; Iwanami, N; Hayashi, T; Noji, M; Saito, K; Niwa, Y; Sankawa, U
2001-02-01
cDNAs encoding geranylgeranyl diphosphate synthase (GGPPS) of two diterpene-producing plants, Scoparia dulcis and Croton sublyratus, have been isolated using the homology-based polymerase chain reaction (PCR) method. Both clones contained highly conserved aspartate-rich motifs (DDXX(XX)D) and their N-terminal residues exhibited the characteristics of chloroplast targeting sequence. When expressed in Escherichia coli, both the full-length and truncated proteins in which the putative targeting sequence was deleted catalyzed the condensation of farnesyl diphosphate and isopentenyl diphosphate to produce geranylgeranyl diphosphate (GGPP). The structural factors determining the product length in plant GGPPSs were investigated by constructing S. dulcis GGPPS mutants on the basis of sequence comparison with the first aspartate-rich motif (FARM) of plant farnesyl diphosphate synthase. The result indicated that in plant GGPPSs small amino acids, Met and Ser, at the fourth and fifth positions before FARM and Pro and Cys insertion in FARM play essential roles in determination of product length. Further, when a chimeric gene comprised of the putative transit peptide of the S. dulcis GGPPS gene and a green fluorescent protein was introduced into Arabidopsis leaves by particle gun bombardment, the chimeric protein was localized in chloroplasts, indicating that the cloned S. dulcis GGPPS is a chloroplast protein.
Virtual Northern Analysis of the Human Genome
Hurowitz, Evan H.; Drori, Iddo; Stodden, Victoria C.; Donoho, David L.; Brown, Patrick O.
2007-01-01
Background We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. Methodology/Principal Findings We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Conclusions/Significance Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes. PMID:17520019
Barman, Lalita Rani; Nooruzzaman, Mohammed; Sarker, Rahul Deb; Rahman, Md Tazinur; Saife, Md Rajib Bin; Giasuddin, Mohammad; Das, Bidhan Chandra; Das, Priya Mohan; Chowdhury, Emdadul Haque; Islam, Mohammad Rafiqul
2017-10-01
A total of 23 Newcastle disease virus (NDV) isolates from Bangladesh taken between 2010 and 2012 were characterized on the basis of partial F gene sequences. All the isolates belonged to genotype XIII of class II NDV but segregated into three sub-clusters. One sub-cluster with 17 isolates aligned with sub-genotype XIIIc. The other two sub-clusters were phylogenetically distinct from the previously described sub-genotypes XIIIa, XIIIb and XIIIc and could be candidates of new sub-genotypes; however, that needs to be validated through full-length F gene sequence data. The results of the present study suggest that genotype XIII NDVs are under continuing evolution in Bangladesh.
Improved maize reference genome with single-molecule technologies.
Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen
2017-06-22
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
Xander: employing a novel method for efficient gene-targeted metagenomic assembly
Wang, Qiong; Fish, Jordan A.; Gilman, Mariah; ...
2015-08-05
Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility ofmore » this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.« less
Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai
Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung
2016-01-01
An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His69, Asp117, and Ser216. The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5′ donor splice (GT) and 3′ acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai. PMID:27399771
Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai.
Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung
2016-07-05
An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His(69), Asp(117), and Ser(216). The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5' donor splice (GT) and 3' acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai.
Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus
NASA Astrophysics Data System (ADS)
Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat
2016-11-01
In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.
Harrison, Robert A; Ibison, Frances; Wilbraham, Davina; Wagstaff, Simon C
2007-05-01
The immobilisation of prey by snakes is most efficiently achieved by the rapid dissemination of venom from its site of injection into the blood stream. Hyaluronidase is a common component of snake venoms and has been termed the "venom spreading factor". In the absence of nucleotide or protein sequence data to confirm the functional identity of this venom component, we interrogated a venom gland EST database for the saw-scaled viper, Echis ocellatus (Nigeria), using the gene ontology (GO) term "carbohydrate metabolism". A single hyalurononglucosaminadase-activity matching sequence (EOC00242) was found and used to design PCR primers to acquire the full-length cDNA sequence. Although very different from the bee venom and mammalian hyaluronidase sequences, the E. ocellatus sequence retained all the catalytic, positional and structural residues that characterise this class of carbohydrate metabolising hydrolases. An extraordinarily high level of sequence identity (>95%) was observed in analogous venom gland cDNA sequences isolated (by PCR) from another saw-scaled viper species, E. pyramidum leakeyi (Kenya), and from the sahara horned viper, Cerastes cerastes cerastes (Egypt) and the puff adder, Bitis arietans (Nigeria). Smaller amplicons, lacking hyaluronidase catalytic residues because of 768 bp or 855 bp central deletions, appear to encode either truncated peptides without hyaluronidase activity, or are non-translated transcripts because they lack consensus translation initiating motifs.
Kim, Hyung Jun; Jang, Soojin
2017-12-01
Staphylococcus haemolyticus is the second most frequently isolated coagulase-negative staphylococci from blood cultures. Moreover, multidrug resistance associated with the genome flexibility of S. haemolyticus has been increasingly reported worldwide. Here we report the draft genome sequence of multidrug-resistant S. haemolyticus IPK_TSA25 isolated from a building surface in South Korea. Genomic DNA of S. haemolyticus IPK_TSA25 was sequenced using the PacBio RS II sequencing platform. Generated reads were assembled using PacBio SMRT Analysis 2.3.0. The draft genome was annotated and antibiotic resistance genes were identified. The genome of 2517398bp contains various antibiotic resistance genes associated with resistance to β-lactams, aminoglycosides and macrolides. Genome analysis also revealed chromosomal integration of the full-length Staphylococcus aureus plasmid pS0385-1 containing a tetracycline resistance gene. The genome sequence reported in this study will provide valuable information to understand the flexibility of the S. haemolyticus genome, which facilitates acquisition of antibiotic resistance genes and contributes to the dissemination of antibiotic resistance by this emerging pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Xu, Yan; Zou, Peng; Liu, Yao; Deng, Fengjiao
2010-06-01
Genes specifically expressed in the notochord may be crucial for proper notochord development. Using the digital differential display program offered by the National Center for Biotechnology Information, we identified a novel EST sequence from a zebrafish ovary library (No. XM_701450). The full-length cDNA of this transcript was cloned by performing 3' and 5'-RACE and was further confirmed by PCR and sequencing. The resulting 614 bp gene was found to encode a novel 94 amino acid protein that did not share significant homology with any other known protein. Characterization of the genomic sequence revealed that the gene spanned 4.9 kb and was composed of four exons and three introns. RT-PCR gene expression analysis revealed that our gene of interest was expressed in ovary, kidney, brain, mature oocytes and during the early stages of embryogenesis. During embryonic development, znfr mRNA was found to be expressed in the embryonic shield, chordamesoderm and the vacuolated notochord cells by in situ hybridization. Based on this information, we hypothesize that this novel gene is an important maternal factor required for zebrafish notochord formation during early embryonic development. We have thus named this gene znfr (zebrafish notochord formation related).
Detailed transcriptome description of the neglected cestode Taenia multiceps.
Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou
2012-01-01
The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and parasite-host interaction studies.
USDA-ARS?s Scientific Manuscript database
In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise sequence similarity values based on alignment of near full-length 16SrRNA genes (1530 bp) reve...
USDA-ARS?s Scientific Manuscript database
Leafy spurge is an invasive perennial weed infesting range and recreational lands of North America. Previous research and omics projects with leafy spurge have helped develop it as a model for studying numerous aspects of perennial plant development and response to abiotic stress. However, the lack ...
The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes
Li, Jia; Su, Yingjuan; Wang, Ting
2018-01-01
The plastid accD gene encodes a subunit of the acetyl-CoA carboxylase (ACCase) enzyme. The length of accD gene has been supposed to expand in Cryptomeria japonica, Taiwania cryptomerioides, Cephalotaxus, Taxus chinensis, and Podocarpus lambertii, and the main reason for this phenomenon was the existence of tandemly repeated sequences. However, it is still unknown whether the accD gene length in other cupressophytes has expanded. Here, in order to investigate how widespread this phenomenon was, 18 accD sequences and its surrounding regions of cupressophyte were sequenced and analyzed. Together with 39 GenBank sequence data, our taxon sampling covered all the extant gymnosperm orders. The repetitive elements and substitution rates of accD among 57 gymnosperm species were analyzed, the results show: (1) Reading frame length of accD gene in 18 cupressophytes species has also expanded. (2) Many repetitive elements were identified in accD gene of cupressophyte lineages. (3) The synonymous and non-synonymous substitution rates of accD were accelerated in cupressophytes. (4) accD was located in rearrangement endpoints. These results suggested that repetitive elements may mediate the chloroplast genome rearrangement and accelerated the substitution rates. PMID:29731764
Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi
2015-07-01
A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.
2014-01-01
The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
Sampathkumar, Raghavan; Sivaraman, Karthi; U. K. J., Anto Jesuraj; Dhar, Chirag; D. Souza, George; Berry, Neil
2017-01-01
India has the third largest number of HIV-1-infected individuals accounting for approximately 2.1 million people, with a predominance of circulating subtype C strains and a low prevalence of subtype A and A1C and BC recombinant forms, identified over the past two decades. Recovery of near full-length HIV-1 genomes from a plasma source coupled with advances in next generation sequencing (NGS) technologies and development of universal methods for amplifying whole genomes of HIV-1 circulating in a target geography or population provides the opportunity for a detailed analysis of HIV-1 strain identification, evolution and dynamics. Here we describe the development and implementation of approaches for HIV-1 NGS analysis in a southern Indian cohort. Plasma samples (n = 20) were obtained from HIV-1-confirmed individuals living in and around the city of Bengaluru. Near full-length genome recovery was obtained for 9 Indian HIV-1 patients, with recovery of full-length gag and env genes for 10 and 2 additional subjects, respectively. Phylogenetic analyses indicate the majority of sequences to be represented by subtype C viruses branching within a monophyletic clade, comprising viruses from India, Nepal, Myanmar and China and closely related to a southern African cluster, with a low prevalence of the A1C recombinant form also present. Development of algorithms for bespoke recovery and analysis at a local level will further aid clinical management of HIV-1 infected Indian subjects and delineate the progress of the HIV-1 pandemic in this and other geographical regions. PMID:29220350
Characterization of defensin gene from abalone Haliotis discus hannai and its deduced protein
NASA Astrophysics Data System (ADS)
Hong, Xuguang; Sun, Xiuqin; Zheng, Minggang; Qu, Lingyun; Zan, Jindong; Zhang, Jinxing
2008-11-01
Defensin is one of preserved ancient host defensive materials formed in biological evolution. As a regulator and effector molecule, it is very important in animals’ acquired immune system. This paper reports the defensin gene from the mixed liver and kidney cDNA library of abalone Haliotis discus hannai Ino. Sequence analysis shows that the gene sequence of full-length cDNA encodes 42 mature peptides (including six Cys), molecular weight of 4 323 Da, and pI of 8.02. Amino acid sequence homology analysis shows that the peptides are highly similar (70% in common) to other insects defensin. Because of a typical insect-defensin structural character of mature peptide in the secondary structure, the polypeptide named Haliotis discus defensin (hd-def), a novel of antimicrobial peptides, belongs to insects defensin subfamily. The RT-PCR result of Haliotis discus defensin shows that the gene can be expressed only in the hepatopancreas by Gram-negative and positive bacteria stimulation, which is ascribed to inducible expression. Therefore, it is revealed that the Haliotis discus defensin gene expression was related to the antibacterial infection of Haliotis discus hannai Ino.
Rowe, Will; Baker, Kate S; Verner-Jeffreys, David; Baker-Austin, Craig; Ryan, Jim J; Maskell, Duncan; Pearce, Gareth
2015-01-01
Antimicrobial resistance remains a growing and significant concern in human and veterinary medicine. Current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria are limited in their effectiveness and scope. With the rapidly developing field of whole genome sequencing beginning to be utilised in clinical practice, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes will become increasingly important and useful for informing clinical decisions. Additionally, use of such tools will provide insight into the dynamics of antimicrobial resistance genes in metagenomic samples such as those used in environmental monitoring. Here we present the Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. The pipeline provides gene information, abundance estimation and the reconstructed sequence of antimicrobial resistance genes; it also provides web links to additional information on each gene. The pipeline utilises clustering and read mapping to annotate full-length genes relative to a user-defined database. It also uses local alignment of annotated genes to a range of online databases to provide additional information. We demonstrate SEAR's application in the detection and abundance estimation of antimicrobial resistance genes in two novel environmental metagenomes, 32 human faecal microbiome datasets and 126 clinical isolates of Shigella sonnei. We have developed a pipeline that contributes to the improved capacity for antimicrobial resistance detection afforded by next generation sequencing technologies, allowing for rapid detection of antimicrobial resistance genes directly from sequencing data. SEAR uses raw sequencing data via an intuitive interface so can be run rapidly without requiring advanced bioinformatic skills or resources. Finally, we show that SEAR is effective in detecting antimicrobial resistance genes in metagenomic and isolate sequencing data from both environmental metagenomes and sequencing data from clinical isolates.
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes
NASA Astrophysics Data System (ADS)
Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat
2016-11-01
In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
Sakurai, Tetsuya; Kondou, Youichi; Akiyama, Kenji; Kurotani, Atsushi; Higuchi, Mieko; Ichikawa, Takanari; Kuroda, Hirofumi; Kusano, Miyako; Mori, Masaki; Saitou, Tsutomu; Sakakibara, Hitoshi; Sugano, Shoji; Suzuki, Makoto; Takahashi, Hideki; Takahashi, Shinya; Takatsuji, Hiroshi; Yokotani, Naoki; Yoshizumi, Takeshi; Saito, Kazuki; Shinozaki, Kazuo; Oda, Kenji; Hirochika, Hirohiko; Matsui, Minami
2011-02-01
Identification of gene function is important not only for basic research but also for applied science, especially with regard to improvements in crop production. For rapid and efficient elucidation of useful traits, we developed a system named FOX hunting (Full-length cDNA Over-eXpressor gene hunting) using full-length cDNAs (fl-cDNAs). A heterologous expression approach provides a solution for the high-throughput characterization of gene functions in agricultural plant species. Since fl-cDNAs contain all the information of functional mRNAs and proteins, we introduced rice fl-cDNAs into Arabidopsis plants for systematic gain-of-function mutation. We generated >30,000 independent Arabidopsis transgenic lines expressing rice fl-cDNAs (rice FOX Arabidopsis mutant lines). These rice FOX Arabidopsis lines were screened systematically for various criteria such as morphology, photosynthesis, UV resistance, element composition, plant hormone profile, metabolite profile/fingerprinting, bacterial resistance, and heat and salt tolerance. The information obtained from these screenings was compiled into a database named 'RiceFOX'. This database contains around 18,000 records of rice FOX Arabidopsis lines and allows users to search against all the observed results, ranging from morphological to invisible traits. The number of searchable items is approximately 100; moreover, the rice FOX Arabidopsis lines can be searched by rice and Arabidopsis gene/protein identifiers, sequence similarity to the introduced rice fl-cDNA and traits. The RiceFOX database is available at http://ricefox.psc.riken.jp/.
Sakurai, Tetsuya; Kondou, Youichi; Akiyama, Kenji; Kurotani, Atsushi; Higuchi, Mieko; Ichikawa, Takanari; Kuroda, Hirofumi; Kusano, Miyako; Mori, Masaki; Saitou, Tsutomu; Sakakibara, Hitoshi; Sugano, Shoji; Suzuki, Makoto; Takahashi, Hideki; Takahashi, Shinya; Takatsuji, Hiroshi; Yokotani, Naoki; Yoshizumi, Takeshi; Saito, Kazuki; Shinozaki, Kazuo; Oda, Kenji; Hirochika, Hirohiko; Matsui, Minami
2011-01-01
Identification of gene function is important not only for basic research but also for applied science, especially with regard to improvements in crop production. For rapid and efficient elucidation of useful traits, we developed a system named FOX hunting (Full-length cDNA Over-eXpressor gene hunting) using full-length cDNAs (fl-cDNAs). A heterologous expression approach provides a solution for the high-throughput characterization of gene functions in agricultural plant species. Since fl-cDNAs contain all the information of functional mRNAs and proteins, we introduced rice fl-cDNAs into Arabidopsis plants for systematic gain-of-function mutation. We generated >30,000 independent Arabidopsis transgenic lines expressing rice fl-cDNAs (rice FOX Arabidopsis mutant lines). These rice FOX Arabidopsis lines were screened systematically for various criteria such as morphology, photosynthesis, UV resistance, element composition, plant hormone profile, metabolite profile/fingerprinting, bacterial resistance, and heat and salt tolerance. The information obtained from these screenings was compiled into a database named ‘RiceFOX’. This database contains around 18,000 records of rice FOX Arabidopsis lines and allows users to search against all the observed results, ranging from morphological to invisible traits. The number of searchable items is approximately 100; moreover, the rice FOX Arabidopsis lines can be searched by rice and Arabidopsis gene/protein identifiers, sequence similarity to the introduced rice fl-cDNA and traits. The RiceFOX database is available at http://ricefox.psc.riken.jp/. PMID:21186176
Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu
2014-01-01
Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361
Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M
2015-01-01
In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.
Expression of a polyubiquitin promoter isolated from Gladiolus.
Joung, Young Hee; Kamo, Kathryn
2006-10-01
A polyubiquitin promoter (GUBQ1) including its 5'UTR and intron was isolated from the floral monocot Gladiolus because high levels of expression could not be obtained using publicly available promoters isolated from either cereals or dicots. Sequencing of the promoter revealed highly conserved 5' and 3' intron splicing sites for the 1.234 kb intron. The coding sequence of the first two ubiquitin genes showed the highest homology (87 and 86%, respectively) to the ubiquitin genes of Nicotiana tabacum and Oryza sativa RUBQ2. Transient expression following gene gun bombardment showed that relative levels of GUS activity with the GUBQ1 promoter were comparable to the CaMV 35S promoter in gladiolus, tobacco, rose, rice, and the floral monocot freesia. The highest levels of GUS expression with GUBQ1 were attained with Gladiolus. The full-length GUBQ1 promoter including 5'UTR and intron were necessary for maximum GUS expression in Gladiolus. The relative GUS activity for the promoter only was 9%, and the activity for the promoter with 5'UTR and 399 bp of the full-length 1.234 kb intron was 41%. Arabidopsis plants transformed with uidA under GUBQ1 showed moderate GUS expression throughout young leaves and in the vasculature of older leaves. The highest levels of transient GUS expression in Gladiolus have been achieved using the GUBQ1 promoter. This promoter should be useful for genetic engineering of disease resistance in Gladiolus, rose, and freesia, where high levels of gene expression are important.
Lee, Ra Mi; Ryu, Rae Hyung; Jeong, Seong Won; Oh, Soo Jin; Huang, Hue; Han, Jin Soo; Lee, Chi Ho; Lee, C. Justin; Jan, Lily Yeh
2011-01-01
To clone the first anion channel from Xenopus laevis (X. laevis), we isolated a calcium-activated chloride channel (CLCA)-like membrane protein 6 gene (CMP6) in X. laevis. As a first step in gene isolation, an expressed sequence tags database was screened to find the partial cDNA fragment. A putative partial cDNA sequence was obtained by comparison with rat CLCAs identified in our laboratory. First stranded cDNA was synthesized by reverse transcription polymerase-chain reaction (RT-PCR) using a specific primer designed for the target cDNA. Repeating the 5' and 3' rapid amplification of cDNA ends, full-length cDNA was constructed from the cDNA pool. The full-length CMP6 cDNA completed via 5'- and 3'-RACE was 2,940 bp long and had an open reading frame (ORF) of 940 amino acids. The predicted 940 polypeptides have four major transmembrane domains and showed about 50% identity with that of rat brain CLCAs in our previously published data. Semi-quantification analysis revealed that CMP6 was most abundantly expressed in small intestine, colon and liver. However, all tissues except small intestine, colon and liver had undetectable levels. This result became more credible after we did real-time PCR quantification for the target gene. In view of all CLCA studies focused on human or murine channels, this finding suggests a hypothetical protein as an ion channel, an X. laevis CLCA. PMID:21826170
Niira, Kazutaka; Ito, Mika; Masuda, Tsuneyuki; Saitou, Toshiya; Abe, Tadatsugu; Komoto, Satoshi; Sato, Mitsuo; Yamasato, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Tuchiaka, Shinobu; Okada, Takashi; Omatsu, Tsutomu; Furuya, Tetsuya; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Shirai, Junsuke; Taniguchi, Koki; Mizutani, Tetsuya; Nagai, Makoto
2016-10-01
Porcine rotavirus C (RVC) is distributed throughout the world and is thought to be a pathogenic agent of diarrhea in piglets. Although, the VP7, VP4, and VP6 gene sequences of Japanese porcine RVCs are currently available, there is no whole-genome sequence data of Japanese RVC. Furthermore, only one to three sequences are available for porcine RVC VP1-VP3 and NSP1-NSP3 genes. Therefore, we determined nearly full-length whole-genome sequences of nine Japanese porcine RVCs from seven piglets with diarrhea and two healthy pigs and compared them with published RVC sequences from a database. The VP7 genes of two Japanese RVCs from healthy pigs were highly divergent from other known RVC strains and were provisionally classified as G12 and G13 based on the 86% nucleotide identity cut-off value. Pairwise sequence identity calculations and phylogenetic analyses revealed that candidate novel genotypes of porcine Japanese RVC were identified in the NSP1, NSP2 and NSP3 encoding genes, respectively. Furthermore, VP3 of Japanese porcine RVCs was shown to be closely related to human RVCs, suggesting a gene reassortment event between porcine and human RVCs and past interspecies transmission. The present study demonstrated that porcine RVCs show greater genetic diversity among strains than human and bovine RVCs. Copyright © 2016 Elsevier B.V. All rights reserved.
Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf
2015-10-01
Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Macaca specific exon creation event generates a novel ZKSCAN5 transcript.
Kim, Young-Hyun; Choe, Se-Hee; Song, Bong-Seok; Park, Sang-Je; Kim, Myung-Jin; Park, Young-Ho; Yoon, Seung-Bin; Lee, Youngjeon; Jin, Yeung Bae; Sim, Bo-Woong; Kim, Ji-Su; Jeong, Kang-Jin; Kim, Sun-Uk; Lee, Sang-Rae; Park, Young-Il; Huh, Jae-Won; Chang, Kyu-Tae
2016-02-15
ZKSCAN5 (also known as ZFP95) is a zinc-finger protein belonging to the Krűppel family. ZKSCAN5 contains a SCAN box and a KRAB A domain and is proposed to play a distinct role during spermatogenesis. In humans, alternatively spliced ZKSCAN5 transcripts with different 5'-untranslated regions (UTRs) have been identified. However, investigation of our Macaca UniGene Database revealed novel alternative ZKSCAN5 transcripts that arose due to an exon creation event. Therefore, in this study, we identified the full-length sequences of ZKSCAN5 and its alternative transcripts in Macaca spp. Additionally, we investigated different nonhuman primate sequences to elucidate the molecular mechanism underlying the exon creation event. We analyzed the evolutionary features of the ZKSCAN5 transcripts by reverse transcription polymerase chain reaction (RT-PCR) and genomic PCR, and by sequencing various nonhuman primate DNA and RNA samples. The exon-created transcript was only detected in the Macaca lineage (crab-eating monkey and rhesus monkey). Full-length sequence analysis by rapid amplification of cDNA ends (RACE) identified ten full-length transcripts and four functional isoforms of ZKSCAN5. Protein sequence analyses revealed the presence of two groups of isoforms that arose because of differences in start-codon usage. Together, our results demonstrate that there has been specific selection for a discrete set of ZKSCAN5 variants in the Macaca lineage. Furthermore, study of this locus (and perhaps others) in Macaca spp. might facilitate our understanding of the evolutionary pressures that have shaped the mechanism of exon creation in primates. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Guo, Yan; Zhang, Jinliang; Yan, Yongfeng; Wu, Jian; Zhu, Nengwu; Deng, Changyan
2015-01-01
Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and subsequent sub-cloning and sequencing were used in this study to analyze the molecular phylogenetic diversity and spatial distribution of bacterial communities in different spatial locations during the cooling stage of composted swine manure. Total microbial DNA was extracted, and bacterial near full-length 16S rRNA genes were subsequently amplified, cloned, RFLP-screened, and sequenced. A total of 420 positive clones were classified by RFLP and near-full-length 16S rDNA sequences. Approximately 48 operational taxonomic units (OTUs) were found among 139 positive clones from the superstratum sample; 26 among 149 were from the middle-level sample and 35 among 132 were from the substrate sample. Thermobifida fusca was common in the superstratum layer of the pile. Some Bacillus spp. were remarkable in the middle-level layer, and Clostridium sp. was dominant in the substrate layer. Among 109 OTUs, 99 displayed homology with those in the GenBank database. Ten OTUs were not closely related to any known species. The superstratum sample had the highest microbial diversity, and different and distinct bacterial communities were detected in the three different layers. This study demonstrated the spatial characteristics of the microbial community distribution in the cooling stage of swine manure compost. PMID:25925066
DOE Office of Scientific and Technical Information (OSTI.GOV)
Markussen, Turhan; Jonassen, Christine Monceyron; Numanovic, Sanela
2008-05-10
Infectious salmon anaemia virus (ISAV) is an orthomyxovirus causing a multisystemic, emerging disease in Atlantic salmon. Here we present, for the first time, detailed sequence analyses of the full-genome sequence of a presumed avirulent isolate displaying a full-length hemagglutinin-esterase (HE) gene (HPR0), and compare this with full-genome sequences of 11 Norwegian ISAV isolates from clinically diseased fish. These analyses revealed the presence of a virulence marker right upstream of the putative cleavage site R{sub 267} in the fusion (F) protein, suggesting a Q{sub 266} {yields} L{sub 266} substitution to be a prerequisite for virulence. To gain virulence in isolates lackingmore » this substitution, a sequence insertion near the cleavage site seems to be required. This strongly suggests the involvement of a protease recognition pattern at the cleavage site of the fusion protein as a determinant of virulence, as seen in highly pathogenic influenza A virus H5 or H7 and the paramyxovirus Newcastle disease virus.« less
Tao, Junjie; Feng, Chao; Ai, Bin; Kang, Ming
2016-01-01
Background and Aims Limestone karst areas possess high floral diversity and endemism. The genus Primulina, which contributes to the unique calcicole flora, has high species richness and exhibit specific soil-based habitat associations that are mainly distributed on calcareous karst soils. The adaptive molecular evolutionary mechanism of the genus to karst calcium-rich environments is still not well understood. The Ca2+-permeable channel TPC1 was used in this study to test whether its gene is involved in the local adaptation of Primulina to karst high-calcium soil environments. Methods Specific amplification and sequencing primers were designed and used to amplify the full-length coding sequences of TPC1 from cDNA of 76 Primulina species. The sequence alignment without recombination and the corresponding reconstructed phylogeny tree were used in molecular evolutionary analyses at the nucleic acid level and amino acid level, respectively. Finally, the identified sites under positive selection were labelled on the predicted secondary structure of TPC1. Key Results Seventy-six full-length coding sequences of Primulina TPC1 were obtained. The length of the sequences varied between 2220 and 2286 bp and the insertion/deletion was located at the 5′ end of the sequences. No signal of substitution saturation was detected in the sequences, while significant recombination breakpoints were detected. The molecular evolutionary analyses showed that TPC1 was dominated by purifying selection and the selective pressures were not significantly different among species lineages. However, significant signals of positive selection were detected at both TPC1 codon level and amino acid level, and five sites under positive selective pressure were identified by at least three different methods. Conclusions The Ca2+-permeable channel TPC1 may be involved in the local adaptation of Primulina to karst Ca2+-rich environments. Different species lineages suffered similar selective pressure associated with calcium in karst environments, and episodic diversifying selection at a few sites may play a major role in the molecular evolution of Primulina TPC1. PMID:27582362
Unit-length line-1 transcripts in human teratocarcinoma cells.
Skowronski, J; Fanning, T G; Singer, M F
1988-01-01
We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-1 (L1) RNA present in a human teratocarcinoma cell line, NTera2D1, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identified as the 5' terminus of the longest known primate genomic L1 elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic L1 elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2D1 cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic L1s, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human L1 family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian L1s to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class II retrotransposons. Images PMID:2454389
Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis
2016-09-02
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Quiroz Velasquez, Paula F.; Abiff, Sumayyah K.; Fins, Katrina C.; Conway, Quincy B.; Salazar, Norma C.; Delgado, Ana Paula; Dawes, Jhanelle K.; Douma, Lauren G.
2014-01-01
A combination of 454 pyrosequencing and Sanger sequencing was used to sample and characterize the transcriptome of the entomopathogenic oomycete Lagenidium giganteum. More than 50,000 high-throughput reads were annotated through homology searches. Several selected reads served as seeds for the amplification and sequencing of full-length transcripts. Phylogenetic analyses inferred from full-length cellulose synthase alignments revealed that L giganteum is nested within the peronosporalean galaxy and as such appears to have evolved from a phytopathogenic ancestor. In agreement with the phylogeny reconstructions, full-length L. giganteum oomycete effector orthologs, corresponding to the cellulose-binding elicitor lectin (CBEL), crinkler (CRN), and elicitin proteins, were characterized by domain organizations similar to those of pathogenicity factors of plant-pathogenic oomycetes. Importantly, the L. giganteum effectors provide a basis for detailing the roles of canonical CRN, CBEL, and elicitin proteins in the infectious process of an oomycete known principally as an animal pathogen. Finally, phylogenetic analyses and genome mining identified members of glycoside hydrolase family 5 subfamily 27 (GH5_27) as putative virulence factors active on the host insect cuticle, based in part on the fact that GH5_27 genes are shared by entomopathogenic oomycetes and fungi but are underrepresented in nonentomopathogenic genomes. The genomic resources gathered from the L. giganteum transcriptome analysis strongly suggest that filamentous entomopathogens (oomycetes and fungi) exhibit convergent evolution: they have evolved independently from plant-associated microbes, have retained genes indicative of plant associations, and may share similar cores of virulence factors, such as GH5_27 enzymes, that are absent from the genomes of their plant-pathogenic relatives. PMID:25107973
TIAN, PENG; LI, JIE; LIU, XIANG; LI, YUXI; CHEN, MEIHENG; MA, YUN; ZHENG, YI QING; FU, YONGGUI; ZOU, HUA
2014-01-01
Nasal polyps (NP) is highly associated with the disorder of immune cells. Alternative polyadenylation (APA) produces mRNA isoforms with different length of 3′-untranslated region (UTR) and regulates gene expression. It has been proven that this APA-mediated regulation of 3′UTR length is an immune-associated phenomenon. The aim of this study was to investigate the genome-wide alternative tandem 3′UTR length switching events in non-eosinophilic nasal polyp tissue. Thirteen patients diagnosed as having non-eosinophilic nasal polyps were included in this study. Nasal polyp tissue and control mucosa were collected during surgery. The 3′ end library of cDNA was constructed. The recovered libraries were sequenced with second sequencing technology, and the sequencing data were analyzed by an in-house bioinformatics pipeline. Tandem 3′UTR length switching between samples was detected by a test of linear trend alternative to independence. We found a significant alteration in the tandem 3′UTR length in 1,920 genes in nasal polyp samples. Functional annotation results showed that several gene ontology (GO) terms were enriched in the list of genes with switched APA sites, including regulation of transcription, macromolecule catabolic localization and mRNA processing. The results suggested that APA-mediated alternative 3′UTR regulation plays an important role in the post-transcriptional regulation of gene expression in non-eosinophilic nasal polyps. PMID:24715051
Systematic Characterization and Comparative Analysis of the Rabbit Immunoglobulin Repertoire
Lavinder, Jason J.; Hoi, Kam Hon; Reddy, Sai T.; Wine, Yariv; Georgiou, George
2014-01-01
Rabbits have been used extensively as a model system for the elucidation of the mechanism of immunoglobulin diversification and for the production of antibodies. We employed Next Generation Sequencing to analyze Ig germline V and J gene usage, CDR3 length and amino acid composition, and gene conversion frequencies within the functional (transcribed) IgG repertoire of the New Zealand white rabbit (Oryctolagus cuniculus). Several previously unannotated rabbit heavy chain variable (VH) and light chain variable (VL) germline elements were deduced bioinformatically using multidimensional scaling and k-means clustering methods. We estimated the gene conversion frequency in the rabbit at 23% of IgG sequences with a mean gene conversion tract length of 59±36 bp. Sequencing and gene conversion analysis of the chicken, human, and mouse repertoires revealed that gene conversion occurs much more extensively in the chicken (frequency 70%, tract length 79±57 bp), was observed to a small, yet statistically significant extent in humans, but was virtually absent in mice. PMID:24978027
Enzmann, P J; Kurath, G; Fichtner, D; Bergmann, S M
2005-09-23
Infectious hematopoietic necrosis virus (IHNV) was first detected in Europe in 1987 in France and Italy, and later, in 1992, in Germany. The source of the virus and the route of introduction are unknown. The present study investigates the molecular epidemiology of IHNV outbreaks in Germany since its first introduction. The complete nucleotide sequences of the glycoprotein (G) and non-virion (NV) genes from 9 IHNV isolates from Germany have been determined, and this has allowed the identification of characteristic differences between these isolates. Phylogenetic analysis of partial G gene sequences (mid-G, 303 nucleotides) from North American IHNV isolates (Kurath et al. 2003) has revealed 3 major genogroups, designated U, M and L. Using this gene region with 2 different North American IHNV data sets, it was possible to group the European IHNV strains within the M genogroup, but not in any previously defined subgroup. Analysis of the full length G gene sequences indicated that an independent evolution of IHN viruses had occurred in Europe. IHN viruses in Europe seem to be of a monophyletic origin, again most closely related to North American isolates in the M genogroup. Analysis of the NV gene sequences also showed the European isolates to be monophyletic, but resolution of the 3 genogroups was poor with this gene region. As a result of comparative sequence analyses, several different genotypes have been identified circulating in Europe.
Flanking sequence determination and specific PCR identification of transgenic wheat B102-1-2.
Cao, Jijuan; Xu, Junyi; Zhao, Tongtong; Cao, Dongmei; Huang, Xin; Zhang, Piqiao; Luan, Fengxia
2014-01-01
The exogenous fragment sequence and flanking sequence between the exogenous fragment and recombinant chromosome of transgenic wheat B102-1-2 were successfully acquired using genome walking technology. The newly acquired exogenous fragment encoded the full-length sequence of transformed genes with transformed plasmid and corresponding functional genes including ubi, vector pBANF-bar, vector pUbiGUSPlus, vector HSP, reporter vector pUbiGUSPlus, promoter ubiquitin, and coli DH1. A specific polymerase chain reaction (PCR) identification method for transgenic wheat B102-1-2 was established on the basis of designed primers according to flanking sequence. This established specific PCR strategy was validated by using transgenic wheat, transgenic corn, transgenic soybean, transgenic rice, and non-transgenic wheat. A specifically amplified target band was observed only in transgenic wheat B102-1-2. Therefore, this method is characterized by high specificity, high reproducibility, rapid identification, and excellent accuracy for the identification of transgenic wheat B102-1-2.
Genomic overview of mRNA 5′-leader trans-splicing in the ascidian Ciona intestinalis
Satou, Yutaka; Hamaguchi, Makoto; Takeuchi, Keisuke; Hastings, Kenneth E. M.; Satoh, Nori
2006-01-01
Although spliced leader (SL) trans-splicing in the chordates was discovered in the tunicate Ciona intestinalis there has been no genomic overview analysis of the extent of trans-splicing or the make-up of the trans-spliced and non-trans-spliced gene populations of this model organism. Here we report such an analysis for Ciona based on the oligo-capping full-length cDNA approach. We randomly sampled 2078 5′-full-length ESTs representing 668 genes, or 4.2% of the entire genome. Our results indicate that Ciona contains a single major SL, which is efficiently trans-spliced to mRNAs transcribed from a specific set of genes representing ∼50% of the total number of expressed genes, and that individual trans-spliced mRNA species are, on average, 2–3-fold less abundant than non-trans-spliced mRNA species. Our results also identify a relationship between trans-splicing status and gene functional classification; ribosomal protein genes fall predominantly into the non-trans-spliced category. In addition, our data provide the first evidence for the occurrence of polycistronic transcription in Ciona. An interesting feature of the Ciona polycistronic transcription units is that the great majority entirely lack intercistronic sequences. PMID:16822859
Liu, Tong; Pan, Luqing; Cai, Yuefeng; Miao, Jingjing
2015-01-25
HSP70 and HSP90 are the most important heat shock proteins (HSPs), which play the key roles in the cell as molecular chaperones and may involve in metabolic detoxification. The present research has obtained full-length cDNAs of genes HSP70 and HSP90 from the clam Ruditapes philippinarum and studied the transcriptional responses of the two genes when exposed to benzo(a)pyrene (BaP). The full-length RpHSP70 cDNA was 2336bp containing a 5' untranslated region (UTR) of 51bp, a 3' UTR of 335bp and an open reading frame (ORF) of 1950bp encoding 650 amino acid residues. The full-length RpHSP90 cDNA was 2839bp containing a 107-bp 5' UTR, a 554-bp 3' UTR and a 2178-bp ORF encoding 726 amino acid residues. The deduced amino acid sequences of RpHSP70 and RpHSP90 shared the highest identity with the sequences of Paphia undulata, and the phylogenetic trees showed that the evolutions of RpHSP70 and RpHSP90 were almost in accord with the evolution of species. The RpHSP70 and RpHSP90 mRNA expressions were detected in all tested tissues in the adult clams (digestive gland, gill, adductor muscle and mantle) and the highest mRNA expression level was observed in the digestive gland compared to other tissues. Quantitative real-time RT-PCR analysis revealed that mRNA expression levels of the clam RpHSP70, RpHSP90 and other xenobiotic metabolizing enzymes (XMEs) (AhR, DD, GST, GPx) in the digestive gland of R. philippinarum were induced by benzo(a)pyrene (BaP) and the absolute expression levels of these genes showed a temporal and dose-dependent response. The results suggested that RpHSP70 and RpHSP90 were involved in the metabolic detoxification of BaP in the clam R. philippinarum. Copyright © 2014 Elsevier B.V. All rights reserved.
Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N.
2012-01-01
The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ∼95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development. PMID:22984496
Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N
2012-01-01
The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ~95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development.
Gambetta, Gregory A; Matthews, Mark A; Syvanen, Michael
2018-05-04
Xylella fastidiosa (Xf) is a gram negative bacterium inhabiting the plant vascular system. In most species this bacterium lives as a benign symbiote, but in several agriculturally important plants (e.g. coffee, citrus, grapevine) Xf is pathogenic. Xf has four loci encoding homologues to hemolysin RTX proteins, virulence factors involved in a wide range of plant pathogen interactions. We show that all four genes are expressed during pathogenesis in grapevine. The sequences from these four genes have a complex repetitive structure. At the C-termini, sequence diversity between strains is what would be expected from orthologous genes. However, within strains there is no N-terminal homology, indicating these loci encode RTXs of different functions and/or specificities. More striking is that many of the orthologous loci between strains share this extreme variation at the N-termini. Thus these RTX orthologues are most easily visualized as fusions between the orthologous C-termini and different N-termini. Further, the four genes are found in operons having a peculiar structure with an extensively duplicated module encoding a small protein with homology to the N-terminal region of the full length RTX. Surprisingly, some of these small peptides are most similar not to their corresponding full length RTX, but to the N-termini of RTXs from other Xf strains, and even other remotely related species. These results demonstrate that these genes are expressed in planta during pathogenesis. Their structure suggests extensive evolutionary restructuring through horizontal gene transfers and heterologous recombination mechanisms. The sum of the evidence suggests these repetitive modules are a novel kind of mobile genetic element.
Mining biological databases for candidate disease genes
NASA Astrophysics Data System (ADS)
Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.
2001-07-01
The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).
Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang
2013-08-01
To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy persons. There was no cross reaction with sera of schistosomiasis and cysticercosis patients. The cross reaction with sera of paragonimiasis westermani patients was 1/15. The recombinant proteins rCs22M-2r and rCs22M-3r which only contained tandem repeats were specifically recognized by pooled sera of clonorchiasis patients. The Cs22 antigen gene of Clonorchis sinensis is obtained, and the recombinant proteins have certain diagnostic value. The antigenic determinant is located in tandem repeat sequences.
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu
2011-09-07
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu
2011-01-01
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
Waite, David W; Dsouza, Melissa; Biswas, Kristi; Ward, Darren F; Deines, Peter; Taylor, Michael W
2015-05-01
The endemic New Zealand weta is an enigmatic insect. Although the insect is well known by its distinctive name, considerable size, and morphology, many basic aspects of weta biology remain unknown. Here, we employed cultivation-independent enumeration techniques and rRNA gene sequencing to investigate the gut microbiota of the Auckland tree weta (Hemideina thoracica). Fluorescence in situ hybridisation performed on different sections of the gut revealed a bacterial community of fluctuating density, while rRNA gene-targeted amplicon pyrosequencing revealed the presence of a microbial community containing high bacterial diversity, but an apparent absence of archaea. Bacteria were further studied using full-length 16S rRNA gene sequences, with statistical testing of bacterial community membership against publicly available termite- and cockroach-derived sequences, revealing that the weta gut microbiota is similar to that of cockroaches. These data represent the first analysis of the weta microbiota and provide initial insights into the potential function of these microorganisms.
Xander: employing a novel method for efficient gene-targeted metagenomic assembly.
Wang, Qiong; Fish, Jordan A; Gilman, Mariah; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R
2015-01-01
Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Miura, Toru; Kamikouchi, Azusa; Sawata, Miyuki; Takeuchi, Hideaki; Natori, Syunji; Kubo, Takeo; Matsumoto, Tadao
1999-01-01
Although “polymorphic castes” in social insects are well known as one of the most important phenomena of polyphenism, few studies of caste-specific gene expressions have been performed in social insects. To identify genes specifically expressed in the soldier caste of the Japanese damp-wood termite Hodotermopsis japonica, we employed the differential-display method using oligo(dT) and arbitrary primers, compared mRNA from the heads of mature soldiers and pseudergates (worker caste), and identified a clone (PCR product) 329 bp in length termed SOL1. Northern blot analysis showed that the SOL1 mRNA is about 1.0 kb in length and is expressed specifically in mature soldiers, but not in pseudergates, even in the presoldier induction by juvenile hormone analogue, suggesting that the product is specific for terminally differentiated soldiers. By using the method of 5′- and 3′-rapid amplification of cDNA ends, we isolated the full length of SOL1 cDNA, which contained an ORF with a putative signal peptide at the N terminus. The sequence showed no significant homology with any other known protein sequences. In situ hybridization analysis showed that SOL1 is expressed specifically in the mandibular glands. These results strongly suggest that the SOL1 gene encodes a secretory protein specifically synthesized in the mandibular glands of the soldiers. Histological observations revealed that the gland actually develops during the differentiation into the soldier caste. PMID:10570166
Molecular cloning and characterization of SoxB2 gene from Zhikong scallop Chlamys farreri
NASA Astrophysics Data System (ADS)
He, Yan; Bao, Zhenmin; Guo, Huihui; Zhang, Yueyue; Zhang, Lingling; Wang, Shi; Hu, Jingjie; Hu, Xiaoli
2013-11-01
The Sox proteins play critical roles during the development of animals, including sex determination and central nervous system development. In this study, the SoxB2 gene was cloned from a mollusk, the Zhikong scallop ( Chlamys farreri), and characterized with respect to phylogeny and tissue distribution. The full-length cDNA and genomic DNA sequences of C. farreri SoxB2 ( Cf SoxB2) were obtained by rapid amplification of cDNA ends and genome walking, respectively, using a partial cDNA fragment from the highly conserved DNA-binding domain, i.e., the High Mobility Group (HMG) box. The full-length cDNA sequence of Cf SoxB2 was 2 048 bp and encoded 268 amino acids protein. The genomic sequence was 5 551 bp in length with only one exon. Several conserved elements, such as the TATA-box, GC-box, CAAT-box, GATA-box, and Sox/sry-sex/testis-determining and related HMG box factors, were found in the promoter region. Furthermore, real-time quantitative reverse transcription PCR assays were carried out to assess the mRNA expression of Cf SoxB 2 in different tissues. SoxB2 was highly expressed in the mantle, moderately in the digestive gland and gill, and weakly expressed in the gonad, kidney and adductor muscle. In male and female gonads at different developmental stages of reproduction, the expression levels of Cf SoxB2 were similar. Considering the specific expression and roles of SoxB 2 in other animals, in particular vertebrates, and the fact that there are many pallial nerves in the mantle, cerebral ganglia in the digestive gland and gill nerves in gill, we propose a possible essential role in nervous tissue function for Sox B 2 in C. farreri.
Maestre, Juan P; Rovira, Roger; Gamisans, Xavier; Kinney, Kerry A; Kirisits, Mary Jo; Lafuente, Javier; Gabriel, David
2009-01-01
The diversity and spatial distribution of bacteria in a lab-scale biotrickling filter treating high loads of hydrogen sulfide (H(2)S) were investigated. Diversity and community structure were studied by terminal-restriction fragment length polymorphism (T-RFLP). A 16S rRNA gene clone library was established. Near Full-length 16S rRNA gene sequences were obtained, and clones were clustered into 24 operational taxonomic units (OTUs). Nearly 74% and 26% of the clones were affiliated with the phyla Proteobacteria and Bacteroidetes, respectively. Beta-, epsilon- and gamma-proteobacteria accounted for 15, 9 and 48%, respectively. Around 45% of the sequences retrieved were affiliated to bacteria of the sulfur cycle including Thiothrix spp., Thiobacillus spp. and Sulfurimonas denitrificans. Sequences related to Thiothrix lacustris accounted for a 38%. Rarefaction curve demonstrated that clone library constructed can be sufficient to describe the vast majority of the bacterial diversity of this reactor operating under strict conditions (2,000 ppm(v) of H(2)S). A spatial distribution of bacteria was found along the length of the reactor by means of the T-RFLP technique. Although aerobic species were predominant along the reactor, facultative anaerobes had a major relative abundance in the inlet part of the reactor, where the sulfide to oxygen ratio is higher.
Genetic speciation of environmental Legionella isolates in Thailand.
Paveenkittiporn, Wantana; Dejsirilert, Surang; Kalambaheti, Thareerat
2012-10-01
Legionella-like organisms were isolated during 2003-2007 from various water resources by culturing on selective media of Wadowsky-Yee-Okuda agar. The 256 isolates were identified as belonging to the Legionella genus based on detection of 108 bp PCR product of the 5S rRNA gene, while the inclusion as Legionella pneumophila were confirmed by PCR detection of a specific mip gene region of 168 bp. The 50 isolates, identified as non-pneumophila, were then subjected to DNA tree analysis, based on mip gene of ~650 bp and rnpB genes product ranged from 304 to 354 bp. Phylogenetic tree was constructed to predict their species in relative to the available database. The isolates of which their speciation, based on those two genes were inconclusive, were then investigated for the almost full-length of 16S rRNA sequences. The isolates were assigned as 16 known Legionella species, and proposed seven novel species based on their unique 16S rRNA sequence. Copyright © 2012 Elsevier B.V. All rights reserved.
Characterization of a New HIV-1 CRF01_AE/ CRF07_BC recombinant virus in Tianjin, China.
Zhou, Zhehua; Ma, Ping; Feng, Yi; Ou, Weidong; Qian, Jing; Gao, Liying; Zhang, Defa; Shao, Yiming; Wei, Min
2018-05-04
Human immunodeficiency virus (HIV) is notorious for its rapid evolving since its transmissions from money to human. Currently, HIV contains multiple subtypes, circulating recombinant forms (CRFs) and unique recombinant forms (URFs). Here, from an HIV-positive mother and her child in Tianjin, China, we identified a novel HIV-1 second-generation recombinant virus (TJ20170316 and TJ20170317) between CRF01_AE and CRF07_BC. Near full-length genomes were obtained from both samples, and they shared very close sequences, except some point mutations. Phylogenetic analyses of the near full-length genomes showed that they consist of CRF01_AE backbone and part CRF07_BC sequences. Recombinant Identification Program (RIP) and Simplot software identified four breakpoints in gag, pol, vif, tat genes in TJ20170316, totally different from other reported CRFs and URFs. The emergence of such URF in Tianjin, China, highlights the complexity of HIV-1 epidemic and more measures should be taken to prevent HIV transmissions.
Zhao, Xing; Liang, Ai-Ping
2016-09-01
The first complete DNA sequence of the mitochondrial genome (mitogenome) of Leptobelus gazelle (Membracoidea: Hemiptera) is determined in this study. The circular molecule is 16,007 bp in its full length, which encodes a set of 37 genes, including 13 proteins, 2 ribosomal RNAs, 22 transfer RNAs, and contains an A + T-rich region (CR). The gene numbers, content, and organization of L. gazelle are similar to other typical metazoan mitogenomes. Twelve of the 13 PCGs are initiated with ATR methionine or ATT isoleucine codons, except the atp8 gene that uses the ATC isoleucine as start signal. Ten of the 13 PCGs have complete termination codons, either TAA (nine genes) or TAG (cytb). The remaining 3 PCGs (cox1, cox2 and nad5) have incomplete termination codons T (AA). All of the 22 tRNAs can be folded in the form of a typical clover-leaf structure. The complete mitogenome sequence data of L. gazelle is useful for the phylogenetic and biogeographic studies of the Membracoidea and Hemiptera.
Jang, Kuem Hee; Hwang, Ui Wook
2016-05-01
The complete mitogenome sequence of Martes flavigula, which is an endangered and endemic species in South Korea, was determined. The genome is 16,533 bp in length and its gene arrangement pattern, gene content, and gene organization is identical to those of martens. The control region was located between the tRNAPro and tRNAPhe genes and is 1087 bp in length. This mitogenome sequence data might be an important role in the preservation of genetic resources by allowing researchers to conduct phylogenetic and systematic analyses of Mustelidae.
Tian, Wenlan; Paudel, Dev
2017-01-01
Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing
Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro
2017-01-01
Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Subramaniam, R; Reinold, S; Molitor, E K; Douglas, C J
1993-01-01
A heterologous probe encoding phenylalanine ammonia-lyase (PAL) was used to identify PAL clones in cDNA libraries made with RNA from young leaf tissue of two Populus deltoides x P. trichocarpa F1 hybrid clones. Sequence analysis of a 2.4-kb cDNA confirmed its identity as a full-length PAl clone. The predicted amino acid sequence is conserved in comparison with that of PAL genes from several other plants. Southern blot analysis of popular genomic DNA from parental and hybrid individuals, restriction site polymorphism in PAL cDNA clones, and sequence heterogeneity in the 3' ends of several cDNA clones suggested that PAL is encoded by at least two genes that can be distinguished by HindIII restriction site polymorphisms. Clones containing each type of PAL gene were isolated from a poplar genomic library. Analysis of the segregation of PAL-specific HindIII restriction fragment-length polymorphisms demonstrated the existence of two independently segregating PAL loci, one of which was mapped to a linkage group of the poplar genetic map. Developmentally regulated PAL expression in poplar was analyzed using RNA blots. Highest expression was observed in young stems, apical buds, and young leaves. Expression was lower in older stems and undetectable in mature leaves. Cellular localization of PAL expression by in situ hybridization showed very high levels of expression in subepidermal cells of leaves early during leaf development. In stems and petioles, expression was associated with subepidermal cells and vascular tissues. PMID:8108506
Ma, Yuyuan; Lv, Maomin; Xu, Shu; Wu, Jianmin; Tian, Kegong; Zhang, Jingang
2010-07-01
Existence of porcine endogenous retrovirus (PERV) hinders pigs to be used in clinical xenotransplantation to alleviate the shortage of human transplants. Chinese miniature pigs are potential organ donors for xenotransplantation in China. However, so far, an adequate level of information on the molecular characteristics of PERV from Chinese miniature pigs has not been available. We described here the cloning and characterization of full-length proviral DNA of PERV from Chinese Wuzhishan miniature pigs inbred (WZSP). Full-length nucleotide sequences of PERV-WZSP and other PERVs were aligned and phylogenetic tree was constructed from deduced amino-acid sequences of env. The results demonstrated that the full-length proviral DNA of PERV-WZSP belongs to gammaretrovirus and shares high similarity with other PERVs. Sequence analysis also suggested that different patterns of LTR existed in the same porcine germ line and partial PERV-C sequence may recombine with PERV-A sequence in LTR. (c) 2008 Elsevier Ltd. All rights reserved.
Brizuela, Leonardo; Richardson, Aaron; Marsischky, Gerald; Labaer, Joshua
2002-01-01
Thanks to the results of the multiple completed and ongoing genome sequencing projects and to the newly available recombination-based cloning techniques, it is now possible to build gene repositories with no precedent in their composition, formatting, and potential. This new type of gene repository is necessary to address the challenges imposed by the post-genomic era, i.e., experimentation on a genome-wide scale. We are building the FLEXGene (Full Length EXpression-ready) repository. This unique resource will contain clones representing the complete ORFeome of different organisms, including Homo sapiens as well as several pathogens and model organisms. It will consist of a comprehensive, characterized (sequence-verified), and arrayed gene repository. This resource will allow full exploitation of the genomic information by enabling genome-wide scale experimentation at the level of functional/phenotypic assays as well as at the level of protein expression, purification, and analysis. Here we describe the rationale and construction of this resource and focus on the data obtained from the Saccharomyces cerevisiae project.
Chou, A; Burke, J
1999-05-01
DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Equid herpesvirus 9 (EHV-9) isolates from zebras in Ontario, Canada, 1989 to 2007.
Rebelo, Ana Rita; Carman, Susy; Shapiro, Jan; van Dreumel, Tony; Hazlett, Murray; Nagy, Éva
2015-04-01
The objective of this study was to identify and partially characterize 3 equid herpesviruses that were isolated postmortem from zebras in Ontario, Canada in 1989, 2002, and 2007. These 3 virus isolates were characterized by plaque morphology, restriction fragment length polymorphism (RFLP) of their genomic deoxyribonucleic acid (DNA), real-time polymerase chain reaction (PCR) assay, and sequence analyses of the full length of the glycoprotein G (gG) gene (ORF70) and a portion of the DNA polymerase gene (ORF30). The isolates were also compared to 3 reference strains of equid herpesvirus 1 (EHV-1). Using rabbit kidney cells, the plaques for the isolates from the zebras were found to be much larger in size than the EHV-1 reference strains. The RFLP patterns of the zebra viruses differed among each other and from those of the EHV-1 reference strains. Real-time PCR and sequence analysis of a portion of the DNA polymerase gene determined that the herpesvirus isolates from the zebras contained a G at nucleotide 2254 and a corresponding N at amino acid position 752, which suggested that they could be neuropathogenic EHV-1 strains. However, subsequent phylogenetic analysis of the gG gene suggested that they were EHV-9 and not EHV-1.
Castresana, C; Garcia-Luque, I; Alonso, E; Malik, V S; Cashmore, A R
1988-01-01
We have analyzed promoter regulatory elements from a photoregulated CAB gene (Cab-E) isolated from Nicotiana plumbaginifolia. These studies have been performed by introducing chimeric gene constructs into tobacco cells via Agrobacterium tumefaciens-mediated transformation. Expression studies on the regenerated transgenic plants have allowed us to characterize three positive and one negative cis-acting elements that influence photoregulated expression of the Cab-E gene. Within the upstream sequences we have identified two positive regulatory elements (PRE1 and PRE2) which confer maximum levels of photoregulated expression. These sequences contain multiple repeated elements related to the sequence-ACCGGCCCACTT-. We have also identified within the upstream region a negative regulatory element (NRE) extremely rich in AT sequences, which reduces the level of gene expression in the light. We have defined a light regulatory element (LRE) within the promoter region extending from -396 to -186 bp which confers photoregulated expression when fused to a constitutive nopaline synthase ('nos') promoter. Within this region there is a 132-bp element, extending from -368 to -234 bp, which on deletion from the Cab-E promoter reduces gene expression from high levels to undetectable levels. Finally, we have demonstrated for a full length Cab-E promoter conferring high levels of photoregulated expression, that sequences proximal to the Cab-E TATA box are not replaceable by corresponding sequences from a 'nos' promoter. This contrasts with the apparent equivalence of these Cab-E and 'nos' TATA box-proximal sequences in truncated promoters conferring low levels of photoregulated expression. Images PMID:2901343
Hu, Beixia; Huang, Yanyan; He, Yefeng; Xu, Chuantian; Lu, Xishan; Zhang, Wei; Meng, Bin; Yan, Shigan; Zhang, Xiumei
2010-07-29
In order to determine the actual prevalence of avian influenza virus (AIV) and Newcastle disease virus (NDV) in ducks in Shandong province of China, extensive surveillance studies were carried out in the breeding ducks of an intensive farm from July 2007 to September 2008. Each month cloacal and tracheal swabs were taken from 30 randomly selected birds that appeared healthy. All of the swabs were negative for influenza A virus recovery, whereas 87.5% of tracheal swabs and 100% cloacal swabs collected in September 2007, were positive for Newcastle disease virus isolation. Several NDV isolates were recovered from tracheal and cloacal swabs of apparently healthy ducks. All of the isolates were apathogenic as determined by the MDT and ICPI. The HN gene and the variable region of F gene (nt 47-420) of four isolates selected at random were sequenced. A 374 bp region of F gene and the full length of HN gene were used for phylogenetic analysis. Four isolates were identified as the same isolate based on nucleotide sequences identities of 99.2-100%, displaying a closer phylogenetic relationship to lentogenic Class I viruses. There were 1.9-9.9% nucleotide differences between the isolates and other Class I virus in the variable region of F gene (nt 47-420), whereas there were 38.5-41.2% nucleotide difference between the isolates and Class II viruses. The amino acid sequences of the F protein cleavage sites in these isolates were 112-ERQERL-117. The full length of HN gene of these isolates was 1851 bp, coding 585 amino acids. The homology analysis of the nucleotide sequence of HN gene indicated that there were 2.0-4.2% nucleotide differences between the isolates and other Class I viruses, whereas there were 29.5-40.9% differences between the isolates and Class II viruses. The results shows that these isolates are not phylogenetically related to the vaccine strain (LaSota). This study adds to the understanding of the ecology of influenza viruses and Newcastle disease viruses in ducks and emphasizes the need for constant surveillance in times of an ongoing and expanding epidemic of AIV and NDV. Copyright (c) 2010 Elsevier B.V. All rights reserved.
HLA-E regulatory and coding region variability and haplotypes in a Brazilian population sample.
Ramalho, Jaqueline; Veiga-Castelli, Luciana C; Donadi, Eduardo A; Mendes-Junior, Celso T; Castelli, Erick C
2017-11-01
The HLA-E gene is characterized by low but wide expression on different tissues. HLA-E is considered a conserved gene, being one of the least polymorphic class I HLA genes. The HLA-E molecule interacts with Natural Killer cell receptors and T lymphocytes receptors, and might activate or inhibit immune responses depending on the peptide associated with HLA-E and with which receptors HLA-E interacts to. Variable sites within the HLA-E regulatory and coding segments may influence the gene function by modifying its expression pattern or encoded molecule, thus, influencing its interaction with receptors and the peptide. Here we propose an approach to evaluate the gene structure, haplotype pattern and the complete HLA-E variability, including regulatory (promoter and 3'UTR) and coding segments (with introns), by using massively parallel sequencing. We investigated the variability of 420 samples from a very admixed population such as Brazilians by using this approach. Considering a segment of about 7kb, 63 variable sites were detected, arranged into 75 extended haplotypes. We detected 37 different promoter sequences (but few frequent ones), 27 different coding sequences (15 representing new HLA-E alleles) and 12 haplotypes at the 3'UTR segment, two of them presenting a summed frequency of 90%. Despite the number of coding alleles, they encode mainly two different full-length molecules, known as E*01:01 and E*01:03, which corresponds to about 90% of all. In addition, differently from what has been previously observed for other non classical HLA genes, the relationship among the HLA-E promoter, coding and 3'UTR haplotypes is not straightforward because the same promoter and 3'UTR haplotypes were many times associated with different HLA-E coding haplotypes. This data reinforces the presence of only two main full-length HLA-E molecules encoded by the many HLA-E alleles detected in our population sample. In addition, this data does indicate that the distal HLA-E promoter is by far the most variable segment. Further analyses involving the binding of transcription factors and non-coding RNAs, as well as the HLA-E expression in different tissues, are necessary to evaluate whether these variable sites at regulatory segments (or even at the coding sequence) may influence the gene expression profile. Copyright © 2017 Elsevier Ltd. All rights reserved.
An improved model for whole genome phylogenetic analysis by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2015-10-07
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Flanking sequence determination and event-specific detection of genetically modified wheat B73-6-1.
Xu, Junyi; Cao, Jijuan; Cao, Dongmei; Zhao, Tongtong; Huang, Xin; Zhang, Piqiao; Luan, Fengxia
2013-05-01
In order to establish a specific identification method for genetically modified (GM) wheat, exogenous insert DNA and flanking sequence between exogenous fragment and recombinant chromosome of GM wheat B73-6-1 were successfully acquired by means of conventional polymerase chain reaction (PCR) and thermal asymmetric interlaced (TAIL)-PCR strategies. Newly acquired exogenous fragment covered the full-length sequence of transformed genes such as transformed plasmid and corresponding functional genes including marker uidA, herbicide-resistant bar, ubiquitin promoter, and high-molecular-weight gluten subunit. The flanking sequence between insert DNA revealed high similarity with Triticum turgidum A gene (GenBank: AY494981.1). A specific PCR detection method for GM wheat B73-6-1 was established on the basis of primers designed according to the flanking sequence. This specific PCR method was validated by GM wheat, GM corn, GM soybean, GM rice, and non-GM wheat. The specifically amplified target band was observed only in GM wheat B73-6-1. This method is of high specificity, high reproducibility, rapid identification, and excellent accuracy for the identification of GM wheat B73-6-1.
Houtz, Robert L.
1998-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .epsilon.N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Houtz, Robert L.
1999-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .sup..epsilon. N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Houtz, R.L.
1998-03-03
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) {epsilon}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 5 figs.
Houtz, R.L.
1999-02-02
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS){sup {epsilon}}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 8 figs.
Sun, Jie; Li, Yuan-Li; Wang, Ruo-Hai; Xia, Gui-Xian
2004-01-01
Fluorescence differential display (FDD) technique was used to identify genes that are specifically or preferentially expressed in different developmental stages of cotton fiber cells. One hundred and nine differentially displayed cDNA fragments were isolated using 9, 21 and 27 DPA (days postanthesis) fibers as experimental materials. By a combination of two rounds of reverse Northern hybridization and Northern blot analyses, a number of such cDNA fragments were proved to represent fiber-specific/preferential genes. Sequencing determination and database searching indicated that most of these genes are novel. This work is an important step towards cloning the full-length cDNAs and characterizing the cellular functions of aforementioned genes in fiber development.
Gautier, Philippe; Loosli, Felix; Tay, Boon-Hui; Tay, Alice; Murdoch, Emma; Coutinho, Pedro; van Heyningen, Veronica; Brenner, Sydney; Venkatesh, Byrappa; Kleinjan, Dirk A.
2013-01-01
Pax6 is a developmental control gene essential for eye development throughout the animal kingdom. In addition, Pax6 plays key roles in other parts of the CNS, olfactory system, and pancreas. In mammals a single Pax6 gene encoding multiple isoforms delivers these pleiotropic functions. Here we provide evidence that the genomes of many other vertebrate species contain multiple Pax6 loci. We sequenced Pax6-containing BACs from the cartilaginous elephant shark (Callorhinchus milii) and found two distinct Pax6 loci. Pax6.1 is highly similar to mammalian Pax6, while Pax6.2 encodes a paired-less Pax6. Using synteny relationships, we identify homologs of this novel paired-less Pax6.2 gene in lizard and in frog, as well as in zebrafish and in other teleosts. In zebrafish two full-length Pax6 duplicates were known previously, originating from the fish-specific genome duplication (FSGD) and expressed in divergent patterns due to paralog-specific loss of cis-elements. We show that teleosts other than zebrafish also maintain duplicate full-length Pax6 loci, but differences in gene and regulatory domain structure suggest that these Pax6 paralogs originate from a more ancient duplication event and are hence renamed as Pax6.3. Sequence comparisons between mammalian and elephant shark Pax6.1 loci highlight the presence of short- and long-range conserved noncoding elements (CNEs). Functional analysis demonstrates the ancient role of long-range enhancers for Pax6 transcription. We show that the paired-less Pax6.2 ortholog in zebrafish is expressed specifically in the developing retina. Transgenic analysis of elephant shark and zebrafish Pax6.2 CNEs with homology to the mouse NRE/Pα internal promoter revealed highly specific retinal expression. Finally, morpholino depletion of zebrafish Pax6.2 resulted in a “small eye” phenotype, supporting a role in retinal development. In summary, our study reveals that the pleiotropic functions of Pax6 in vertebrates are served by a divergent family of Pax6 genes, forged by ancient duplication events and by independent, lineage-specific gene losses. PMID:23359656
An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.
Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J
2014-11-01
The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami
2018-01-19
Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .
Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop
2012-01-01
Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming
2017-06-01
HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion
Padhukasahasram, Badri; Rannala, Bruce
2011-01-01
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Dubé, Marie-Pier; Castonguay, Yves; Cloutier, Jean; Michaud, Josée; Bertrand, Annick
2013-03-01
Dehydrin defines a complex family of intrinsically disordered proteins with potential adaptive value with regard to freeze-induced cell dehydration. Search within an expressed sequence tags library from cDNAs of cold-acclimated crowns of alfalfa (Medicago sativa spp. sativa L.) identified transcripts putatively encoding K(3)-type dehydrins. Analysis of full-length coding sequences unveiled two highly homologous sequence variants, K(3)-A and K(3)-B. An increase in the frequency of genotypes yielding positive genomic amplification of the K(3)-dehydrin variants in response to selection for superior tolerance to freezing and the induction of their expression at low temperature strongly support a link with cold adaptation. The presence of multiple allelic forms within single genotypes and independent segregation indicate that the two K(3) dehydrin variants are encoded by distinct genes located at unlinked loci. The co-inheritance of the K(3)-A dehydrin with a Y(2)K(4) dehydrin restriction fragment length polymorphism with a demonstrated impact on freezing tolerance suggests the presence of a genome domain where these functionally related genes are located. These results provide additional evidence that dehydrin play important roles with regard to tolerance to subfreezing temperatures. They also underscore the value of recurrent selection to help identify variants within a large multigene family in allopolyploid species like alfalfa.
Sequence analysis of DBL2β domain of vargene of Indonesian Plasmodium falciparum
NASA Astrophysics Data System (ADS)
Sulistyaningsih, E.; Romadhon, B. D.; Palupi, I.; Hidayah, F.; Dewi, R.; Prasetyo, A.
2018-03-01
Malaria is a major health problem in tropical countries including Indonesia. The most deadly agent is Plasmodium falciparum. In P. falciparum infection, PfEMP1 is supposed to play an important role in the pathogenesis of malaria. PfEMP1 is encoded by var gene family, it is a polymorphic protein where the extra-cellular portion contains of three distinct binding domains: Duffy binding-like (DBL), Cysteine-rich interdomain regions (CIDR) and C2. PfEMP1 varies in domain composition and binding specificity. The study explored the characteristic of Indonesian DBL2β-var genes and investigated its role to the malaria outcome. Twenty blood samples from clinically mild to severe malaria patients in Jember, East Java were collected for DNA extraction. Diagnosis was confirmed by Giemsa-stained thick blood smear. PCR was conducted using specific primer targeting on the full-length of DBL2ß and resulted approximately single band of 1,7 kb in a sample. This band was observed only from severe malaria sample. Sequence analysis directly from PCR product showed 74-99% similarities with previous sequences in Gene Bank. In conclusion, the DBL2β domain of vargene of Indonesian isolates was 1603 nucleotides in length and there was a possible association of the existence of DBL2β domain with the severity of malaria outcome.
Large-scale collection of full-length cDNA and transcriptome analysis in Hevea brasiliensis
Makita, Yuko; Ng, Kiaw Kiaw; Veera Singham, G.; Kawashima, Mika; Hirakawa, Hideki; Sato, Shusei
2017-01-01
Abstract Natural rubber has unique physical properties that cannot be replaced by products from other latex-producing plants or petrochemically produced synthetic rubbers. Rubber from Hevea brasiliensis is the main commercial source for this natural rubber that has a cis-polyisoprene configuration. For sustainable production of enough rubber to meet demand elucidation of the molecular mechanisms involved in the production of latex is vital. To this end, we firstly constructed rubber full-length cDNA libraries of RRIM 600 cultivar and sequenced around 20,000 clones by the Sanger method and over 15,000 contigs by Illumina sequencer. With these data, we updated around 5,500 gene structures and newly annotated around 9,500 transcription start sites. Second, to elucidate the rubber biosynthetic pathways and their transcriptional regulation, we carried out tissue- and cultivar-specific RNA-Seq analysis. By using our recently published genome sequence, we confirmed the expression patterns of the rubber biosynthetic genes. Our data suggest that the cytoplasmic mevalonate (MVA) pathway is the main route for isoprenoid biosynthesis in latex production. In addition to the well-studied polymerization factors, we suggest that rubber elongation factor 8 (REF8) is a candidate factor in cis-polyisoprene biosynthesis. We have also identified 39 transcription factors that may be key regulators in latex production. Expression profile analysis using two additional cultivars, RRIM 901 and PB 350, via an RNA-Seq approach revealed possible expression differences between a high latex-yielding cultivar and a disease-resistant cultivar. PMID:28431015
Structure, organization and expression of common carp (Cyprinus carpio L.) SLP-76 gene.
Huang, Rong; Sun, Xiao-Feng; Hu, Wei; Wang, Ya-Ping; Guo, Qiong-Lin
2008-05-01
SLP-76 is an important member of the SLP-76 family of adapters, and it plays a key role in TCR signaling and T cell function. Partial cDNA sequence of SLP-76 of common carp (Cyprinus carpio L.) was isolated from thymus cDNA library by the method of suppression subtractive hybridization (SSH). Subsequently, the full length cDNA of carp SLP-76 was obtained by means of 3' RACE and 5' RACE, respectively. The full length cDNA of carp SLP-76 was 2007 bp, consisting of a 5'-terminal untranslated region (UTR) of 285 bp, a 3'-terminal UTR of 240 bp, and an open reading frame of 1482 bp. Sequence comparison showed that the deduced amino acid sequence of carp SLP-76 had an overall similarity of 34-73% to that of other species homologues, and it was composed of an NH2-terminal domain, a central proline-rich domain, and a C-terminal SH2 domain. Amino acid sequence analysis indicated the existence of a Gads binding site R-X-X-K, a 10-aa-long sequence which binds to the SH3 domain of LCK in vitro, and three conserved tyrosine-containing sequence in the NH2-terminal domain. Then we used PCR to obtain a genomic DNA which covers the entire coding region of carp SLP-76. In the 9.2k-long genomic sequence, twenty one exons and twenty introns were identified. RT-PCR results showed that carp SLP-76 was expressed predominantly in hematopoietic tissues, and was upregulated in thymus tissue of four-month carp compared to one-year old carp. RT-PCR and virtual northern hybridization results showed that carp SLP-76 was also upregulated in thymus tissue of GH transgenic carp at the age of four-months. These results suggest that the expression level of SLP-76 gene may be related to thymocyte development in teleosts.
Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R
2007-04-01
We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.
Mahesh, Venkataramaiah; Rakotomalala, Jean Jacques; Le Gal, Lénaïg; Vigne, Hélène; de Kochko, Alexandre; Hamon, Serge; Noirot, Michel; Campa, Claudine
2006-09-01
Biosynthesis of caffeoylquinic acids occurs via the phenylpropanoid pathway in which the phenylalanine ammonia-lyase (PAL) acts as a key-control enzyme. A full-length cDNA (pF6), corresponding to a PAL gene (CcPAL1), was isolated by screening a Coffea canephora fruit cDNA library and its corresponding genomic sequence was characterized. Amplification of total DNA from seven Coffea species revealed differences in intronic length. This interspecific polymorphism was used to locate the gene on a genetic map established for a backcross progeny between Coffea pseudozanguebariae and C. dewevrei. The CcPAL1 gene was found on the same linkage group, but genetically independent, as a caffeoyl-coenzyme A-O-methyltransferase gene, another gene intervening in the phenylpropanoid pathway. In the same backcross, a lower caffeoylquinic acid content was observed in seeds harvested from plants harbouring the C. pseudozanguebariae CcPAL1 allele. Involvement of the CcPAL1 allelic form in the differential accumulation of caffeoylquinic acids in coffee green beans is then discussed.
Perkins, J B; Bower, S; Howitt, C L; Yocum, R R; Pero, J
1996-01-01
Northern (RNA) blot analysis of the Bacillus subtilis biotin operon, bioWAFDBIorf2, detected at least two steady-state polycistronic transcripts initiated from a putative vegetative (Pbio) promoter that precedes the operon, i.e., a full-length 7.2-kb transcript covering the entire operon and a more abundant 5.1-kb transcript covering just the first five genes of the operon. Biotin and the B. subtilis birA gene product regulated synthesis of the transcripts. Moreover, replacing the putative Pbio promoter and regulatory sequence with a constitutive SP01 phage promoter resulted in higher-level constitutive synthesis. Removal of a rho-independent terminator-like sequence located between the fifth (bioB) and sixth (bioI) genes prevented accumulation of the 5.1-kb transcript, suggesting that the putative terminator functions to limit expression of bioI, which is thought to be involved in an early step in biotin synthesis. PMID:8892842
Perkins, J B; Bower, S; Howitt, C L; Yocum, R R; Pero, J
1996-11-01
Northern (RNA) blot analysis of the Bacillus subtilis biotin operon, bioWAFDBIorf2, detected at least two steady-state polycistronic transcripts initiated from a putative vegetative (Pbio) promoter that precedes the operon, i.e., a full-length 7.2-kb transcript covering the entire operon and a more abundant 5.1-kb transcript covering just the first five genes of the operon. Biotin and the B. subtilis birA gene product regulated synthesis of the transcripts. Moreover, replacing the putative Pbio promoter and regulatory sequence with a constitutive SP01 phage promoter resulted in higher-level constitutive synthesis. Removal of a rho-independent terminator-like sequence located between the fifth (bioB) and sixth (bioI) genes prevented accumulation of the 5.1-kb transcript, suggesting that the putative terminator functions to limit expression of bioI, which is thought to be involved in an early step in biotin synthesis.
Phylogenetic analysis of canine distemper virus in domestic dogs in Nanjing, China.
Bi, Zhenwei; Wang, Yongshan; Wang, Xiaoli; Xia, Xingxia
2015-02-01
Canine distemper virus (CDV) infects a broad range of carnivores, including wild and domestic Canidae. The hemagglutinin gene, which encodes the attachment protein that determines viral tropism, has been widely used to determine the relationship between CDV strains of different lineages circulating worldwide. We determined the full-length H gene sequences of seven CDV field strains detected in domestic dogs in Nanjing, China. A phylogenetic analysis of the H gene sequences of CDV strains from different geographic regions and vaccine strains was performed. Four of the seven CDV strains were grouped in the same cluster of the Asia-1 lineage to which the vast majority of Chinese CDV strains belong, whereas the other three were clustered within the Asia-4 lineage, which has never been detected in China. This represents the first record of detection of strains of the Asia-4 lineage in China since this lineage was reported in Thailand in 2013.
High-resolution phylogenetic microbial community profiling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less
High-resolution phylogenetic microbial community profiling
Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin; ...
2016-02-09
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less
Analysis of the Citrullus colocynthis Transcriptome during Water Deficit Stress
Wang, Zhuoyu; Hu, Hongtao; Goertzen, Leslie R.; McElroy, J. Scott; Dane, Fenny
2014-01-01
Citrullus colocynthis is a very drought tolerant species, closely related to watermelon (C. lanatus var. lanatus), an economically important cucurbit crop. Drought is a threat to plant growth and development, and the discovery of drought inducible genes with various functions is of great importance. We used high throughput mRNA Illumina sequencing technology and bioinformatic strategies to analyze the C. colocynthis leaf transcriptome under drought treatment. Leaf samples at four different time points (0, 24, 36, or 48 hours of withholding water) were used for RNA extraction and Illumina sequencing. qRT-PCR of several drought responsive genes was performed to confirm the accuracy of RNA sequencing. Leaf transcriptome analysis provided the first glimpse of the drought responsive transcriptome of this unique cucurbit species. A total of 5038 full-length cDNAs were detected, with 2545 genes showing significant changes during drought stress. Principle component analysis indicated that drought was the major contributing factor regulating transcriptome changes. Up regulation of many transcription factors, stress signaling factors, detoxification genes, and genes involved in phytohormone signaling and citrulline metabolism occurred under the water deficit conditions. The C. colocynthis transcriptome data highlight the activation of a large set of drought related genes in this species, thus providing a valuable resource for future functional analysis of candidate genes in defense of drought stress. PMID:25118696
Pelnena, Dita; Burnyte, Birute; Jankevics, Eriks; Lace, Baiba; Dagyte, Evelina; Grigalioniene, Kristina; Utkus, Algirdas; Krumina, Zita; Rozentale, Jolanta; Adomaitiene, Irina; Stavusis, Janis; Pliss, Liana; Inashkina, Inna
2017-12-12
The most common mitochondrial disorder in children is Leigh syndrome, which is a progressive and genetically heterogeneous neurodegenerative disorder caused by mutations in nuclear genes or mitochondrial DNA (mtDNA). In the present study, a novel and robust method of complete mtDNA sequencing, which allows amplification of the whole mitochondrial genome, was tested. Complete mtDNA sequencing was performed in a cohort of patients with suspected mitochondrial mutations. Patients from Latvia and Lithuania (n = 92 and n = 57, respectively) referred by clinical geneticists were included. The de novo point mutations m.9185T>C and m.13513G>A, respectively, were detected in two patients with lactic acidosis and neurodegenerative lesions. In one patient with neurodegenerative lesions, the mutation m.9185T>C was identified. These mutations are associated with Leigh syndrome. The present data suggest that full-length mtDNA sequencing is recommended as a supplement to nuclear gene testing and enzymatic assays to enhance mitochondrial disease diagnostics.
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Creager, Hannah M; Becker, Ericka A; Sandman, Kelly K; Karl, Julie A; Lank, Simon M; Bimber, Benjamin N; Wiseman, Roger W; Hughes, Austin L; O'Connor, Shelby L; O'Connor, David H
2011-09-01
In recent years, the use of cynomolgus macaques in biomedical research has increased greatly. However, with the exception of the Mauritian population, knowledge of the MHC class II genetics of the species remains limited. Here, using cDNA cloning and Sanger sequencing, we identified 127 full-length MHC class II alleles in a group of 12 Indonesian and 12 Vietnamese cynomolgus macaques. Forty two of these were completely novel to cynomolgus macaques while 61 extended the sequence of previously identified alleles from partial to full length. This more than doubles the number of full-length cynomolgus macaque MHC class II alleles available in GenBank, significantly expanding the allele library for the species and laying the groundwork for future evolutionary and functional studies.
Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai
2017-06-01
Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Diray-Arce, Joann; Liu, Bin; Cupp, John D; Hunt, Travis; Nielsen, Brent L
2013-03-04
The Arabidopsis thaliana genome encodes a homologue of the full-length bacteriophage T7 gp4 protein, which is also homologous to the eukaryotic Twinkle protein. While the phage protein has both DNA primase and DNA helicase activities, in animal cells Twinkle is localized to mitochondria and has only DNA helicase activity due to sequence changes in the DNA primase domain. However, Arabidopsis and other plant Twinkle homologues retain sequence homology for both functional domains of the phage protein. The Arabidopsis Twinkle homologue has been shown by others to be dual targeted to mitochondria and chloroplasts. To determine the functional activity of the Arabidopsis protein we obtained the gene for the full-length Arabidopsis protein and expressed it in bacteria. The purified protein was shown to have both DNA primase and DNA helicase activities. Western blot and qRT-PCR analysis indicated that the Arabidopsis gene is expressed most abundantly in young leaves and shoot apex tissue, as expected if this protein plays a role in organelle DNA replication. This expression is closely correlated with the expression of organelle-localized DNA polymerase in the same tissues. Homologues from other plant species show close similarity by phylogenetic analysis. The results presented here indicate that the Arabidopsis phage T7 gp4/Twinkle homologue has both DNA primase and DNA helicase activities and may provide these functions for organelle DNA replication.
Enzmann, P.-J.; Kurath, G.; Fichtner, D.; Bergmann, S.M.
2005-01-01
Infectious hematopoietic necrosis virus (IHNV) was first detected in Europe in 1987 in France and Italy, and later, in 1992, in Germany. The source of the virus and the route of introduction are unknown. The present study investigates the molecular epidemiology of IHNV outbreaks in Germany since its first introduction. The complete nucleotide sequences of the glycoprotein (G) and non-virion (NV) genes from 9 IHNV isolates from Germany have been determined, and this has allowed the identification of characteristic differences between these isolates. Phylogenetic analysis of partial G gene sequences (mid-G, 303 nucleotides) from North American IHNV isolates (Kurath et al. 2003) has revealed 3 major genogroups, designated U, M and L. Using this gene region with 2 different North American IHNV data sets, it was possible to group the European IHNV strains within the M genogroup, but not in any previously defined subgroup. Analysis of the full length G gene sequences indicated that an independent evolution of IHN viruses had occurred in Europe. IHN viruses in Europe seem to be of a monophyletic origin, again most closely related to North American isolates in the M genogroup. Analysis of the NV gene sequences also showed the European isolates to be monophyletic, but resolution of the 3 genogroups was poor with this gene region. As a result of comparative sequence analyses, several different genotypes have been identified circulating in Europe. ?? Inter-Research 2005.
Molecular cloning of pepsinogens A and C from adult newt (Cynops pyrrhogaster) stomach.
Inokuchi, Tomofumi; Ikuzawa, Masayuki; Yamazaki, Shin; Watanabe, Yukari; Shiota, Koushiro; Katoh, Takuma; Kobayashi, Ken-Ichiro
2013-08-01
The full-length cDNAs of three pepsinogens (Pgs) were cloned from the stomach of newt, Cynops pyrrhogaster, and nucleotide sequences of the full-length cDNAs were determined. Molecular phylogenetic analysis showed that two Pgs, named PgC1 and PgC2, belong to the pepsinogen C group, and one Pg, named PgA, belongs to the pepsinogen A group. The sequences contain an open reading frame (ORF) encoding 385 amino acid residues for PgC1, 383 amino acid residues for PgC2 and 377 amino acid residues for PgA. In addition, all of the three amino acid sequences conserve some unique characteristics such as six cysteine residues and putative active site two aspartic acid residues. All of the pepsinogen mRNAs were detected in the stomach by RT-PCR but not in other organs. Although a slight difference at the time of the start of expression was seen among the three pepsinogen genes, all of them were expressed in the larval stage after hatching. This is the first report on cloning of pepsinogens from urodele stomach. Copyright © 2013 Elsevier Inc. All rights reserved.
[Construction of the superantigen SEA transfected laryngocarcinoma cells].
Ji, Xiaobin; Jingli, J V; Liu, Qicai; Xie, Jinghua
2013-04-01
To construct an eukaryotic expression vectors containing superantigen staphylococcal enterotoxin A (SEA) gene, and to identify its expression in laryngeal squamous carcinoma cells. SEA full-length gene fragment was obtained from ATCC13565 genome of the staphylococcus, referencing standard strains producing SEA. Coding sequence of SEA was artificially synthetized. Than, SEA gene fragments was subcloned into eukaryotic expression vector pIRES-EGFP. The recombinant plasmid pSEA-IRES-EGFP was constructed and was transfected to laryngocarcinoma Hep-2 cells. Resistant clones were screened by G418. The expression of SEA in laryngocarcinoma cells was identified with ELISA and RT-PCR method. The subclone of artificially synthetized SEA gene was subclone to eukaryotic expression vector pires-EGFP. Flanking sequence confirmed that SEA sequence was fully identical to the coding sequence of standard staphylococcus strains ATCC13565 in Genbank. After recombinant plasmid transfected to laryngocarcinoma cells, the resistant clones was obtained after screening for two weeks. The clones were selected. The specific gene fragment was obtained by RT-PCR amplification. ELISA assay confirmed that the content of SEA protein in supernatant fluid of cell culture had reached about Pg level. The recombinant eukaryotic expression vector containing superantigen SEA gene is successfully constructed, and is capable of effective expression and continued secretion of SEA protein in laryngochrcinoma Hep-2 cells after recombinant plasmid transfected to laryngocarcinoma cells.
Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz
2014-01-01
The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., ‘Regina’), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop. PMID:26504533
Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai
2016-05-01
In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Hingston, Patricia; Chen, Jessica; Dhillon, Bhavjinder K.; Laing, Chad; Bertelli, Claire; Gannon, Victor; Tasara, Taurai; Allen, Kevin; Brinkman, Fiona S. L.; Truelstrup Hansen, Lisbeth; Wang, Siyun
2017-01-01
The human pathogen Listeria monocytogenes is a large concern in the food industry where its continuous detection in food products has caused a string of recalls in North America and Europe. Most recognized for its ability to grow in foods during refrigerated storage, L. monocytogenes can also tolerate several other food-related stresses with some strains possessing higher levels of tolerances than others. The objective of this study was to use a combination of phenotypic analyses and whole genome sequencing to elucidate potential relationships between L. monocytogenes genotypes and food-related stress tolerance phenotypes. To accomplish this, 166 L. monocytogenes isolates were sequenced and evaluated for their ability to grow in cold (4°C), salt (6% NaCl, 25°C), and acid (pH 5, 25°C) stress conditions as well as survive desiccation (33% RH, 20°C). The results revealed that the stress tolerance of L. monocytogenes is associated with serotype, clonal complex (CC), full length inlA profiles, and the presence of a plasmid which was identified in 55% of isolates. Isolates with full length inlA exhibited significantly (p < 0.001) enhanced cold tolerance relative to those harboring a premature stop codon (PMSC) in this gene. Similarly, isolates possessing a plasmid demonstrated significantly (p = 0.013) enhanced acid tolerance. We also identified nine new L. monocytogenes sequence types, a new inlA PMSC, and several connections between CCs and the presence/absence or variations of specific genetic elements. A whole genome single-nucleotide-variants phylogeny revealed sporadic distribution of tolerant isolates and closely related sensitive and tolerant isolates, highlighting that minor genetic differences can influence the stress tolerance of L. monocytogenes. Specifically, a number of cold and desiccation sensitive isolates contained PMSCs in σB regulator genes (rsbS, rsbU, rsbV). Collectively, the results suggest that knowing the sequence type of an isolate in addition to screening for the presence of full-length inlA and a plasmid, could help food processors and food agency investigators determine why certain isolates might be persisting in a food processing environment. Additionally, increased sequencing of L. monocytogenes isolates in combination with stress tolerance profiling, will enhance the ability to identify genetic elements associated with higher risk strains. PMID:28337186
Detailed Transcriptome Description of the Neglected Cestode Taenia multiceps
Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou
2012-01-01
Background The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. Methodology/Principal Findings We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. Conclusions/Significance This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and parasite-host interaction studies. PMID:23049872
[Cloning and characterization of Caveolin-1 gene in pigeon, Columba livia domestica].
Zhang, Ying; Yu, Jian-Feng; Yang, Li; Wang, Xing-Guo; Gu, Zhi-Liang
2010-10-01
Caveolins, a class of principal proteins forming the structure of caveolae in plasmalemma, were encoded by caveolins gene family. Caveolin-1 gene is a member of caveolins gene family. In the present study, a full-length of 2605 bp caveolin-1 cDNA sequence in Columba livia domestica, which included a 537 bp complete ORF encoding a 178 amino acids long putative peptide, were obtained by using RT-PCR and RACE technique. The Columba livia domestica caveolin-1 CDS shared 80.1% - 93.4% homology with Bos taurus, Canis lupus familiaris, Gallus gallus and Rattus norvegicus. Meanwhile, the putative amino acid sequence of Columba livia domestica caveolin-1 shared 85.4% - 97.2% homology with the above species. The semi-quantity RT-PCR revealed that Caveolin-1 expressions were detectable in all the Columba livia domestica tissues and the expressional level of caveolin-1 gene was high in adipose, medium in various muscles, low in liver. These results demonstrated that Caveolin-1 gene was potentially involved in some metabolic pathways in adipose and muscle.
Yadav, Narendra Singh; Rashmi, Deo; Singh, Dinkar; Agarwal, Pradeep K; Jha, Bhavanath
2012-02-01
Salicornia brachiata is one of the extreme salt tolerant plants and grows luxuriantly in coastal areas. Previously we have reported isolation and characterization of ESTs from S. brachiata with large number of unknown gene sequences. Reverse Northern analysis showed upregulation and downregulation of few unknown genes in response to salinity. Some of these unknown genes were made full length and their functional analysis is being tested. In this study, we have selected a novel unknown salt inducible gene SbSI-1 (Salicornia brachiata salt inducible-1) for the functional validation. The SbSI-1 (Gen-Bank accession number JF 965339) was made full length and characterized in detail for its functional validation under desiccation and salinity. The SbSI-1 gene is 917 bp long, and contained 437 bp 3' UTR, and 480 bp ORF region encoding 159 amino acids protein with estimated molecular mass of 18.39 kDa and pI 8.58. The real time PCR analysis revealed high transcript expression in salt, desiccation, cold and heat stresses. However, the maximum expression was obtained by desiccation. The ORF region of SbSI-1 was cloned in pET28a vector and transformed in BL21 (DE3) E. coli cells. The SbSI-1 recombinant E. coli cells showed tolerance to desiccation and salinity stress compared to only vector in the presence of stress.
Ngernyuang, Nipaporn; Kobayashi, Isao; Promboon, Amornrat; Ratanapo, Sunanta; Tamura, Toshiki; Ngernsiri, Lertluk
2011-01-01
α-Amylase is a common enzyme for hydrolyzing starch. In the silkworm, Bombyx mori L. (Lepidoptera: Bombycidae), α-amylase is found in both digestive fluid and hemolymph. Here, the complete genomic sequence of the Amy gene encoding α-amylase from a local Thai silkworm, the Nanglai strain, was obtained. This gene was 7981 bp long with 9 exons. The full length Amy cDNA sequence was 1749 bp containing a 1503 bp open reading frame. The ORF encoded 500 amino acid residues. The deduced protein showed 81–54% identity to other insect α-amylases and more than 50% identity to mammalian enzymes. Southern blot analysis revealed that in the Nanglai strain Amy is a single-copy gene. RT- PCR showed that Amy was transcribed only in the foregut. Transgenic B. mori also showed that the Amy promoter activates expression of the transgene only in the foregut. PMID:21529256
Identification of a mouse synaptic glycoprotein gene in cultured neurons.
Yu, Albert Cheung-Hoi; Sun, Chun Xiao; Li, Qiang; Liu, Hua Dong; Wang, Chen Ran; Zhao, Guo Ping; Jin, Meilei; Lau, Lok Ting; Fung, Yin-Wan Wendy; Liu, Shuang
2005-10-01
Neuronal differentiation and aging are known to involve many genes, which may also be differentially expressed during these developmental processes. From primary cultured cerebral cortical neurons, we have previously identified various differentially expressed gene transcripts from cultured cortical neurons using the technique of arbitrarily primed PCR (RAP-PCR). Among these transcripts, clone 0-2 was found to have high homology to rat and human synaptic glycoprotein. By in silico analysis using an EST database and the FACTURA software, the full-length sequence of 0-2 was assembled and the clone was named as mouse synaptic glycoprotein homolog 2 (mSC2). DNA sequencing revealed transcript size of mSC2 being smaller than the human and rat homologs. RT-PCR indicated that mSC2 was expressed differentially at various culture days. The mSC2 gene was located in various tissues with higher expression in brain, lung, and liver. Functions of mSC2 in neurons and other tissues remain elusive and will require more investigation.
Houtz, Robert L.
2001-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .sup..epsilon. N-methyltansferase (protein methylase III or Rubisco LSMT) from a plant which has a des(methyl) lysyl residue in the LS is disclosed. In addition, the full-length cDNA clones for Rubisco LSMT are disclosed. Transgenic plants and methods of producing same which have the Rubisco LSMT gene inserted into the DNA are also provided. Further, methods of inactivating the enzymatic activity of Rubisco LSMT are also disclosed.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Immune-Related Transcriptome of Coptotermes formosanus Shiraki Workers: The Defense Mechanism
Hussain, Abid; Li, Yi-Feng; Cheng, Yu; Liu, Yang; Chen, Chuan-Cheng; Wen, Shuo-Yang
2013-01-01
Formosan subterranean termites, Coptotermes formosanus Shiraki, live socially in microbial-rich habitats. To understand the molecular mechanism by which termites combat pathogenic microbes, a full-length normalized cDNA library and four Suppression Subtractive Hybridization (SSH) libraries were constructed from termite workers infected with entomopathogenic fungi (Metarhizium anisopliae and Beauveria bassiana), Gram-positive Bacillus thuringiensis and Gram-negative Escherichia coli, and the libraries were analyzed. From the high quality normalized cDNA library, 439 immune-related sequences were identified. These sequences were categorized as pattern recognition receptors (47 sequences), signal modulators (52 sequences), signal transducers (137 sequences), effectors (39 sequences) and others (164 sequences). From the SSH libraries, 27, 17, 22 and 15 immune-related genes were identified from each SSH library treated with M. anisopliae, B. bassiana, B. thuringiensis and E. coli, respectively. When the normalized cDNA library was compared with the SSH libraries, 37 immune-related clusters were found in common; 56 clusters were identified in the SSH libraries, and 259 were identified in the normalized cDNA library. The immune-related gene expression pattern was further investigated using quantitative real time PCR (qPCR). Important immune-related genes were characterized, and their potential functions were discussed based on the integrated analysis of the results. We suggest that normalized cDNA and SSH libraries enable us to discover functional genes transcriptome. The results remarkably expand our knowledge about immune-inducible genes in C. formosanus Shiraki and enable the future development of novel control strategies for the management of Formosan subterranean termites. PMID:23874972
USDA-ARS?s Scientific Manuscript database
The genetically engineered plum 'HoneySweet' (aka C5) has proven to be highly resistant to Plum pox virus (PPV) for over 10 years in field trials. The original vector used for transformation to develop 'HoneySweet' carried a single sense sequence of the full length PPV coat protein (ppv-cp) gene, y...
Zhu, Shengming; Wang, Yanping; Zheng, Hong; Cheng, Jingqiu; Lu, Yanrong; Zeng, Yangzhi; Wang, Yu; Wang, Zhu
2009-04-01
This study sought to clone Chinese Banna minipig inbred-line (BMI) alpha1,3-galactosyltransferase (alpha1,3-GT) gene and construct its recombinant eukaryotic expression vector. Total RNA was isolated from BMI liver. Full length cDNA of alpha1,3-GT gene was amplified by RT-PCR and cloned into pMD18-T vector to sequence. Subsequently, alpha1,3-GT gene was inserted into pEGFP-N1 to construct eukaryotic expression vector pEGFP-N1-GT. Then the reconstructed plasmid pEGFP-N1-GT was transiently transfected into human lung cancer cell line A549. The expression of alpha1,3-GT mRNA in transfected cells was detected by RT-PCR. FITC-BS-IB4 lectin was used in the direct immunofluorescence method, which was performed to observe the alpha-Gal synthesis function of BMI alpha1,3-GT in transfected cells. The results showed that full length of BMI alpha1,3-GT cDNA was 1116 bp. BMI alpha1,3-GT cDNA sequence was highly homogenous with those of mouse and bovine, and was exactly the same as the complete sequence of those of swine, pEGFP-N1-GT was confirmed by enzyme digestion and PCR. The expression of alpha1,3-GT mRNA was detected in A549 cells transfected by pEGFP-N1-GT. The expression of alpha-Gal was observed on the membrane of A549 cells transfected by pEGFP-N1-GT. Successful cloning of BMI alpha1,3-GT cDNA and construction of its eukaryotic expression vector have established a foundation for further research and application of BMI alpha1,3-GT in the fields of xenotransplantation and immunological therapy of cancer.
Community analysis of a full-scale anaerobic bioreactor treating paper mill wastewater.
Roest, Kees; Heilig, Hans G H J; Smidt, Hauke; de Vos, Willem M; Stams, Alfons J M; Akkermans, Antoon D L
2005-03-01
To get insight into the microbial community of an Upflow Anaerobic Sludge Blanket reactor treating paper mill wastewater, conventional microbiological methods were combined with 16S rRNA gene analyses. Particular attention was paid to microorganisms able to degrade propionate or butyrate in the presence or absence of sulphate. Serial enrichment dilutions allowed estimating the number of microorganisms per ml sludge that could use butyrate with or without sulphate (10(5)), propionate without sulphate (10(6)), or propionate and sulphate (10(8)). Quantitative RNA dot-blot hybridisation indicated that Archaea were two-times more abundant in the microbial community of anaerobic sludge than Bacteria. The microbial community composition was further characterised by 16S rRNA-gene-targeted Denaturing Gradient Gel Electrophoresis (DGGE) fingerprinting, and via cloning and sequencing of dominant amplicons from the bacterial and archaeal patterns. Most of the nearly full length (approximately 1.45 kb) bacterial 16S rRNA gene sequences showed less than 97% similarity to sequences present in public databases, in contrast to the archaeal clones (approximately. 1.3 kb) that were highly similar to known sequences. While Methanosaeta was found as the most abundant genus, also Crenarchaeote-relatives were identified. The microbial community was relatively stable over a period of 3 years (samples taken in July 1999, May 2001, March 2002 and June 2002) as indicated by the high similarity index calculated from DGGE profiles (81.9+/-2.7% for Bacteria and 75.1+/-3.1% for Archaea). 16S rRNA gene sequence analysis indicated the presence of unknown and yet uncultured microorganisms, but also showed that known sulphate-reducing bacteria and syntrophic fatty acid-oxidising microorganisms dominated the enrichments.
Trojan, Daniela; Schreiber, Lars; Bjerg, Jesper T; Bøggild, Andreas; Yang, Tingting; Kjeldsen, Kasper U; Schramm, Andreas
2016-07-01
Cable bacteria are long, multicellular filaments that can conduct electric currents over centimeter-scale distances. All cable bacteria identified to date belong to the deltaproteobacterial family Desulfobulbaceae and have not been isolated in pure culture yet. Their taxonomic delineation and exact phylogeny is uncertain, as most studies so far have reported only short partial 16S rRNA sequences or have relied on identification by a combination of filament morphology and 16S rRNA-targeted fluorescence in situ hybridization with a Desulfobulbaceae-specific probe. In this study, nearly full-length 16S rRNA gene sequences of 16 individual cable bacteria filaments from freshwater, salt marsh, and marine sites of four geographic locations are presented. These sequences formed a distinct, monophyletic sister clade to the genus Desulfobulbus and could be divided into six coherent, species-level clusters, arranged as two genus-level groups. The same grouping was retrieved by phylogenetic analysis of full or partial dsrAB genes encoding the dissimilatory sulfite reductase. Based on these results, it is proposed to accommodate cable bacteria within two novel candidate genera: the mostly marine "Candidatus Electrothrix", with four candidate species, and the mostly freshwater "Candidatus Electronema", with two candidate species. This taxonomic framework can be used to assign environmental sequences confidently to the cable bacteria clade, even without morphological information. Database searches revealed 185 16S rRNA gene sequences that affiliated within the clade formed by the proposed cable bacteria genera, of which 120 sequences could be assigned to one of the six candidate species, while the remaining 65 sequences indicated the existence of up to five additional species. Copyright © 2016 The Author(s). Published by Elsevier GmbH.. All rights reserved.
Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.
Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene
2017-02-01
Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Zhang, Sufang; Zhang, Zhen; Wang, Hongbin; Kong, Xiangbo
2014-09-01
The Yunnan pine and Simao pine caterpillar moths, Dendrolimus houi Lajonquière and Dendrolimus kikuchii Matsumura (Lepidoptera: Lasiocampidae), are two closely related and sympatric pests of coniferous forests in southwestern China, and olfactory communication systems of these two insects have received considerable attention because of their economic importance. However, there is little information on the molecular aspect of odor detection about these insects. Furthermore, although lepidopteran species have been widely used in studies of insect olfaction, few work made comparison between sister moths on the olfactory recognition mechanisms. In this study, next-generation sequencing of the antennal transcriptome of these two moths were performed to identify the major olfactory genes. After comparing the antennal transcriptome of these two moths, we found that they exhibit highly similar transcripts-associated GO terms. Chemosensory gene families were further analyzed in both species. We identified 23 putative odorant binding proteins (OBP), 17 chemosensory proteins (CSP), two sensory neuron membrane proteins (SNMP), 33 odorant receptors (OR), and 10 ionotropic receptors (IR) in D. houi; and 27 putative OBPs, 17 CSPs, two SNMPs, 33 ORs, and nine IRs in D. kikuchii. All these transcripts were full-length or almost full-length. The predicted protein sequences were compared with orthologs in other species of Lepidoptera and model insects, including Bombyx mori, Manduca sexta, Heliothis virescens, Danaus plexippus, Sesamia inferens, Cydia pomonella, and Drosophila melanogaster. The sequence homologies of the orthologous genes in D. houi and D. kikuchii are very high. Furthermore, the olfactory genes were classed according to their expression level, and the highly expressed genes are our target for further function investigation. Interestingly, many highly expressed genes are ortholog gene of D. houi and D. kikuchii. We also found that the Classic OBPs were further separated into three groups according to their motifs, which will help future functional researches. Surprisingly, no pheromone receptor was identified in the two Dendrolimus species, which may indicate a special pheromone identification mechanism in Dendrolimus. Our work allows for further functional studies of pheromones and host volatile recognition genes, and give novel candidate targets for pest management. Copyright © 2014 Elsevier Ltd. All rights reserved.
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei
2016-07-01
The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group.
Clonal expansion of genome-intact HIV-1 in functionally polarized Th1 CD4+ T cells
Orlova-Fink, Nina; Einkauf, Kevin; Chowdhury, Fatema Z.; Sun, Xiaoming; Harrington, Sean; Kuo, Hsiao-Hsuan; Hua, Stephane; Chen, Hsiao-Rong; Ouyang, Zhengyu; Reddy, Kavidha; Dong, Krista; Ndung’u, Thumbi; Walker, Bruce D.; Rosenberg, Eric S.; Yu, Xu G.
2017-01-01
HIV-1 causes a chronic, incurable disease due to its persistence in CD4+ T cells that contain replication-competent provirus, but exhibit little or no active viral gene expression and effectively resist combination antiretroviral therapy (cART). These latently infected T cells represent an extremely small proportion of all circulating CD4+ T cells but possess a remarkable long-term stability and typically persist throughout life, for reasons that are not fully understood. Here we performed massive single-genome, near-full-length next-generation sequencing of HIV-1 DNA derived from unfractionated peripheral blood mononuclear cells, ex vivo-isolated CD4+ T cells, and subsets of functionally polarized memory CD4+ T cells. This approach identified multiple sets of independent, near-full-length proviral sequences from cART-treated individuals that were completely identical, consistent with clonal expansion of CD4+ T cells harboring intact HIV-1. Intact, near-full-genome HIV-1 DNA sequences that were derived from such clonally expanded CD4+ T cells constituted 62% of all analyzed genome-intact sequences in memory CD4 T cells, were preferentially observed in Th1-polarized cells, were longitudinally detected over a duration of up to 5 years, and were fully replication- and infection-competent. Together, these data suggest that clonal proliferation of Th1-polarized CD4+ T cells encoding for intact HIV-1 represents a driving force for stabilizing the pool of latently infected CD4+ T cells. PMID:28628034
Clonal expansion of genome-intact HIV-1 in functionally polarized Th1 CD4+ T cells.
Lee, Guinevere Q; Orlova-Fink, Nina; Einkauf, Kevin; Chowdhury, Fatema Z; Sun, Xiaoming; Harrington, Sean; Kuo, Hsiao-Hsuan; Hua, Stephane; Chen, Hsiao-Rong; Ouyang, Zhengyu; Reddy, Kavidha; Dong, Krista; Ndung'u, Thumbi; Walker, Bruce D; Rosenberg, Eric S; Yu, Xu G; Lichterfeld, Mathias
2017-06-30
HIV-1 causes a chronic, incurable disease due to its persistence in CD4+ T cells that contain replication-competent provirus, but exhibit little or no active viral gene expression and effectively resist combination antiretroviral therapy (cART). These latently infected T cells represent an extremely small proportion of all circulating CD4+ T cells but possess a remarkable long-term stability and typically persist throughout life, for reasons that are not fully understood. Here we performed massive single-genome, near-full-length next-generation sequencing of HIV-1 DNA derived from unfractionated peripheral blood mononuclear cells, ex vivo-isolated CD4+ T cells, and subsets of functionally polarized memory CD4+ T cells. This approach identified multiple sets of independent, near-full-length proviral sequences from cART-treated individuals that were completely identical, consistent with clonal expansion of CD4+ T cells harboring intact HIV-1. Intact, near-full-genome HIV-1 DNA sequences that were derived from such clonally expanded CD4+ T cells constituted 62% of all analyzed genome-intact sequences in memory CD4 T cells, were preferentially observed in Th1-polarized cells, were longitudinally detected over a duration of up to 5 years, and were fully replication- and infection-competent. Together, these data suggest that clonal proliferation of Th1-polarized CD4+ T cells encoding for intact HIV-1 represents a driving force for stabilizing the pool of latently infected CD4+ T cells.
Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea
2015-01-01
In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3’end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding. PMID:25768012
Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea
2015-01-01
In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3'end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding.
Huang, Shengbing; Song, Wei; Lin, Qishui
2005-08-01
A membrane-bound protein was purified from rat liver mitochondria. After being digested with V8 protease, two peptides containing identical 14 amino acid residue sequences were obtained. Using the 14 amino acid peptide derived DNA sequence as gene specific primer, the cDNA of correspondent gene 5'-terminal and 3'-terminal were obtained by RACE technique. The full-length cDNA that encoded a protein of 616 amino acids was thus cloned, which included the above mentioned peptide sequence. The full length cDNA was highly homologous to that of human ETF-QO, indicating that it may be the cDNA of rat ETF-QO. ETF-QO is an iron sulfur protein located in mitochondria inner membrane containing two kinds of redox center: FAD and [4Fe-4S] center. After comparing the sequence from the cDNA of the 616 amino acids protein with that of the mature protein of rat liver mitochondria, it was found that the N terminal 32 amino acid residues did not exist in the mature protein, indicating that the cDNA was that of ETF-QOp. When the cDNA was expressed in Saccharomyces cerevisiae with inducible vectors, the protein product was enriched in mitochondrial fraction and exhibited electron transfer activity (NBT reductase activity) of ETF-QO. Results demonstrated that the 32 amino acid peptide was a mitochondrial targeting peptide, and both FAD and iron-sulfur cluster were inserted properly into the expressed ETF-QO. ETF-QO had a high level expression in rat heart, liver and kidney. The fusion protein of GFP-ETF-QO co-localized with mitochondria in COS-7 cells.
van der Walt, Elizna M; Smuts, Izelle; Taylor, Robert W; Elson, Joanna L; Turnbull, Douglass M; Louw, Roan; van der Westhuizen, Francois H
2012-06-01
Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
Yeoh, Keat-Ai; Othman, Abrizah; Meon, Sariah; Abdullah, Faridah; Ho, Chai-Ling
2012-10-15
Glucanases are enzymes that hydrolyze a variety β-d-glucosidic linkages. Plant β-1,3-glucanases are able to degrade fungal cell walls; and promote the release of cell-wall derived fungal elicitors. In this study, three full-length cDNA sequences encoding oil palm (Elaeis guineensis) glucanases were analyzed. Sequence analyses of the cDNA sequences suggested that EgGlc1-1 is a putative β-d-glucan exohydolase belonging to glycosyl hydrolase (GH) family 3 while EgGlc5-1 and EgGlc5-2 are putative glucan endo-1,3-β-glucosidases belonging to GH family 17. The transcript abundance of these genes in the roots and leaves of oil palm seedlings treated with Ganoderma boninense and Trichoderma harzianum was profiled to investigate the involvement of these glucanases in oil palm during fungal infection. The gene expression of EgGlc1-1 in the root of oil palm seedlings was increased by T. harzianum but suppressed by G. boninense; while the gene expression of both EgGlc5-1 and EgGlc5-2 in the roots of oil palm seedlings was suppressed by G. boninense or/and T. harzianum. Copyright © 2012 Elsevier GmbH. All rights reserved.
Goller, Katja V; Gabriel, Claudia; Dimna, Mireille Le; Le Potier, Marie-Frédérique; Rossi, Sophie; Staubach, Christoph; Merboth, Matthias; Beer, Martin; Blome, Sandra
2016-03-01
Classical swine fever is a viral disease of pigs that carries tremendous socio-economic impact. In outbreak situations, genetic typing is carried out for the purpose of molecular epidemiology in both domestic pigs and wild boar. These analyses are usually based on harmonized partial sequences. However, for high-resolution analyses towards the understanding of genetic variability and virus evolution, full-genome sequences are more appropriate. In this study, a unique set of representative virus strains was investigated that was collected during an outbreak in French free-ranging wild boar in the Vosges-du-Nord mountains between 2003 and 2007. Comparative sequence and evolutionary analyses of the nearly full-length sequences showed only slow evolution of classical swine fever virus strains over the years and no impact of vaccination on mutation rates. However, substitution rates varied amongst protein genes; furthermore, a spatial and temporal pattern could be observed whereby two separate clusters were formed that coincided with physical barriers.
Malouli, Daniel; Howell, Grant L; Legasse, Alfred W; Kahl, Christoph; Axthelm, Michael K; Hansen, Scott G; Früh, Klaus
2014-09-01
Multiple novel simian adenoviruses have been isolated over the past years and their potential to cross the species barrier and infect the human population is an ever present threat. Here we describe the isolation and full genome sequencing of a novel simian adenovirus (SAdV) isolated from the urine of two independent, never co-housed, late stage simian immunodeficiency virus (SIV)-infected rhesus macaques. The viral genome sequences revealed a novel type with a unique genome length, GC content, E3 region and DNA polymerase amino acid sequence that is sufficiently distinct from all currently known human- or simian adenovirus species to warrant classifying these isolates as a novel species of simian adenovirus. This new species, termed Simian mastadenovirus D (SAdV-D), displays the standard genome organization for the genus Mastadenovirus containing only one copy of the fiber gene which sets it apart from the old world monkey adenovirus species HAdV-G, SAdV-B and SAdV-C.
Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis
2016-08-24
To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.
The complete mitochondrial genome of a chronic hepatitis associated liver cancer LEC rat strain.
Zhang, Sihao; Jiang, Zhaoming; Zhang, Shuai; Xia, Mingfeng; Tian, Fang; Tian, Hu
2016-05-01
We sequenced a complete mitochondrial genome sequencing of a chronic hepatitis-associated liver cancer disease LEC rat strain for the first time. The total length of the mitogenome was 16,316 bp with 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. This mitochondrial genome sequence will provide new genetic resource into liver cancer disease.
Identification of the Drosophila eIF4A gene as a target of the DREF transcription factor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ida, Hiroyuki; Insect Biomedical Research Center, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585; Yoshida, Hideki
2007-12-10
The DNA replication-related element-binding factor (DREF) regulates cell proliferation-related gene expression in Drosophila. We have carried out a genetic screening, taking advantage of the rough eye phenotype of transgenic flies that express full-length DREF in the eye imaginal discs and identified the eukaryotic initiation factor 4A (eIF4A) gene as a dominant suppressor of the DREF-induced rough eye phenotype. The eIF4A gene was here found to carry three DRE sequences, DRE1 (- 40 to - 47), DRE2 (- 48 to - 55), and DRE3 (- 267 to - 274) in its promoter region, these all being important for the eIF4A genemore » promoter activity in cultured Drosophila Kc cells and in living flies. Knockdown of DREF in Drosophila S2 cells decreased the eIF4A mRNA level and the eIF4A gene promoter activity. Furthermore, specific binding of DREF to genomic regions containing DRE sequences was demonstrated by chromatin immunoprecipitation assays using anti-DREF antibodies. Band mobility shift assays using Kc cell nuclear extracts revealed that DREF could bind to DRE1 and DRE3 sequences in the eIF4A gene promoter in vitro, but not to the DRE2 sequence. The results suggest that the eIF4A gene is under the control of the DREF pathway and DREF is therefore involved in the regulation of protein synthesis.« less
Arthur, A K; Höss, A; Fanning, E
1988-01-01
The genomic coding sequence of the large T antigen of simian virus 40 (SV40) was cloned into an Escherichia coli expression vector by joining new restriction sites, BglII and BamHI, introduced at the intron boundaries of the gene. Full-length large T antigen, as well as deletion and amino acid substitution mutants, were inducibly expressed from the lac promoter of pUC9, albeit with different efficiencies and protein stabilities. Specific interaction with SV40 origin DNA was detected for full-length T antigen and certain mutants. Deletion mutants lacking T-antigen residues 1 to 130 and 260 to 708 retained specific origin-binding activity, demonstrating that the region between residues 131 and 259 must carry the essential binding domain for DNA-binding sites I and II. A sequence between residues 302 and 320 homologous to a metal-binding "finger" motif is therefore not required for origin-specific binding. However, substitution of serine for either of two cysteine residues in this motif caused a dramatic decrease in origin DNA-binding activity. This region, as well as other regions of the full-length protein, may thus be involved in stabilizing the DNA-binding domain and altering its preference for binding to site I or site II DNA. Images PMID:2835505
Devi, Kamalakshi; Dehury, Budheswar; Phukon, Munmi; Modi, Mahendra Kumar; Sen, Priyabrata
2015-01-01
The 1-deoxy-d-xylulose-5-phosphate reductoisomerase (DXR; EC1.1.1.267), an NADPH-dependent reductase, plays a pivotal role in the methylerythritol 4-phosphate pathway (MEP), in the conversion of 1-deoxy-d-xylulose-5-phosphate (DXP) into MEP. The sheath and leaf of citronella (Cymbopogon winterianus) accumulates large amount of terpenes and sesquiterpenes with proven medicinal value and economic uses. Thus, sequencing of full length dxr gene and its characterization seems to be a valuable resource in metabolic engineering to alter the flux of isoprenoid active ingredients in plants. In this study, full length DXR from citronella was characterized through in silico and tissue-specific expression studies to explain its structure–function mechanism, mode of cofactor recognition and differential expression. The modelled DXR has a three-domain architecture and its active site comprised of a cofactor (NADPH) binding pocket and the substrate-binding pocket. Molecular dynamics simulation studies indicated that DXR model retained most of its secondary structure during 10 ns simulation in aqueous solution. The modelled DXR superimposes well with its closest structural homolog but subtle variations in the charge distribution over the cofactor recognition site were noticed. Molecular docking study revealed critical residues aiding tight anchoring NADPH within the active pocket of DXR. Tissue-specific differential expression analysis using semi-quantitative RT-PCR and qRT-PCR in various tissues of citronella plant revealed distinct differential expression of DXR. To our knowledge, this is the first ever report on DXR from the important medicinal plant citronella and further characterization of this gene will open up better avenues for metabolic engineering of secondary metabolite pathway genes from medicinal plants in the near future. PMID:25941629
Lu, W; Wainwright, G; Olohan, L A; Webster, S G; Rees, H H; Turner, P C
2001-10-31
Synthesis of ecdysteroids (molting hormones) by crustacean Y-organs is regulated by a neuropeptide, molt-inhibiting hormone (MIH), produced in eyestalk neural ganglia. We report here the molecular cloning of a cDNA encoding MIH of the edible crab, Cancer pagurus. Full-length MIH cDNA was obtained by using reverse transcription-polymerase chain reaction (RT-PCR) with degenerate oligonucleotides based upon the amino acid sequence of MIH, in conjunction with 5'- and 3'-RACE. Full-length clones of MIH cDNA were obtained that encoded a 35 amino acid putative signal peptide and the mature 78 amino acid peptide. Of various tissues examined by Northern blot analysis, the X-organ was the sole major site of expression of the MIH gene. However, a nested-PCR approach using non-degenerate MIH-specific primers indicated the presence of MIH transcripts in other tissues. Southern blot analysis indicated a simple gene arrangement with at least two copies of the MIH gene in the genome of C. pagurus. Additional Southern blotting experiments detected MIH-hybridizing bands in another Cancer species, Cancer antennarius and another crab species, Carcinus maenas.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.
Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji
2017-02-28
High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
2011-01-01
Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182
Cloning and bioinformatics analysis of CcPILS gene of Hickory (Carya cathayensis)
NASA Astrophysics Data System (ADS)
Guo, Wenbin; Yuan, Huwei; Gao, Liuxiao; Guo, Haipeng; Qiu, Lingling; Xu, Dongbin; Yan, Daoliang; Zheng, Bingsong
2017-04-01
PILS is a key auxin efflux carrier protein in the auxin signal transduction. A CcPILS gene related to hickory (Carya carthayensis) grafting process was obtained by RACE techniques. The full length of CcPILS gene was1541bp contained a 1263bp length open reading flame (ORF). The CcPILS encoded 294 amino acids with molecular weight of 46 kDa, PI 5.38 and localized at endoplasmic reticulum membrane. The gene contained a central hydrophilic loop separating two hydrophobic domains of about five transmembrane regions each. The gene of CcPILS belonged to Clade III sub-family of PILS and its sequence had high homology with Arabidopsis. Real Time RT-PCR analysis showed that the gene expressions were weakly induced in bud, inflorescence, fruit, leaf and stem, while strongly in root. The expression levels were strongly induced and reached a peak at the third day of grafting in scion and rootstock of hickory, which were 1.45 and 3.45 times higher, respectively, compared to that of control. The results indicated that CcPILS may be involved in regulating the expression of genes related to auxin signal transduction during hickory graft process.
Meesapyodsuk, Dauenpen; Balsevich, John; Reed, Darwin W.; Covello, Patrick S.
2007-01-01
Saponaria vaccaria (Caryophyllaceae), a soapwort, known in western Canada as cowcockle, contains bioactive oleanane-type saponins similar to those found in soapbark tree (Quillaja saponaria; Rosaceae). To improve our understanding of the biosynthesis of these saponins, a combined polymerase chain reaction and expressed sequence tag approach was taken to identify the genes involved. A cDNA encoding a β-amyrin synthase (SvBS) was isolated by reverse transcription-polymerase chain reaction and characterized by expression in yeast (Saccharomyces cerevisiae). The SvBS gene is predominantly expressed in leaves. A S. vaccaria developing seed expressed sequence tag collection was developed and used for the isolation of a full-length cDNA bearing sequence similarity to ester-forming glycosyltransferases. The gene product of the cDNA, classified as UGT74M1, was expressed in Escherichia coli, purified, and identified as a triterpene carboxylic acid glucosyltransferase. UGT74M1 is expressed in roots and leaves and appears to be involved in monodesmoside biosynthesis in S. vaccaria. PMID:17172290
Korkusol, Achareeya; Takhampunya, Ratree; Hang, Jun; Jarman, Richard G; Tippayachai, Bousaraporn; Kim, Heung-Chul; Chong, Sung-Tae; Davidson, Silas A; Klein, Terry A
2017-05-01
Flaviviruses comprise a large and diverse group of positive-stranded RNA viruses, including tick-, mosquito- and unknown-vector-borne flaviviruses. A novel flavivirus was detected in pools of Aedes vexans nipponii (n=1) and Aedes esoensis (n=3) collected in 2012 and 2013 near the demilitarized zone (DMZ), Republic of Korea (ROK). Phylogenetic analyses of the NS5, E gene and complete polyprotein coding sequence (CDS) showed that the novel virus fell within the Aedes-borne flaviviruses (ABFVs), with nucleotide identity ranging from 57.8-75.1 %, 46.1-74.2 % and 51.1-76.2 %, respectively. While the novel ABFV was distant from other flaviviruses within the group, it formed a clade with Ilomantsi virus (ILOV). Sequence alignments of the partial NS5 gene, full-length E gene and polyprotein CDS between the novel virus and ILOV showed approximately 76.2 % nucleotide identity and 90 % amino acid identity, respectively. The ABFV identified in Aedes mosquitoes from the ROK is a novel ABFV based on the sequence analyses and is designated as Panmunjeom flavivirus (PANFV).
Manswr, Basim; Ball, Christopher; Forrester, Anne; Chantrey, Julian; Ganapathy, Kannan
2018-08-01
Sequence variability in the S1 gene determines the genotype of infectious bronchitis virus (IBV) strains. A single RT-PCR assay was developed to amplify and sequence the full S1 gene for six classical and variant IBVs (M41, D274, 793B, IS/885/00, IS/1494/06 and Q1) enriched in allantoic fluid (AF) or the same AF inoculated onto Flinders Technology Association (FTA) cards. Representative strains from each genotype were grown in specific-pathogen-free eggs and RNA was extracted from AF. Full S1 gene amplification was achieved using primer A and primer 22.51. Products were sequenced using primers A, 1050+, 1380+ and SX3+ to obtain short sequences covering the full gene. Following serial dilutions of AF, detection limits of the partial assay were higher than those of the full S1 gene. Partial S1 sequences exhibited higher-than-average nucleotide similarity percentages (79%; 352 bp) compared to full S1 sequences (77%; 1756 bp), suggesting that full S1 analysis allows greater strain differentiation. For IBV detection from AF-inoculated FTA cards, four serotypes were incubated for up to 21 days at three temperatures, 4°C, room temperature (approximately 24°C) and 40°C. RNA was extracted and tested with partial and full S1 protocols. Through partial sequencing, all IBVs were successfully detected at all sampling points and storage temperatures. In contrast, using full S1 sequencing it was not possible to amplify the gene beyond 14 days or when stored at 40°C. Data presented show that for full S1 sequencing, a substantial amount of RNA is needed. Field samples collected onto FTA cards are unlikely to yield such quantity or quality. AF: allantoic fluid; CD50: ciliostatic dose 50; FTA: Flinders Technology Association; IB: infectious bronchitis; IBV: infectious bronchitis virus.
Novel rare variations of the oxytocin receptor (OXTR) gene in autism spectrum disorder individuals.
Liu, Xiaoxi; Kawashima, Minae; Miyagawa, Taku; Otowa, Takeshi; Latt, Khun Zaw; Thiri, Myo; Nishida, Hisami; Sugiyama, Toshiro; Tsurusaki, Yoshinori; Matsumoto, Naomichi; Mabuchi, Akihiko; Tokunaga, Katsushi; Sasaki, Tsukasa
2015-01-01
The oxytocin receptor (OXTR) gene has been implicated as a risk gene for autism spectrum disorder (ASD)-a neurodevelopmental disorder with essential features of impairments in social communication and reciprocal interaction. The genetic associations between common variations in OXTR and ASD have been reported in multiple ethnic populations. However, little is known about the distribution of rare variations within OXTR in ASD patients. In this study, we resequenced the full length of OXTR in 105 ASD individuals using an approach that combined the power of next-generation sequencing technology, long-range PCR and DNA pooling. We demonstrated that rare variants with minor allele frequency as low as 0.05% could be reliably detected by our method. We identified 28 novel variants including potential functional variants in the intron region and one rare missense variant (R150S). We subsequently performed Sanger sequencing and validated five novel variants located in previously suggested candidate regions in ASD individuals. Further sequencing of 312 healthy subjects showed that the burden of rare variants is significantly higher in ASDs compared with healthy individuals. Our results support that the rare variation in OXTR gene might be involved in ASD.
Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout
Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo
2015-01-01
Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877
Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing
2010-01-01
Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
Durso, Lisa M; Harhay, Gregory P; Bono, James L; Smith, Timothy P L
2011-02-01
The bovine fecal microbiota impacts human food safety as well as animal health. Although the bacteria of cattle feces have been well characterized using culture-based and culture-independent methods, techniques have been lacking to correlate total community composition with community function. We used high throughput sequencing of total DNA extracted from fecal material to characterize general community composition and examine the repertoire of microbial genes present in beef cattle feces, including genes associated with antibiotic resistance and bacterial virulence. Results suggest that traditional 16S sequencing using "universal" primers to generate full-length sequence may under represent Acitinobacteria and Proteobacteria. Over eight percent (8.4%) of the sequences from our beef cattle fecal pool sample could be categorized as virulence genes, including a suite of genes associated with resistance to antibiotic and toxic compounds (RATC). This is a higher proportion of virulence genes found in Sargasso sea, chicken cecum, and cow rumen samples, but comparable to the proportion found in Antarctic marine derived lake, human fecal, and farm soil samples. The quantitative nature of metagenomic data, combined with the large number of RATC classes represented in samples from widely different habitats indicates that metagenomic data can be used to track relative amounts of antibiotic resistance genes in individual animals over time. Consequently, these data can be used to generate sample-specific and temporal antibiotic resistance gene profiles to facilitate an understanding of the ecology of the microbial communities in each habitat as well as the epidemiology of antibiotic resistant gene transport between and among habitats. Published by Elsevier B.V.
Roux, Michelle M.; Pain, Arnab; Klimpel, Kurt R.; Dhar, Arun K.
2002-01-01
Pattern recognition proteins such as lipopolysaccharide and β-1,3-glucan binding protein (LGBP) play an important role in the innate immune response of crustaceans and insects. Random sequencing of cDNA clones from a hepatopancreas cDNA library of white spot virus (WSV)-infected shrimp provided a partial cDNA (PsEST-289) that showed similarity to the LGBP gene of crayfish and insects. Subsequently full-length cDNA was cloned by the 5′-RACE (rapid amplification of cDNA ends) technique and sequenced. The shrimp LGBP gene is 1,352 bases in length and is capable of encoding a polypeptide of 376 amino acids that showed significant similarity to homologous genes from crayfish, insects, earthworms, and sea urchins. Analysis of the shrimp LGBP deduced amino acid sequence identified conserved features of this gene family including a potential recognition motif for β-(1→3) linkage of polysaccharides and putative RGD cell adhesion sites. It is known that LGBP gene expression is upregulated in bacterial and fungal infection and that the binding of lipopolysaccharide and β-1,3-glucan to LGBP activates the prophenoloxidase (proPO) cascade. The temporal expression of LGBP and proPO genes in healthy and WSV-challenged Penaeus stylirostris shrimp was measured by real-time quantitative reverse transcription-PCR, and we showed that LGBP gene expression in shrimp was upregulated as the WSV infection progressed. Interestingly, the proPO expression was upregulated initially after infection followed by a downregulation as the viral infection progressed. The downward trend in the expression of proPO coincided with the detection of WSV in the infected shrimp. Our data suggest that shrimp LGBP is an inducible acute-phase protein that may play a critical role in shrimp-WSV interaction and that the WSV infection regulates the activation and/or activity of the proPO cascade in a novel way. PMID:12072514
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi
2017-09-04
Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
Chandramohan, Bathrachalam; Renieri, Carlo; La Manna, Vincenzo; La Terza, Antonietta
2015-01-01
The objectives of the present study were to characterize the MC1R gene, its transcripts and the single nucleotide polymorphisms (SNPs) associated with coat color in alpaca. Full length cDNA amplification revealed the presence of two transcripts, named as F1 and F2, differing only in the length of their 5'-terminal untranslated region (UTR) sequences and presenting a color specific expression. Whereas the F1 transcript was common to white and colored (black and brown) alpaca phenotypes, the shorter F2 transcript was specific to white alpaca. Further sequencing of the MC1R gene in white and colored alpaca identified a total of twelve SNPs; among those nine (four silent mutations (c.126C>A, c.354T>C, c.618G>A, and c.933G>A); five missense mutations (c.82A>G, c.92C>T, c.259A>G, c.376A>G, and c.901C>T)) were observed in coding region and three in the 3'UTR. A 4 bp deletion (c.224 227del) was also identified in the coding region. Molecular segregation analysis uncovered that the combinatory mutations in the MC1R locus could cause eumelanin and pheomelanin synthesis in alpaca. Overall, our data refine what is known about the MC1R gene and provides additional information on its role in alpaca pigmentation.
Tissue plasminogen activator (tPA) as a reporter gene in transient gene expression.
Cheng, S M; Lee, S G; Kalyan, N K; McCloud, S; Levner, M; Hung, P P
1987-01-01
Using the gene coding for tissue plasminogen activator (tPA) as a reporter gene, a transient gene expression system has been established. Vectors containing the full-length cDNA of tPA with its signal sequences were introduced into mammalian recipient cells by a modified gene transfer procedure. Thirty hours after transfection, the secreted tPA was found in serum-free medium and measured by a fibrin-agarose plate assay (FAPA). In this assay, tPA converts plasminogen into plasmin which then degrades high-Mr fibrin to produce cleared zones. The sizes of these zones correspond to quantities of tPA. The combination of transient tPA expression system and the FAPA provides a quick, sensitive, quantitative and non-destructive method to examine the strength of eukaryotic regulatory elements in tissue-culture cells.
Suzuki, H; Katayama, K; Takenaka, M; Amakasu, K; Saito, K; Suzuki, K
2009-10-01
The lde/lde rat is characterized by dwarfism, postnatal lethality, male hypogonadism, a high incidence of epilepsy and many vacuoles in the hippocampus and amygdala. We used a candidate approach to identify the gene responsible for the lde phenotype and assessed the susceptibility of lde/lde rats for audiogenic seizures. Following backcross breeding of lethal dwarfism with epilepsy (LDE) to Brown Norway rats, the lde/lde rats with an altered genetic background showed all pleiotropic phenotypes. The lde locus was mapped to a 1.5-Mbp region on rat chromosome 19 that included the latter half of the Wwox gene. Sequencing of the full-length Wwox transcript identified a 13-bp deletion in exon 9 in lde/lde rats. This mutation causes a frame shift, resulting in aberrant amino acid sequences at the C-terminal. Western blotting showed that both the full-length products of the Wwox gene and its isoform were present in normal testes and hippocampi, whereas both products were undetectable in the testes and hippocampi of lde/lde rats. Sound stimulation induced epileptic seizures in 95% of lde/lde rats, with starting as wild running (WR), sometimes progressing to tonic-clonic convulsions. Electroencephalogram (EEG) analysis showed interictal spikes, fast waves during WR and burst of spikes during clonic phases. The Wwox protein is expressed in the central nervous system (CNS), indicating that abnormal neuronal excitability in lde/lde rats may be because of a lack of Wwox function. The lde/lde rat is not only useful for understanding the multiple functions of Wwox but is also a unique model for studying the physiological function of Wwox in CNS.
Yang, G; Liu, X G; Qiu, B S
2000-07-01
The complete nucleotides of two Chinese tobacco mosaic virus (TMV) isolates, TMV-Cv (vulgare strain) and TMV-N14 (an attenuated virus originated from a tomato strain), were determined from their respective full-length infectious cDNA clones and compared with published TMV sequences. The genome structure of TMV-Cv contained 6395 nucleotides, in which four functional open reading frames (ORF), coding for replicase (126 kD/183 kD), movement protein (MP, 30 kD) and coat protein (CP, 17.6 kD) respectively, could be recognized. TMV-N14 contained 6384 nucleotides in its genome. In contrast to TMV-Cv, five functional ORFs encoding the replicase 98.5 kD/126 kD/183 kD, MP(27 kD) and CP(17.6 kD), respectively, were detected in the TMV-N14 genome. TMV-Cv is 99% homologous to a Korean TMV isolate belonging to the vulgare strain at the nucleotide level. TMV-N14 is 99% homologous to a highly virulent Japanese isolate TMV-L (tomato strain) at the nucleotide level. In TMV-N14, one opal nulation (UGA) occurred in the replicase gene and one ochre nutation (UAA) in the MP gene. The former mutation created a potential, additional ORF within the replicase gene, the latter reduced the size of the MP to 27 kD. In addition, there were also 13 amino acid substitutions in the replicase gene of TMV-N14 when compared to that of TMV-L. Collectively, these changes may have significant implications in the attenuation of the virulence of TMV-N14.
Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule
2017-06-26
IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino changes). The functionality is generic and can analyse any IG or TR single chain nucleotide sequence containing two V domains, provided that the corresponding species IMGT reference directory is available. The "Analysis of single chain Fragment variable (scFv)" implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST provides the identification and full characterization of the two V domains of full-length scFv (~850 bp) nucleotide sequences from combinatorial libraries. The analysis can also be performed on concatenated paired chains of expressed antigen receptor IG or TR repertoires.
Tian, Wenzhi; Chua, Kevin; Strober, Warren; Chu, Charles C.
2002-01-01
BACKGROUND: Identification of differentially expressed genes between normal and diseased states is an area of intense current medical research that can lead to the discovery of new therapeutic targets. However, isolation of differentially expressed genes by subtraction often suffers from unreported contamination of the resulting subtraction library with clones containing DNA sequences not from the original RNA samples. MATERIALS AND METHODS: Subtraction using cDNA representational difference analysis (RDA) was performed on human B cells from normal or common variable immunodeficiency patients. The material remaining after the subtraction was cloned and individual clones were sequenced. The sequence of one clone with similarity to integrases (ILG1, integrase-like gene-1) was used to obtain the full length cDNA sequence and as a probe for the presence of this sequence in RNA or genomic DNA samples. RESULTS: After five rounds of cDNA RDA, 23.3% of the clones from the resulting subtraction library contained Escherichia coli DNA. In addition, three clones contained the sequence of a new integrase, ILG1. The full length cDNA sequence of ILG1 exhibits prokaryotic, but not eukaryotic, features. At the DNA level, ILG1 is not similar to any known gene. At the protein level, ILG1 has 58% similarity to integrases from the cryptic P4 bacteriophage family (S clade). The catalytic domain of ILG1 contains the conserved features found in site-specific recombinases. The critical residues that form the catalytic active site pocket are conserved, including the highly conserved R-H-R-Y hallmark of these recombinases. Interestingly, ILG1 was not present in the original B cell populations. By probing genomic DNA, ILG1 could only be detected in the E. coli TOP10F' strain used in our laboratory for molecular cloning, but not in any of its precursor strains, including TOP10. Furthermore, bacteria cultured from the mouth of the laboratory worker who performed cDNA RDA were also positive for ILG1. CONCLUSIONS: In the course of our studies using cDNA RDA, we have isolated and identified ILG1, a likely active site-specific recombinase and new member of the bacteriophage P4 family of integrases. This family of integrases is implicated in the horizontal DNA transfer of pathogenic genes between bacterial species, such as those found in pathogenic strains of E. coli, Shigella, Yersinia, and Vibrio cholera. Using ILG1 as a marker of our laboratory E. coli strain TOP10F', our evidence suggests that contaminating bacterial DNA in our subtraction experiment is due to this laboratory bacterial strain, which colonized exposed surfaces of the laboratory worker. Thus, identification of differentially expressed genes between normal and diseased states could be dramatically improved by using extra precaution to prevent bacterial contamination of samples. PMID:12393938
de Gier, Camilla; Kirkham, Lea-Ann S.
2015-01-01
Nonhemolytic variants of Haemophilus haemolyticus are difficult to differentiate from Haemophilus influenzae despite a wide difference in pathogenic potential. A previous investigation characterized a challenging set of 60 clinical strains using multiple PCRs for marker genes and described strains that could not be unequivocally identified as either species. We have analyzed the same set of strains by multilocus sequence analysis (MLSA) and near-full-length 16S rRNA gene sequencing. MLSA unambiguously allocated all study strains to either of the two species, while identification by 16S rRNA sequence was inconclusive for three strains. Notably, the two methods yielded conflicting identifications for two strains. Most of the “fuzzy species” strains were identified as H. influenzae that had undergone complete deletion of the fucose operon. Such strains, which are untypeable by the H. influenzae multilocus sequence type (MLST) scheme, have sporadically been reported and predominantly belong to a single branch of H. influenzae MLSA phylogenetic group II. We also found evidence of interspecies recombination between H. influenzae and H. haemolyticus within the 16S rRNA genes. Establishing an accurate method for rapid and inexpensive identification of H. influenzae is important for disease surveillance and treatment. PMID:26378279
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.
Sasaki, H; Yokoyama, E; Kuroiwa, A
1990-01-01
The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Quality scores for 32,000 genomes
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...
2014-12-08
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Quality scores for 32,000 genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Tran, Thi Kim Anh; MacFarlane, Geoff R; Kong, Richard Yuen Chong; O'Connor, Wayne A; Yu, Richard Man Kit
2016-05-01
Marine molluscs, such as oysters, respond to estrogenic compounds with the induction of the egg yolk protein precursor, vitellogenin (Vtg), availing a biomarker for estrogenic pollution. Despite this application, the precise molecular mechanism through which estrogens exert their action to induce molluscan vitellogenesis is unknown. As a first step to address this question, we cloned a gene encoding Vtg from the Sydney rock oyster Saccostrea glomerata (sgVtg). Using primers designed from a partial sgVtg cDNA sequence available in Genbank, a full-length sgVtg cDNA of 8498bp was obtained by 5'- and 3'-RACE. The open reading frame (ORF) of sgVtg was determined to be 7980bp, which is substantially longer than the orthologs of other oyster species. Its deduced protein sequence shares the highest homology at the N- and C-terminal regions with other molluscan Vtgs. The full-length genomic DNA sequence of sgVtg was obtained by genomic PCR and genome walking targeting the gene body and flanking regions, respectively. The genomic sequence spans 20kb and consists of 30 exons and 29 introns. Computer analysis identified three closely spaced half-estrogen responsive elements (EREs) in the promoter region and a 210-bp CpG island 62bp downstream of the transcription start site. Upregulation of sgVtg mRNA expression was observed in the ovaries following in vitro (explants) and in vivo (tank) exposure to 17β-estradiol (E2). Notably, treatment with an estrogen receptor (ER) antagonist in vitro abolished the upregulation, suggesting a requirement for an estrogen-dependent receptor for transcriptional activation. DNA methylation of the 5' CpG island was analysed using bisulfite genomic sequencing of the in vivo exposed ovaries. The CpG island was found to be hypomethylated (with 0-3% methylcytosines) in both control and E2-exposed oysters. However, no significant differential methylation or any correlation between methylation and sgVtg expression levels was observed. Overall, the results support the possible involvement of an ERE-containing promoter and an estrogen-activated receptor in estrogen signalling in marine molluscs. Copyright © 2016 Elsevier B.V. All rights reserved.
De Silva, Jeremy Ryan; Lau, Yee Ling; Fong, Mun Yik
2017-01-03
The simian malaria parasite Plasmodium knowlesi has been reported to cause significant numbers of human infection in South East Asia. Its merozoite surface protein-3 (MSP3) is a protein that belongs to a multi-gene family of proteins first found in Plasmodium falciparum. Several studies have evaluated the potential of P. falciparum MSP3 as a potential vaccine candidate. However, to date no detailed studies have been carried out on P. knowlesi MSP3 gene (pkmsp3). The present study investigates the genetic diversity, and haplotypes groups of pkmsp3 in P. knowlesi clinical samples from Peninsular Malaysia. Blood samples were collected from P. knowlesi malaria patients within a period of 4 years (2008-2012). The pkmsp3 gene of the isolates was amplified via PCR, and subsequently cloned and sequenced. The full length pkmsp3 sequence was divided into Domain A and Domain B. Natural selection, genetic diversity, and haplotypes of pkmsp3 were analysed using MEGA6 and DnaSP ver. 5.10.00 programmes. From 23 samples, 48 pkmsp3 sequences were successfully obtained. At the nucleotide level, 101 synonymous and 238 non-synonymous mutations were observed. Tests of neutrality were not significant for the full length, Domain A or Domain B sequences. However, the dN/dS ratio of Domain B indicates purifying selection for this domain. Analysis of the deduced amino acid sequences revealed 42 different haplotypes. Neighbour Joining phylogenetic tree and haplotype network analyses revealed that the haplotypes clustered into two distinct groups. A moderate level of genetic diversity was observed in the pkmsp3 and only the C-terminal region (Domain B) appeared to be under purifying selection. The separation of the pkmsp3 into two haplotype groups provides further evidence of the existence of two distinct P. knowlesi types or lineages. Future studies should investigate the diversity of pkmsp3 among P. knowlesi isolates in North Borneo, where large numbers of human knowlesi malaria infection still occur.
Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System
Smith, L. Courtney
2012-01-01
The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Lu, Min; An, Huaming; Li, Liangliang
2016-01-01
Rosa roxburghii Tratt is an important commercial horticultural crop in China that is recognized for its nutritional and medicinal values. In spite of the economic significance, genomic information on this rose species is currently unavailable. In the present research, a genome survey of R. roxburghii was carried out using next-generation sequencing (NGS) technologies. Total 30.29 Gb sequence data was obtained by HiSeq 2500 sequencing and an estimated genome size of R. roxburghii was 480.97 Mb, in which the guanine plus cytosine (GC) content was calculated to be 38.63%. All of these reads were technically assembled and a total of 627,554 contigs with a N50 length of 1.484 kb and furthermore 335,902 scaffolds with a total length of 409.36 Mb were obtained. Transposable elements (TE) sequence of 90.84 Mb which comprised 29.20% of the genome, and 167,859 simple sequence repeats (SSRs) were identified from the scaffolds. Among these, the mono-(66.30%), di-(25.67%), and tri-(6.64%) nucleotide repeats contributed to nearly 99% of the SSRs, and sequence motifs AG/CT (28.81%) and GAA/TTC (14.76%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. Genome analysis predicted a total of 22,721 genes which have an average length of 2311.52 bp, an average exon length of 228.15 bp, and average intron length of 401.18 bp. Eleven genes putatively involved in ascorbate metabolism were identified and its expression in R. roxburghii leaves was validated by quantitative real-time PCR (qRT-PCR). This is the first report of genome-wide characterization of this rose species.
Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues
2017-02-01
The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.
Large-scale collection of full-length cDNA and transcriptome analysis in Hevea brasiliensis.
Makita, Yuko; Ng, Kiaw Kiaw; Veera Singham, G; Kawashima, Mika; Hirakawa, Hideki; Sato, Shusei; Othman, Ahmad Sofiman; Matsui, Minami
2017-04-01
Natural rubber has unique physical properties that cannot be replaced by products from other latex-producing plants or petrochemically produced synthetic rubbers. Rubber from Hevea brasiliensis is the main commercial source for this natural rubber that has a cis-polyisoprene configuration. For sustainable production of enough rubber to meet demand elucidation of the molecular mechanisms involved in the production of latex is vital. To this end, we firstly constructed rubber full-length cDNA libraries of RRIM 600 cultivar and sequenced around 20,000 clones by the Sanger method and over 15,000 contigs by Illumina sequencer. With these data, we updated around 5,500 gene structures and newly annotated around 9,500 transcription start sites. Second, to elucidate the rubber biosynthetic pathways and their transcriptional regulation, we carried out tissue- and cultivar-specific RNA-Seq analysis. By using our recently published genome sequence, we confirmed the expression patterns of the rubber biosynthetic genes. Our data suggest that the cytoplasmic mevalonate (MVA) pathway is the main route for isoprenoid biosynthesis in latex production. In addition to the well-studied polymerization factors, we suggest that rubber elongation factor 8 (REF8) is a candidate factor in cis-polyisoprene biosynthesis. We have also identified 39 transcription factors that may be key regulators in latex production. Expression profile analysis using two additional cultivars, RRIM 901 and PB 350, via an RNA-Seq approach revealed possible expression differences between a high latex-yielding cultivar and a disease-resistant cultivar. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Tian, Xue; Meng, Xiaolin; Wang, Liangyan; Song, Yunfei; Zhang, Danli; Ji, Yuankai; Li, Xuejun; Dong, Changsheng
2015-01-25
Slc7a11 encoding solute carrier family 7 member 11 (amionic amino acid transporter light chain, xCT), has been identified to be a critical genetic regulator of pheomelanin synthesis in hair and melanocytes. To better understand the molecular characterization of Slc7a11 and the expression patterns in skin of white versus brown alpaca (lama paco), we cloned the full length coding sequence (CDS) of alpaca Slc7a11 gene and analyzed the expression patterns using Real Time PCR, Western blotting and immunohistochemistry. The full length CDS of 1512bp encodes a 503 amino acid polypeptide. Sequence analysis showed that alpaca xCT contains 12 transmembrane regions consistent with the highly conserved amino acid permease (AA_permease_2) domain similar to other vertebrates. Sequence alignment and phylogenetic analysis revealed that alpaca xCT had the highest identity and shared the same branch with Camelus ferus. Real Time PCR and Western blotting suggested that xCT was expressed at significantly high levels in brown alpaca skin, and transcripts and protein possessed the same expression pattern in white and brown alpaca skins. Additionally, immunohistochemical analysis further demonstrated that xCT staining was robustly increased in the matrix and root sheath of brown alpaca skin compared with that of white. These results suggest that Slc7a11 functions in alpaca coat color regulation and offer essential information for further exploration on the role of Slc7a11 in melanogenesis. Copyright © 2014 Elsevier B.V. All rights reserved.
Development and characterization of a eukaryotic expression system for human type II procollagen.
Wieczorek, Andrew; Rezaei, Naghmeh; Chan, Clara K; Xu, Chuan; Panwar, Preety; Brömme, Dieter; Merschrod S, Erika F; Forde, Nancy R
2015-12-15
Triple helical collagens are the most abundant structural protein in vertebrates and are widely used as biomaterials for a variety of applications including drug delivery and cellular and tissue engineering. In these applications, the mechanics of this hierarchically structured protein play a key role, as does its chemical composition. To facilitate investigation into how gene mutations of collagen lead to disease as well as the rational development of tunable mechanical and chemical properties of this full-length protein, production of recombinant expressed protein is required. Here, we present a human type II procollagen expression system that produces full-length procollagen utilizing a previously characterized human fibrosarcoma cell line for production. The system exploits a non-covalently linked fluorescence readout for gene expression to facilitate screening of cell lines. Biochemical and biophysical characterization of the secreted, purified protein are used to demonstrate the proper formation and function of the protein. Assays to demonstrate fidelity include proteolytic digestion, mass spectrometric sequence and posttranslational composition analysis, circular dichroism spectroscopy, single-molecule stretching with optical tweezers, atomic-force microscopy imaging of fibril assembly, and transmission electron microscopy imaging of self-assembled fibrils. Using a mammalian expression system, we produced full-length recombinant human type II procollagen. The integrity of the collagen preparation was verified by various structural and degradation assays. This system provides a platform from which to explore new directions in collagen manipulation.
Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng
2014-01-01
Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Tuo, Decai; Shen, Wentao; Yan, Pu; Li, Xiaoying; Zhou, Peng
2015-01-01
Papaya leaf distortion mosaic virus (PLDMV) is becoming a threat to papaya and transgenic papaya resistant to the related pathogen, papaya ringspot virus (PRSV). The generation of infectious viral clones is an essential step for reverse-genetics studies of viral gene function and cross-protection. In this study, a sequence- and ligation-independent cloning system, the In-Fusion® Cloning Kit (Clontech, Mountain View, CA, USA), was used to construct intron-less or intron-containing full-length cDNA clones of the isolate PLDMV-DF, with the simultaneous scarless assembly of multiple viral and intron fragments into a plasmid vector in a single reaction. The intron-containing full-length cDNA clone of PLDMV-DF was stably propagated in Escherichia coli. In vitro intron-containing transcripts were processed and spliced into biologically active intron-less transcripts following mechanical inoculation and then initiated systemic infections in Carica papaya L. seedlings, which developed similar symptoms to those caused by the wild-type virus. However, no infectivity was detected when the plants were inoculated with RNA transcripts from the intron-less construct because the instability of the viral cDNA clone in bacterial cells caused a non-sense or deletion mutation of the genomic sequence of PLDMV-DF. To our knowledge, this is the first report of the construction of an infectious full-length cDNA clone of PLDMV and the splicing of intron-containing transcripts following mechanical inoculation. In-Fusion cloning shortens the construction time from months to days. Therefore, it is a faster, more flexible, and more efficient method than the traditional multistep restriction enzyme-mediated subcloning procedure. PMID:26633465
Tuo, Decai; Shen, Wentao; Yan, Pu; Li, Xiaoying; Zhou, Peng
2015-12-01
Papaya leaf distortion mosaic virus (PLDMV) is becoming a threat to papaya and transgenic papaya resistant to the related pathogen, papaya ringspot virus (PRSV). The generation of infectious viral clones is an essential step for reverse-genetics studies of viral gene function and cross-protection. In this study, a sequence- and ligation-independent cloning system, the In-Fusion(®) Cloning Kit (Clontech, Mountain View, CA, USA), was used to construct intron-less or intron-containing full-length cDNA clones of the isolate PLDMV-DF, with the simultaneous scarless assembly of multiple viral and intron fragments into a plasmid vector in a single reaction. The intron-containing full-length cDNA clone of PLDMV-DF was stably propagated in Escherichia coli. In vitro intron-containing transcripts were processed and spliced into biologically active intron-less transcripts following mechanical inoculation and then initiated systemic infections in Carica papaya L. seedlings, which developed similar symptoms to those caused by the wild-type virus. However, no infectivity was detected when the plants were inoculated with RNA transcripts from the intron-less construct because the instability of the viral cDNA clone in bacterial cells caused a non-sense or deletion mutation of the genomic sequence of PLDMV-DF. To our knowledge, this is the first report of the construction of an infectious full-length cDNA clone of PLDMV and the splicing of intron-containing transcripts following mechanical inoculation. In-Fusion cloning shortens the construction time from months to days. Therefore, it is a faster, more flexible, and more efficient method than the traditional multistep restriction enzyme-mediated subcloning procedure.
Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B
2015-08-01
The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
2009-01-01
Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg
2009-08-06
Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
Henderson, James B.; Sellas, Anna B.; Fuchs, Jérôme; Bowie, Rauri C.K.; Dumbacher, John P.
2017-01-01
We report here the successful assembly of the complete mitochondrial genomes of the northern spotted owl (Strix occidentalis caurina) and the barred owl (S. varia). We utilized sequence data from two sequencing methodologies, Illumina paired-end sequence data with insert lengths ranging from approximately 250 nucleotides (nt) to 9,600 nt and read lengths from 100–375 nt and Sanger-derived sequences. We employed multiple assemblers and alignment methods to generate the final assemblies. The circular genomes of S. o. caurina and S. varia are comprised of 19,948 nt and 18,975 nt, respectively. Both code for two rRNAs, twenty-two tRNAs, and thirteen polypeptides. They both have duplicated control region sequences with complex repeat structures. We were not able to assemble the control regions solely using Illumina paired-end sequence data. By fully spanning the control regions, Sanger-derived sequences enabled accurate and complete assembly of these mitochondrial genomes. These are the first complete mitochondrial genome sequences of owls (Aves: Strigiformes) possessing duplicated control regions. We searched the nuclear genome of S. o. caurina for copies of mitochondrial genes and found at least nine separate stretches of nuclear copies of gene sequences originating in the mitochondrial genome (Numts). The Numts ranged from 226–19,522 nt in length and included copies of all mitochondrial genes except tRNAPro, ND6, and tRNAGlu. Strix occidentalis caurina and S. varia exhibited an average of 10.74% (8.68% uncorrected p-distance) divergence across the non-tRNA mitochondrial genes. PMID:29038757
Dynamics of actin evolution in dinoflagellates.
Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F
2011-04-01
Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Yan, Jie; Liang, Xiao; Zhang, Yin; Li, Yang; Cao, Xiaojuan; Gao, Jian
2017-07-01
Heat shock protein 70 (HSP70) and 90 (HSP90) are the most broadly studied proteins in HSP families. They play key roles in cells as molecular chaperones, in response to stress conditions such as thermal stress. In this study, full-length cDNA sequences of HSP70, HSP90α and HSP90β from loach Misgurnus anguillicaudatus were cloned. The full-length cDNA of HSP70 in loach was 2332bp encoding 644 amino acids, while HSP90α and HSP90β were 2586bp and 2678bp in length, encoding 729 and 727 amino acids, respectively. The deduced amino acid sequences of HSP70 in loach shared the highest identity with those of Megalobrama amblycephala and Cyprinus carpio. The deduced amino acid sequences of HSP90α and HSP90β in loach both shared the highest identity with those of M. amblycephala. Their mRNA tissue expression results showed that the maximum expressions of HSP70, HSP90α and HSP90β were respectively present in the intestine, brain and kidney of loach. Quantitative real-time PCR was employed to analyze the temporal expressions of HSP70, HSP90α and HSP90β in livers of loaches fed with different levels of vitamin C under thermal stress. Expression levels of the three HSP genes in loach fed the diet without vitamin C supplemented at 0 h of thermal stress were significantly lower than those at 2 h, 6 h, 12 h and 24 h of thermal stress. It indicated that expressions of the three HSP genes were sensitive to thermal stress in loach. The three HSP genes in loaches fed with 1000 mg/kg vitamin C expressed significantly lower than other vitamin C groups at many time points of thermal stress, suggesting 1000 mg/kg dietary vitamin C might decrease the body damages caused by the thermal stress. This study will be of value for further studies into thermal stress tolerance in loach. Copyright © 2017 Elsevier Ltd. All rights reserved.
Characterization of a novel ADAM protease expressed by Pneumocystis carinii.
Kennedy, Cassie C; Kottom, Theodore J; Limper, Andrew H
2009-08-01
Pneumocystis species are opportunistic fungal pathogens that cause severe pneumonia in immunocompromised hosts. Recent evidence has suggested that unidentified proteases are involved in Pneumocystis life cycle regulation. Proteolytically active ADAM (named for "a disintegrin and metalloprotease") family molecules have been identified in some fungal organisms, such as Aspergillus fumigatus and Schizosaccharomyces pombe, and some have been shown to participate in life cycle regulation. Accordingly, we sought to characterize ADAM-like molecules in the fungal opportunistic pathogen, Pneumocystis carinii (PcADAM). After an in silico search of the P. carinii genomic sequencing project identified a 329-bp partial sequence with homology to known ADAM proteins, the full-length PcADAM sequence was obtained by PCR extension cloning, yielding a final coding sequence of 1,650 bp. Sequence analysis detected the presence of a typical ADAM catalytic active site (HEXXHXXGXXHD). Expression of PcADAM over the Pneumocystis life cycle was analyzed by Northern blot. Southern and contour-clamped homogenous electronic field blot analysis demonstrated its presence in the P. carinii genome. Expression of PcADAM was observed to be increased in Pneumocystis cysts compared to trophic forms. The full-length gene was subsequently cloned and heterologously expressed in Saccharomyces cerevisiae. Purified PcADAMp protein was proteolytically active in casein zymography, requiring divalent zinc. Furthermore, native PcADAMp extracted directly from freshly isolated Pneumocystis organisms also exhibited protease activity. This is the first report of protease activity attributable to a specific, characterized protein in the clinically important opportunistic fungal pathogen Pneumocystis.
Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N
2016-11-01
Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.
2016-01-01
Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites
Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng
2016-01-01
HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.
Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng
2016-01-01
HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
Deep RNA-Seq to unlock the gene bank of floral development in Sinapis arvensis.
Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong
2014-01-01
Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.
Deep RNA-Seq to Unlock the Gene Bank of Floral Development in Sinapis arvensis
Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong
2014-01-01
Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops. PMID:25192023
Integrating alternative splicing detection into gene prediction.
Foissac, Sylvain; Schiex, Thomas
2005-02-10
Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders. We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGENE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage). This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.
Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F
2004-09-01
Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Characterization and mapping of cDNA encoding aspartate aminotransferase in rice, Oryza sativa L.
Song, J; Yamamoto, K; Shomura, A; Yano, M; Minobe, Y; Sasaki, T
1996-10-31
Fifteen cDNA clones, putatively identified as encoding aspartate aminotransferase (AST, EC 2.6.1.1.), were isolated and partially sequenced. Together with six previously isolated clones putatively identified to encode ASTs (Sasaki, et al. 1994, Plant Journal 6, 615-624), their sequences were characterized and classified into 4 cDNA species. Two of the isolated clones, C60213 and C2079, were full-length cDNAs, and their complete nucleotide sequences were determined. C60213 was 1612 bp long and its deduced amino acid sequence showed 88% homology with that of Panicum miliaceum L. mitochondrial AST. The C60213-encoded protein had an N-terminal amino acid sequence that was characteristic of a mitochondrial transit peptide. On the other hand, C2079 was 1546 bp long and had 91% amino acid sequence homology with P. miliaceum L. cytosolic AST but lacked in the transit peptide sequence. The homologies of nucleotide sequences and deduced amino acid sequences of C2079 and C60213 were 54% and 52%, respectively. C2079 and C60213 were mapped on chromosomes 1 and 6, respectively, by restriction fragment length polymorphism linkage analysis. Northern blot analysis using C2079 as a probe revealed much higher transcript levels in callus and root than in green and etiolated shoots, suggesting tissue-specific variations of AST gene expression.
Zhu, Haisheng; Liu, Jianting; Wen, Qingfang; Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng
2017-01-01
Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.
Houtz, Robert L.
1999-01-01
The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .sup..epsilon. N-methyltransferase (protein methylase III or Rubisco LSMT) from a plant which has a des(methyl) lysyl residue in the LS is disclosed. In addition, the full-length cDNA clones for Rubisco LSMT are disclosed. Transgenic plants and methods of producing same which have the Rubisco LSMT gene inserted into the DNA are also provided. Further, methods of inactivating the enzymatic activity of Rubisco LSMT are also disclosed.
Analysis of the complete genome of subgroup A' hepatitis B virus isolates from South Africa.
Kramvis, Anna; Weitzmann, Louise; Owiredu, William K B A; Kew, Michael C
2002-04-01
A phylogenetic analysis is presented of six complete and seven pre-S1/S2/S gene sequences of hepatitis B virus (HBV) isolates from South Africa. Five of the full-length sequences and all of the pre-S2/S sequences have been previously reported. Four of the six complete genomes and three of the five incomplete sequences clustered with subgroup A', a unique segment of genotype A of HBV previously identified in 60% of South African isolates using analysis of the pre-S2/S region alone. This separation was also evident when the polymerase open reading frame was analysed, but not on analysis of either the X or pre-core/core genes. Amino acids were identified in the pre-S1 and polymerase regions specific to subgroup A'. In common with genotype D, 10 of 11 genotype A South African isolates had an 11 amino acid deletion in the amino end of the pre-S1 region. This deletion is also found in hepadnaviruses from non-human primates.
Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology
Udy, Dylan B.; Voorhies, Mark; Chan, Patricia P.; Lowe, Todd M.; Dumont, Sophie
2015-01-01
The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes—and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics. PMID:26252667
Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron
2016-01-01
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology.
Udy, Dylan B; Voorhies, Mark; Chan, Patricia P; Lowe, Todd M; Dumont, Sophie
2015-01-01
The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes-and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics.
2013-01-01
Background Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. Results The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases. Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. Conclusions A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci. PMID:23379821
Wang, Wei-Ming; Lee, A-Young; Chiang, Cheng-Ming
2008-01-01
The AP-1 transcription factor is a dimeric protein complex formed primarily between Jun (c-Jun, JunB, JunD) and Fos (c-Fos, FosB, Fra-1, Fra-2) family members. These distinct AP-1 complexes are expressed in many cell types and modulate target gene expression implicated in cell proliferation, differentiation, and stress responses. Although the importance of AP-1 has long been recognized, the biochemical characterization of AP-1 remains limited in part due to the difficulty in purifying full-length, reconstituted dimers with active DNA-binding and transcriptional activity. Using a combination of bacterial coexpression and epitope-tagging methods, we successfully purified all 12 heterodimers (3 Jun × 4 Fos) of full-length human AP-1 complexes as well as c-Jun/c-Jun, JunD/JunD, and c-Jun/JunD dimers from bacterial inclusion bodies using one-step nickel-NTA affinity tag purification following denaturation and renaturation of coexpressed AP-1 subunits. Coexpression of two constitutive components in a dimeric AP-1 complex helps stabilize the proteins when compared with individual protein expression in bacteria. Purified dimeric AP-1 complexes are functional in sequence-specific DNA binding, as illustrated by electrophoretic mobility shift assays and DNase I footprinting, and are also active in transcription with in vitro-reconstituted human papillomavirus (HPV) chromatin containing AP-1-binding sites in the native configuration of HPV nucleosomes. The availability of these recombinant full-length human AP-1 complexes has greatly facilitated mechanistic studies of AP-1-regulated gene transcription in many biological systems. PMID:18329890
Gene and translation initiation site prediction in metagenomic sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John
2012-01-01
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
Camarena-Rosales, Faustino; Del Río-Portilla, Miguel A; Ruiz-Campos, Gorgonio; García-De-León, Francisco J
2016-11-01
The complete mitochondrial genome sequence of the Desert Pupfish, Cyprinodon macularius (Gene accession number KM985373) has a length of 16,940 bp, and the arrangement consisted of 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes and 22 transfer RNA, which are similar to other known mitogenomes for the family Cyprinodontidae.
Zeng, Xian-Chun; Nie, Yao; Luo, Xuesong; Wu, Shifen; Shi, Wanxia; Zhang, Lei; Liu, Yichen; Cao, Hanjun; Yang, Ye; Zhou, Jianping
2013-03-01
The full-length cDNA sequences of two novel cysteine-rich peptides (referred to as HsVx1 and MmKTx1) were obtained from scorpions. The two peptides represent a novel class of cysteine-rich peptides with a unique cysteine pattern. The genomic sequence of HsVx1 is composed of three exons interrupted by two introns that are localized in the mature peptide encoding region and inserted in phase 1 and phase 2, respectively. Such a genomic organization markedly differs from those of other peptides from scorpions described previously. Genome-wide search for the orthologs of HsVx1 identified 59 novel cysteine-rich peptides from arthropods. These peptides share a consistent cysteine pattern with HsVx1. Genomic comparison revealed extensive intron length differences and intronic number and position polymorphisms among the genes of these peptides. Further analysis identified 30 cases of intron sliding, 1 case of intron gain and 22 cases of intron loss occurred with the genes of the HsVx1 and HsVx1-like peptides. It is interesting to see that three HsVx1-like peptides XP_001658928, XP_001658929 and XP_001658930 were derived from a single gene (XP gene): the former two were generated from alternative splicing; the third one was encoded by a DNA region in the reverse complementary strand of the third intron of the XP gene. These findings strongly suggest that the genes of these cysteine-rich peptides were evolved by intron sliding, intron gain/loss, gene recombination and alternative splicing events in response to selective forces without changing their cysteine pattern. The evolution of these genes is dominated by intron sliding and intron loss. Copyright © 2012 Elsevier Inc. All rights reserved.
Weiss, Eric R; Lamers, Susanna L; Henderson, Jennifer L; Melnikov, Alexandre; Somasundaran, Mohan; Garber, Manuel; Selin, Liisa; Nusbaum, Chad; Luzuriaga, Katherine
2018-01-15
Over 90% of the world's population is persistently infected with Epstein-Barr virus. While EBV does not cause disease in most individuals, it is the common cause of acute infectious mononucleosis (AIM) and has been associated with several cancers and autoimmune diseases, highlighting a need for a preventive vaccine. At present, very few primary, circulating EBV genomes have been sequenced directly from infected individuals. While low levels of diversity and low viral evolution rates have been predicted for double-stranded DNA (dsDNA) viruses, recent studies have demonstrated appreciable diversity in common dsDNA pathogens (e.g., cytomegalovirus). Here, we report 40 full-length EBV genome sequences obtained from matched oral wash and B cell fractions from a cohort of 10 AIM patients. Both intra- and interpatient diversity were observed across the length of the entire viral genome. Diversity was most pronounced in viral genes required for establishing latent infection and persistence, with appreciable levels of diversity also detected in structural genes, including envelope glycoproteins. Interestingly, intrapatient diversity declined significantly over time ( P < 0.01), and this was particularly evident on comparison of viral genomes sequenced from B cell fractions in early primary infection and convalescence ( P < 0.001). B cell-associated viral genomes were observed to converge, becoming nearly identical to the B95.8 reference genome over time (Spearman rank-order correlation test; r = -0.5589, P = 0.0264). The reduction in diversity was most marked in the EBV latency genes. In summary, our data suggest independent convergence of diverse viral genome sequences toward a reference-like strain within a relatively short period following primary EBV infection. IMPORTANCE Identification of viral proteins with low variability and high immunogenicity is important for the development of a protective vaccine. Knowledge of genome diversity within circulating viral populations is a key step in this process, as is the expansion of intrahost genomic variation during infection. We report full-length EBV genomes sequenced from the blood and oral wash of 10 individuals early in primary infection and during convalescence. Our data demonstrate considerable diversity within the pool of circulating EBV strains, as well as within individual patients. Overall viral diversity decreased from early to persistent infection, particularly in latently infected B cells, which serve as the viral reservoir. Reduction in B cell-associated viral genome diversity coincided with a convergence toward a reference-like EBV genotype. Greater convergence positively correlated with time after infection, suggesting that the reference-like genome is the result of selection. Copyright © 2018 American Society for Microbiology.
Sequence analysis and expression of the M1 and M2 matrix protein genes of hirame rhabdovirus (HIRRV)
Nishizawa, T.; Kurath, G.; Winton, J.R.
1997-01-01
We have cloned and sequenced a 2318 nucleotide region of the genomic RNA of hirame rhabdovirus (HIRRV), an important viral pathogen of Japanese flounder Paralichthys olivaceus. This region comprises approximately two-thirds of the 3' end of the nucleocapsid protein (N) gene and the complete matrix protein (M1 and M2) genes with the associated intergenic regions. The partial N gene sequence was 812 nucleotides in length with an open reading frame (ORF) that encoded the carboxyl-terminal 250 amino acids of the N protein. The M1 and M2 genes were 771 and 700 nucleotides in length, respectively, with ORFs encoding proteins of 227 and 193 amino acids. The M1 gene sequence contained an additional small ORF that could encode a highly basic, arginine-rich protein of 25 amino acids. Comparisons of the N, M1, and M2 gene sequences of HIRRV with the corresponding sequences of the fish rhabdoviruses, infectious hematopoietic necrosis virus (IHNV) or viral hemorrhagic septicemia virus (VHSV) indicated that HIRRV was more closely related to IHNV than to VHSV, but was clearly distinct from either. The putative consensus gene termination sequence for IHNV and VHSV, AGAYAG(A)(7), was present in the N-M1, M1-M2, and M2-G intergenic regions of HIRRV as were the putative transcription initiation sequences YGGCAC and AACA. An Escherichia coli expression system was used to produce recombinant proteins from the M1 and M2 genes of HIRRV. These were the same size as the authentic M1 and M2 proteins and reacted with anti-HIRRV rabbit serum in western blots. These reagents can be used for further study of the fish immune response and to test novel control methods.
Phylogenetic tree of 16s rRNA sequences from sulfate-reducing bacteria in a sandy marine sediment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devereux, R.; Mundfrom, G.W.
1994-01-01
Phylogenetic divergence among sulfate-reducing bateria in an estuarine sediment sample was investigated by PCR amplification and comparison of partial 16S rDNA sequences. Twenty unique 16S rDNA sequences were found, 12 from delta subclass bacteria based on overall sequence similarity (82-91%). Two successive PCR amplifications were used to obtain and clone the 16S rDNA. The first reaction used templates derived from phosphate-buffered saline washed sediment with primers designed to amplify nearly full-length bacterial domain 16S rDNA. A produce from a first reaction was used as template in a second reaction with primers designed to selectivity amplify a region of 16S rDNAmore » genes of sulfate-reducing bacteria. A phylogenetic tree incorporating the cloned sequences suggests the presence of yet to be cultivated lines of sulfate-reducing bacteria within the sediment sample.« less
Pyrin gene and mutants thereof, which cause familial Mediterranean fever
Kastner, Daniel L [Bethesda, MD; Aksentijevichh, Ivona [Bethesda, MD; Centola, Michael [Tacoma Park, MD; Deng, Zuoming [Gaithersburg, MD; Sood, Ramen [Rockville, MD; Collins, Francis S [Rockville, MD; Blake, Trevor [Laytonsville, MD; Liu, P Paul [Ellicott City, MD; Fischel-Ghodsian, Nathan [Los Angeles, CA; Gumucio, Deborah L [Ann Arbor, MI; Richards, Robert I [North Adelaide, AU; Ricke, Darrell O [San Diego, CA; Doggett, Norman A [Santa Cruz, NM; Pras, Mordechai [Tel-Hashomer, IL
2003-09-30
The invention provides the nucleic acid sequence encoding the protein associated with familial Mediterranean fever (FMF). The cDNA sequence is designated as MEFV. The invention is also directed towards fragments of the DNA sequence, as well as the corresponding sequence for the RNA transcript and fragments thereof. Another aspect of the invention provides the amino acid sequence for a protein (pyrin) associated with FMF. The invention is directed towards both the full length amino acid sequence, fusion proteins containing the amino acid sequence and fragments thereof. The invention is also directed towards mutants of the nucleic acid and amino acid sequences associated with FMF. In particular, the invention discloses three missense mutations, clustered in within about 40 to 50 amino acids, in the highly conserved rfp (B30.2) domain at the C-terminal of the protein. These mutants include M6801, M694V, K695R, and V726A. Additionally, the invention includes methods for diagnosing a patient at risk for having FMF and kits therefor.
Kaur, G; Chandra, M; Dwivedi, P N
2016-03-01
Canine parvovirus (CPV) causes hemorrhagic enteritis, especially in young dogs, leading to high morbidity and mortality. It has four main antigenic types CPV-2, CPV-2a, CPV-2b and CPV-2c. Virus protein 2 (VP2) is the main capsid protein and mutations affecting VP2 gene are responsible for the evolution of various antigenic types of CPV. Full length VP2 gene from field isolates was amplified and cloned for sequence analysis. The sequences were submitted to the GenBank and were assigned Acc. Nos., viz. KP406928.1 for P12, KP406927.1 for P15, KP406930.1 for P32, KP406926.1 for Megavac-6 and KP406929.1 for NobivacDHPPi. Phylogenetic analysis indicated that the samples were forming a separate clad with vaccine strains. When the samples were compared with the world and Indian isolates, it was observed that samples formed a separate node indicating regional genetic variation in CPV.
Characterization of Toll-like receptor 3 gene in large yellow croaker, Pseudosciaena crocea.
Huang, Xue-Na; Wang, Zhi-Yong; Yao, Cui-Luan
2011-07-01
Toll-like receptor 3 (TLR3) plays an important role in innate immune responses. In this report, the full-length cDNA sequence and genomic structure of Pseudosciaena crocea TLR3 (PcTLR3) were identified and characterized. The full-length cDNA of PcTLR3 was of 3384 bp, including a 5'-terminal untranslated region (UTR) of 65 bp, a 3'-terminal UTR of 589 bp and an open reading frame (ORF) of 2730 bp encoding a polypeptide of 909 amino acid residues. The full-length genome sequence of PcTLR3 was composed of 5721 nucleotides, including five exons and four introns. The putative PcTLR3 protein contained a signal peptide sequence, 16 leucine-rich repeat (LRR) motifs, a transmembrane region and a Toll/interleukin-1 receptor (TIR) domain. Quantitative real-time reverse transcription PCR analysis revealed a broad expression of PcTLR3 in most tissues, with the predominant expression in liver, then intestine, and the weakest expression in blood cells. The expression of PcTLR3 after injection with poly inosinic:cytidylic (I:C) and Vibrio parahemolyticus was tested in spleen, blood cells and liver. The results indicated that PcTLR3 transcripts could be induced in the three tissues by injection with poly I:C. The highest expression was in the blood cells with 43.5 times (at 6h) greater expression than in the control (p<0.05). In addition, after V. parahemolyticus challenge, a moderate up-regulation and down-regulation of PcTLR3 was found in blood cells and liver, respectively. Our results suggested that PcTLR3 might play an important role in fish's defense against both viral and bacterial infection. Copyright © 2011 Elsevier Ltd. All rights reserved.
Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature
Pozzoli, Uberto; Elgar, Greg; Cagliani, Rachele; Riva, Laura; Comi, Giacomo P.; Bresolin, Nereo; Bardoni, Alessandra; Sironi, Manuela
2003-01-01
The human DMD gene is the largest known to date, spanning > 2000 kb on the X chromosome. The gene size is mainly accounted for by huge intronic regions. We sequenced 190 kb of Fugu rubripes (pufferfish) genomic DNA corresponding to the complete dystrophin gene (FrDMD) and provide the first report of gene structure and sequence comparison among dystrophin genomic sequences from different vertebrate organisms. Almost all intron positions and phases are conserved between FrDMD and its mammalian counterparts, and the predicted protein product of the Fugu gene displays 55% identity and 71% similarity to human dystrophin. In analogy to the human gene, FrDMD presents several-fold longer than average intronic regions. Analysis of intron sequences of the human and murine genes revealed that they are extremely conserved in size and that a similar fraction of total intron length is represented by repetitive elements; moreover, our data indicate that intron expansion through repeat accumulation in the two orthologs is the result of independent insertional events. The hypothesis that intron length might be functionally relevant to the DMD gene regulation is proposed and substantiated by the finding that dystrophin intron gigantism is common to the three vertebrate genes. [Supplemental material is available online at www.genome.org.] PMID:12727896
[Cloning and functional characterization of phytoene desaturase in Andrographis paniculata].
Shen, Qin-qin; Li, Li-xia; Zhan, Peng-lin; Wang, Qiang
2015-10-01
A full-length cDNA of phytoene desaturase (PDS) gene from Andrographis paniculata was obtained through RACE-PCR. The cDNA sequence consists of 2 224 bp with an intact ORF of 1 752 bp (GeneBank: KP982892), encoding a ploypeptide of 584 amino acids. Homology analysis showed that the deduced protein has extensive sequence similarities to PDS from other plants, and contains a conserved NAD ( H) -binding domain of plant dehydrase cofactor binding-domain in N-terminal. Phylogenetic analysis demonstrated that ApPDS was more related to PDS of Sesamum indicum and Pogostemon cablin. The semi-quantitative RT-PCR analysis revealed that ApPDS expressed in whole aboveground tissues with the highest expression in leaves. Virus induced gene silencing (VIGS) was performed to characterize the functional of ApPDS in planta. Significant photobleaching was not observed in infiltrated leaves, while the PDS gene has been down-regulated significantly at the yellowish area. To the best of our knowledge, this represents the first report of PDS gene cloning and functional characterization from A. paniculata, which lays the foundation for further investigation of new genes, especially that correlative to andrographolide biosynthetic pathway.
Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale
He, Liu; Fu, Shuhua; Xu, Zhichao; Yan, Jun; Xu, Jiang; Zhou, Hong; Zhou, Jianguo; Chen, Xinlian; Li, Ying; Au, Kin Fai; Yao, Hui
2017-01-01
Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem. PMID:28981454
Diop, Khoudia; Diop, Awa; Levasseur, Anthony; Mediannikov, Oleg; Robert, Catherine; Armstrong, Nicholas; Couderc, Carine; Bretelle, Florence; Raoult, Didier; Fournier, Pierre-Edouard; Fenollar, Florence
2018-03-01
Microbial culturomics is a new subfield of postgenomic medicine and omics biotechnology application that has broadened our awareness on bacterial diversity of the human microbiome, including the human vaginal flora bacterial diversity. Using culturomics, a new obligate anaerobic Gram-stain-negative rod-shaped bacterium designated strain khD1 T was isolated in the vagina of a patient with bacterial vaginosis and characterized using taxonogenomics. The most abundant cellular fatty acids were C 15:0 anteiso (36%), C 16:0 (19%), and C 15:0 iso (10%). Based on an analysis of the full-length 16S rRNA gene sequences, phylogenetic analysis showed that the strain khD1 T exhibited 90% sequence similarity with Prevotella loescheii, the phylogenetically closest validated Prevotella species. With 3,763,057 bp length, the genome of strain khD1 T contained (mol%) 48.7 G + C and 3248 predicted genes, including 3194 protein-coding and 54 RNA genes. Given the phenotypical and biochemical characteristic results as well as genome sequencing, strain khD1 T is considered to represent a novel species within the genus Prevotella, for which the name Prevotella lascolaii sp. nov. is proposed. The type strain is khD1 T ( = CSUR P0109, = DSM 101754). These results show that microbial culturomics greatly improves the characterization of the human microbiome repertoire by isolating potential putative new species. Further studies will certainly clarify the microbial mechanisms of pathogenesis of these new microbes and their role in health and disease. Microbial culturomics is an important new addition to the diagnostic medicine toolbox and warrants attention in future medical, global health, and integrative biology postgraduate teaching curricula.
Jung, Woongsic; Kim, Eun Jae; Han, Se Jong; Choi, Han-Gu; Kim, Sanghee
2016-10-01
Stearoyl-CoA desaturase is a key regulator in fatty acid metabolism that catalyzes the desaturation of stearic acid to oleic acid and controls the intracellular levels of monounsaturated fatty acids (MUFAs). Two stearoyl-CoA desaturases (SCD, Δ9 desaturases) genes were identified in an Antarctic copepod, Tigriopus kingsejongensis, that was collected in a tidal pool near the King Sejong Station, King George Island, Antarctica. Full-length complementary DNA (cDNA) sequences of two T. kingsejongensis SCDs (TkSCDs) were obtained from next-generation sequencing and isolated by reverse transcription PCR. DNA sequence lengths of the open reading frames of TkSCD-1 and TkSCD-2 were determined to be 1110 and 681 bp, respectively. The molecular weights deduced from the corresponding genes were estimated to be 43.1 kDa (TkSCD-1) and 26.1 kDa (TkSCD-2). The amino acid sequences were compared with those of fatty acid desaturases and sterol desaturases from various organisms and used to analyze the relationships among TkSCDs. As assessed by heterologous expression of recombinant proteins in Escherichia coli, the enzymatic functions of both stearoyl-CoA desaturases revealed that the amount of C16:1 and C18:1 fatty acids increased by greater than 3-fold after induction with isopropyl β-D-thiogalactopyranoside. In particular, C18:1 fatty acid production increased greater than 10-fold in E. coli expressing TkSCD-1 and TkSCD-2. The results of this study suggest that both SCD genes from an Antarctic marine copepod encode a functional desaturase that is capable of increasing the amounts of palmitoleic acid and oleic acid in a prokaryotic expression system.
Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis
Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.
2011-01-01
The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189
Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun
2012-01-01
The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979
Afouda, Pamela; Durand, Guillaume A; Lagier, Jean-Christophe; Labas, Noémie; Cadoret, Fréderic; Armstrong, Nicholas; Raoult, Didier; Dubourg, Grégory
2018-04-14
Intestinimonas massiliensis sp. nov strain GD2 T is a new species of the genus Intestinimonas (the second, following Intestinimonas butyriciproducens gen. nov., sp. nov). First isolated from the gut microbiota of a healthy subject of French origin using a culturomics approach combined with taxono-genomics, it is strictly anaerobic, nonspore-forming, rod-shaped, with catalase- and oxidase-negative reactions. Its growth was observed after preincubation in an anaerobic blood culture enriched with sheep blood (5%) and rumen fluid (5%), incubated at 37°C. Its phenotypic and genotypic descriptions are presented in this paper with a full annotation of its genome sequence. This genome consists of 3,104,261 bp in length and contains 3,074 predicted genes, including 3,012 protein-coding genes and 62 RNA-coding genes. Strain GD2 T significantly produces butyrate and is frequently found among available 16S rRNA gene amplicon datasets, which leads consideration of Intestinimonas massiliensis as an important human gut commensal. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Evidence for Phex haploinsufficiency in murine X-linked hypophosphatemia.
Wang, L; Du, L; Ecarot, B
1999-04-01
Mutations in the PHEX gene (phosphate-regulating gene with homology to endopeptidases on the X-chromosome) are responsible for X-linked hypophosphatemia (HYP). We previously reported the full-length coding sequence of murine Phex cDNA and provided evidence of Phex expression in bone and tooth. Here, we report the cloning of the entire 3.5-kb 3'UTR of the Phex gene, yielding a total of 6248 bp for the Phex transcript. Southern blot and RT-PCR analyses revealed that the 3' end of the coding sequence and the 3'UTR of the Phex gene, spanning exons 16 to 22, are deleted in Hyp, the mouse model for HYP. Northern blot analysis of bone revealed lack of expression of stable Phex mRNA from the mutant allele and expression of Phex transcripts from the wild-type allele in Hyp heterozygous females. Expression of the Phex protein in heterozygotes was confirmed by Western analysis with antibodies raised against a COOH-terminal peptide of the mouse Phex protein. Taken together, these results indicate that the dominant pattern of Hyp inheritance in mice is due to Phex haploinsufficiency.
Genomic organization of the neurofibromatosis 1 gene (NF1)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Y.; O`Connell, P.; Huntsman Breidenbach, H.
Neurofibromatosis 1 maps to chromosome band 17q11.2, and the NF1 locus has been partially characterized. Even though the full-length NF1 cDNA has been sequenced, the complete genomic structure of the NF1 gene has not been elucidated. The 5{prime} end of NF1 is embedded in a CpG island containing a NotI restriction site, and the remainder of the gene lies in the adjacent 350-kb NotI fragment. In our efforts to develop a comprehensive screen for NF1 mutations, we have isolated genomic DNA clones that together harbor the entire NF1 cDNA sequence. We have identified all intron-exon boundaries of the coding regionmore » and established that it is composed of 59 exons. Furthermore, we have defined the 3{prime}-untranslated region (3{prime}-UTR) of the NF1 gene; it spans approximately 3.5 kb of genomic DNA sequence and is continuous with the stop codon. Oligonucleotide primer pairs synthesized from exon-flanking DNA sequences were used in the polymerase chain reaction with cloned, chromosome 17-specific genomic DNA as template to amplify NF1 exons 1 through 27b and the exon containing the 3{prime}-UTR separately. This information should be useful for implementing a comprehensive NF1 mutation screen using genomic DNA as template. 41 refs., 3 figs., 2 tabs.« less
Costa, José Hélio; de Melo, Dirce Fernandes; Gouveia, Zélia; Cardoso, Hélia Guerra; Peixe, Augusto; Arnholdt-Schmitt, Birgit
2009-12-01
'Genomic design' refers to the structural organization of gene sequences. Recently, the role of intron sequences for gene regulation is being better understood. Further, introns possess high rates of polymorphism that are considered as the major source for speciation. In molecular breeding, the length of gene-specific introns is recognized as a tool to discriminate genotypes with diverse traits of agronomic interest. 'Economy selection' and 'time-economy selection' have been proposed as models for explaining why highly expressed genes typically contain small introns. However, in contrast to these theories, plant-specific selection reveals that highly expressed genes contain introns that are large. In the presented research, 'wet'Aox gene identification from grapevine is advanced by a bioinformatics approach to study the species-specific organization of Aox gene structures in relation to available expressed sequence tag (EST) data. Two Aox1 and one Aox2 gene sequences have been identified in Vitis vinifera using grapevine cultivars from Portugal and Germany. Searching the complete genome sequence data of two grapevine cultivars confirmed that V. vinifera alternative oxidase (Aox) is encoded by a small multigene family composed of Aox1a, Aox1b and Aox2. An analysis of EST distribution revealed high expression of the VvAox2 gene. A relationship between the atypical long primary transcript of VvAox2 (in comparison to other plant Aox genes) and its expression level is suggested. V. vinifera Aox genes contain four exons interrupted by three introns except for Aox1a which contains an additional intron in the 3'-UTR. The lengths of primary Aox transcripts were estimated for each gene in two V. vinifera varieties: PN40024 and Pinot Noir. In both varieties, Aox1a and Aox1b contained small introns that corresponded to primary transcript lengths ranging from 1501 to 1810 bp. The Aox2 of PN40024 (12 329 bp) was longer than that from Pinot Noir (7279 bp) because of selection against a transposable-element insertion that is 5028 bp in size. An EST database basic local alignment search tool (BLAST) search of GenBank revealed the following ESTs percentages for each gene: Aox1a (26.2%), Aox1b (11.9%) and Aox2 (61.9%). Aox1a was expressed in fruits and roots, Aox1b expression was confined to flowers and Aox2 was ubiquitously expressed. These data for V. vinifera show that atypically long Aox intron lengths are related to high levels of gene expression. Furthermore, it is shown for the first time that two grapevine cultivars can be distinguished by Aox intron length polymorphism.
NASA Astrophysics Data System (ADS)
Qi, Fei; Guo, Huarong; Wang, Jian
2008-02-01
Reversible protein phosphorylation, catalyzed by protein kinases and phosphatases, is an important and versatile mechanism by which eukaryotic cells regulate almost all the signaling processes. Protein phosphatase 1 (PP1) is the first and well-characterized member of the protein serine/threonine phosphatase family. In the present study, a full-length cDNA encoding the beta isoform of the catalytic subunit of protein phosphatase 1(PP1cb), was for the first time isolated and sequenced from the skin tissue of flatfish turbot Scophthalmus maximus, designated SmPP1cb, by the rapid amplification of cDNA ends (RACE) technique. The cDNA sequence of SmPP1cb we obtained contains a 984 bp open reading frame (ORF), flanked by a complete 39 bp 5' untranslated region and 462 bp 3' untranslated region. The ORF encodes a putative 327 amino acid protein, and the N-terminal section of this protein is highly acidic, Met-Ala-Glu-Gly-Glu-Leu-Asp-Val-Asp, a common feature for PP1 catalytic subunit but absent in protein phosphatase 2B (PP2B). And its calculated molecular mass is 37 193 Da and pI 5.8. Sequence analysis indicated that, SmPP1cb is extremely conserved in both amino acid and nucleotide acid levels compared with the PP1cb of other vertebrates and invertebrates, and its Kozak motif contained in the 5'UTR around ATG start codon is GXXAXXGXX ATGG, which is different from mammalian in two positions A-6 and G-3, indicating the possibility of different initiation of translation in turbot, and also the 3'UTR of SmPP1cb is highly diverse in the sequence similarity and length compared with other animals, especially zebrafish. The cloning and sequencing of SmPP1cb gene lays a good foundation for the future work on the biological functions of PP1 in the flatfish turbot.
A comparative molecular analysis of water-filled limestone sinkholes in north-eastern Mexico.
Sahl, Jason W; Gary, Marcus O; Harris, J Kirk; Spear, John R
2011-01-01
Sistema Zacatón in north-eastern Mexico is host to several deep, water-filled, anoxic, karstic sinkholes (cenotes). These cenotes were explored, mapped, and geochemically and microbiologically sampled by the autonomous underwater vehicle deep phreatic thermal explorer (DEPTHX). The community structure of the filterable fraction of the water column and extensive microbial mats that coat the cenote walls was investigated by comparative analysis of small-subunit (SSU) 16S rRNA gene sequences. Full-length Sanger gene sequence analysis revealed novel microbial diversity that included three putative bacterial candidate phyla and three additional groups that showed high intra-clade distance with poorly characterized bacterial candidate phyla. Limited functional gene sequence analysis in these anoxic environments identified genes associated with methanogenesis, sulfate reduction and anaerobic ammonium oxidation. A directed, barcoded amplicon, multiplex pyrosequencing approach was employed to compare ∼100,000 bacterial SSU gene sequences from water column and wall microbial mat samples from five cenotes in Sistema Zacatón. A new, high-resolution sequence distribution profile (SDP) method identified changes in specific phylogenetic types (phylotypes) in microbial mats at varied depths; Mantel tests showed a correlation of the genetic distances between mat communities in two cenotes and the geographic location of each cenote. Community structure profiles from the water column of three neighbouring cenotes showed distinct variation; statistically significant differences in the concentration of geochemical constituents suggest that the variation observed in microbial communities between neighbouring cenotes are due to geochemical variation. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.
Lumkul, Lalita; Sawaswong, Vorthon; Simpalipan, Phumin; Kaewthamasorn, Morakot; Harnyuttanakorn, Pongchai; Pattaradilokrat, Sittiporn
2018-01-01
Development of an effective vaccine is critically needed for the prevention of malaria. One of the key antigens for malaria vaccines is the apical membrane antigen 1 (AMA-1) of the human malaria parasite Plasmodium falciparum, the surface protein for erythrocyte invasion of the parasite. The gene encoding AMA-1 has been sequenced from populations of P. falciparum worldwide, but the haplotype diversity of the gene in P. falciparum populations in the Greater Mekong Subregion (GMS), including Thailand, remains to be characterized. In the present study, the AMA-1 gene was PCR amplified and sequenced from the genomic DNA of 65 P. falciparum isolates from 5 endemic areas in Thailand. The nearly full-length 1,848 nucleotide sequence of AMA-1 was subjected to molecular analyses, including nucleotide sequence diversity, haplotype diversity and deduced amino acid sequence diversity and neutrality tests. Phylogenetic analysis and pairwise population differentiation (Fst indices) were performed to infer the population structure. The analyses identified 60 single nucleotide polymorphic loci, predominately located in domain I of AMA-1. A total of 31 unique AMA-1 haplotypes were identified, which included 11 novel ones. The phylogenetic tree of the AMA-1 haplotypes revealed multiple clades of AMA-1, each of which contained parasites of multiple geographical origins, consistent with the Fst indices indicating genetic homogeneity or gene flow among geographically distinct populations of P. falciparum in Thailand’s borders with Myanmar, Laos and Cambodia. In summary, the study revealed novel haplotypes and population structure needed for the further advancement of AMA-1-based malaria vaccines in the GMS. PMID:29742870
Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.
Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong
2016-05-01
Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.
Jimenez-Guardeño, Jose M; Regla-Nava, Jose A; Nieto-Torres, Jose L; DeDiego, Marta L; Castaño-Rodriguez, Carlos; Fernandez-Delgado, Raul; Perlman, Stanley; Enjuanes, Luis
2015-10-01
A SARS-CoV lacking the full-length E gene (SARS-CoV-∆E) was attenuated and an effective vaccine. Here, we show that this mutant virus regained fitness after serial passages in cell culture or in vivo, resulting in the partial duplication of the membrane gene or in the insertion of a new sequence in gene 8a, respectively. The chimeric proteins generated in cell culture increased virus fitness in vitro but remained attenuated in mice. In contrast, during SARS-CoV-∆E passage in mice, the virus incorporated a mutated variant of 8a protein, resulting in reversion to a virulent phenotype. When the full-length E protein was deleted or its PDZ-binding motif (PBM) was mutated, the revertant viruses either incorporated a novel chimeric protein with a PBM or restored the sequence of the PBM on the E protein, respectively. Similarly, after passage in mice, SARS-CoV-∆E protein 8a mutated, to now encode a PBM, and also regained virulence. These data indicated that the virus requires a PBM on a transmembrane protein to compensate for removal of this motif from the E protein. To increase the genetic stability of the vaccine candidate, we introduced small attenuating deletions in E gene that did not affect the endogenous PBM, preventing the incorporation of novel chimeric proteins in the virus genome. In addition, to increase vaccine biosafety, we introduced additional attenuating mutations into the nsp1 protein. Deletions in the carboxy-terminal region of nsp1 protein led to higher host interferon responses and virus attenuation. Recombinant viruses including attenuating mutations in E and nsp1 genes maintained their attenuation after passage in vitro and in vivo. Further, these viruses fully protected mice against challenge with the lethal parental virus, and are therefore safe and stable vaccine candidates for protection against SARS-CoV.
Nieto-Torres, Jose L.; DeDiego, Marta L.; Castaño-Rodriguez, Carlos; Fernandez-Delgado, Raul; Perlman, Stanley; Enjuanes, Luis
2015-01-01
A SARS-CoV lacking the full-length E gene (SARS-CoV-∆E) was attenuated and an effective vaccine. Here, we show that this mutant virus regained fitness after serial passages in cell culture or in vivo, resulting in the partial duplication of the membrane gene or in the insertion of a new sequence in gene 8a, respectively. The chimeric proteins generated in cell culture increased virus fitness in vitro but remained attenuated in mice. In contrast, during SARS-CoV-∆E passage in mice, the virus incorporated a mutated variant of 8a protein, resulting in reversion to a virulent phenotype. When the full-length E protein was deleted or its PDZ-binding motif (PBM) was mutated, the revertant viruses either incorporated a novel chimeric protein with a PBM or restored the sequence of the PBM on the E protein, respectively. Similarly, after passage in mice, SARS-CoV-∆E protein 8a mutated, to now encode a PBM, and also regained virulence. These data indicated that the virus requires a PBM on a transmembrane protein to compensate for removal of this motif from the E protein. To increase the genetic stability of the vaccine candidate, we introduced small attenuating deletions in E gene that did not affect the endogenous PBM, preventing the incorporation of novel chimeric proteins in the virus genome. In addition, to increase vaccine biosafety, we introduced additional attenuating mutations into the nsp1 protein. Deletions in the carboxy-terminal region of nsp1 protein led to higher host interferon responses and virus attenuation. Recombinant viruses including attenuating mutations in E and nsp1 genes maintained their attenuation after passage in vitro and in vivo. Further, these viruses fully protected mice against challenge with the lethal parental virus, and are therefore safe and stable vaccine candidates for protection against SARS-CoV. PMID:26513244
Hofman, Sebastian; Pabijan, Maciej; Osikowski, Artur; Litvinchuk, Spartak N; Szymura, Jacek M
2016-09-01
We present the full-length mitogenome sequences of four European water frog species: Pelophylax cypriensis, P. epeiroticus, P. kurtmuelleri and P. shqipericus. The mtDNA size varied from 17,363 to 17,895 bp, and its organization with the LPTF tRNA gene cluster preceding the 12 S rRNA gene displayed the typical Neobatrachian arrangement. Maximum likelihood and Bayesian inference revealed a well-resolved mtDNA phylogeny of seven European Pelophylax species. The uncorrected p-distance for among Pelophylax mitogenomes was 9.6 (range 0.01-0.13). Most divergent was the P. shqipericus mitogenome, clustering with the "P. lessonae" group, in contrast to the other three new Pelophylax mitogenomes related to the "P. bedriagae/ridibundus" lineage. The new mitogenomes resolve ambiguities of the phylogenetic placement of P. cretensis and P. epeiroticus.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xian-Wen, E-mail: xianwenli01@sina.com; College of Life Science, Xinyang Normal University, Xinyang 464000; Key Laboratory of Horticultural Plant Biology of the Ministry of Education, Huazhong Agricultural University, Wuhan 430070
In present research, the full-length cDNA and the genomic sequence of a novel cold-regulated gene, CsCOR1, were isolated from Camellia sinensis L. The deduced protein CsCOR1 contains a hydrophobic N-terminus as a signal peptide and a hydrophilic C-terminal domain that is rich in glycine, arginine and proline. Two internal repetitive tridecapeptide fragments (HSVTAGRGGYNRG) exist in the middle of the C-terminal domain and the two nucleotide sequences encoding them are identical. CsCOR1 was localized in the cell walls of transgenic-tobaccos via CsCOR1::GFP fusion approach. The expression of CsCOR1 in tea leaves was enhanced dramatically by both cold- and dehydration-stress. And overexpressionmore » of CsCOR1 in transgenic-tobaccos improved obviously the tolerance to salinity and dehydration.« less
Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T
2013-06-01
Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.
The Essential Genome of Escherichia coli K-12
2018-01-01
ABSTRACT Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. PMID:29463657
Feeding-Related Traits Are Affected by Dosage of the foraging Gene in Drosophila melanogaster
Allen, Aaron M.; Anreiter, Ina; Neville, Megan C.; Sokolowski, Marla B.
2017-01-01
Nutrient acquisition and energy storage are critical parts of achieving metabolic homeostasis. The foraging gene in Drosophila melanogaster has previously been implicated in multiple feeding-related and metabolic traits. Before foraging’s functions can be further dissected, we need a precise genetic null mutant to definitively map its amorphic phenotypes. We used homologous recombination to precisely delete foraging, generating the for0 null allele, and used recombineering to reintegrate a full copy of the gene, generating the {forBAC} rescue allele. We show that a total loss of foraging expression in larvae results in reduced larval path length and food intake behavior, while conversely showing an increase in triglyceride levels. Furthermore, varying foraging gene dosage demonstrates a linear dose-response on these phenotypes in relation to foraging gene expression levels. These experiments have unequivocally proven a causal, dose-dependent relationship between the foraging gene and its pleiotropic influence on these feeding-related traits. Our analysis of foraging’s transcription start sites, termination sites, and splicing patterns using rapid amplification of cDNA ends (RACE) and full-length cDNA sequencing, revealed four independent promoters, pr1–4, that produce 21 transcripts with nine distinct open reading frames (ORFs). The use of alternative promoters and alternative splicing at the foraging locus creates diversity and flexibility in the regulation of gene expression, and ultimately function. Future studies will exploit these genetic tools to precisely dissect the isoform- and tissue-specific requirements of foraging’s functions and shed light on the genetic control of feeding-related traits involved in energy homeostasis. PMID:28007892
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pestov, Nikolay B., E-mail: korn@mail.ibch.ru; Dmitriev, Ruslan I.; Kostina, Maria B.
Highlights: Black-Right-Pointing-Pointer Full-length secretory pathway Ca-ATPase (SPCA2) cloned from rat duodenum. Black-Right-Pointing-Pointer ATP2C2 gene (encoding SPCA2) exists only in genomes of Tetrapoda. Black-Right-Pointing-Pointer Rat and pig SPCA2 are expressed in intestines, lung and some secretory glands. Black-Right-Pointing-Pointer Subcellular localization of SPCA2 may depend on tissue type. Black-Right-Pointing-Pointer In rat duodenum, SPCA2 is localized in plasma membrane-associated compartments. -- Abstract: Secretory pathway Ca-ATPases are less characterized mammalian calcium pumps than plasma membrane Ca-ATPases and sarco-endoplasmic reticulum Ca-ATPases. Here we report analysis of molecular evolution, alternative splicing, tissue-specific expression and subcellular localization of the second isoform of the secretory pathway Ca-ATPase (SPCA2),more » the product of the ATP2C2 gene. The primary structure of SPCA2 from rat duodenum deduced from full-length transcript contains 944 amino acid residues, and exhibits 65% sequence identity with known SPCA1. The rat SPCA2 sequence is also highly homologous to putative human protein KIAA0703, however, the latter seems to have an aberrant N-terminus originating from intron 2. The tissue-specificity of SPCA2 expression is different from ubiquitous SPCA1. Rat SPCA2 transcripts were detected predominantly in gastrointestinal tract, lung, trachea, lactating mammary gland, skin and preputial gland. In the newborn pig, the expression profile is very similar with one remarkable exception: porcine bulbourethral gland gave the strongest signal. Upon overexpression in cultured cells, SPCA2 shows an intracellular distribution with remarkable enrichment in Golgi. However, in vivo SPCA2 may be localized in compartments that differ among various tissues: it is intracellular in epidermis, but enriched in plasma membranes of the intestinal epithelium. Analysis of SPCA2 sequences from various vertebrate species argue that ATP2C2 gene radiated from ATP2C1 (encoding SPCA1) during adaptation of tetrapod ancestors to terrestrial habitats.« less
The complete mitochondrial genome sequence of Diaphorina citri (Hemiptera: Psyllidae)
USDA-ARS?s Scientific Manuscript database
The first complete mitochondrial genome (mitogenome) sequence of Asian citrus psyllid, Diaphorina citri (Hemiptera: Psyllidae), from Guangzhou, China is presented. The circular mitogenome is 14,996 bp in length with an A+T content of 74.5%, and contains 13 protein-coding genes (PCGs), 22 tRNA genes ...
Ma, Kaifeng; Sun, Lidan; Cheng, Tangren; Pan, Huitang; Wang, Jia; Zhang, Qixiang
2018-01-01
Increasing evidence shows that epigenetics plays an important role in phenotypic variance. However, little is known about epigenetic variation in the important ornamental tree Prunus mume. We used amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP) techniques, and association analysis and sequencing to investigate epigenetic variation and its relationships with genetic variance, environment factors, and traits. By performing leaf sampling, the relative total methylation level (29.80%) was detected in 96 accessions of P. mume. And the relative hemi-methylation level (15.77%) was higher than the relative full methylation level (14.03%). The epigenetic diversity (I∗ = 0.575, h∗ = 0.393) was higher than the genetic diversity (I = 0.484, h = 0.319). The cultivated population displayed greater epigenetic diversity than the wild populations in both southwest and southeast China. We found that epigenetic variance and genetic variance, and environmental factors performed cooperative structures, respectively. In particular, leaf length, width and area were positively correlated with relative full methylation level and total methylation level, indicating that the DNA methylation level played a role in trait variation. In total, 203 AFLP and 423 MSAP associated markers were detected and 68 of them were sequenced. Homologous analysis and functional prediction suggested that the candidate marker-linked genes were essential for leaf morphology development and metabolism, implying that these markers play critical roles in the establishment of leaf length, width, area, and ratio of length to width. PMID:29441078
Ma, Kaifeng; Sun, Lidan; Cheng, Tangren; Pan, Huitang; Wang, Jia; Zhang, Qixiang
2018-01-01
Increasing evidence shows that epigenetics plays an important role in phenotypic variance. However, little is known about epigenetic variation in the important ornamental tree Prunus mume . We used amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP) techniques, and association analysis and sequencing to investigate epigenetic variation and its relationships with genetic variance, environment factors, and traits. By performing leaf sampling, the relative total methylation level (29.80%) was detected in 96 accessions of P . mume . And the relative hemi-methylation level (15.77%) was higher than the relative full methylation level (14.03%). The epigenetic diversity ( I ∗ = 0.575, h ∗ = 0.393) was higher than the genetic diversity ( I = 0.484, h = 0.319). The cultivated population displayed greater epigenetic diversity than the wild populations in both southwest and southeast China. We found that epigenetic variance and genetic variance, and environmental factors performed cooperative structures, respectively. In particular, leaf length, width and area were positively correlated with relative full methylation level and total methylation level, indicating that the DNA methylation level played a role in trait variation. In total, 203 AFLP and 423 MSAP associated markers were detected and 68 of them were sequenced. Homologous analysis and functional prediction suggested that the candidate marker-linked genes were essential for leaf morphology development and metabolism, implying that these markers play critical roles in the establishment of leaf length, width, area, and ratio of length to width.
Renieri, Carlo; La Terza, Antonietta
2015-01-01
The objectives of the present study were to characterize the MC1R gene, its transcripts and the single nucleotide polymorphisms (SNPs) associated with coat color in alpaca. Full length cDNA amplification revealed the presence of two transcripts, named as F1 and F2, differing only in the length of their 5′-terminal untranslated region (UTR) sequences and presenting a color specific expression. Whereas the F1 transcript was common to white and colored (black and brown) alpaca phenotypes, the shorter F2 transcript was specific to white alpaca. Further sequencing of the MC1R gene in white and colored alpaca identified a total of twelve SNPs; among those nine (four silent mutations (c.126C>A, c.354T>C, c.618G>A, and c.933G>A); five missense mutations (c.82A>G, c.92C>T, c.259A>G, c.376A>G, and c.901C>T)) were observed in coding region and three in the 3′UTR. A 4 bp deletion (c.224 227del) was also identified in the coding region. Molecular segregation analysis uncovered that the combinatory mutations in the MC1R locus could cause eumelanin and pheomelanin synthesis in alpaca. Overall, our data refine what is known about the MC1R gene and provides additional information on its role in alpaca pigmentation. PMID:25685836
Uzbekova, Svetlana; Roy-Sabau, Monica; Dalbiès-Tran, Rozenn; Perreau, Christine; Papillier, Pascal; Mompart, Florence; Thelie, Aurore; Pennetier, Sophie; Cognie, Juliette; Cadoret, Veronique; Royere, Dominique; Monget, Philippe; Mermillod, Pascal
2006-01-01
Background Zygote arrest 1 (ZAR1) is one of the few known oocyte-specific maternal-effect genes essential for the beginning of embryo development discovered in mice. This gene is evolutionary conserved in vertebrates and ZAR1 protein is characterized by the presence of atypical plant homeobox zing finger domain, suggesting its role in transcription regulation. This work was aimed at the study of this gene, which could be one of the key regulators of successful preimplantation development of domestic animals, in pig and cattle, as compared with human. Methods Screenings of somatic cell hybrid panels and in silico research were performed to characterize ZAR1 chromosome localization and sequences. Rapid amplification of cDNA ends was used to obtain full-length cDNAs. Spatio-temporal mRNA expression patterns were studied using Northern blot, reverse transcription coupled to polymerase chain reaction and in situ hybridization. Results We demonstrated that ZAR1 is a single copy gene, positioned on chromosome 8 in pig and 6 in cattle, and several variants of correspondent cDNA were cloned from oocytes. Sequence analysis of ZAR1 cDNAs evidenced numerous short inverted repeats within the coding sequences and putative Pumilio-binding and embryo-deadenylation elements within the 3'-untranslated regions, indicating the potential regulation ways. We showed that ZAR1 expressed exclusively in oocytes in pig ovary, persisted during first cleavages in embryos developed in vivo and declined sharply in morulae and blastocysts. ZAR1 mRNA was also detected in testis, and, at lower level, in hypothalamus and pituitary in both species. For the first time, ZAR1 was localized in testicular germ cells, notably in round spermatids. In addition, in pig, cattle and human only shorter ZAR1 transcript variants resulting from alternative splicing were found in testis as compared to oocyte. Conclusion Our data suggest that in addition to its role in early embryo development highlighted by expression pattern of full-length transcript in oocytes and early embryos, ZAR1 could also be implicated in the regulation of meiosis and post meiotic differentiation of male and female germ cells through expression of shorter splicing variants. Species conservation of ZAR1 expression and regulation underlines the central role of this gene in early reproductive processes. PMID:16551357
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
Kim, Bo-Mi; Jeong, Chang-Bum; Han, Jeonghoon; Kim, Il-Chan; Rhee, Jae-Sung; Lee, Jae-Seong
2013-09-01
To identify and characterize CHH (TJ-CHH) gene in the copepod Tigriopus japonicus, we analyzed the full-length cDNA sequence, genomic structure, and promoter region. The full-length TJ-CHH cDNA was 716 bp in length, encoding 136 amino acid residues. The deduced amino acid sequences of TJ-CHH showed a high similarity of the CHH mature domain to other crustaceans. Six conserved cysteine residues and five conserved structural motifs in the CHH mature peptide domain were also observed. The genomic structure of the TJ-CHH gene contained three exons and two introns in its open reading frame (ORF), and several transcriptional elements were detected in the promoter region of the TJ-CHH gene. To investigate transcriptional change of TJ-CHH under environmental stress, T. japonicus were exposed to heat treatment, UV-B radiation, heavy metals, and water-accommodated fractions (WAFs) of Iranian crude oil. Upon heat stress, TJ-CHH transcripts were elevated at 30 °C and 35 °C for 96 h in a time-course experiment. UV-B radiation led to a decreased pattern of the TJ-CHH transcript 48 h and more after radiation (12 kJ/m(2)). After exposure of a fixed dose (12 kJ/m(2)) in a time-course experiment, TJ-CHH transcript was down-regulated in time-dependent manner with a lowest value at 12h. However, the TJ-CHH transcript level was increased in response to five heavy metal exposures for 96 h. Also, the level of the TJ-CHH transcript was significantly up-regulated at 20% of WAFs after exposure to WAFs for 48 h and then remarkably reduced in a dose-dependent manner. These findings suggest that the enhanced TJ-CHH transcript level is associated with a cellular stress response of the TJ-CHH gene as shown in decapod crustaceans. This study is also helpful for a better understanding of the detrimental effects of environmental changes on the CHH-triggered copepod metabolism. Copyright © 2013 Elsevier Inc. All rights reserved.
Huang, Wen-Chien; Tsai, Hsin-Chi; Tao, Chi-Wei; Chen, Jung-Sheng; Shih, Yi-Jia; Kao, Po-Min; Huang, Tung-Yi; Hsu, Bing-Mu
2017-01-01
In this study, we describe a nested PCR-DGGE strategy to detect Legionella communities from river water samples. The nearly full-length 16S rRNA gene was amplified using bacterial primer in the first step. After, the amplicons were employed as DNA templates in the second PCR using Legionella specific primer. The third round of gene amplification was conducted to gain PCR fragments apposite for DGGE analysis. Then the total numbers of amplified genes were observed in DGGE bands of products gained with primers specific for the diversity of Legionella species. The DGGE patterns are thus potential for a high-throughput preliminary determination of aquatic environmental Legionella species before sequencing. Comparative DNA sequence analysis of excised DGGE unique band patterns showed the identity of the Legionella community members, including a reference profile with two pathogenic species of Legionella strains. In addition, only members of Legionella pneumophila and uncultured Legionella sp. were detected. Development of three step nested PCR-DGGE tactic is seen as a useful method for studying the diversity of Legionella community. The method is rapid and provided sequence information for phylogenetic analysis.
Approach to determine the diversity of Legionella species by nested PCR-DGGE in aquatic environments
Huang, Wen-Chien; Tsai, Hsin-Chi; Tao, Chi-Wei; Chen, Jung-Sheng; Shih, Yi-Jia; Kao, Po-Min; Huang, Tung-Yi; Hsu, Bing-Mu
2017-01-01
In this study, we describe a nested PCR-DGGE strategy to detect Legionella communities from river water samples. The nearly full-length 16S rRNA gene was amplified using bacterial primer in the first step. After, the amplicons were employed as DNA templates in the second PCR using Legionella specific primer. The third round of gene amplification was conducted to gain PCR fragments apposite for DGGE analysis. Then the total numbers of amplified genes were observed in DGGE bands of products gained with primers specific for the diversity of Legionella species. The DGGE patterns are thus potential for a high-throughput preliminary determination of aquatic environmental Legionella species before sequencing. Comparative DNA sequence analysis of excised DGGE unique band patterns showed the identity of the Legionella community members, including a reference profile with two pathogenic species of Legionella strains. In addition, only members of Legionella pneumophila and uncultured Legionella sp. were detected. Development of three step nested PCR-DGGE tactic is seen as a useful method for studying the diversity of Legionella community. The method is rapid and provided sequence information for phylogenetic analysis. PMID:28166249
Nance, Michael E; Duan, Dongsheng
2015-12-01
Duchenne muscular dystrophy (DMD) is a X-linked, progressive childhood myopathy caused by mutations in the dystrophin gene, one of the largest genes in the genome. It is characterized by skeletal and cardiac muscle degeneration and dysfunction leading to cardiac and/or respiratory failure. Adeno-associated virus (AAV) is a highly promising gene therapy vector. AAV gene therapy has resulted in unprecedented clinical success for treating several inherited diseases. However, AAV gene therapy for DMD remains a significant challenge. Hurdles for AAV-mediated DMD gene therapy include the difficulty to package the full-length dystrophin coding sequence in an AAV vector, the necessity for whole-body gene delivery, the immune response to dystrophin and AAV capsid, and the species-specific barriers to translate from animal models to human patients. Capsid engineering aims at improving viral vector properties by rational design and/or forced evolution. In this review, we discuss how to use the state-of-the-art AAV capsid engineering technologies to overcome hurdles in AAV-based DMD gene therapy.
Znrg, a novel gene expressed mainly in the developing notochord of zebrafish.
Zhou, Yaping; Xu, Yan; Li, Jianzhen; Liu, Yao; Zhang, Zhe; Deng, Fengjiao
2010-06-01
The notochord, a defining characteristic of the chordate embryo is a critical midline structure required for axial skeletal formation in vertebrates, and acts as a signaling center throughout embryonic development. We utilized the digital differential display program of the National Center for Biotechnology Information, and identified a contig of expressed sequence tags (no. Dr. 83747) from the zebrafish ovary library in Genbank. Full-length cDNA of the identified gene was cloned by 5'- and 3'- RACE, and the resulting sequence was confirmed by polymerase chain reaction and sequencing. The cDNA clone contains 2,505 base pairs and encodes a novel protein of 707 amino acids that shares no significant homology with any known proteins. This gene was expressed in mature oocytes and at the one-cell stage, and persisted until the 5th day of development, as determined by RT-PCR. Transcripts were detected by whole-mount RNA in situ hybridization from the two-cell stage to 72 h of embryonic development. This gene was uniformly distributed from the cleavage stage up to the blastula stage. During early gastrulation, it was present in the dorsal region, and became restricted to the notochord and pectoral fin at 48 and 72 h of embryonic development. Based on its abundance in the notochord, we hypothesized that the novel gene may play an important role in notochord development in zebrafish; we named this gene, zebrafish notochord-related gene, or znrg.
Zeng, Q-Q; Zhong, G-H; He, K; Sun, D-D; Wan, Q-H
2016-02-01
Classical major histocompatibility complex (MHC) class I allelic polymorphism is essential for competent antigen presentation. To improve the genotyping efforts in the golden pheasant, it is necessary to differentiate more accurately between classical and nonclassical class I molecules. In our study, all MHC class I genes were isolated from one golden pheasant based on two overlapping PCR amplifications. In total, six full-length class I nucleotide sequences (A-F) were identified, and four were novel. Two (A and C) belonged to the IA1 gene, two (B and D) were alleles derived from the IA2 gene through transgene amplification, and two (E and F) comprised a third novel locus, IA3 that was excluded from the core region of the golden pheasant MHC-B. IA1 and IA2 exhibited the broad expression profiles characteristic of classical loci, while IA3 showed no expression in multiple tissues and was therefore defined as a nonclassical gene. Phylogenetic analysis indicated that the three IA genes in the golden pheasant share a much closer evolutionary relationship than the corresponding sequences in other galliform species. This observation was consistent with high sequence similarity among them, which likely arises from the homogenizing effect of recombination. Our careful distinction between the classical and nonclassical MHC class I genes in the golden pheasant lays the foundation for developing locus-specific genotyping and establishing a good molecular marker system of classical MHC I loci. © 2015 John Wiley & Sons Ltd.
Genome Sequences of Streptomyces Phages Amela and Verse
Layton, Sonya R.; Hemenway, Ryan M.; Munyoki, Christine M.; Barnes, Emory B.; Barnett, Sierra E.; Bond, Alec M.; Narvaez, Jessi M.; Sirisakd, Christie D.; Smith, Brandt R.; Swain, Justin; Syed, Orooj; Bowman, Charles A.; Russell, Daniel A.; Bhuiyan, Swapan; Donegan-Quick, Richard; Benjamin, Robert C.
2016-01-01
Amela and Verse are two Streptomyces phages isolated by enrichment on Streptomyces venezuelae (ATCC 10712) from two different soil samples. Amela has a genome length of 49,452, with 75 genes. Verse has a genome length of 49,483, with 75 genes. Both belong to the BD3 subcluster of Actinobacteriophage. PMID:26893416
Structural analysis of the RH-like blood group gene products in nonhuman primates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Salvignol, I.; Calvas, P.; Blancher, A.
1995-03-01
Rh-related transcripts present in bone marrow samples from several species of nonhuman primates (chimpanzee, gorilla, gibbon, crab-eating macaque) have been amplified by RT-polymerase chain reaction using primers deduced from the sequence of human RH genes. Nucleotide sequence analysis of the nonhuman transcripts revealed a high degree of similarity to human blood group Rh sequences, suggesting a great conservation of the RH genes throughout evolution. Full-length transcripts, potentially encoding 417 amino acid long proteins homologous to Rh polypeptides, were characterized, as well as mRNA isoforms which harbored nucleotide deletions or insertions and potentially encode truncated proteins. Proteins of 30-40,000 M{sub r},more » immunologically related to human Rh proteins, were detected by western blot analysis with antipeptide antibodies, indicating that Rh-like transcripts are translated into membrane proteins. Comparison of human and nonhuman protein sequences was pivotal in clarifying the molecular basis of the blood group C/c polymorphism, showing that only the Pro103Ser substitution was correlated with C/c polymorphism. In addition, it was shown that a proline residue at position 102 was critical in the expression of C and c epitopes, most likely by providing an appropriate conformation of Rh polypeptides. From these data a phylogenetic reconstruction of the RH locus evolution has been calculated from which an unrooted phylogenetic tree could be proposed, indicating that African ape Rh-like genes would be closer to the human RhD gene than to the human RhCE gene. 55 refs., 4 figs., 1 tab.« less
2010-01-01
Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. Conclusions The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction. PMID:21062474
2012-01-01
Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Wu, Kun; Tan, Xiao-Ying; Xu, Yi-Huan; Chen, Qi-Liang; Pan, Ya-Xiong
2016-01-15
The present study clones and characterizes the full-length cDNA sequences of members in JAK-STAT pathway, explores their mRNA tissue expression and the biological role in leptin influencing lipid metabolism in yellow catfish Pelteobagrus fulvidraco. Full-length cDNA sequences of five JAKs and seven STAT members, including some splicing variants, were obtained from yellow catfish. Compared to mammals, more members of the JAKs and STATs family were found in yellow catfish, which provided evidence that the JAK and STAT family members had arisen by the whole genome duplications during vertebrate evolution. All of these members were widely expressed across the eleven tissues (liver, white muscle, spleen, brain, gill, mesenteric fat, anterior intestine, heart, mid-kidney, testis and ovary) but at the variable levels. Intraperitoneal injection in vivo and incubation in vitro of recombinant human leptin changed triglyceride content and mRNA expression of several JAKs and STATs members, and genes involved in lipid metabolism. AG490, a specific inhibitor of JAK2-STAT pathway, partially reversed leptin-induced effects, indicating that the JAK2a/b-STAT3 pathway exerts main regulating actions of leptin on lipid metabolism at transcriptional level. Meanwhile, the different splicing variants were differentially regulated by leptin incubation. Thus, our data suggest that leptin activated the JAK/STAT pathway and increases the expression of target genes, which partially accounts for the leptin-induced changes in lipid metabolism in yellow catfish. Copyright © 2015 Elsevier Inc. All rights reserved.
Wang, Taotao; Wang, Huiyuan; Cai, Dawei; Gao, Yubang; Zhang, Hangxiao; Wang, Yongsheng; Lin, Chentao; Ma, Liuyin; Gu, Lianfeng
2017-08-01
Moso bamboo (Phyllostachys edulis) represents one of the fastest-spreading plants in the world, due in part to its well-developed rhizome system. However, the post-transcriptional mechanism for the development of the rhizome system in bamboo has not been comprehensively studied. We therefore used a combination of single-molecule long-read sequencing technology and polyadenylation site sequencing (PAS-seq) to re-annotate the bamboo genome, and identify genome-wide alternative splicing (AS) and alternative polyadenylation (APA) in the rhizome system. In total, 145 522 mapped full-length non-chimeric (FLNC) reads were analyzed, resulting in the correction of 2241 mis-annotated genes and the identification of 8091 previously unannotated loci. Notably, more than 42 280 distinct splicing isoforms were derived from 128 667 intron-containing full-length FLNC reads, including a large number of AS events associated with rhizome systems. In addition, we characterized 25 069 polyadenylation sites from 11 450 genes, 6311 of which have APA sites. Further analysis of intronic polyadenylation revealed that LTR/Gypsy and LTR/Copia were two major transposable elements within the intronic polyadenylation region. Furthermore, this study provided a quantitative atlas of poly(A) usage. Several hundred differential poly(A) sites in the rhizome-root system were identified. Taken together, these results suggest that post-transcriptional regulation may potentially have a vital role in the underground rhizome-root system. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K
2016-06-01
The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.
Chalcone synthase genes from milk thistle (Silybum marianum): isolation and expression analysis.
Sanjari, Sepideh; Shobbar, Zahra Sadat; Ebrahimi, Mohsen; Hasanloo, Tahereh; Sadat-Noori, Seyed-Ahmad; Tirnaz, Soodeh
2015-12-01
Silymarin is a flavonoid compound derived from milk thistle (Silybum marianum) seeds which has several pharmacological applications. Chalcone synthase (CHS) is a key enzyme in the biosynthesis of flavonoids; thereby, the identification of CHS encoding genes in milk thistle plant can be of great importance. In the current research, fragments of CHS genes were amplified using degenerate primers based on the conserved parts of Asteraceae CHS genes, and then cloned and sequenced. Analysis of the resultant nucleotide and deduced amino acid sequences led to the identification of two different members of CHS gene family,SmCHS1 and SmCHS2. Third member, full-length cDNA (SmCHS3) was isolated by rapid amplification of cDNA ends (RACE), whose open reading frame contained 1239 bp including exon 1 (190 bp) and exon 2 (1049 bp), encoding 63 and 349 amino acids, respectively. In silico analysis of SmCHS3 sequence contains all the conserved CHS sites and shares high homology with CHS proteins from other plants.Real-time PCR analysis indicated that SmCHS1 and SmCHS3 had the highest transcript level in petals in the early flowering stage and in the stem of five upper leaves, followed by five upper leaves in the mid-flowering stage which are most probably involved in anthocyanin and silymarin biosynthesis.
Okeke, Iruka N.; Borneman, Jade A.; Shin, Sooan; Mellies, Jay L.; Quinn, Laura E.; Kaper, James B.
2001-01-01
Enteropathogenic Escherichia coli (EPEC) strains that carry the EPEC adherence factor (EAF) plasmid were screened for the presence of different EAF sequences, including those of the plasmid-encoded regulator (per). Considerable variation in gene content of EAF plasmids from different strains was seen. However, bfpA, the gene encoding the structural subunit for the bundle-forming pilus, bundlin, and per genes were found in 96.8% of strains. Sequence analysis of the per operon and its promoter region from 15 representative strains revealed that it is highly conserved. Most of the variation occurs in the 5′ two-thirds of the perA gene. In contrast, the C-terminal portion of the predicted PerA protein that contains the DNA-binding helix-turn-helix motif is 100% conserved in all strains that possess a full-length gene. In a minority of strains including the O119:H2 and canine isolates and in a subset of O128:H2 and O142:H6 strains, frameshift mutations in perA leading to premature truncation and consequent inactivation of the gene were identified. Cloned perA, -B, and -C genes from these strains, unlike those from strains with a functional operon, failed to activate the LEE1 operon and bfpA transcriptional fusions or to complement a per mutant in reference strain E2348/69. Furthermore, O119, O128, and canine strains that carry inactive per operons were deficient in virulence protein expression. The context in which the perABC operon occurs on the EAF plasmid varies. The sequence upstream of the per promoter region in EPEC reference strains E2348/69 and B171-8 was present in strains belonging to most serogroups. In a subset of O119:H2, O128:H2, and O142:H6 strains and in the canine isolate, this sequence was replaced by an IS1294-homologous sequence. PMID:11500429
Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes
Salazar-Gonzalez, Jesus F.; Salazar, Maria G.; Tully, Damien C.; Ogilvie, Colin B.; Learn, Gerald H.; Allen, Todd M.; Heath, Sonya L.; Goepfert, Paul; Bar, Katharine J.
2016-01-01
Background Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS) as a simple and convenient alternative to collecting and storing frozen plasma. Methods We performed parallel nucleic acid extraction, single genome amplification (SGA), next generation sequencing (NGS), and phylogenetic analyses on plasma and DBS. Results We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF) HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ∼50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20°C for up to five months. Conclusions These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings. PMID:27819061
Bacterial diversity in the oral cavity of ten healthy individuals
Bik, Elisabeth M.; Long, Clara Davis; Armitage, Gary C.; Loomer, Peter; Emerson, Joanne; Mongodin, Emmanuel F.; Nelson, Karen E.; Gill, Steven R.; Fraser-Liggett, Claire M.; Relman, David A.
2010-01-01
The composition of the oral microbiota from 10 individuals with healthy oral tissues was determined using culture-independent techniques. From each individual, 26 specimens, each from different oral sites at a single point in time, were collected and pooled. An eleventh pool was constructed using portions of the subgingival specimens from all 10 individuals. The 16S rRNA gene was amplified using broad-range bacterial primers, and clone libraries from the individual and subgingival pools were constructed. From a total of 11 368 high-quality, non-chimeric, near full-length sequences, 247 species-level phylotypes (using a 99% sequence identity threshold) and 9 bacteria phyla were identified. At least 15 bacterial genera were conserved among all 10 individuals, with significant interindividual differences at the species and strain level. Comparisons of these oral bacterial sequences to near full-length sequences found previously in the large intestines and feces of other healthy individuals suggest that the mouth and intestinal tract harbor distinct sets of bacteria. Co-occurrence analysis demonstrated significant segregation of taxa when community membership was examined at the level of genus, but not at the level of species, suggesting that ecologically-significant, competitive interactions are more apparent at a broader taxonomic level than species. This study is one of the more comprehensive, high-resolution analyses of bacterial diversity within the healthy human mouth to date, and highlights the value of tools from macroecology for enhancing our understanding of bacterial ecology in human health. PMID:20336157
Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng
2017-01-01
Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar ‘Fusi-3’. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1–6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism. PMID:29145430
Improved Dual-Luciferase Reporter Assays for Nuclear Receptors
Paguio, Aileen; Stecha, Pete; Wood, Keith V; Fan, Frank
2010-01-01
Nuclear receptors play important roles in many cellular functions through control of gene transcription. It is also a large target class for drug discovery. Luciferase reporter assays are frequently used to study nuclear receptor function because of their wide dynamic range, low endogenous activity, and ease of use. Recent improvements of luciferase genes and vectors have further enhanced their utilities. Here we applied these improvements to two reporter formats for studying nuclear receptors. The first assay contains a Murine Mammary Tumor Virus promoter upstream of a destabilized luciferase. The presence of response elements for nuclear hormone receptor in this promoter allows the studies of endogenous and/or exogenous full length receptors. The second assay contains a ligand binding domain (LBD) of a nuclear receptor fused to the GAL4 DNA binding domain (DBD) on one vector and multiple Gal4 Upstream Activator Sequences (UAS) upstream of luciferase reporter on another vector. We showed that codon optimization of luciferase reporter genes increased expression levels in conjunction with the incorporation of protein destabilizing sequences into luciferase led to a larger assay dynamic range in both formats. The optimum number of UAS to generate the best response was determined. The expression vector for nuclear receptor LBD/GAL4 DBD fusion also constitutively expresses a Renilla luciferase-neoR fusion protein, which provides selection capability (G418 resistance, neoR) as well as an internal control (Renilla luciferase). This dual-luciferase format allowed detecting compound cytotoxicity or off-target change in expression during drug screening, therefore improved data quality. These luciferase reporter assays provided better research and drug discovery tools for studying the functions of full length nuclear receptors and ligand binding domains. PMID:21687560
Kim, Ah Ran; Alam, Md Jobaidul; Yoon, Tae-ho; Lee, Soo Rin; Park, Hyun; Kim, Doo-Nam; An, Doo-Hae; Lee, Jae-Bong; Lee, Chung Il
2016-01-01
Adiponectin (AdipoQ) and its receptors (AdipoRs) are strongly related to growth and development of skeletal muscle, as well as glucose and lipid metabolism in vertebrates. Herein we report the identification of the first full-length cDNA encoding an AdipoR homolog (Liv-AdipoR) from the decapod crustacean Litopenaeus vannamei using a combination of next generation sequencing (NGS) technology and bioinformatics analysis. The full-length Liv-AdipoR (1,245 bp) encoded a protein that exhibited the canonical seven transmembrane domains (7TMs) and the inversed topology that characterize members of the progestin and adipoQ receptor (PAQR) family. Based on the obtained sequence information, only a single orthologous AdipoR gene appears to exist in arthropods, whereas two paralogs, AdipoR1 and AdipoR2, have evolved in vertebrates. Transcriptional analysis suggested that the single Liv-AdipoR gene appears to serve the functions of two mammalian AdipoRs. At 72 h after injection of 50 pmol Liv-AdipoR dsRNA (340 bp) into L. vannamei thoracic muscle and deep abdominal muscle, transcription levels of Liv-AdipoR decreased by 93% and 97%, respectively. This confirmed optimal conditions for RNAi of Liv-AdipoR. Knockdown of Liv-AdipoR resulted in significant changes in the plasma levels of ammonia, 3-methylhistine, and ornithine, but not plasma glucose, suggesting that that Liv-AdipoR is important for maintaining muscle fibers. The chronic effect of Liv-AdipoR dsRNA injection was increased mortality. Transcriptomic analysis showed that 804 contigs were upregulated and 212 contigs were downregulated by the knockdown of Liv-AdipoR in deep abdominal muscle. The significantly upregulated genes were categorized as four main functional groups: RNA-editing and transcriptional regulators, molecular chaperones, metabolic regulators, and channel proteins. PMID:27478708
USDA-ARS?s Scientific Manuscript database
This paper presents the first study describing the isolation, cloning and characterization of a full length gene encoding Bowman-Birk protease inhibitor (RbTI) from rice bean (Vigna umbellata). A full-length protease inhibitor gene with complete open reading frame of 327bp encoding 109 amino acids w...
Vaughan, Sue; Wickstead, Bill; Gull, Keith; Addinall, Stephen G
2004-01-01
The FtsZ protein is a polymer-forming GTPase which drives bacterial cell division and is structurally and functionally related to eukaryotic tubulins. We have searched for FtsZ-related sequences in all freely accessible databases, then used strict criteria based on the tertiary structure of FtsZ and its well-characterized in vitro and in vivo properties to determine which sequences represent genuine homologues of FtsZ. We have identified 225 full-length FtsZ homologues, which we have used to document, phylum by phylum, the primary sequence characteristics of FtsZ homologues from the Bacteria, Archaea, and Eukaryota. We provide evidence for at least five independent ftsZ gene-duplication events in the bacterial kingdom and suggest the existence of three ancestoral euryarchaeal FtsZ paralogues. In addition, we identify "FtsZ-like" sequences from Bacteria and Archaea that, while showing significant sequence similarity to FtsZs, are unlikely to bind and hydrolyze GTP.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martinsson, T.; Vujic, M.; Tomkinson, B.
1993-08-01
The authors have assigned the human tripeptidyl peptidase II (TPP2) gene to chromosome region 13q32-q33 using two different methods. First, a full-length TPP2 cDNA was used as a probe on Southern blots of DNA from a panel of human/rodent somatic cell hybrids. The TPP2 sequences were found to segregate with the human chromosome 13. Second, fluorescence in situ hybridization analysis was performed with the same probe. This analysis supported the chromosome 13 localization and further refined it to region 13q32-q33. 20 refs., 2 figs.
Baum, Elisabeth; Hue, Fong; Barbour, Alan G
2012-12-04
The rodent Peromyscus leucopus is a major natural reservoir for the Lyme disease agent Borrelia burgdorferi and a host for its vector Ixodes scapularis. At various locations in northeastern United States 10 to 15 B. burgdorferi strains coexist at different prevalences in tick populations. We asked whether representative strains of high or low prevalence differed in their infections of P. leucopus. After 5 weeks of experimental infection of groups with each of 6 isolates, distributions and burdens of bacteria in tissues were measured by quantitative PCR, and antibodies to B. burgdorferi were evaluated by immunoblotting and protein microarray. All groups of animals were infected in their joints, ears, tails, and hearts, but overall spirochete burdens were lower in animals infected with low-prevalence strains. Animals were similar regardless of the infecting isolate in their levels of antibodies to whole cells, FlaB, BmpA, and DbpB proteins, and the conserved N-terminal region of the serotype-defining OspC proteins. But there were strain-specific antibody responses to full-length OspC and to plasmid-encoded VlsE, BBK07, and BBK12 proteins. Sequencing of additional VlsE genes revealed substantial diversity within some pairs of strains but near-identical sequences within other pairs, which otherwise differed in their ospC alleles. The presence or absence of full-length bbk07 and bbk12 genes accounted for the differences in antibody responses. We propose that for B. burgdorferi, there is selection in reservoir species for (i) sequence diversity, as for OspC and VlsE, and (ii) the presence or absence of polymorphisms, as for BBK07 and BBK12. Humans are dead-end hosts for Borrelia agents of Lyme disease (LD), and, thus, irrelevant for the pathogens' maintenance. Many reports of human cases and laboratory mouse infections exist, but less is known about infection and immunity in natural reservoirs, such as the rodent Peromyscus leucopus. We observed that high- and low-prevalence strains of Borrelia burgdorferi were capable of infecting P. leucopus but elicited different patterns of antibody responses. Antibody reactivities to the VlsE protein were as type-specific as previously characterized reactivities to serotype-defining OspC proteins. In addition, the low-prevalence strains lacked full-length genes for two proteins that (i) are encoded by a virulence-associated plasmid in some high-prevalence strains and (ii) LD patients and field-captured rodents commonly have antibodies to. Immune selection against these genes may have led to null phenotype lineages that can infect otherwise immune hosts but at the cost of reduced fitness and lower prevalence.
Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J
2005-01-01
Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Zhu, Yu-Cheng; Specht, Charles A; Dittmer, Neal T; Muthukrishnan, Subbaratnam; Kanost, Michael R; Kramer, Karl J
2002-11-01
Glycosyltransferases are enzymes that synthesize oligosaccharides, polysaccharides and glycoconjugates. One type of glycosyltransferase is chitin synthase, a very important enzyme in biology, which is utilized by insects, fungi, and other invertebrates to produce chitin, a polysaccharide of beta-1,4-linked N-acetylglucosamine. Chitin is an important component of the insect's exoskeletal cuticle and gut lining. To identify and characterize a chitin synthase gene of the tobacco hornworm, Manduca sexta, degenerate primers were designed from two highly conserved regions in fungal and nematode chitin synthase protein sequences and then used to amplify a similar region from Manduca cDNA. A full-length cDNA of 5152 nucleotides was assembled for the putative Manduca chitin synthase gene, MsCHS1, and sequencing of genomic DNA verified the contiguity of the sequence. The MsCHS1 cDNA has an ORF of 4692 nucleotides that encodes a transmembrane protein of 1564 amino acid residues with a mass of approximately 179 kDa (GenBank no. AY062175). It is most similar, over its entire length of protein sequence, to putative chitin synthases from other insects and nematodes, with 68% identity to enzymes from both the blow fly, Lucilia cuprina, and the fruit fly, Drosophila melanogaster. The similarity with fungal chitin synthases is restricted to the putative catalytic domain, and the MsCHS1 protein has, at equivalent positions, several amino acids that are essential for activity as revealed by mutagenesis of the fungal enzymes. A 5.3-kb transcript of MsCHS1 was identified by northern blot hybridization of RNA from larval epidermis, suggesting that the enzyme functions to make chitin deposited in the cuticle. Further examination by RT-PCR showed that MsCHS1 expression is regulated in the epidermis, with the amount of transcript increasing during phases of cuticle deposition.
Genome-wide identification and expression profiling of the SnRK2 gene family in Malus prunifolia.
Shao, Yun; Qin, Yuan; Zou, Yangjun; Ma, Fengwang
2014-11-15
Sucrose non-fermenting-1-related protein kinase 2 (SnRK2) constitutes a small plant-specific serine/threonine kinase family with essential roles in the abscisic acid (ABA) signal pathway and in responses to osmotic stress. Although a genome-wide analysis of this family has been conducted in some species, little is known about SnRK2 genes in apple (Malus domestica). We identified 14 putative sequences encoding 12 deduced SnRK2 proteins within the apple genome. Gene chromosomal location and synteny analysis of the apple SnRK2 genes indicated that tandem and segmental duplications have likely contributed to the expansion and evolution of these genes. All 12 full-length coding sequences were confirmed by cloning from Malus prunifolia. The gene structure and motif compositions of the apple SnRK2 genes were analyzed. Phylogenetic analysis showed that MpSnRK2s could be classified into four groups. Profiling of these genes presented differential patterns of expression in various tissues. Under stress conditions, transcript levels for some family members were up-regulated in the leaves in response to drought, salinity, or ABA treatments. This suggested their possible roles in plant response to abiotic stress. Our findings provide essential information about SnRK2 genes in apple and will contribute to further functional dissection of this gene family. Copyright © 2014 Elsevier B.V. All rights reserved.
Ferrás, Cristina; Oude Vrielink, Joachim AF; Verspuy, Johan WA; te Riele, Hein; Tsaalbi-Shtylik, Anastasia; de Wind, Niels
2009-01-01
A substantial fraction of sporadic and inherited colorectal and endometrial cancers in humans is deficient in DNA mismatch repair (MMR). These cancers are characterized by length alterations in ubiquitous simple sequence repeats, a phenotype called microsatellite instability. Here we have exploited this phenotype by developing a novel approach for the highly selective gene therapy of MMR-deficient tumors. To achieve this selectivity, we mutated the VP22FCU1 suicide gene by inserting an out-of-frame microsatellite within its coding region. We show that in a significant fraction of microsatellite-instable (MSI) cells carrying the mutated suicide gene, full-length protein becomes expressed within a few cell doublings, presumably resulting from a reverting frameshift within the inserted microsatellite. Treatment of these cells with the innocuous prodrug 5-fluorocytosine (5-FC) induces strong cytotoxicity and we demonstrate that this owes to multiple bystander effects conferred by the suicide gene/prodrug combination. In a mouse model, MMR-deficient tumors that contained the out-of-frame VP22FCU1 gene displayed strong remission after treatment with 5-FC, without any obvious adverse systemic effects to the mouse. By virtue of its high selectivity and potency, this conditional enzyme/prodrug combination may hold promise for the treatment or prevention of MMR-deficient cancer in humans. PMID:19471249
Expressed gene sequence of the IFN-gamma-response chemokine CXCL9 of cattle, horses, and swine
USDA-ARS?s Scientific Manuscript database
This report describes the cloning and characterization of expressed gene sequences of bovine, equine, and swine CXCL9 from RNA obtained from peripheral blood mononuclear cell (PBMC) or other tissues. The bovine coding region was 378 nucleotides in length, while the equine and swine coding regions w...
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.
Herold, Keith E; Rasooly, Avraham
2003-12-01
Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat
2013-07-01
Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
Zhang, Songyan; Gao, Jiuxiang; Lu, Yiling; Cai, Shasha; Qiao, Xue; Wang, Yipeng; Yu, Haining
2013-08-01
Antifreeze proteins (AFPs) refer to a class of polypeptides that are produced by certain vertebrates, plants, fungi, and bacteria and which permit their survival in subzero environments. In this study, we report the molecular cloning, sequence analysis and three-dimensional structure of the axolotl antifreeze-like protein (AFLP) by homology modeling of the first caudate amphibian AFLP. We constructed a full-length spleen cDNA library of axolotl (Ambystoma mexicanum). An EST having highest similarity (∼42%) with freeze-responsive liver protein Li16 from Rana sylvatica was identified, and the full-length cDNA was subsequently obtained by RACE-PCR. The axolotl antifreeze-like protein sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 93 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein were 10128.6 Da and 8.97, respectively. The molecular characterization of this gene and its deduced protein were further performed by detailed bioinformatics analysis. The three-dimensional structure of current AFLP was predicted by homology modeling, and the conserved residues required for functionality were identified. The homology model constructed could be of use for effective drug design. This is the first report of an antifreeze-like protein identified from a caudate amphibian.
Zheng, Weiwei; Peng, Tao; He, Wei; Zhang, Hongyu
2012-01-01
Background Tephritid fruit flies in the genus Bactrocera are of major economic significance in agriculture causing considerable loss to the fruit and vegetable industry. Currently, there is no ideal control program. Molecular means is an effective method for pest control at present, but genomic or transcriptomic data for members of this genus remains limited. To facilitate molecular research into reproduction and development mechanisms, and finally effective control on these pests, an extensive transcriptome for the oriental fruit fly Bactrocera dorsalis was produced using the Roche 454-FLX platform. Results We obtained over 350 million bases of cDNA derived from the whole body of B. dorsalis at different developmental stages. In a single run, 747,206 sequencing reads with a mean read length of 382 bp were obtained. These reads were assembled into 28,782 contigs and 169,966 singletons. The mean contig size was 750 bp and many nearly full-length transcripts were assembled. Additionally, we identified a great number of genes that are involved in reproduction and development as well as genes that represent nearly all major conserved metazoan signal transduction pathways, such as insulin signal transduction. Furthermore, transcriptome changes during development were analyzed. A total of 2,977 differentially expressed genes (DEGs) were detected between larvae and pupae libraries, while there were 1,621 DEGs between adults and larvae, and 2,002 between adults and pupae. These DEGs were functionally annotated with KEGG pathway annotation and 9 genes were validated by qRT-PCR. Conclusion Our data represent the extensive sequence resources available for B. dorsalis and provide for the first time access to the genetic architecture of reproduction and development as well as major signal transduction pathways in the Tephritid fruit fly pests, allowing us to elucidate the molecular mechanisms underlying courtship, ovipositing, development and detailed analyses of the signal transduction pathways. PMID:22570719
Mubiru, James N; Yang, Alice S; Olsen, Christian; Nayak, Sudhir; Livi, Carolina B; Dick, Edward J; Owston, Michael; Garcia-Forey, Magdalena; Shade, Robert E; Rogers, Jeffrey
2014-01-01
The function of prostate-specific antigen (PSA) is to liquefy the semen coagulum so that the released sperm can fuse with the ovum. Fifteen spliced variants of the PSA gene have been reported in humans, but little is known about alternative splicing in nonhuman primates. Positive selection has been reported in sex- and reproductive-related genes from sea urchins to Drosophila to humans; however, there are few studies of adaptive evolution of the PSA gene. Here, using polymerase chain reaction (PCR) product cloning and sequencing, we study PSA transcript variant heterogeneity in the prostates of chimpanzees (Pan troglodytes), cynomolgus monkeys (Macaca fascicularis), baboons (Papio hamadryas anubis), and African green monkeys (Chlorocebus aethiops). Six PSA variants were identified in the chimpanzee prostate, but only two variants were found in cynomolgus monkeys, baboons, and African green monkeys. In the chimpanzee the full-length transcript is expressed at the same magnitude as the transcripts that retain intron 3. We have found previously unidentified splice variants of the PSA gene, some of which might be linked to disease conditions. Selection on the PSA gene was studied in 11 primate species by computational methods using the sequences reported here for African green monkey, cynomolgus monkey, baboon, and chimpanzee and other sequences available in public databases. A codon-based analysis (dN/dS) of the PSA gene identified potential adaptive evolution at five residue sites (Arg45, Lys70, Gln144, Pro189, and Thr203).
Isolation of expressed sequences from the region commonly deleted in Velo-cardio-facial syndrome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sirotkin, H.; Morrow, B.; DasGupta, R.
Velo-cardio-facial syndrome (VCFS) is a relatively common autosomal dominant genetic disorder characterized by cleft palate, cardiac abnormalities, learning disabilities and a characteristic facial dysmorphology. Most VCFS patients have interstitial deletions of 22q11 of 1-2 mb. In an effort to isolate the gene(s) responsible for VCFS we have utilized a hybrid selection protocol to recover expressed sequences from three non-overlapping YACs comprising almost 1 mb of the commonly deleted region. Total yeast genomic DNA or isolated YAC DNA was immobilized on Hybond-N filters, blocked with yeast and human ribosomal and human repetitive sequences and hybridized with a mixture of random primedmore » short fragment cDNA libraries. Six human short fragment libraries derived from total fetus, fetal brain, adult brain, testes, thymus and spleen have been used for the selections. Short fragment cDNAs retained on the filter were passed through a second round of selection and cloned into lambda gt10. cDNAs shown to originate from the YACs and from chromosome 22 are being used to isolate full length cDNAs. Three genes known to be present on these YACs, catechol-O-methyltransferase, tuple 1 and clathrin heavy chain have been recovered. Additionally, a gene related to the murine p120 gene and a number of novel short cDNAs have been isolated. The role of these genes in VCFS is being investigated.« less
Indexed variation graphs for efficient and accurate resistome profiling.
Rowe, Will P M; Winn, Martyn D
2018-05-14
Antimicrobial resistance remains a major threat to global health. Profiling the collective antimicrobial resistance genes within a metagenome (the "resistome") facilitates greater understanding of antimicrobial resistance gene diversity and dynamics. In turn, this can allow for gene surveillance, individualised treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Our method combines a variation graph representation of gene sets with an LSH Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, GROOT, and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 minutes using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). will.rowe@stfc.ac.uk. Supplementary data are available at Bioinformatics online.
Signal recognition particle RNA in dinoflagellates and the Perkinsid Perkinsus marinus.
Zhang, Huan; Campbell, David A; Sturm, Nancy R; Rosenblad, Magnus A; Dungan, Christopher F; Lin, Senjie
2013-09-01
In dinoflagellates and perkinsids, the molecular structure of the protein translocating machinery is unclear. Here, we identified several types of full-length signal recognition particle (SRP) RNA genes from Karenia brevis (dinoflagellate) and Perkinsus marinus (perkinsid). We also identified the four SRP S-domain proteins, but not the two Alu domain proteins, from P. marinus and several dinoflagellates. We mapped both ends of SRP RNA transcripts from K. brevis and P. marinus, and obtained the 3' end from four other dinoflagellates. The lengths of SRP RNA are predicted to be ∼260-300 nt in dinoflagellates and 280-285 nt in P. marinus. Although these SRP RNA sequences are substantially variable, the predicted structures are similar. The genomic organization of the SRP RNA gene differs among species. In K. brevis, this gene is located downstream of the spliced leader (SL) RNA, either as SL RNA-SRP RNA-tRNA gene tandem repeats, or within a SL RNA-SRP RNA-tRNA-U6-5S rRNA gene cluster. In other dinoflagellates, SRP RNA does not cluster with SL RNA or 5S rRNA genes. The majority of P. marinus SRP RNA genes array as tandem repeats without the above-mentioned small RNA genes. Our results capture a snapshot of a potentially complex evolutionary history of SRP RNA in alveolates. Copyright © 2013 Elsevier GmbH. All rights reserved.
Appiano, Michela; Pavan, Stefano; Catalano, Domenico; Zheng, Zheng; Bracuto, Valentina; Lotti, Concetta; Visser, Richard G F; Ricciardi, Luigi; Bai, Yuling
2015-10-01
Specific homologs of the plant Mildew Locus O (MLO) gene family act as susceptibility factors towards the powdery mildew (PM) fungal disease, causing significant economic losses in agricultural settings. Thus, in order to obtain PM resistant phenotypes, a general breeding strategy has been proposed, based on the selective inactivation of MLO susceptibility genes across cultivated species. In this study, PCR-based methodologies were used in order to isolate MLO genes from cultivated solanaceous crops that are hosts for PM fungi, namely eggplant, potato and tobacco, which were named SmMLO1, StMLO1 and NtMLO1, respectively. Based on phylogenetic analysis and sequence alignment, these genes were predicted to be orthologs of tomato SlMLO1 and pepper CaMLO2, previously shown to be required for PM pathogenesis. Full-length sequence of the tobacco homolog NtMLO1 was used for a heterologous transgenic complementation assay, resulting in its characterization as a PM susceptibility gene. The same assay showed that a single nucleotide change in a mutated NtMLO1 allele leads to complete gene loss-of-function. Results here presented, also including a complete overview of the tobacco and potato MLO gene families, are valuable to study MLO gene evolution in Solanaceae and for molecular breeding approaches aimed at introducing PM resistance using strategies of reverse genetics.
Semler, Matthew R; Wiseman, Roger W; Karl, Julie A; Graham, Michael E; Gieger, Samantha M; O'Connor, David H
2018-06-01
Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.
The evolution of subtype B HIV-1 tat in the Netherlands during 1985-2012.
van der Kuyl, Antoinette C; Vink, Monique; Zorgdrager, Fokla; Bakker, Margreet; Wymant, Chris; Hall, Matthew; Gall, Astrid; Blanquart, François; Berkhout, Ben; Fraser, Christophe; Cornelissen, Marion
2018-05-02
For the production of viral genomic RNA, HIV-1 is dependent on an early viral protein, Tat, which is required for high-level transcription. The quantity of viral RNA detectable in blood of HIV-1 infected individuals varies dramatically, and a factor involved could be the efficiency of Tat protein variants to stimulate RNA transcription. HIV-1 virulence, measured by set-point viral load, has been observed to increase over time in the Netherlands and elsewhere. Investigation of tat gene evolution in clinical isolates could discover a role of Tat in this changing virulence. A dataset of 291 Dutch HIV-1 subtype B tat genes, derived from full-length HIV-1 genome sequences from samples obtained between 1985-2012, was used to analyse the evolution of Tat. Twenty-two patient-derived tat genes, and the control Tat HXB2 were analysed for their capacity to stimulate expression of an LTR-luciferase reporter gene construct in diverse cell lines, as well as for their ability to complement a tat-defective HIV-1 LAI clone. Analysis of 291 historical tat sequences from the Netherlands showed ample amino acid (aa) variation between isolates, although no specific mutations were selected for over time. Of note, however, the encoded protein varied its length over the years through the loss or gain of stop codons in the second exon. In transmission clusters, a selection against the shorter Tat86 ORF was apparent in favour of the more common Tat101 version, likely due to negative selection against Tat86 itself, although random drift, transmission bottlenecks, or linkage to other variants could also explain the observation. There was no correlation between Tat length and set-point viral load; however, the number of non-intermediate variants in our study was small. In addition, variation in the length of Tat did not significantly change its capacity to stimulate transcription. From 1985 till 2012, variation in the length of the HIV-1 subtype B tat gene is increasingly found in the Dutch epidemic. However, as Tat proteins did not differ significantly in their capacity to stimulate transcription elongation in vitro, the increased HIV-1 virulence seen in recent years could not be linked to an evolving viral Tat protein. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
2013-01-01
Background Birds have a ZZ male: ZW female sex chromosome system and while the Z-linked DMRT1 gene is necessary for testis development, the exact mechanism of sex determination in birds remains unsolved. This is partly due to the poor annotation of the W chromosome, which is speculated to carry a female determinant. Few genes have been mapped to the W and little is known of their expression. Results We used RNA-seq to produce a comprehensive profile of gene expression in chicken blastoderms and embryonic gonads prior to sexual differentiation. We found robust sexually dimorphic gene expression in both tissues pre-dating gonadogenesis, including sex-linked and autosomal genes. This supports the hypothesis that sexual differentiation at the molecular level is at least partly cell autonomous in birds. Different sets of genes were sexually dimorphic in the two tissues, indicating that molecular sexual differentiation is tissue specific. Further analyses allowed the assembly of full-length transcripts for 26 W chromosome genes, providing a view of the W transcriptome in embryonic tissues. This is the first extensive analysis of W-linked genes and their expression profiles in early avian embryos. Conclusion Sexual differentiation at the molecular level is established in chicken early in embryogenesis, before gonadal sex differentiation. We find that the W chromosome is more transcriptionally active than previously thought, expand the number of known genes to 26 and present complete coding sequences for these W genes. This includes two novel W-linked sequences and three small RNAs reassigned to the W from the Un_Random chromosome. PMID:23531366
Gene Discovery through Transcriptome Sequencing for the Invasive Mussel Limnoperna fortunei
Uliano-Silva, Marcela; Americo, Juliana Alves; Brindeiro, Rodrigo; Dondero, Francesco; Prosdocimi, Francisco; de Freitas Rebelo, Mauro
2014-01-01
The success of the Asian bivalve Limnoperna fortunei as an invader in South America is related to its high acclimation capability. It can inhabit waters with a wide range of temperatures and salinity and handle long-term periods of air exposure. We describe the transcriptome of L. fortunei aiming to give a first insight into the phenotypic plasticity that allows non-native taxa to become established and widespread. We sequenced 95,219 reads from five main tissues of the mussel L. fortunei using Roche’s 454 and assembled them to form a set of 84,063 unigenes (contigs and singletons) representing partial or complete gene sequences. We annotated 24,816 unigenes using a BLAST sequence similarity search against a NCBI nr database. Unigenes were divided into 20 eggNOG functional categories and 292 KEGG metabolic pathways. From the total unigenes, 1,351 represented putative full-length genes of which 73.2% were functionally annotated. We described the first partial and complete gene sequences in order to start understanding bivalve invasiveness. An expansion of the hsp70 gene family, seen also in other bivalves, is present in L. fortunei and could be involved in its adaptation to extreme environments, e.g. during intertidal periods. The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high. Finally, the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach. PMID:25047650
Barnes, D W
2012-04-01
Two of the most commonly used elasmobranch experimental model species are the spiny dogfish Squalus acanthias and the little skate Leucoraja erinacea. Comparative biology and genomics with these species have provided useful information in physiology, pharmacology, toxicology, immunology, evolutionary developmental biology and genetics. A wealth of information has been obtained using in vitro approaches to study isolated cells and tissues from these organisms under circumstances in which the extracellular environment can be controlled. In addition to classical work with primary cell cultures, continuously proliferating cell lines have been derived recently, representing the first cell lines from cartilaginous fishes. These lines have proved to be valuable tools with which to explore functional genomic and biological questions and to test hypotheses at the molecular level. In genomic experiments, complementary (c)DNA libraries have been constructed, and c. 8000 unique transcripts identified, with over 3000 representing previously unknown gene sequences. A sub-set of messenger (m)RNAs has been detected for which the 3' untranslated regions show elements that are remarkably well conserved evolutionarily, representing novel, potentially regulatory gene sequences. The cell culture systems provide physiologically valid tools to study functional roles of these sequences and other aspects of elasmobranch molecular cell biology and physiology. Information derived from the use of in vitro cell cultures is valuable in revealing gene diversity and information for genomic sequence assembly, as well as for identification of new genes and molecular markers, construction of gene-array probes and acquisition of full-length cDNA sequences. © 2012 The Author. Journal of Fish Biology © 2012 The Fisheries Society of the British Isles.
Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.
2015-01-01
Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays
Boerner, Susan; McGinnis, Karen M.
2012-01-01
Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Harrison, Nigel A; Davis, Robert E; Oropeza, Carlos; Helmick, Ericka E; Narváez, María; Eden-Green, Simon; Dollet, Michel; Dickinson, Matthew
2014-06-01
In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise similarity values based on alignment of nearly full-length 16S rRNA gene sequences (1530 bp) revealed that the Mozambique coconut phytoplasma (LYDM) shared 100% identity with a comparable sequence derived from a phytoplasma strain (LDN) responsible for Awka wilt disease of coconut in Nigeria, and shared 99.0-99.6% identity with 16S rRNA gene sequences from strains associated with Cape St Paul wilt (CSPW) disease of coconut in Ghana and Côte d'Ivoire. Similarity scores further determined that the 16S rRNA gene of the LYDM phytoplasma shared <97.5% sequence identity with all previously described members of 'Candidatus Phytoplasma'. The presence of unique regions in the 16S rRNA gene sequence distinguished the LYDM phytoplasma from all currently described members of 'Candidatus Phytoplasma', justifying its recognition as the reference strain of a novel taxon, 'Candidatus Phytoplasma palmicola'. Virtual RFLP profiles of the F2n/R2 portion (1251 bp) of the 16S rRNA gene and pattern similarity coefficients delineated coconut LYDM phytoplasma strains from Mozambique as novel members of established group 16SrXXII, subgroup A (16SrXXII-A). Similarity coefficients of 0.97 were obtained for comparisons between subgroup 16SrXXII-A strains and CSPW phytoplasmas from Ghana and Côte d'Ivoire. On this basis, the CSPW phytoplasma strains were designated members of a novel subgroup, 16SrXXII-B.
Paiva, Anthony M; Sheardy, Richard D
2005-04-20
The formation of unusual structures during DNA replication has been invoked for gene expansion in genomes possessing triplet repeat sequences, CNG, where N = A, C, G, or T. In particular, it has been suggested that the daughter strand of the leading strand partially dissociates from the parent strand and forms a hairpin. The equilibrium between the fully duplexed parent:daugter species and the parent:hairpin species is dependent upon their relative stabilities and the rates of reannealing of the daughter strand back to the parent. These stabilities and rates are ultimately influenced by the sequence context of the DNA and its length. Previous work has demonstrated that longer strands are more stable than shorter strands and that the identity of N also influences the thermal stability [Paiva, A. M.; Sheardy, R. D. Biochemistry 2004, 43, 14218-14227]. Here, we show that the rate of duplex formation from complementary hairpins is also sequence context and length dependent. In particular, longer duplexes have higher activation energies than shorter duplexes of the same sequence context. Further, [(CCG):(GGC)] duplexes have lower activation energies than corresponding [(CAG):(GTC)] duplexes of the same length. Hence, hairpins formed from long CNG sequences are more thermodynamically stable and have slower kinetics for reannealing to their complement than shorter analogues. Gene expansion can now be explained in terms of thermodynamics and kinetics.
Detection of porcine circovirus type 2 in pigs imported from Indonesia.
Manokaran, Gayathri; Lin, Yueh-Nuo; Soh, Moi-Lien; Lim, Elizabeth Ai-Sim; Lim, Chee-Wee; Tan, Boon-Huan
2008-11-25
We have detected the presence of porcine circovirus (PCV) type 2 in Indonesian pigs imported to Singapore for food consumption. A total of three viral isolates were identified, and to genetically characterise them further, their full genomes were sequenced. Each genome showed a typical organization of PCV type 2, with the three isolates sharing similar genome lengths of 1767 nucleotide (nt) at high nt identities of 99.8-100%, further indicating that the viral isolates were quite homogeneous. Sequence analysis further revealed that the ORF2 genes contain the nt sequence CCCCGC (from nt position 262 to 267) that was previously reported to be associated with PCV type 2, group 1C. The phylogenetic tree was constructed for the ORF2 genes, and the PCV type 2 isolates distributed into two distinctive groups. The Indonesian PCV type 2 clustered tightly with one China isolate, accession number AY035820, as a sub-cluster in group 1C. The sequence and phylogenetic analyses both confirmed that the three Indonesian PCV type 2 isolates belong to group 1C, and that the genetic changes for the three Indonesian isolates were very stable, possibly due to the low-scale evolution.
Localization of HTLV-I tax proviral DNA in mononuclear cells.
Zucker-Franklin, Dorothea; Pancake, Bette A; Najfeld, Vesna
2003-01-01
The tax sequence of HTLV-I is demonstrable in the skin and blood mononuclear cells of patients with mycosis fungoides, as well as in the mononuclear leukocytes of some healthy blood donors, but was not demonstrable when PCR/Southern analyses were carried out on preparations of high-molecular-weight genomic DNA. Therefore, it was postulated that tax DNA may not be integrated. To investigate this possibility fluorescence in situ hybridization was carried out on cells arrested in metaphase, using a probe containing the HTLV-I tax proviral DNA full-length open reading frame coding sequence. While metaphases prepared from C91PL cells, a cell line infected with HTLV-I, showed an abundance of chromosome-associated as well as extra-chromosomal signals, metaphases prepared with blood mononuclear cells from healthy tax sequence positive donors did not reveal any tax DNA associated with chromosomes. Such signals were readily detected extra-chromosomally. Although it has been demonstrated that transactivation of genes by gene products encoded by extra-chromosomal DNA may have nosocomial implications, whether transactivation by p40 tax generated from extra-chromosomal tax sequences is responsible for the development of neoplasia remains to be investigated.
Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne
2015-02-10
Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
NASA Astrophysics Data System (ADS)
Zhao, Xiaoqing; Li, Hong; Bao, Tonglaga; Ying, Zhiqiang
2012-09-01
Many experiment evidences showed that sequence structures of introns and intron loss/gain can influence gene expression, but current mechanisms did not refer to the functions of post-spliced introns directly. We propose that postspliced introns play their functions in gene expression by interacting with their mRNA sequences and the interaction is characterized by the matched segments between introns and their CDS. In this study, we investigated the interaction characters with length series by improved Smith-Waterman local alignment software for the ribosomal protein genes in C. elegans and D. melanogaster. Our results showed that RF values of five intron groups are significantly high in the central non-conserved region and very low in 5'-end and 3'-end splicing region. It is interesting that the number of the optimal matched regions gradually increases with intron length. Distributions of the optimal matched regions are different for five intron groups. Our study revealed that there are more interaction regions between longer introns and their CDS than shorter, and it provides a positive pattern for regulating the gene expression.
Kumar, Sunil; Kalra, Shikha; Singh, Baljinder; Kumar, Avneesh; Kaur, Jagdeep; Singh, Kashmir
2016-01-01
Chlorophytum borivilianum is an important species of liliaceae family, owing to its vital medicinal properties. Plant roots are used for aphrodisiac, adaptogen, anti-aging, health-restorative and health-promoting purposes. Saponins, are considered to be the principal bioactive components responsible for the wide variety of pharmacological properties of this plant. In the present study, we have performed de novo root transcriptome sequencing of C. borivilianum using Illumina Hiseq 2000 platform, to gain molecular insight into saponins biosynthesis. A total of 33,963,356 high-quality reads were obtained after quality filtration. Sequences were assembled using various programs which generated 97,344 transcripts with a size range of 100-5,216 bp and N50 value of 342. Data was analyzed against non-redundant proteins, gene ontology (GO), and enzyme commission (EC) databases. All the genes involved in saponins biosynthesis along with five full-length genes namely farnesyl pyrophosphate synthase, cycloartenol synthase, β-amyrin synthase, cytochrome p450, and sterol-3-glucosyltransferase were identified. Read per exon kilobase per million (RPKM)-based comparative expression profiling was done to study the differential regulation of the genes. In silico expression analysis of seven selected genes of saponin biosynthetic pathway was validated by qRT-PCR.
Metataxonomics reveal vultures as a reservoir for Clostridium perfringens.
Meng, Xiangli; Lu, Shan; Yang, Jing; Jin, Dong; Wang, Xiaohong; Bai, Xiangning; Wen, Yumeng; Wang, Yiting; Niu, Lina; Ye, Changyun; Rosselló-Móra, Ramon; Xu, Jianguo
2017-02-22
The Old World vulture may carry and spread pathogens for emerging infections since they feed on the carcasses of dead animals and participate in the sky burials of humans, some of whom have died from communicable diseases. Therefore, we studied the precise fecal microbiome of the Old World vulture with metataxonomics, integrating the high-throughput sequencing of almost full-length small subunit ribosomal RNA (16S rRNA) gene amplicons in tandem with the operational phylogenetic unit (OPU) analysis strategy. Nine vultures of three species were sampled using rectal swabs on the Qinghai-Tibet Plateau, China. Using the Pacific Biosciences sequencing platform, we obtained 54 135 high-quality reads of 16S rRNA amplicons with an average of 1442±6.9 bp in length and 6015±1058 reads per vulture. Those sequences were classified into 314 OPUs, including 102 known species, 50 yet to be described species and 161 unknown new lineages of uncultured representatives. Forty-five species have been reported to be responsible for human outbreaks or infections, and 23 yet to be described species belong to genera that include pathogenic species. Only six species were common to all vultures. Clostridium perfringens was the most abundant and present in all vultures, accounting for 30.8% of the total reads. Therefore, using the new technology, we found that vultures are an important reservoir for C. perfringens as evidenced by the isolation of 107 strains encoding for virulence genes, representing 45 sequence types. Our study suggests that the soil-related C. perfringens and other pathogens could have a reservoir in vultures and other animals.
Metataxonomics reveal vultures as a reservoir for Clostridium perfringens
Meng, Xiangli; Lu, Shan; Yang, Jing; Jin, Dong; Wang, Xiaohong; Bai, Xiangning; Wen, Yumeng; Wang, Yiting; Niu, Lina; Ye, Changyun; Rosselló-Móra, Ramon; Xu, Jianguo
2017-01-01
The Old World vulture may carry and spread pathogens for emerging infections since they feed on the carcasses of dead animals and participate in the sky burials of humans, some of whom have died from communicable diseases. Therefore, we studied the precise fecal microbiome of the Old World vulture with metataxonomics, integrating the high-throughput sequencing of almost full-length small subunit ribosomal RNA (16S rRNA) gene amplicons in tandem with the operational phylogenetic unit (OPU) analysis strategy. Nine vultures of three species were sampled using rectal swabs on the Qinghai-Tibet Plateau, China. Using the Pacific Biosciences sequencing platform, we obtained 54 135 high-quality reads of 16S rRNA amplicons with an average of 1442±6.9 bp in length and 6015±1058 reads per vulture. Those sequences were classified into 314 OPUs, including 102 known species, 50 yet to be described species and 161 unknown new lineages of uncultured representatives. Forty-five species have been reported to be responsible for human outbreaks or infections, and 23 yet to be described species belong to genera that include pathogenic species. Only six species were common to all vultures. Clostridium perfringens was the most abundant and present in all vultures, accounting for 30.8% of the total reads. Therefore, using the new technology, we found that vultures are an important reservoir for C. perfringens as evidenced by the isolation of 107 strains encoding for virulence genes, representing 45 sequence types. Our study suggests that the soil-related C. perfringens and other pathogens could have a reservoir in vultures and other animals. PMID:28223683