Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen
2009-06-01
To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M
2015-05-01
To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2015-01-01
Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf
2015-10-01
Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Ma, Yuyuan; Lv, Maomin; Xu, Shu; Wu, Jianmin; Tian, Kegong; Zhang, Jingang
2010-07-01
Existence of porcine endogenous retrovirus (PERV) hinders pigs to be used in clinical xenotransplantation to alleviate the shortage of human transplants. Chinese miniature pigs are potential organ donors for xenotransplantation in China. However, so far, an adequate level of information on the molecular characteristics of PERV from Chinese miniature pigs has not been available. We described here the cloning and characterization of full-length proviral DNA of PERV from Chinese Wuzhishan miniature pigs inbred (WZSP). Full-length nucleotide sequences of PERV-WZSP and other PERVs were aligned and phylogenetic tree was constructed from deduced amino-acid sequences of env. The results demonstrated that the full-length proviral DNA of PERV-WZSP belongs to gammaretrovirus and shares high similarity with other PERVs. Sequence analysis also suggested that different patterns of LTR existed in the same porcine germ line and partial PERV-C sequence may recombine with PERV-A sequence in LTR. (c) 2008 Elsevier Ltd. All rights reserved.
An improved and validated RNA HLA class I SBT approach for obtaining full length coding sequences.
Gerritsen, K E H; Olieslagers, T I; Groeneweg, M; Voorter, C E M; Tilanus, M G J
2014-11-01
The functional relevance of human leukocyte antigen (HLA) class I allele polymorphism beyond exons 2 and 3 is difficult to address because more than 70% of the HLA class I alleles are defined by exons 2 and 3 sequences only. For routine application on clinical samples we improved and validated the HLA sequence-based typing (SBT) approach based on RNA templates, using either a single locus-specific or two overlapping group-specific polymerase chain reaction (PCR) amplifications, with three forward and three reverse sequencing reactions for full length sequencing. Locus-specific HLA typing with RNA SBT of a reference panel, representing the major antigen groups, showed identical results compared to DNA SBT typing. Alleles encountered with unknown exons in the IMGT/HLA database and three samples, two with Null and one with a Low expressed allele, have been addressed by the group-specific RNA SBT approach to obtain full length coding sequences. This RNA SBT approach has proven its value in our routine full length definition of alleles. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Creager, Hannah M; Becker, Ericka A; Sandman, Kelly K; Karl, Julie A; Lank, Simon M; Bimber, Benjamin N; Wiseman, Roger W; Hughes, Austin L; O'Connor, Shelby L; O'Connor, David H
2011-09-01
In recent years, the use of cynomolgus macaques in biomedical research has increased greatly. However, with the exception of the Mauritian population, knowledge of the MHC class II genetics of the species remains limited. Here, using cDNA cloning and Sanger sequencing, we identified 127 full-length MHC class II alleles in a group of 12 Indonesian and 12 Vietnamese cynomolgus macaques. Forty two of these were completely novel to cynomolgus macaques while 61 extended the sequence of previously identified alleles from partial to full length. This more than doubles the number of full-length cynomolgus macaque MHC class II alleles available in GenBank, significantly expanding the allele library for the species and laying the groundwork for future evolutionary and functional studies.
Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai
2017-06-01
Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
High-Resolution Sequence-Function Mapping of Full-Length Proteins
Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.
2015-01-01
Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
2013-01-01
A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi
2018-02-12
Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
2011-01-01
Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean. PMID:22118559
Oikonomopoulos, Spyros; Wang, Yu Chang; Djambazian, Haig; Badescu, Dunarel; Ragoussis, Jiannis
2016-08-24
To assess the performance of the Oxford Nanopore Technologies MinION sequencing platform, cDNAs from the External RNA Controls Consortium (ERCC) RNA Spike-In mix were sequenced. This mix mimics mammalian mRNA species and consists of 92 polyadenylated transcripts with known concentration. cDNA libraries were generated using a template switching protocol to facilitate the direct comparison between different sequencing platforms. The MinION performance was assessed for its ability to sequence the cDNAs directly with good accuracy in terms of abundance and full length. The abundance of the ERCC cDNA molecules sequenced by MinION agreed with their expected concentration. No length or GC content bias was observed. The majority of cDNAs were sequenced as full length. Additionally, a complex cDNA population derived from a human HEK-293 cell line was sequenced on an Illumina HiSeq 2500, PacBio RS II and ONT MinION platforms. We observed that there was a good agreement in the measured cDNA abundance between PacBio RS II and ONT MinION (rpearson = 0.82, isoforms with length more than 700bp) and between Illumina HiSeq 2500 and ONT MinION (rpearson = 0.75). This indicates that the ONT MinION can sequence quantitatively both long and short full length cDNA molecules.
Ning, ZhongHua; Hincke, Maxwell T.; Yang, Ning; Hou, ZhuoCheng
2014-01-01
Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not ‘finished’. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences. PMID:24676480
Zhang, Quan; Liu, Long; Zhu, Feng; Ning, ZhongHua; Hincke, Maxwell T; Yang, Ning; Hou, ZhuoCheng
2014-01-01
Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not 'finished'. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences.
Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V
2017-07-01
Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing
2010-01-01
Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng
2012-01-01
To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng
2014-01-01
Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon
2011-01-01
Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N
2016-11-01
Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.
2016-01-01
Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou
2016-11-01
It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
Lei, Yong-Liang; Wang, Xiao-Guang; Liu, Fu-Ming; Chen, Xiu-Ying; Ye, Bi-Feng; Mei, Jian-Hua; Lan, Jin-Quan; Tang, Qing
2009-08-01
Based on sequencing the full-length genomes of two Chinese Ferret-Badger, we analyzed the properties of rabies viruses genetic variation in molecular level to get information on prevalence and variation of rabies viruses in Zhejiang, and to enrich the genome database of rabies viruses street strains isolated from Chinese wildlife. Overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses of the N genes from Chinese Ferret-Badger, sika deer, vole, dog. Vaccine strains were then determined. The two full-length genomes were completely sequenced to find out that they had the same genetic structure with 11 923 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions (IGRs), 423 nts-Pseudogene-like sequence (Psi), 70 nts-Trailer. The two full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by blast and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the two full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so that the nucleotide mutations happened in these two genomes were most probably as synonymous mutations. Compared to the referenced rabies viruses, the lengths of the five protein coding regions did not show any changes or recombination, but only with a few-point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the two ferret badgers genomes were similar to the referenced vaccine or street strains. The two strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessing the distinct geographyphic characteristics of China. All the evidence suggested a cue that these two ferret badgers rabies viruses were likely to be street virus that already circulating in wildlife.
Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José
2016-02-01
During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.
Semler, Matthew R; Wiseman, Roger W; Karl, Julie A; Graham, Michael E; Gieger, Samantha M; O'Connor, David H
2018-06-01
Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
USDA-ARS?s Scientific Manuscript database
Sequence comparison between the full-length 2412 bp DNA gyrase subunit B (gyrB) gene of a novobiocin resistant Aeromonas hydrophila AH11NOVO vaccine strain and that of its virulent parent strain AH11P revealed 10 missense mutations. Similarly, sequence comparison between the full-length 4092 bp RNA ...
Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi
2006-01-01
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...
2016-06-24
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads
Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.
2016-01-01
Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
Stevens, Mark; Viganó, Felicita
2007-04-01
The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2016-02-16
The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof
Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius
2015-11-04
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius
2015-09-01
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-09-15
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof
Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof
Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius
2015-08-18
The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Near Full-Length Identification of a Novel HIV-1 CRF01_AE/B/C Recombinant in Northern Myanmar.
Zhou, Yan-Heng; Chen, Xin; Liang, Yue-Bo; Pang, Wei; Qin, Wei-Hong; Zhang, Chiyu; Zheng, Yong-Tang
2015-08-01
The Myanmar-China border appears to be the "hot spot" region for the occurrence of HIV-1 recombination. The majority of the previous analyses of HIV-1 recombination were based on partial genomic sequences, which obviously cannot reflect the reality of the genetic diversity of HIV-1 in this area well. Here, we present a near full-length characterization of a novel HIV-1 CRF01_AE/B/C recombinant isolated from a long-distance truck driver in Northern Myanmar. It is the first description of a near full-length genomic sequence in Myanmar since 2003, and might be one of the most complicated HIV-1 chimeras ever detected in Myanmar, containing four CRF01_AE, six B segments, and five C segments separated by 14 breakpoints throughout its genome. The discovery and characterization of this new CRF01_AE/B/C recombinant indicate that intersubtype recombination is ongoing in Myanmar, continuously generating new forms of HIV-1. More work based on near full-length sequence analyses is urgently needed to better understand the genetic diversity of HIV-1 in these regions.
PCR Amplification Strategies towards full-length HIV-1 Genome sequencing.
Liu, Chao Chun; Ji, Hezhao
2018-06-26
The advent of next generation sequencing has enabled greater resolution of viral diversity and improved feasibility of full viral genome sequencing allowing routine HIV-1 full genome sequencing in both research and diagnostic settings. Regardless of the sequencing platform selected, successful PCR amplification of the HIV-1 genome is essential for sequencing template preparation. As such, full HIV-1 genome amplification is a crucial step in dictating the successful and reliable sequencing downstream. Here we reviewed existing PCR protocols leading to HIV-1 full genome sequencing. In addition to the discussion on basic considerations on relevant PCR design, the advantages as well as the pitfalls of published protocols were reviewed. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Minimap2: pairwise alignment for nucleotide sequences.
Li, Heng
2018-05-10
Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq
Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru
2015-01-01
Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G
2013-03-01
We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.
NASA Astrophysics Data System (ADS)
Kikuchi, Shoshi
2009-02-01
Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.
Laird Smith, Melissa; Murrell, Ben; Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E; Kosakovsky Pond, Sergei L; Smith, Davey M
2016-07-01
The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences' Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data.
NASA Astrophysics Data System (ADS)
Hamid, Nur Athirah Abd; Ismail, Ismanizan
2013-11-01
Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Carbohydrate degrading polypeptide and uses thereof
Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter
2015-10-20
The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
A retrotransposable element from the mosquito Anopheles gambiae .
Besansky, N J
1990-01-01
A family of middle repetitive elements from the African malaria vector Anopheles gambiae is described. Approximately 100 copies of the element, designated T1Ag, are dispersed in the genome. Full-length elements are 4.6 kilobase pairs in length, but truncation of the 5' end is common. Nucleotide sequences of one full-length, two 5'-truncated, and two 5' ends of T1Ag elements were determined and aligned to define a consensus sequence. Sequence analysis revealed two long, overlapping open reading frames followed by a polyadenylation signal, AATAAA, and a tail consisting of tandem repetitions of the motif TGAAA. No direct or inverted long terminal repeats (LTRs) were detected. The first open reading frame, 442 amino acids in length, includes a domain resembling that of nucleic acid-binding proteins. The second open reading frame, 975 amino acids long, resembles the reverse transcriptases of a category of retrotransposable elements without LTRs, variously termed class II retrotransposons, class III elements or non-LTR retrotransposons. Similarity at the sequence and structural levels places T1Ag in this category. Images PMID:1689457
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples
Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E.; Kosakovsky Pond, Sergei L.
2016-01-01
Abstract The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences’ Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data. PMID:29492273
Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao
2005-01-01
We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.
Ahmed, Md Atique; Fauzi, Muh; Han, Eun-Taek
2018-03-14
Human infections due to the monkey malaria parasite Plasmodium knowlesi is on the rise in most Southeast Asian countries specifically Malaysia. The C-terminal 19 kDa domain of PvMSP1P is a potential vaccine candidate, however, no study has been conducted in the orthologous gene of P. knowlesi. This study investigates level of polymorphisms, haplotypes and natural selection of full-length pkmsp1p in clinical samples from Malaysia. A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software. Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes. This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.
Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke
2010-03-30
The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)
Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn
2009-01-01
Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A
2009-01-01
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386
Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin
2015-01-01
Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
Hirayama, Junichi; Tazumi, Akihiro; Hayashi, Kyohei; Tasaki, Erina; Kuribayashi, Takashi; Moore, John E; Millar, Beverley C; Matsuda, Motoo
2011-06-01
In the present study, the reliability of full-length gene sequence information for several genes including 16S rRNA was examined, for the discrimination of the two representative Campylobacter lari taxa, namely urease-negative (UN) C. lari and urease-positive thermophilic Campylobacter (UPTC). As previously described, 16S rRNA gene sequence are not reliable for the molecular discrimination of UN C. lari from UPTC organisms employing both the unweighted pair group method using arithmetic means analysis (UPGMA) and neighbor joining (NJ) methods. In addition, three composite full-length gene sequences (ciaB, flaC and vacJ) out of seven gene loci examined were reliable for discrimination employing dendrograms constructed by the UPGMA method. In addition, all the dendrograms of the NJ phylogenetic trees constructed based on the nine gene information were not reliable for the discrimination. Three composite full-length gene sequences (ciaB, flaC and vacJ) were reliable for the molecular discrimination between UN C. lari and UPTC organisms employing the UPGMA method, as well as among four thermophilic Campylobacter species. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Matsuda, M; Tazumi, A; Kagawa, S; Sekizuka, T; Murayama, O; Moore, JE; Millar, BC
2006-01-01
Background At present, six accessible sequences of 16S rDNA from Taylorella equigenitalis (T. equigenitalis) are available, whose sequence differences occur at a few nucleotide positions. Thus it is important to determine these sequences from additional strains in other countries, if possible, in order to clarify any anomalies regarding 16S rDNA sequence heterogeneity. Here, we clone and sequence the approximate full-length 16S rDNA from additional strains of T. equigenitalis isolated in Japan, Australia and France and compare these sequences to the existing published sequences. Results Clarification of any anomalies regarding 16S rDNA sequence heterogeneity of T. equigenitalis was carried out. When cloning, sequencing and comparison of the approximate full-length 16S rDNA from 17 strains of T. equigenitalis isolated in Japan, Australia and France, nucleotide sequence differences were demonstrated at the six loci in the 1,469 nucleotide sequence. Moreover, 12 polymorphic sites occurred among 23 sequences of the 16S rDNA, including the six reference sequences. Conclusion High sequence similarity (99.5% or more) was observed throughout, except from nucleotide positions 138 to 501 where substitutions and deletions were noted. PMID:16398935
Keniya, Mikhail V; Holmes, Ann R; Niimi, Masakazu; Lamping, Erwin; Gillet, Jean-Pierre; Gottesman, Michael M; Cannon, Richard D
2014-10-06
ABCB5, an ATP-binding cassette (ABC) transporter, is highly expressed in melanoma cells, and may contribute to the extreme resistance of melanomas to chemotherapy by efflux of anti-cancer drugs. Our goal was to determine whether we could functionally express human ABCB5 in the model yeast Saccharomyces cerevisiae, in order to demonstrate an efflux function for ABCB5 in the absence of background pump activity from other human transporters. Heterologous expression would also facilitate drug discovery for this important target. DNAs encoding ABCB5 sequences were cloned into the chromosomal PDR5 locus of a S. cerevisiae strain in which seven endogenous ABC transporters have been deleted. Protein expression in the yeast cells was monitored by immunodetection using both a specific anti-ABCB5 antibody and a cross-reactive anti-ABCB1 antibody. ABCB5 function in recombinant yeast cells was measured by determining whether the cells possessed increased resistance to known pump substrates, compared to the host yeast strain, in assays of yeast growth. Three ABCB5 constructs were made in yeast. One was derived from the ABCB5-β mRNA, which is highly expressed in human tissues but is a truncation of a canonical full-size ABC transporter. Two constructs contained full-length ABCB5 sequences: either a native sequence from cDNA or a synthetic sequence codon-harmonized for S. cerevisiae. Expression of all three constructs in yeast was confirmed by immunodetection. Expression of the codon-harmonized full-length ABCB5 DNA conferred increased resistance, relative to the host yeast strain, to the putative substrates rhodamine 123, daunorubicin, tetramethylrhodamine, FK506, or clorgyline. We conclude that full-length ABCB5 can be functionally expressed in S. cerevisiae and confers drug resistance.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.
Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio
2017-10-06
Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Cheng, Bing; Furtado, Agnelo
2017-01-01
Abstract Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. PMID:29048540
Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang
2005-01-01
In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts.
Hirozane-Kishikawa, Tomoko; Shiraki, Toshiyuki; Waki, Kazunori; Nakamura, Mari; Arakawa, Takahiro; Kawai, Jun; Fagiolini, Michela; Hensch, Takao K; Hayashizaki, Yoshihide; Carninci, Piero
2003-09-01
The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
[cDNA library construction from panicle meristem of finger millet].
Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B
2014-01-01
The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
Anthony Johnson, A M; Borah, B K; Sai Gopal, D V R; Dasgupta, I
2012-12-01
Citrus yellow mosaic badna virus (CMBV), a member of the Family Caulimoviridae, Genus Badnavirus is the causative agent of mosaic disease among Citrus species in southern India. Despite its reported prevalence in several citrus species, complete information on clear functional genomics or functional information of full-length genomes from all the CMBV isolates infecting citrus species are not available in publicly accessible databases. CMBV isolates from Rough Lemon and Sweet Orange collected from a nursery were cloned and sequenced. The analysis revealed high sequence homology of the two CMBV isolates with previously reported CMBV sequences implying that they represent new variants. Based on computational analysis of the predicted secondary structures, the possible functions of some CMBV proteins have been analyzed.
Yu, Haining; Gao, Jiuxiang; Lu, Yiling; Guang, Huijuan; Cai, Shasha; Zhang, Songyan; Wang, Yipeng
2013-11-01
Lysozymes are key proteins that play important roles in innate immune defense in many animal phyla by breaking down the bacterial cell-walls. In this study, we report the molecular cloning, sequence analysis and phylogeny of the first caudate amphibian g-lysozyme: a full-length spleen cDNA library from axolotl (Ambystoma mexicanum). A goose-type (g-lysozyme) EST was identified and the full-length cDNA was obtained using RACE-PCR. The axolotl g-lysozyme sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 184 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein are 21523.0 Da and 4.37, respectively. Expression of g-lysozyme mRNA is predominantly found in skin, with lower levels in spleen, liver, muscle, and lung. Phylogenetic analysis revealed that caudate amphibian g-lysozyme had distinct evolution pattern for being juxtaposed with not only anura amphibian, but also with the fish, bird and mammal. Although the first complete cDNA sequence for caudate amphibian g-lysozyme is reported in the present study, clones encoding axolotl's other functional immune molecules in the full-length cDNA library will have to be further sequenced to gain insight into the fundamental aspects of antibacterial mechanisms in caudate.
Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin
2003-06-01
A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
Designing robust watermark barcodes for multiplex long-read sequencing.
Ezpeleta, Joaquín; Krsticevic, Flavia J; Bulacio, Pilar; Tapia, Elizabeth
2017-03-15
To attain acceptable sample misassignment rates, current approaches to multiplex single-molecule real-time sequencing require upstream quality improvement, which is obtained from multiple passes over the sequenced insert and significantly reduces the effective read length. In order to fully exploit the raw read length on multiplex applications, robust barcodes capable of dealing with the full single-pass error rates are needed. We present a method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing. The manuscript focuses on the design of barcodes for full-length single-pass reads, impaired by challenging error rates in the order of 11%. The proposed barcodes can multiplex hundreds or thousands of samples while achieving sample misassignment probabilities as low as 10-7 under the above conditions, and are designed to be compatible with chemical constraints imposed by the sequencing process. Software tools for constructing watermark barcode sets and demultiplexing barcoded reads, together with example sets of barcodes and synthetic barcoded reads, are freely available at www.cifasis-conicet.gov.ar/ezpeleta/NS-watermark . ezpeleta@cifasis-conicet.gov.ar. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Prakash, Celine; Haeseler, Arndt Von
2017-03-01
RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
Haeseler, Arndt Von
2017-01-01
Abstract RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment. PMID:27661099
USDA-ARS?s Scientific Manuscript database
Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Tateishi-Karimata, Hisae; Isono, Noburu; Sugimoto, Naoki
2014-01-01
The thermal stability and topology of non-canonical structures of G-quadruplexes and hairpins in template DNA were investigated, and the effect of non-canonical structures on transcription fidelity was evaluated quantitatively. We designed ten template DNAs: A linear sequence that does not have significant higher-order structure, three sequences that form hairpin structures, and six sequences that form G-quadruplex structures with different stabilities. Templates with non-canonical structures induced the production of an arrested, a slipped, and a full-length transcript, whereas the linear sequence produced only a full-length transcript. The efficiency of production for run-off transcripts (full-length and slipped transcripts) from templates that formed the non-canonical structures was lower than that from the linear. G-quadruplex structures were more effective inhibitors of full-length product formation than were hairpin structure even when the stability of the G-quadruplex in an aqueous solution was the same as that of the hairpin. We considered that intra-polymerase conditions may differentially affect the stability of non-canonical structures. The values of transcription efficiencies of run-off or arrest transcripts were correlated with stabilities of non-canonical structures in the intra-polymerase condition mimicked by 20 wt% polyethylene glycol (PEG). Transcriptional arrest was induced when the stability of the G-quadruplex structure (-ΔG°37) in the presence of 20 wt% PEG was more than 8.2 kcal mol(-1). Thus, values of stability in the presence of 20 wt% PEG are an important indicator of transcription perturbation. Our results further our understanding of the impact of template structure on the transcription process and may guide logical design of transcription-regulating drugs.
Tateishi-Karimata, Hisae; Isono, Noburu; Sugimoto, Naoki
2014-01-01
The thermal stability and topology of non-canonical structures of G-quadruplexes and hairpins in template DNA were investigated, and the effect of non-canonical structures on transcription fidelity was evaluated quantitatively. We designed ten template DNAs: A linear sequence that does not have significant higher-order structure, three sequences that form hairpin structures, and six sequences that form G-quadruplex structures with different stabilities. Templates with non-canonical structures induced the production of an arrested, a slipped, and a full-length transcript, whereas the linear sequence produced only a full-length transcript. The efficiency of production for run-off transcripts (full-length and slipped transcripts) from templates that formed the non-canonical structures was lower than that from the linear. G-quadruplex structures were more effective inhibitors of full-length product formation than were hairpin structure even when the stability of the G-quadruplex in an aqueous solution was the same as that of the hairpin. We considered that intra-polymerase conditions may differentially affect the stability of non-canonical structures. The values of transcription efficiencies of run-off or arrest transcripts were correlated with stabilities of non-canonical structures in the intra-polymerase condition mimicked by 20 wt% polyethylene glycol (PEG). Transcriptional arrest was induced when the stability of the G-quadruplex structure (−ΔGo 37) in the presence of 20 wt% PEG was more than 8.2 kcal mol−1. Thus, values of stability in the presence of 20 wt% PEG are an important indicator of transcription perturbation. Our results further our understanding of the impact of template structure on the transcription process and may guide logical design of transcription-regulating drugs. PMID:24594642
VKCDB: voltage-gated K+ channel database updated and upgraded.
Gallin, Warren J; Boutet, Patrick A
2011-01-01
The Voltage-gated K(+) Channel DataBase (VKCDB) (http://vkcdb.biology.ualberta.ca) makes a comprehensive set of sequence data readily available for phylogenetic and comparative analysis. The current update contains 2063 entries for full-length or nearly full-length unique channel sequences from Bacteria (477), Archaea (18) and Eukaryotes (1568), an increase from 346 solely eukaryotic entries in the original release. In addition to protein sequences for channels, corresponding nucleotide sequences of the open reading frames corresponding to the amino acid sequences are now available and can be extracted in parallel with sets of protein sequences. Channels are categorized into subfamilies by phylogenetic analysis and by using hidden Markov model analyses. Although the raw database contains a number of fragmentary, duplicated, obsolete and non-channel sequences that were collected in early steps of data collection, the web interface will only return entries that have been validated as likely K(+) channels. The retrieval function of the web interface allows retrieval of entries that contain a substantial fraction of the core structural elements of VKCs, fragmentary entries, or both. The full database can be downloaded as either a MySQL dump or as an XML dump from the web site. We have now implemented automated updates at quarterly intervals.
Vergani, Stefano; Korsunsky, Ilya; Mazzarello, Andrea Nicola; Ferrer, Gerardo; Chiorazzi, Nicholas; Bagnara, Davide
2017-01-01
Efficient and accurate high-throughput DNA sequencing of the adaptive immune receptor repertoire (AIRR) is necessary to study immune diversity in healthy subjects and disease-related conditions. The high complexity and diversity of the AIRR coupled with the limited amount of starting material, which can compromise identification of the full biological diversity makes such sequencing particularly challenging. AIRR sequencing protocols often fail to fully capture the sampled AIRR diversity, especially for samples containing restricted numbers of B lymphocytes. Here, we describe a library preparation method for immunoglobulin sequencing that results in an exhaustive full-length repertoire where virtually every sampled B-cell is sequenced. This maximizes the likelihood of identifying and quantifying the entire IGHV-D-J repertoire of a sample, including the detection of rearrangements present in only one cell in the starting population. The methodology establishes the importance of circumventing genetic material dilution in the preamplification phases and incorporates the use of certain described concepts: (1) balancing the starting material amount and depth of sequencing, (2) avoiding IGHV gene-specific amplification, and (3) using Unique Molecular Identifier. Together, this methodology is highly efficient, in particular for detecting rare rearrangements in the sampled population and when only a limited amount of starting material is available.
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome
Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.
2001-01-01
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.
Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M
2001-10-09
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-07-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-01-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users. PMID:20501601
Species identification of mutans streptococci by groESL gene sequence.
Hung, Wei-Chung; Tsai, Jui-Chang; Hsueh, Po-Ren; Chia, Jean-San; Teng, Lee-Jene
2005-09-01
The near full-length sequences of the groESL genes were determined and analysed among eight reference strains (serotypes a to h) representing five species of mutans group streptococci. The groES sequences from these reference strains revealed that there are two lengths (285 and 288 bp) in the five species. The intergenic spacer between groES and groEL appears to be a unique marker for species, with a variable size (ranging from 111 to 310 bp) and sequence. Phylogenetic analysis of groES and groEL separated the eight serotypes into two major clusters. Strains of serotypes b, c, e and f were highly related and had groES gene sequences of the same length, 288 bp, while strains of serotypes a, d, g and h were also closely related and their groES gene sequence lengths were 285 bp. The groESL sequences in clinical isolates of three serotypes of S. mutans were analysed for intraspecies polymorphism. The results showed that the groESL sequences could provide information for differentiation among species, but were unable to distinguish serotypes of the same species. Based on the determined sequences, a PCR assay was developed that could differentiate members of the mutans streptococci by amplicon size and provide an alternative way for distinguishing mutans streptococci from other viridans streptococci.
Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu
2007-01-01
Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061
HUNT: launch of a full-length cDNA database from the Helix Research Institute.
Yudate, H T; Suwa, M; Irie, R; Matsui, H; Nishikawa, T; Nakamura, Y; Yamaguchi, D; Peng, Z Z; Yamamoto, T; Nagai, K; Hayashi, K; Otsuki, T; Sugiyama, T; Ota, T; Suzuki, Y; Sugano, S; Isogai, T; Masuho, Y
2001-01-01
The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT.
De Franceschi, Paolo; Bianco, Luca; Cestaro, Alessandro; Dondini, Luca; Velasco, Riccardo
2018-06-01
Data obtained from Illumina resequencing of 63 apple cultivars were used to obtain full-length S-RNase sequences using a strategy based on both alignment and de novo assembly of reads. The reproductive biology of apple is regulated by the S-RNase-based gametophytic self-incompatibility system, that is genetically controlled by the single, multi-genic and multi-allelic S locus. Resequencing of apple cultivars provided a huge amount of genetic data, that can be aligned to the reference genome in order to characterize variation to a genome-wide level. However, this approach is not immediately adaptable to the S-locus, due to some peculiar features such as the high degree of polymorphism, lack of colinearity between haplotypes and extensive presence of repetitive elements. In this study we describe a dedicated procedure aimed at characterizing S-RNase alleles from resequenced cultivars. The S-genotype of 63 apple accessions is reported; the full length coding sequence was determined for the 25 S-RNase alleles present in the 63 resequenced cultivars; these included 10 previously incomplete sequences (S 5 , S 6a , S 6b , S 8 , S 11 , S 23 , S 39 , S 46 , S 50 and S 58 ). Moreover, sequence divergence clearly suggests that alleles S 6a and S 6b , proposed to be neutral variants of the same alleles, should be instead considered different specificities. The promoter sequences have also been analyzed, highlighting regions of homology conserved among all the alleles.
2010-01-01
Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar), but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST) resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius) ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate. PMID:20433749
High resolution identity testing of inactivated poliovirus vaccines
Mee, Edward T.; Minor, Philip D.; Martin, Javier
2015-01-01
Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.
Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B
2004-12-15
cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Full-length genomic characterization and molecular evolution of canine parvovirus in China.
Zhou, Ling; Tang, Qinghai; Shi, Lijun; Kong, Miaomiao; Liang, Lin; Mao, Qianqian; Bu, Bin; Yao, Lunguang; Zhao, Kai; Cui, Shangjin; Leal, Élcio
2016-06-01
Canine parvovirus type 2 (CPV-2) can cause acute haemorrhagic enteritis in dogs and myocarditis in puppies. This disease has become one of the most serious infectious diseases of dogs. During 2014 in China, there were many cases of acute infectious diarrhoea in dogs. Some faecal samples were negative for the CPV-2 antigen based on a colloidal gold test strip but were positive based on PCR, and a viral strain was isolated from one such sample. The cytopathic effect on susceptible cells and the results of the immunoperoxidase monolayer assay, PCR, and sequencing indicated that the pathogen was CPV-2. The strain was named CPV-NY-14, and the full-length genome was sequenced and analysed. A maximum likelihood tree was constructed using the full-length genome and all available CPV-2 genomes. New strains have replaced the original strain in Taiwan and Italy, although the CPV-2a strain is still predominant there. However, CPV-2a still causes many cases of acute infectious diarrhoea in dogs in China.
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.
Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory
2017-12-01
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Full-genome sequences of hepatitis B virus subgenotype D3 isolates from the Brazilian Amazon Region.
Spitz, Natália; Mello, Francisco C A; Araujo, Natalia Motta
2015-02-01
The Brazilian Amazon Region is a highly endemic area for hepatitis B virus (HBV). However, little is known regarding the genetic variability of the strains circulating in this geographical region. Here, we describe the first full-length genomes of HBV isolated in the Brazilian Amazon Region; these genomes are also the first complete HBV subgenotype D3 genomes reported for Brazil. The genomes of the five Brazilian isolates were all 3,182 base pairs in length and the isolates were classified as belonging to subgenotype D3, subtypes ayw2 (n = 3) and ayw3 (n = 2). Phylogenetic analysis suggested that the Brazilian sequences are not likely to be closely related to European D3 sequences. Such results will contribute to further epidemiological and evolutionary studies of HBV.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stapleton, Mark; Liao, Guochun; Brokstein, Peter
2002-08-12
Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5prime expressed sequence tags (EST) from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to {approx}40 percent of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remainingmore » genes, we have generated an additional 157,835 5prime ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22hr embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70 percent of the predicted genes in Drosophila.« less
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius
Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.
2010-01-01
Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
The Bulgarian vaccine Crimean-Congo haemorrhagic fever virus strain.
Papa, Anna; Papadimitriou, Evangelia; Christova, Iva
2011-03-01
The Crimean-Congo haemorrhagic fever virus (CCHFV) is a 3-segmented RNA virus, which causes disease with a high fatality rate in humans. An inactivated suckling mouse brain-derived vaccine is used in Bulgaria for protection against CCHF. Strain V42/81 is currently used for the vaccine preparation. As the M-RNA segment plays a major role in the immune response, the full-length M segment sequence of the V42/81 strain was characterized. A great genetic diversity was observed among CCHFV strains. In order to gain an insight into the topology of the strain in the CCHFV phylogenetic trees, the full-length S and partial L segments were additionally sequenced and analyzed.
Donald, L. J.; Chernushevich, I. V.; Zhou, J.; Verentchikov, A.; Poppe-Schriemer, N.; Hosfield, D. J.; Westmore, J. B.; Ens, W.; Duckworth, H. W.; Standing, K. G.
1996-01-01
IclR protein, the repressor of the aceBAK operon of Escherichia coli, has been examined by time-of-flight mass spectrometry, with ionization by matrix assisted laser desorption or by electrospray. The purified protein was found to have a smaller mass than that predicted from the base sequence of the cloned iclR gene. Additional measurements were made on mixtures of peptides derived from IclR by treatment with trypsin and cyanogen bromide. They showed that the amino acid sequence is that predicted from the gene sequence, except that the protein has suffered truncation by removal of the N-terminal eight or, in some cases, nine amino acid residues. The peptide bond whose hydrolysis would remove eight residues is a typical target for the E. coli protease OmpT. We find that, by taking precautions to minimize Omp T proteolysis, or by eliminating it through mutation of the host strain, we can isolate full-length IclR protein (lacking only the N-terminal methionine residue). Full-length IclR is a much better DNA-binding protein than the truncated versions: it binds the aceBAK operator sequence 44-fold more tightly, presumably because of additional contacts that the N-terminal residues make with the DNA. Our experience thus demonstrates the advantages of using mass spectrometry to characterize newly purified proteins produced from cloned genes, especially where proteolysis or other covalent modification is a concern. This technique gives mass spectra from complex peptide mixtures that can be analyzed completely, without any fractionation of the mixtures, by reference to the amino acid sequence inferred from the base sequence of the cloned gene. PMID:8844850
Duquesne, Véronique; Delcont, Aurélie; Huleux, Anthéa; Beven, Véronique; Touzain, Fabrice; Ribière-Chabert, Magali
2017-11-02
We report here the full mitochondrial genome sequence of Aethina tumida , a Nitidulidae species beetle, that is a pest of bee hives. The obtained sequence is 16,576 bp in length and contains 13 protein-coding genes, 2 rRNA genes, and 22 tRNAs. Copyright © 2017 Duquesne et al.
2011-01-01
Transmission from pet rats and cats to humans as well as severe infection in felids and other animal species have recently drawn increasing attention to cowpox virus (CPXV). We report the cloning of the entire genome of cowpox virus strain Brighton Red (BR) as a bacterial artificial chromosome (BAC) in Escherichia coli and the recovery of infectious virus from cloned DNA. Generation of a full-length CPXV DNA clone was achieved by first introducing a mini-F vector, which allows maintenance of large circular DNA in E. coli, into the thymidine kinase locus of CPXV by homologous recombination. Circular replication intermediates were then electroporated into E. coli DH10B cells. Upon successful establishment of the infectious BR clone, we modified the full-length clone such that recombination-mediated excision of bacterial sequences can occur upon transfection in eukaryotic cells. This self-excision of the bacterial replicon is made possible by a sequence duplication within mini-F sequences and allows recovery of recombinant virus progeny without remaining marker or vector sequences. The in vitro growth properties of viruses derived from both BAC clones were determined and found to be virtually indistinguishable from those of parental, wild-type BR. Finally, the complete genomic sequence of the infectious clone was determined and the cloned viral genome was shown to be identical to that of the parental virus. In summary, the generated infectious clone will greatly facilitate studies on individual genes and pathogenesis of CPXV. Moreover, the vector potential of CPXV can now be more systematically explored using this newly generated tool. PMID:21314965
Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui
2015-01-01
Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Dunn-Coleman, Nigel; Goedegebuur, Frits; Ward, Michael; Yiao, Jian
2014-03-18
The present invention provides a novel endoglucanase nucleic acid sequence, designated egl6 (SEQ ID NO:1 encodes the full length endoglucanase; SEQ ID NO:4 encodes the mature form), and the corresponding endoglucanase VI amino acid sequence ("EGVI"; SEQ ID NO:3 is the signal sequence; SEQ ID NO:2 is the mature sequence). The invention also provides expression vectors and host cells comprising a nucleic acid sequence encoding EGVI, recombinant EGVI proteins and methods for producing the same.
Unrelated sequences at the 5' end of mouse LINE-1 repeated elements define two distinct subfamilies.
Wincker, P; Jubier-Maurin, V; Roizès, G
1987-01-01
Some full length members of the mouse long interspersed repeated DNA family L1Md have been shown to be associated at their 5' end with a variable number of tandem repetitions, the A repeats, that have been suggested to be transcription controlling elements. We report that the other type of repeat, named F, found at the 5' end of a few L1 elements is also an integral part of full length L1 copies. Sequencing shows that the F repeats are GC rich, and organized in tandem. The L1 copies associated with either A or F repeats can be correlated with two different subsets of L1 sequences distinguished by a series of variant nucleotides specific to each and by unassociated but frequent restriction sites. These findings suggest that sequence replacement has occurred at least once in 5' of L1Md, and is related to the generation of specific subfamilies. Images PMID:3684566
Shotgun Protein Sequencing with Meta-contig Assembly*
Guthals, Adrian; Clauser, Karl R.; Bandeira, Nuno
2012-01-01
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings. PMID:22798278
Shotgun protein sequencing with meta-contig assembly.
Guthals, Adrian; Clauser, Karl R; Bandeira, Nuno
2012-10-01
Full-length de novo sequencing from tandem mass (MS/MS) spectra of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains a challenging open problem. Conventional algorithms designed to individually sequence each MS/MS spectrum are limited by incomplete peptide fragmentation or low signal to noise ratios and tend to result in short de novo sequences at low sequencing accuracy. Our shotgun protein sequencing (SPS) approach was developed to ameliorate these limitations by first finding groups of unidentified spectra from the same peptides (contigs) and then deriving a consensus de novo sequence for each assembled set of spectra (contig sequences). But whereas SPS enables much more accurate reconstruction of de novo sequences longer than can be recovered from individual MS/MS spectra, it still requires error-tolerant matching to homologous proteins to group smaller contig sequences into full-length protein sequences, thus limiting its effectiveness on sequences from poorly annotated proteins. Using low and high resolution CID and high resolution HCD MS/MS spectra, we address this limitation with a Meta-SPS algorithm designed to overlap and further assemble SPS contigs into Meta-SPS de novo contig sequences extending as long as 100 amino acids at over 97% accuracy without requiring any knowledge of homologous protein sequences. We demonstrate Meta-SPS using distinct MS/MS data sets obtained with separate enzymatic digestions and discuss how the remaining de novo sequencing limitations relate to MS/MS acquisition settings.
Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice; Grasland, Béatrice; Frossard, Jean-Pierre; Dastjerdi, Akbar; Hulst, Marcel; Hanke, Dennis; Pohlmann, Anne; Blome, Sandra; van der Poel, Wim H. M.; Steinbach, Falko; Blanchard, Yannick; Lavazza, Antonio; Bøtner, Anette
2018-01-01
Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence (Accession number AF353511) but they are >99% identical. Unique and shared differences between sequences were identified. The coding region for the surface-exposed spike protein showed the highest proportion of variability including both point mutations and small deletions. The predicted expression of the ORF3 gene product was more dramatically affected in three different variants of this virus through either loss of the initiation codon or gain of a premature termination codon. The genome of one isolate had a substantially rearranged 5´-terminal sequence. This rearrangement was validated through the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies. PMID:29494671
Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik
2011-10-01
The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Structure of the highly repeated, long interspersed DNA family (LINE or L1Rn) of the rat.
D'Ambrosio, E; Waitzkin, S D; Witney, F R; Salemme, A; Furano, A V
1986-01-01
We present the DNA sequence of a 6.7-kilobase member of the rat long interspersed repeated DNA family (LINE or L1Rn). This member (LINE 3) is flanked by a perfect 14-base-pair (bp) direct repeat and is a full-length, or close-to-full-length, member of this family. LINE 3 contains an approximately 100-bp A-rich right end, a number of long (greater than 400-bp) open reading frames, and a ca. 200-bp G + C-rich (ca. 60%) cluster near each terminus. Comparison of the LINE 3 sequence with the sequence of about one-half of another member, which we also present, as well as restriction enzyme analysis of the genomic copies of this family, indicates that in length and overall structure LINE 3 is quite typical of the 40,000 or so other genomic members of this family which would account for as much as 10% of the rat genome. Therefore, the rat LINE family is relatively homogeneous, which contrasts with the heterogeneous LINE families in primates and mice. Transcripts corresponding to the entire LINE sequence are abundant in the nuclear RNA of rat liver. The characteristics of the rat LINE family are discussed with respect to the possible function and evolution of this family of DNA sequences. Images PMID:3023845
Churion, Kelly A; Rogers, Robert E; Bayless, Kayla J; Bondos, Sarah E
2016-12-01
Separation of full-length protein from proteolytic products is challenging, since the properties used to isolate the protein can also be present in proteolytic products. Many separation techniques risk non-specific protein adhesion and/or require a lot of time, enabling continued proteolysis and aggregation after lysis. We demonstrate that proteolytic products aggregate for two different proteins. As a result, full-length protein can be rapidly separated from these fragments by filter flow-through purification, resulting in a substantial protein purity enhancement. This rapid approach is likely to be useful for intrinsically disordered proteins, whose repetitive sequence composition and flexible nature can facilitate aggregation. Copyright © 2016 Elsevier Inc. All rights reserved.
Loss of GATA-1 Full Length as a Cause of Diamond–Blackfan Anemia Phenotype
Parrella, Sara; Aspesi, Anna; Quarello, Paola; Garelli, Emanuela; Pavesi, Elisa; Carando, Adriana; Nardi, Margherita; Ellis, Steven R.; Ramenghi, Ugo; Dianzani, Irma
2015-01-01
Mutations in the hematopoietic transcription factor GATA-1 alter the proliferation/differentiation of hemopoietic progenitors. Mutations in exon 2 interfere with the synthesis of the full-length isoform of GATA-1 and lead to the production of a shortened isoform, GATA-1s. These mutations have been found in patients with Diamond–Blackfan anemia (DBA), a congenital erythroid aplasia typically caused by mutations in genes encoding ribosomal proteins. We sequenced GATA-1 in 23 patients that were negative for mutations in the most frequently mutated DBA genes. One patient showed a c.2T > C mutation in the initiation codon leading to the loss of the full-length GATA-1 isoform. PMID:24453067
Analysis for complete genomic sequence of HLA-B and HLA-C alleles in the Chinese Han population.
Zhu, F; He, Y; Zhang, W; He, J; He, J; Xu, X; Lv, H; Yan, L
2011-08-01
In the present study, we have determined the complete genomic sequence and analysed the intron polymorphism of partial HLA-B and HLA-C alleles in the Chinese Han population. Over 3.0 kb DNA fragments of HLA-B and HLA-C loci were amplified by polymerase chain reaction from partial 5' untranslated region to 3' noncoding region respectively, and then the amplified products were sequenced. Full-length nucleotide sequences of 14 HLA-B alleles and 10 HLA-C alleles were obtained and have been submitted to GenBank and IMGT/HLA database. Two novel alleles of HLA-B*52:01:01:02 and HLA-B*59:01:01:02 were identified, and the complete genomic sequence of HLA-B*52:01:01:01 was firstly reported. Totally 157 and 167 polymorphism positions were found in the full-length genomic sequence of HLA-B and HLA-C loci respectively. Our results suggested that many single nucleotide polymorphisms existed in the exon and intron regions, and the data can provide useful information for understanding the evolution of HLA-B and HLA-C alleles. © 2011 Blackwell Publishing Ltd.
High resolution identity testing of inactivated poliovirus vaccines.
Mee, Edward T; Minor, Philip D; Martin, Javier
2015-07-09
Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Wang, L; Eriksson, S
2000-01-01
The subcellular localization of mitochondrial thymidine kinase (TK2) has been questioned, since no mitochondrial targeting sequences have been found in cloned human TK2 cDNAs. Here we report the cloning of mouse TK2 cDNA from a mouse full-length enriched cDNA library. The mouse TK2 cDNA codes for a protein of 270 amino acids, with a 40-amino-acid presumed N-terminal mitochondrial targeting signal. In vitro translation and translocation experiments with purified rat mitochondria confirmed that the N-terminal sequence directed import of the precursor TK2 into the mitochondrial matrix. A single 2.4 kb mRNA transcript was detected in most tissues examined, except in liver, where an additional shorter (1.0 kb) transcript was also observed. There was no correlation between the tissue distribution of TK2 activity and the expression of TK2 mRNA. Full-length mouse TK2 protein and two N-terminally truncated forms, one of which corresponds to the mitochondrial form of TK2 and a shorter form corresponding to the previously characterized recombinant human TK2, were expressed in Escherichia coli and affinity purified. All three forms of TK2 phosphorylated thymidine, deoxycytidine and 2'-deoxyuridine, but with different kinetic efficiencies. A number of cytostatic pyrimidine nucleoside analogues were also tested and shown to be good substrates for the various forms of TK2. The active form of full-length mouse TK2 was a dimer, as judged by Superdex 200 chromatography. These results enhance our understanding of the structure and function of TK2, and may help to explain the mitochondrial disorder, mitochondrial neurogastrointestinal encephalomyopathy. PMID:11023833
Sampathkumar, Raghavan; Sivaraman, Karthi; U. K. J., Anto Jesuraj; Dhar, Chirag; D. Souza, George; Berry, Neil
2017-01-01
India has the third largest number of HIV-1-infected individuals accounting for approximately 2.1 million people, with a predominance of circulating subtype C strains and a low prevalence of subtype A and A1C and BC recombinant forms, identified over the past two decades. Recovery of near full-length HIV-1 genomes from a plasma source coupled with advances in next generation sequencing (NGS) technologies and development of universal methods for amplifying whole genomes of HIV-1 circulating in a target geography or population provides the opportunity for a detailed analysis of HIV-1 strain identification, evolution and dynamics. Here we describe the development and implementation of approaches for HIV-1 NGS analysis in a southern Indian cohort. Plasma samples (n = 20) were obtained from HIV-1-confirmed individuals living in and around the city of Bengaluru. Near full-length genome recovery was obtained for 9 Indian HIV-1 patients, with recovery of full-length gag and env genes for 10 and 2 additional subjects, respectively. Phylogenetic analyses indicate the majority of sequences to be represented by subtype C viruses branching within a monophyletic clade, comprising viruses from India, Nepal, Myanmar and China and closely related to a southern African cluster, with a low prevalence of the A1C recombinant form also present. Development of algorithms for bespoke recovery and analysis at a local level will further aid clinical management of HIV-1 infected Indian subjects and delineate the progress of the HIV-1 pandemic in this and other geographical regions. PMID:29220350
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA.
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A
2016-10-07
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.
Molecular Architecture of Full-length TRF1 Favors Its Interaction with DNA*
Boskovic, Jasminka; Martinez-Gago, Jaime; Mendez-Pertuz, Marinela; Buscato, Alberto; Martinez-Torrecuadrada, Jorge Luis; Blasco, Maria A.
2016-01-01
Telomeres are specific DNA-protein structures found at both ends of eukaryotic chromosomes that protect the genome from degradation and from being recognized as double-stranded breaks. In vertebrates, telomeres are composed of tandem repeats of the TTAGGG sequence that are bound by a six-subunit complex called shelterin. Molecular mechanisms of telomere functions remain unknown in large part due to lack of structural data on shelterins, shelterin complex, and its interaction with the telomeric DNA repeats. TRF1 is one of the best studied shelterin components; however, the molecular architecture of the full-length protein remains unknown. We have used single-particle electron microscopy to elucidate the structure of TRF1 and its interaction with telomeric DNA sequence. Our results demonstrate that full-length TRF1 presents a molecular architecture that assists its interaction with telometic DNA and at the same time makes TRFH domains accessible to other TRF1 binding partners. Furthermore, our studies suggest hypothetical models on how other proteins as TIN2 and tankyrase contribute to regulate TRF1 function. PMID:27563064
Correlations between oxygen affinity and sequence classifications of plant hemoglobins
USDA-ARS?s Scientific Manuscript database
Plants express three phylogenetic classes of hemoglobins (Hb) based on sequence analyses. Class 1 and 2 Hbs are full length globins with the classical 8 helix Mb-like fold, whereas Class 3 plant Hbs resemble the truncated globins found in bacteria. With the exception of the specialized leghemoglobin...
California mild CTV strains that break resistance in Trifoliate Orange
USDA-ARS?s Scientific Manuscript database
This is the final report of a project to characterize California isolates of Citrus tristeza virus (CTV) that replicate in Poncirus trifoliata (trifoliate orange). Next Generation Sequencing (NGS) of viral small interfering RNAs (siRNAs) and assembly of full-length sequences of mild California CTV i...
USDA-ARS?s Scientific Manuscript database
The full-length sequence of a new isolate of Apple chlorotic leaf spot virus (ACLSV) from Korea was divergent, but most closely related to the Japanese isolate A4, at 84% nucleotide identity. The full-length cDNA of the Korean isolate of ACLSV was cloned into a binary vector downstream of the bacter...
Gao, Ruimin; Niu, Shengniao; Dai, Weifang; Kitajima, Elliot; Wong, Sek-Man
2016-10-01
A Brazilian isolate of Hibiscus latent Fort Pierce virus (HLFPV-BR) was firstly found in a hibiscus plant in Limeira, SP, Brazil. RACE PCR was carried out to obtain the full-length sequences of HLFPV-BR which is 6453 nucleotides and has more than 99.15 % of complete genomic RNA nucleotide sequence identity with that of HLFPV Japanese isolate. The genomic structure of HLFPV-BR is similar to other tobamoviruses. It includes a 5' untranslated region (UTR), followed by open reading frames encoding for a 128-kDa protein and a 188-kDa readthrough protein, a 38-kDa movement protein, 18-kDa coat protein, and a 3' UTR. Interestingly, the unique feature of poly(A) tract is also found within its 3'-UTR. Furthermore, from the total RNA extracted from the local lesions of HLFPV-BR-infected Chenopodium quinoa leaves, a biologically active, full-length cDNA clone encompassing the genome of HLFPV-BR was amplified and placed adjacent to a T7 RNA polymerase promoter. The capped in vitro transcripts from the cloned cDNA were infectious when mechanically inoculated into C. quinoa and Nicotiana benthamiana plants. This is the first report of the presence of an isolate of HLFPV in Brazil and the successful synthesis of a biologically active HLFPV-BR full-length cDNA clone.
An improved model for whole genome phylogenetic analysis by Fourier transform.
Yin, Changchuan; Yau, Stephen S-T
2015-10-07
DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Macaca specific exon creation event generates a novel ZKSCAN5 transcript.
Kim, Young-Hyun; Choe, Se-Hee; Song, Bong-Seok; Park, Sang-Je; Kim, Myung-Jin; Park, Young-Ho; Yoon, Seung-Bin; Lee, Youngjeon; Jin, Yeung Bae; Sim, Bo-Woong; Kim, Ji-Su; Jeong, Kang-Jin; Kim, Sun-Uk; Lee, Sang-Rae; Park, Young-Il; Huh, Jae-Won; Chang, Kyu-Tae
2016-02-15
ZKSCAN5 (also known as ZFP95) is a zinc-finger protein belonging to the Krűppel family. ZKSCAN5 contains a SCAN box and a KRAB A domain and is proposed to play a distinct role during spermatogenesis. In humans, alternatively spliced ZKSCAN5 transcripts with different 5'-untranslated regions (UTRs) have been identified. However, investigation of our Macaca UniGene Database revealed novel alternative ZKSCAN5 transcripts that arose due to an exon creation event. Therefore, in this study, we identified the full-length sequences of ZKSCAN5 and its alternative transcripts in Macaca spp. Additionally, we investigated different nonhuman primate sequences to elucidate the molecular mechanism underlying the exon creation event. We analyzed the evolutionary features of the ZKSCAN5 transcripts by reverse transcription polymerase chain reaction (RT-PCR) and genomic PCR, and by sequencing various nonhuman primate DNA and RNA samples. The exon-created transcript was only detected in the Macaca lineage (crab-eating monkey and rhesus monkey). Full-length sequence analysis by rapid amplification of cDNA ends (RACE) identified ten full-length transcripts and four functional isoforms of ZKSCAN5. Protein sequence analyses revealed the presence of two groups of isoforms that arose because of differences in start-codon usage. Together, our results demonstrate that there has been specific selection for a discrete set of ZKSCAN5 variants in the Macaca lineage. Furthermore, study of this locus (and perhaps others) in Macaca spp. might facilitate our understanding of the evolutionary pressures that have shaped the mechanism of exon creation in primates. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Characterization and chromosomal mapping of the human TFG gene involved in thyroid carcinoma
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mencinger, M.; Panagopoulos, I.; Andreasson, P.
1997-05-01
Homology searches in the Expressed Sequence Tag Database were performed using SPYGQ-rich regions as query sequences to find genes encoding protein regions similar to the N-terminal parts of the sarcoma-associated EWS and FUS proteins. Clone 22911 (T74973), encoding a SPYGQ-rich region in its 5{prime} end, and several other clones that overlapped 22911 were selected. The combined data made it possible to assemble a full-length cDNA sequence. This cDNA sequence is 1677 bp, containing an initiation codon ATG, an open reading frame of 400 amino acids, a poly(A) signal, and a poly(A) tail. We found 100% identity between the 5{prime} partmore » of the consensus sequence and the 598-bp-long sequence named TFG. The TFG sequence is fused to the 3{prime} end of NTRK1, generating the TRK-T3 fusion transcript found in papillary thyroid carcinoma. The cDNA therefore represents the full-length transcript of the TFG gene. TFG was localized to 3q11-q12 by fluorescence in situ hybridization. The 3{prime} and the 5{prime} ends of the TFG cDNA probe hybridized to a 2.2-kb band on Northern blot filters in all tissues examined. 28 refs., 5 figs., 1 tab.« less
3G vector-primer plasmid for constructing full-length-enriched cDNA libraries.
Zheng, Dong; Zhou, Yanna; Zhang, Zidong; Li, Zaiyu; Liu, Xuedong
2008-09-01
We designed a 3G vector-primer plasmid for the generation of full-length-enriched complementary DNA (cDNA) libraries. By employing the terminal transferase activity of reverse transcriptase and the modified strand replacement method, this plasmid (assembled with a polydT end and a deoxyguanosine [dG] end) combines priming full-length cDNA strand synthesis and directional cDNA cloning. As a result, the number of steps involved in cDNA library preparation is decreased while simplifying downstream gene manipulation, sequencing, and subcloning. The 3G vector-primer plasmid method yields fully represented plasmid primed libraries that are equivalent to those made by the SMART (switching mechanism at 5' end of RNA transcript) approach.
Hehle, Verena K.; Paul, Matthew J.; Roberts, Victoria A.; van Dolleweerd, Craig J.; Ma, Julian K.-C.
2016-01-01
This study examined the degradation pattern of a murine IgG1κ monoclonal antibody expressed in and extracted from transformed Nicotiana tabacum. Gel electrophoresis of leaf extracts revealed a consistent pattern of recombinant immunoglobulin bands, including intact and full-length antibody, as well as smaller antibody fragments. N-terminal sequencing revealed these smaller fragments to be proteolytic cleavage products and identified a limited number of protease-sensitive sites in the antibody light and heavy chain sequences. No strictly conserved target sequence was evident, although the peptide bonds that were susceptible to proteolysis were predominantly and consistently located within or near to the interdomain or solvent-exposed regions in the antibody structure. Amino acids surrounding identified cleavage sites were mutated in an attempt to increase resistance. Different Guy’s 13 antibody heavy and light chain mutant combinations were expressed transiently in N. tabacum and demonstrated intensity shifts in the fragmentation pattern, resulting in alterations to the full-length antibody-to-fragment ratio. The work strengthens the understanding of proteolytic cleavage of antibodies expressed in plants and presents a novel approach to stabilize full-length antibody by site-directed mutagenesis.—Hehle, V. K., Paul, M. J., Roberts, V. A., van Dolleweerd, C. J., Ma, J. K.-C. Site-targeted mutagenesis for stabilization of recombinant monoclonal antibody expressed in tobacco (Nicotiana tabacum) plants. PMID:26712217
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia
Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A.; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A.; Fagiolini, Michela; Hensch, Takao K.; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide
2003-01-01
We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete. PMID:12819125
Singh, B N; Mudgil, Yashwanti; Sopory, S K; Reddy, M K
2003-07-01
We have successfully expressed enzymatically active plant topoisomerase II in Escherichia coli for the first time, which has enabled its biochemical characterization. Using a PCR-based strategy, we obtained a full-length cDNA and the corresponding genomic clone of tobacco topoisomerase II. The genomic clone has 18 exons interrupted by 17 introns. Most of the 5' and 3' splice junctions follow the typical canonical consensus dinucleotide sequence GU-AG present in other plant introns. The position of introns and phasing with respect to primary amino acid sequence in tobacco TopII and Arabidopsis TopII are highly conserved, suggesting that the two genes are evolved from the common ancestral type II topoisomerase gene. The cDNA encodes a polypeptide of 1482 amino acids. The primary amino acid sequence shows a striking sequence similarity, preserving all the structural domains that are conserved among eukaryotic type II topoisomerases in an identical spatial order. We have expressed the full-length polypeptide in E. coli and purified the recombinant protein to homogeneity. The full-length polypeptide relaxed supercoiled DNA and decatenated the catenated DNA in a Mg(2+)- and ATP-dependent manner, and this activity was inhibited by 4'-(9-acridinylamino)-3'-methoxymethanesulfonanilide (m-AMSA). The immunofluorescence and confocal microscopic studies, with antibodies developed against the N-terminal region of tobacco recombinant topoisomerase II, established the nuclear localization of topoisomerase II in tobacco BY2 cells. The regulated expression of tobacco topoisomerase II gene under the GAL1 promoter functionally complemented a temperature-sensitive TopII(ts) yeast mutant.
USDA-ARS?s Scientific Manuscript database
Channel catfish, Ictalurus punctatus, T cell receptors (TCR) gamma and delta were identified by mining of expressed sequence tag databases and full length sequences were obtained by 5'-RACE and RT-PCR protocols. cDNAs for each of these TCR chains encode typical variable (V), (diversity; D), joining ...
Identification of SHIP-1 and SHIP-2 homologs in channel catfish, Ictalurus punctatus
USDA-ARS?s Scientific Manuscript database
Src homology domain 2 (SH2) domain-containing inositol 5’-phosphatases (SHIP) proteins have diverse roles in signal transduction. SHIP-1 and SHIP-2 homologs were identified in channel catfish, Ictalurus punctatus, based on sequence homology to murine and human SHIP sequences. Full-length cDNAs for ...
Rodríguez-Martín, Carlos; Cidre, Florencia; Fernández-Teijeiro, Ana; Gómez-Mariano, Gema; de la Vega, Leticia; Ramos, Patricia; Zaballos, Ángel; Monzón, Sara; Alonso, Javier
2016-05-01
Retinoblastoma (RB, MIM 180200) is the paradigm of hereditary cancer. Individuals harboring a constitutional mutation in one allele of the RB1 gene have a high predisposition to develop RB. Here, we present the first case of familial RB caused by a de novo insertion of a full-length long interspersed element-1 (LINE-1) into intron 14 of the RB1 gene that caused a highly heterogeneous splicing pattern of RB1 mRNA. LINE-1 insertion was inferred by mRNA studies and full-length sequenced by massive parallel sequencing. Some of the aberrant mRNAs were produced by noncanonical acceptor splice sites, a new finding that up to date has not been described to occur upon LINE-1 retrotransposition. Our results clearly show that RNA-based strategies have the potential to detect disease-causing transposon insertions. It also confirms that the incorporation of new genetic approaches, such as massive parallel sequencing, contributes to characterize at the sequence level these unique and exceptional genetic alterations.
Xia, Xichao; Liu, Rongzhi; Li, Yi; Xue, Shipeng; Liu, Qingchun; Jiang, Xiao; Zhang, Wenjuan; Ding, Ke
2014-09-01
Hyaluronidase is a common component of scorpion venom and has been considered as "spreading factor" that promotes a fast penetration of the venom in the anaphylactic reaction. In the current study, a novel full-length of hyaluronidase BmHYI and three noncoding isoforms of BmHYII, BmHYIII and BmHYIV were cloned by using a combined strategy based on peptide sequencing and Rapid Amplification of cDNA Ends (RACE). BmHYI has 410 amino acid residues containing the catalytic, positional and five potential N-glycosylation sites. The deduced protein sequence of BmHYI shares significant identity with venom hyaluronidases from bees and snakes. The phylogenetic analysis showed early divergence and independent evolution of BmHYI from other hyaluronidases. An extraordinarily high level of sequence similarity was detected among four sequences. But, BmHYII, BmHYIII and BmHYIV were short of stop-codon in the open reading frame and poly(A) signal in the 3' end. Copyright © 2014 Elsevier B.V. All rights reserved.
Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong
2008-06-01
A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.
Harrison, Robert A; Ibison, Frances; Wilbraham, Davina; Wagstaff, Simon C
2007-05-01
The immobilisation of prey by snakes is most efficiently achieved by the rapid dissemination of venom from its site of injection into the blood stream. Hyaluronidase is a common component of snake venoms and has been termed the "venom spreading factor". In the absence of nucleotide or protein sequence data to confirm the functional identity of this venom component, we interrogated a venom gland EST database for the saw-scaled viper, Echis ocellatus (Nigeria), using the gene ontology (GO) term "carbohydrate metabolism". A single hyalurononglucosaminadase-activity matching sequence (EOC00242) was found and used to design PCR primers to acquire the full-length cDNA sequence. Although very different from the bee venom and mammalian hyaluronidase sequences, the E. ocellatus sequence retained all the catalytic, positional and structural residues that characterise this class of carbohydrate metabolising hydrolases. An extraordinarily high level of sequence identity (>95%) was observed in analogous venom gland cDNA sequences isolated (by PCR) from another saw-scaled viper species, E. pyramidum leakeyi (Kenya), and from the sahara horned viper, Cerastes cerastes cerastes (Egypt) and the puff adder, Bitis arietans (Nigeria). Smaller amplicons, lacking hyaluronidase catalytic residues because of 768 bp or 855 bp central deletions, appear to encode either truncated peptides without hyaluronidase activity, or are non-translated transcripts because they lack consensus translation initiating motifs.
Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi
2016-11-01
The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia
2014-01-01
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
Patnaik, Bharat Bhusan; Kim, Dong Hyun; Oh, Seung Han; Song, Yong-Su; Chanh, Nguyen Dang Minh; Kim, Jong Sun; Jung, Woo-jin; Saha, Atul Kumar; Bindroo, Bharat Bhushan; Han, Yeon Soo
2012-01-01
Background Silkworm fecal matter is considered one of the richest sources of antimicrobial and antiviral protein (substances) and such economically feasible and eco-friendly proteins acting as secondary metabolites from the insect system can be explored for their practical utility in conferring broad spectrum disease resistance against pathogenic microbial specimens. Methodology/Principal Findings Silkworm fecal matter extracts prepared in 0.02 M phosphate buffer saline (pH 7.4), at a temperature of 60°C was subjected to 40% saturated ammonium sulphate precipitation and purified by gel-filtration chromatography (GFC). SDS-PAGE under denaturing conditions showed a single band at about 21.5 kDa. The peak fraction, thus obtained by GFC wastested for homogeneityusing C18reverse-phase high performance liquid chromatography (HPLC). The activity of the purified protein was tested against selected Gram +/− bacteria and phytopathogenic Fusarium species with concentration-dependent inhibitionrelationship. The purified bioactive protein was subjected to matrix-assisted laser desorption and ionization-time of flight mass spectrometry (MALDI-TOF-MS) and N-terminal sequencing by Edman degradation towards its identification. The N-terminal first 18 amino acid sequence following the predicted signal peptide showed homology to plant germin-like proteins (Glp). In order to characterize the full-length gene sequence in detail, the partial cDNA was cloned and sequenced using degenerate primers, followed by 5′- and 3′-rapid amplification of cDNA ends (RACE-PCR). The full-length cDNA sequence composed of 630 bp encoding 209 amino acids and corresponded to germin-like proteins (Glps) involved in plant development and defense. Conclusions/Significance The study reports, characterization of novel Glpbelonging to subfamily 3 from M. alba by the purification of mature active protein from silkworm fecal matter. The N-terminal amino acid sequence of the purified protein was found similar to the deduced amino acid sequence (without the transit peptide sequence) of the full length cDNA from M. alba. PMID:23284650
Characterization of AFLAV, a Tf1/Sushi retrotransposon from Aspergillus flavus.
Hua, Sui-Sheng T; Tarun, Alice S; Pandey, Sonal N; Chang, Leo; Chang, Perng-Kuang
2007-02-01
The plasmid, pAF28, a genomic clone from Aspergillus flavus NRRL 6541, has been used as a hybridization probe to fingerprint A. flavus strains isolated in corn and peanut fields. The insert of pAF28 contains a 4.5 kb region which encodes a truncated retrotransposon (AfRTL-1). In search for a full-length and intact copy of retrotransposon, we exploited a novel PCR cloning strategy by amplifying a 3.4 kb region from the genomic DNA of A. flavus NRRL 6541. The fragment was cloned into pCR 4-TOPO. Sequence analysis confirmed that this region encoded putative domains of partial reverse transcriptase, RNase H, and integrase of the predicted retrotransposon. The two flanking long terminal repeats (LTRs) and the sequence between them comprise a putative full-length LTR retrotransposon of 7799 bp in length. This intact retrotransposon sequence is named AFLAV (A. flavus Retrotransposon). The order of the predicted catalytic domains in the polyprotein (Pol) placed AFLAV in the Tf1/sushi subgroup of the Ty3/gypsy retrotransposon family. Primers derived from AFLAV sequence were used to screen this retrotransposon in other strains of A. flavus. More than fifty strains of A. flavus isolated from different geological origins were surveyed and the results show that many strains have extensive deletions in the regions encoding the capsid (Gag) and Pol.
Deutscher, Ania T; Burke, Catherine M; Darling, Aaron E; Riegler, Markus; Reynolds, Olivia L; Chapman, Toni A
2018-05-05
Gut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets). Near full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains. Variation in the gut bacterial communities of B. tryoni larvae depends on diet, domestication, and horizontal acquisition. Bacterial variation in wild larvae suggests that more than one bacterial species can perform the same functional role; however, Asaia could be an important gut bacterium in larvae and warrants further study. A greater understanding of the functions of the bacteria detected in larvae could lead to increased fly quality and performance as part of the SIT.
Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.
Nakano, Kazuma; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Ohki, Shun; Shinzato, Misuzu; Minami, Maiko; Nakanishi, Tetsuhiro; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi
2017-07-01
PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II's sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.
Sequence of Spider Aciniform and Piriform Silks
2001-09-19
7/98nd subtan-6/01 4. TITLE AND SUBTITLE Sequence of Spider Aciniform and Piriform Silks 5. FUNDING NUMBERS DAAD19-01-1-0569 6...aciniform glands from Argiope trifasciata were used to construct a cDNA library. The library was probed with various DNA probes based on known spider silk ...sequence in a number of other spider silks . The 5’end of the clone still appears to be repetitive sequence and thus it is unlikely to be a full-length
Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun
2013-01-01
Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features
Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna
2005-01-01
Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...
Seafood delicacy makes great adhesive
Idaho National Laboratory - Frank Roberto, Heather Silverman
2017-12-09
Technology from Mother Nature is often hard to beat, so Idaho National Laboratory scientistsgenetically analyzed the adhesive proteins produced by blue mussels, a seafood delicacy. Afterobtaining full-length DNA sequences encoding these proteins, reprod
Nandakumar, Subhiksha; Bae, Eunhae H; Khan, Arifa S
2017-08-17
The full-length genome sequence of a simian foamy virus (SFVmmu_K3T), isolated from a rhesus macaque ( Macaca mulatta ), was obtained using high-throughput sequencing. SFVmmu_K3T consisted of 12,983 bp and had a genomic organization similar to that of other SFVs, with long terminal repeats (LTRs) and open reading frames for Gag, Pol, Env, Tas, and Bet.
Molecular cloning and nucleotide sequence of CYP6BF1 from the diamondback moth, Plutella xylostella
Li, Hongshan; Dai, Huaguo; Wei, Hui
2005-01-01
A novel cDNA clong encoding a cytochrome P450 was screened from the insecticide-susceptible strain of Plutella xylostella (L.) (Lepidoptera:Yponomeutidae). The nucleotide sequence of the clone, designated CYP6BF1, was determined. This is the first full-length sequence of the CYP6 family from Plutella xylostella (L.). The cDNA is 1661bp in length and contains an open reading frame from base pairs 26 to 1570, encoding a protein of 514 amino acid residues. It is similar to the other insect P450s in gene family 6, including CYP6AE1 from Depressaria pastinacella, (46%). The GenBank accession number is AY971374. PMID:17119627
Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants
2014-01-01
Background The source inoculum of gastrointestinal tract (GIT) microbes is largely influenced by delivery mode in full-term infants, but these influences may be decoupled in very low birth weight (VLBW, <1,500 g) neonates via conventional broad-spectrum antibiotic treatment. We hypothesize the built environment (BE), specifically room surfaces frequently touched by humans, is a predominant source of colonizing microbes in the gut of premature VLBW infants. Here, we present the first matched fecal-BE time series analysis of two preterm VLBW neonates housed in a neonatal intensive care unit (NICU) over the first month of life. Results Fresh fecal samples were collected every 3 days and metagenomes sequenced on an Illumina HiSeq2000 device. For each fecal sample, approximately 33 swabs were collected from each NICU room from 6 specified areas: sink, feeding and intubation tubing, hands of healthcare providers and parents, general surfaces, and nurse station electronics (keyboard, mouse, and cell phone). Swabs were processed using a recently developed ‘expectation maximization iterative reconstruction of genes from the environment’ (EMIRGE) amplicon pipeline in which full-length 16S rRNA amplicons were sheared and sequenced using an Illumina platform, and short reads reassembled into full-length genes. Over 24,000 full-length 16S rRNA sequences were produced, generating an average of approximately 12,000 operational taxonomic units (OTUs) (clustered at 97% nucleotide identity) per room-infant pair. Dominant gut taxa, including Staphylococcus epidermidis, Klebsiella pneumoniae, Bacteroides fragilis, and Escherichia coli, were widely distributed throughout the room environment with many gut colonizers detected in more than half of samples. Reconstructed genomes from infant gut colonizers revealed a suite of genes that confer resistance to antibiotics (for example, tetracycline, fluoroquinolone, and aminoglycoside) and sterilizing agents, which likely offer a competitive advantage in the NICU environment. Conclusions We have developed a high-throughput culture-independent approach that integrates room surveys based on full-length 16S rRNA gene sequences with metagenomic analysis of fecal samples collected from infants in the room. The approach enabled identification of discrete ICU reservoirs of microbes that also colonized the infant gut and provided evidence for the presence of certain organisms in the room prior to their detection in the gut. PMID:24468033
Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R
2007-04-01
We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Lv, Daoyuan; Song, Ping; Chen, Yungui; Gong, Wuming; Mo, Saijun
2005-04-08
Using the digital differential display program of the National Center for Biotechnology Information, we identified a contig of expression sequence tags (ESTs) (Accession No. BM316936), which came from zebrafish ovary and testis libraries. The full-length cDNA of this transcript was cloned and further confirmed by polymerase chain reaction and sequencing. The full-length cDNA of the novel gene is 807bp and encodes a novel protein of 187 amino acids, which shares no significant homology with any other known proteins. Characterization of genomic sequences of the gene revealed that it spans 6kb on the linkage group 3 and is composed of five exons and four introns. RT-PCR analysis showed that it was expressed in mature oocytes and one-cell stage, and persisted until 24h of development. RT-PCR also revealed that it is expressed in gonad and kidney, with the highest level of expression in the testis. The expression sites of the novel gene in adult gonad were further localized by in situ hybridization to oogonia and growing oocytes in ovary and to spermatogonia, spermatocytes but not to spermatids in testis. Based on its abundance in testis and the germline stem cell-spermatogonia and oogonia, we hypothesize that it may function as a testicular development and gametogenesis related gene that plays important roles in spermatogenesis, and named it Zsrg (zebrafish testis spermatogenesis related gene, Zsrg).
Mouw, M; Pintel, D J
1998-11-10
GST-NS1 purified from Escherichia coli and insect cells binds double-strand DNA in an (ACCA)2-3-dependent fashion under similar ionic conditions, independent of the presence of anti-NS1 antisera or exogenously supplied ATP and interacts with single-strand DNA and RNA in a sequence-independent manner. An amino-terminal domain (amino acids 1-275) of NS1 [GST-NS1(1-275)], representing 41% of the full-length NS1 molecule, includes a domain that binds double-strand DNA in a sequence-specific manner at levels comparable to full-length GST-NS1, as well as single-strand DNA and RNA in a sequence-independent manner. The deletion of 15 additional amino-terminal amino acids yielded a molecule [GST-NS1(1-275)] that maintained (ACCA)2-3-specific double-strand DNA binding; however, this molecule was more sensitive to increasing ionic conditions than full-length GST-NS1 and GST-NS1(1-275) and could not be demonstrated to bind single-strand nucleic acids. A quantitative filter binding assay showed that E. coli- and baculovirus-expressed GST-NS1 and E. coli GST-NS1(1-275) specifically bound double-strand DNA with similar equilibrium kinetics [as measured by their apparent equilibrium DNA binding constants (KD)], whereas GST-NS1(16-275) bound 4- to 8-fold less well. Copyright 1998 Academic Press.
Mini-DNA barcode in identification of the ornamental fish: A case study from Northeast India.
Dhar, Bishal; Ghosh, Sankar Kumar
2017-09-05
The ornamental fishes were exported under the trade names or generic names, thus creating problems in species identification. In this regard, DNA barcoding could effectively elucidate the actual species status. However, the problem arises if the specimen is having taxonomic disputes, falsified by trade/generic names, etc., On the other hand, barcoding the archival museum specimens would be of greater benefit to address such issues as it would create firm, error-free reference database for rapid identification of any species. This can be achieved only by generating short sequences as DNA from chemically preserved are mostly degraded. Here we aimed to identify a short stretch of informative sites within the full-length barcode segment, capable of delineating diverse group of ornamental fish species, commonly traded from NE India. We analyzed 287 full-length barcode sequences from the major fish orders and compared the interspecific K2P distance with nucleotide substitutions patterns and found a strong correlation of interspecies distance with transversions (0.95, p<0.001). We, therefore, proposed a short stretch of 171bp (transversion rich) segment as mini-barcode. The proposed segment was compared with the full-length barcodes and found to delineate the species effectively. Successful PCR amplification and sequencing of the 171bp segment using designed primers for different orders validated it as mini-barcodes for ornamental fishes. Thus, our findings would be helpful in strengthening the global database with the sequence of archived fish species as well as an effective identification tool of the traded ornamental fish species, as a less time consuming, cost effective field-based application. Copyright © 2017 Elsevier B.V. All rights reserved.
Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oshima, A.; Kyle, J.W.; Miller, R.D.
1987-02-01
The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less
Use of designed sequences in protein structure recognition.
Kumar, Gayatri; Mudgal, Richa; Srinivasan, Narayanaswamy; Sandhya, Sankaran
2018-05-09
Knowledge of the protein structure is a pre-requisite for improved understanding of molecular function. The gap in the sequence-structure space has increased in the post-genomic era. Grouping related protein sequences into families can aid in narrowing the gap. In the Pfam database, structure description is provided for part or full-length proteins of 7726 families. For the remaining 52% of the families, information on 3-D structure is not yet available. We use the computationally designed sequences that are intermediately related to two protein domain families, which are already known to share the same fold. These strategically designed sequences enable detection of distant relationships and here, we have employed them for the purpose of structure recognition of protein families of yet unknown structure. We first measured the success rate of our approach using a dataset of protein families of known fold and achieved a success rate of 88%. Next, for 1392 families of yet unknown structure, we made structural assignments for part/full length of the proteins. Fold association for 423 domains of unknown function (DUFs) are provided as a step towards functional annotation. The results indicate that knowledge-based filling of gaps in protein sequence space is a lucrative approach for structure recognition. Such sequences assist in traversal through protein sequence space and effectively function as 'linkers', where natural linkers between distant proteins are unavailable. This article was reviewed by Oliviero Carugo, Christine Orengo and Srikrishna Subramanian.
Guo, Yan; Zhang, Jinliang; Yan, Yongfeng; Wu, Jian; Zhu, Nengwu; Deng, Changyan
2015-01-01
Polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) and subsequent sub-cloning and sequencing were used in this study to analyze the molecular phylogenetic diversity and spatial distribution of bacterial communities in different spatial locations during the cooling stage of composted swine manure. Total microbial DNA was extracted, and bacterial near full-length 16S rRNA genes were subsequently amplified, cloned, RFLP-screened, and sequenced. A total of 420 positive clones were classified by RFLP and near-full-length 16S rDNA sequences. Approximately 48 operational taxonomic units (OTUs) were found among 139 positive clones from the superstratum sample; 26 among 149 were from the middle-level sample and 35 among 132 were from the substrate sample. Thermobifida fusca was common in the superstratum layer of the pile. Some Bacillus spp. were remarkable in the middle-level layer, and Clostridium sp. was dominant in the substrate layer. Among 109 OTUs, 99 displayed homology with those in the GenBank database. Ten OTUs were not closely related to any known species. The superstratum sample had the highest microbial diversity, and different and distinct bacterial communities were detected in the three different layers. This study demonstrated the spatial characteristics of the microbial community distribution in the cooling stage of swine manure compost. PMID:25925066
USDA-ARS?s Scientific Manuscript database
In this paper, we report the full length coding sequence of bovine ATGL cDNA are reported and analyze its expression in bovine tissues. Similar to human, mouse, and pig ATGL sequences, bovine ATGL has a highly conserved patatin domain that is necessary for lipolytic function in mice and humans. Thi...
USDA-ARS?s Scientific Manuscript database
The cDNA of a NADH dehydrogenase -ubiquinone Fe-S protein 8 subunit (NDUFS8) gene from Aedes (Ochlerotatus) taeniorhynchus Wiedemann has been cloned and sequenced. The full-length mRNA sequence (824 bp) of AetNDUFS8 encodes an open reading region of 651 bp (i.e., 217 amino acids). To detect whether ...
Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing
2015-02-01
The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.
NASA Astrophysics Data System (ADS)
Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid
2017-02-01
Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
The protein structure prediction problem could be solved using the current PDB library
Zhang, Yang; Skolnick, Jeffrey
2005-01-01
For single-domain proteins, we examine the completeness of the structures in the current Protein Data Bank (PDB) library for use in full-length model construction of unknown sequences. To address this issue, we employ a comprehensive benchmark set of 1,489 medium-size proteins that cover the PDB at the level of 35% sequence identity and identify templates by structure alignment. With homologous proteins excluded, we can always find similar folds to native with an average rms deviation (RMSD) from native of 2.5 Å with ≈82% alignment coverage. These template structures often contain a significant number of insertions/deletions. The tasser algorithm was applied to build full-length models, where continuous fragments are excised from the top-scoring templates and reassembled under the guide of an optimized force field, which includes consensus restraints taken from the templates and knowledge-based statistical potentials. For almost all targets (except for 2/1,489), the resultant full-length models have an RMSD to native below 6 Å (97% of them below 4 Å). On average, the RMSD of full-length models is 2.25 Å, with aligned regions improved from 2.5 Å to 1.88 Å, comparable with the accuracy of low-resolution experimental structures. Furthermore, starting from state-of-the-art structural alignments, we demonstrate a methodology that can consistently bring template-based alignments closer to native. These results are highly suggestive that the protein-folding problem can in principle be solved based on the current PDB library by developing efficient fold recognition algorithms that can recover such initial alignments. PMID:15653774
RNA circularization reveals terminal sequence heterogeneity in a double-stranded RNA virus.
Widmer, G
1993-03-01
Double-stranded RNA viruses (dsRNA), termed LRV1, have been found in several strains of the protozoan parasite Leishmania. With the aim of constructing a full-length cDNA copy of the viral genome, including its terminal sequences, a protocol based on PCR amplification across the 3'-5' junction of circularized RNA was developed. This method proved to be applicable to dsRNA. It provided a relatively simple alternative to one-sided PCR, without loss of specificity inherent in the use of generic primers. LRV1 terminal nucleotide sequences obtained by this method showed a considerable variation in length, particularly at the 5' end of the positive strand, as well as the potential for forming 3' overhangs. The opposite genomic end terminates in 0, 1, or 2 TCA trinucleotide repeats. These results are compared with terminal sequences derived from one-sided PCR experiments.
High-resolution phylogenetic microbial community profiling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, Esther; Coleman-Derr, Devin; Bowman, Brett
2014-03-17
The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less
Ribosomal RNA Genes Contribute to the Formation of Pseudogenes and Junk DNA in the Human Genome.
Robicheau, Brent M; Susko, Edward; Harrigan, Amye M; Snyder, Marlene
2017-02-01
Approximately 35% of the human genome can be identified as sequence devoid of a selected-effect function, and not derived from transposable elements or repeated sequences. We provide evidence supporting a known origin for a fraction of this sequence. We show that: 1) highly degraded, but near full length, ribosomal DNA (rDNA) units, including both 45S and Intergenic Spacer (IGS), can be found at multiple sites in the human genome on chromosomes without rDNA arrays, 2) that these rDNA sequences have a propensity for being centromere proximal, and 3) that sequence at all human functional rDNA array ends is divergent from canonical rDNA to the point that it is pseudogenic. We also show that small sequence strings of rDNA (from 45S + IGS) can be found distributed throughout the genome and are identifiable as an "rDNA-like signal", representing 0.26% of the q-arm of HSA21 and ∼2% of the total sequence of other regions tested. The size of sequence strings found in the rDNA-like signal intergrade into the size of sequence strings that make up the full-length degrading rDNA units found scattered throughout the genome. We conclude that the displaced and degrading rDNA sequences are likely of a similar origin but represent different stages in their evolution towards random sequence. Collectively, our data suggests that over vast evolutionary time, rDNA arrays contribute to the production of junk DNA. The concept that the production of rDNA pseudogenes is a by-product of concerted evolution represents a previously under-appreciated process; we demonstrate here its importance. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.
2006-01-01
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
SILVA tree viewer: interactive web browsing of the SILVA phylogenetic guide trees.
Beccati, Alan; Gerken, Jan; Quast, Christian; Yilmaz, Pelin; Glöckner, Frank Oliver
2017-09-30
Phylogenetic trees are an important tool to study the evolutionary relationships among organisms. The huge amount of available taxa poses difficulties in their interactive visualization. This hampers the interaction with the users to provide feedback for the further improvement of the taxonomic framework. The SILVA Tree Viewer is a web application designed for visualizing large phylogenetic trees without requiring the download of any software tool or data files. The SILVA Tree Viewer is based on Web Geographic Information Systems (Web-GIS) technology with a PostgreSQL backend. It enables zoom and pan functionalities similar to Google Maps. The SILVA Tree Viewer enables access to two phylogenetic (guide) trees provided by the SILVA database: the SSU Ref NR99 inferred from high-quality, full-length small subunit sequences, clustered at 99% sequence identity and the LSU Ref inferred from high-quality, full-length large subunit sequences. The Tree Viewer provides tree navigation, search and browse tools as well as an interactive feedback system to collect any kinds of requests ranging from taxonomy to data curation and improving the tool itself.
Ma, Junguo; Bu, Yanzhen; Li, Yao; Niu, Daichun; Li, Xiaoyu
2014-06-01
The full-length sequence of a cytochrome P450 3A 138 (CYP3A138) cDNA in common carp was cloned and sequenced. The transcriptional and microsome enzyme activities of CYP3A138 in the fish liver after rifampicin exposure were also determined in this study. The results showed that the full-length CYP3A138 cDNA is 1912 base pairs (bp) long and contains an open reading frame of 1551 bp encoding a protein of 517 amino acids. Sequence analysis revealed that CYP3A138 is highly conserved in fish. Furthermore, the results of quantitative real-time PCR revealed that CYP3A138 in common carp is constitutively expressed in all tissues, but mainly in the liver and intestine. Additionally, rifampicin exposure promoted both the expression of CYP3A138 at the transcriptional level and the activity of the protein, suggesting that CYP3A138 is a member of the CYP3A subfamily. © 2014 Wiley Periodicals, Inc.
Isoform Sequencing Provides a More Comprehensive View of the Panax ginseng Transcriptome.
Jo, Ick-Hyun; Lee, Jinsu; Hong, Chi Eun; Lee, Dong Jin; Bae, Wonsil; Park, Sin-Gi; Ahn, Yong Ju; Kim, Young Chang; Kim, Jang Uk; Lee, Jung Woo; Hyun, Dong Yun; Rhee, Sung-Keun; Hong, Chang Pyo; Bang, Kyong Hwan; Ryu, Hojin
2017-09-15
Korean ginseng ( Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng , we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana . Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng . In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.
Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose
2012-02-01
The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.
Structure-based inhibitors of tau aggregation
NASA Astrophysics Data System (ADS)
Seidler, P. M.; Boyer, D. R.; Rodriguez, J. A.; Sawaya, M. R.; Cascio, D.; Murray, K.; Gonen, T.; Eisenberg, D. S.
2018-02-01
Aggregated tau protein is associated with over 20 neurological disorders, which include Alzheimer's disease. Previous work has shown that tau's sequence segments VQIINK and VQIVYK drive its aggregation, but inhibitors based on the structure of the VQIVYK segment only partially inhibit full-length tau aggregation and are ineffective at inhibiting seeding by full-length fibrils. Here we show that the VQIINK segment is the more powerful driver of tau aggregation. Two structures of this segment determined by the cryo-electron microscopy method micro-electron diffraction explain its dominant influence on tau aggregation. Of practical significance, the structures lead to the design of inhibitors that not only inhibit tau aggregation but also inhibit the ability of exogenous full-length tau fibrils to seed intracellular tau in HEK293 biosensor cells into amyloid. We also raise the possibility that the two VQIINK structures represent amyloid polymorphs of tau that may account for a subset of prion-like strains of tau.
Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe; Avarre, Jean-Christophe
2016-01-01
Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×10 7 . The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3.
Hammoumi, Saliha; Vallaeys, Tatiana; Santika, Ayi; Leleux, Philippe; Borzym, Ewa; Klopp, Christophe
2016-01-01
Koi herpesvirus disease (KHVD) is an emerging disease that causes mass mortality in koi and common carp, Cyprinus carpio L. Its causative agent is Cyprinid herpesvirus 3 (CyHV-3), also known as koi herpesvirus (KHV). Although data on the pathogenesis of this deadly virus is relatively abundant in the literature, still little is known about its genomic diversity and about the molecular mechanisms that lead to such a high virulence. In this context, we developed a new strategy for sequencing full-length CyHV-3 genomes directly from infected fish tissues. Total genomic DNA extracted from carp gill tissue was specifically enriched with CyHV-3 sequences through hybridization to a set of nearly 2 million overlapping probes designed to cover the entire genome length, using KHV-J sequence (GenBank accession number AP008984) as reference. Applied to 7 CyHV-3 specimens from Poland and Indonesia, this targeted genomic enrichment enabled recovery of the full genomes with >99.9% reference coverage. The enrichment rate was directly correlated to the estimated number of viral copies contained in the DNA extracts used for library preparation, which varied between ∼5000 and ∼2×107. The average sequencing depth was >200 for all samples, thus allowing the search for variants with high confidence. Sequence analyses highlighted a significant proportion of intra-specimen sequence heterogeneity, suggesting the presence of mixed infections in all investigated fish. They also showed that inter-specimen genetic diversity at the genome scale was very low (>99.95% of sequence identity). By enabling full genome comparisons directly from infected fish tissues, this new method will be valuable to trace outbreaks rapidly and at a reasonable cost, and in turn to understand the transmission routes of CyHV-3. PMID:27703859
Einer-Jensen, Katja; Winton, James R.; Lorenzen, Niels
2005-01-01
The aim of this study was to develop a standardized molecular assay that used limited resources and equipment for routine genotyping of isolates of the fish rhabdovirus, viral haemorrhagic septicaemia virus (VHSV). Computer generated restriction maps, based on 62 unique full-length (1524 nt) sequences of the VHSV glycoprotein (G) gene, were used to predict restriction fragment length polymorphism (RFLP) patterns that were subsequently grouped and compared with a phylogenetic analysis of the G-gene sequences of the same set of isolates. Digestion of PCR amplicons from the full-lengthG-gene by a set of three restriction enzymes was predicted to accurately enable the assignment of the VHSV isolates into the four major genotypes discovered to date. Further sub-typing of the isolates into the recently described sub-lineages of genotype I was possible by applying three additional enzymes. Experimental evaluation of the method consisted of three steps: (i) RT-PCR amplification of the G-gene of VHSV isolates using purified viral RNA as template, (ii) digestion of the PCR products with a panel of restriction endonucleases and (iii) interpretation of the resulting RFLP profiles. The RFLP analysis was shown to approximate the level of genetic discrimination obtained by other, more labour-intensive, molecular techniques such as the ribonuclease protection assay or sequence analysis. In addition, 37 previously uncharacterised isolates from diverse sources were assigned to specific genotypes. While the assay was able to distinguish between marine and continental isolates of VHSV, the differences did not correlate with the pathogenicity of the isolates.
Clonal expansion of genome-intact HIV-1 in functionally polarized Th1 CD4+ T cells
Orlova-Fink, Nina; Einkauf, Kevin; Chowdhury, Fatema Z.; Sun, Xiaoming; Harrington, Sean; Kuo, Hsiao-Hsuan; Hua, Stephane; Chen, Hsiao-Rong; Ouyang, Zhengyu; Reddy, Kavidha; Dong, Krista; Ndung’u, Thumbi; Walker, Bruce D.; Rosenberg, Eric S.; Yu, Xu G.
2017-01-01
HIV-1 causes a chronic, incurable disease due to its persistence in CD4+ T cells that contain replication-competent provirus, but exhibit little or no active viral gene expression and effectively resist combination antiretroviral therapy (cART). These latently infected T cells represent an extremely small proportion of all circulating CD4+ T cells but possess a remarkable long-term stability and typically persist throughout life, for reasons that are not fully understood. Here we performed massive single-genome, near-full-length next-generation sequencing of HIV-1 DNA derived from unfractionated peripheral blood mononuclear cells, ex vivo-isolated CD4+ T cells, and subsets of functionally polarized memory CD4+ T cells. This approach identified multiple sets of independent, near-full-length proviral sequences from cART-treated individuals that were completely identical, consistent with clonal expansion of CD4+ T cells harboring intact HIV-1. Intact, near-full-genome HIV-1 DNA sequences that were derived from such clonally expanded CD4+ T cells constituted 62% of all analyzed genome-intact sequences in memory CD4 T cells, were preferentially observed in Th1-polarized cells, were longitudinally detected over a duration of up to 5 years, and were fully replication- and infection-competent. Together, these data suggest that clonal proliferation of Th1-polarized CD4+ T cells encoding for intact HIV-1 represents a driving force for stabilizing the pool of latently infected CD4+ T cells. PMID:28628034
Clonal expansion of genome-intact HIV-1 in functionally polarized Th1 CD4+ T cells.
Lee, Guinevere Q; Orlova-Fink, Nina; Einkauf, Kevin; Chowdhury, Fatema Z; Sun, Xiaoming; Harrington, Sean; Kuo, Hsiao-Hsuan; Hua, Stephane; Chen, Hsiao-Rong; Ouyang, Zhengyu; Reddy, Kavidha; Dong, Krista; Ndung'u, Thumbi; Walker, Bruce D; Rosenberg, Eric S; Yu, Xu G; Lichterfeld, Mathias
2017-06-30
HIV-1 causes a chronic, incurable disease due to its persistence in CD4+ T cells that contain replication-competent provirus, but exhibit little or no active viral gene expression and effectively resist combination antiretroviral therapy (cART). These latently infected T cells represent an extremely small proportion of all circulating CD4+ T cells but possess a remarkable long-term stability and typically persist throughout life, for reasons that are not fully understood. Here we performed massive single-genome, near-full-length next-generation sequencing of HIV-1 DNA derived from unfractionated peripheral blood mononuclear cells, ex vivo-isolated CD4+ T cells, and subsets of functionally polarized memory CD4+ T cells. This approach identified multiple sets of independent, near-full-length proviral sequences from cART-treated individuals that were completely identical, consistent with clonal expansion of CD4+ T cells harboring intact HIV-1. Intact, near-full-genome HIV-1 DNA sequences that were derived from such clonally expanded CD4+ T cells constituted 62% of all analyzed genome-intact sequences in memory CD4 T cells, were preferentially observed in Th1-polarized cells, were longitudinally detected over a duration of up to 5 years, and were fully replication- and infection-competent. Together, these data suggest that clonal proliferation of Th1-polarized CD4+ T cells encoding for intact HIV-1 represents a driving force for stabilizing the pool of latently infected CD4+ T cells.
Discovery of a novel iflavirus sequence in the eastern paralysis tick Ixodes holocyclus.
O'Brien, Caitlin A; Hall-Mendelin, Sonja; Hobson-Peters, Jody; Deliyannis, Georgia; Allen, Andy; Lew-Tabor, Ala; Rodriguez-Valle, Manuel; Barker, Dayana; Barker, Stephen C; Hall, Roy A
2018-05-11
Ixodes holocyclus, the eastern paralysis tick, is a significant parasite in Australia in terms of animal and human health. However, very little is known about its virome. In this study, next-generation sequencing of I. holocyclus salivary glands yielded a full-length genome sequence which phylogenetically groups with viruses classified in the Iflaviridae family and shares 45% amino acid similarity with its closest relative Bole hyalomma asiaticum virus 1. The sequence of this virus, provisionally named Ixodes holocyclus iflavirus (IhIV) has been identified in tick populations from northern New South Wales and Queensland, Australia and represents the first virus sequence reported from I. holocyclus.
Full Genome Sequence of Egg Drop Syndrome Virus Strain FJ12025 Isolated from Muscovy Duckling.
Fu, Guanghua; Chen, Hongmei; Huang, Yu; Cheng, Longfei; Fu, Qiuling; Shi, Shaohua; Wan, Chunhe; Chen, Cuiteng; Lin, Jiansheng
2013-08-22
Egg drop syndrome virus (EDSV) strain FJ12025 was isolated from a 9-day-old Muscovy duckling. The results of the sequence showed that the genome of strain FJ12025 is 33,213 bp in length, with a G+C content of 43.03%. When comparing the genome sequence of strain FJ12025 to that of laying duck original strain AV-127, we found 50 single-nucleotide polymorphisms (SNPs) between the two viral genome sequences. A genomic sequence comparison of FJ12025 and AV-127 will help to understand the phenotypic differences between the two viruses.
Tuo, Decai; Shen, Wentao; Yan, Pu; Li, Xiaoying; Zhou, Peng
2015-01-01
Papaya leaf distortion mosaic virus (PLDMV) is becoming a threat to papaya and transgenic papaya resistant to the related pathogen, papaya ringspot virus (PRSV). The generation of infectious viral clones is an essential step for reverse-genetics studies of viral gene function and cross-protection. In this study, a sequence- and ligation-independent cloning system, the In-Fusion® Cloning Kit (Clontech, Mountain View, CA, USA), was used to construct intron-less or intron-containing full-length cDNA clones of the isolate PLDMV-DF, with the simultaneous scarless assembly of multiple viral and intron fragments into a plasmid vector in a single reaction. The intron-containing full-length cDNA clone of PLDMV-DF was stably propagated in Escherichia coli. In vitro intron-containing transcripts were processed and spliced into biologically active intron-less transcripts following mechanical inoculation and then initiated systemic infections in Carica papaya L. seedlings, which developed similar symptoms to those caused by the wild-type virus. However, no infectivity was detected when the plants were inoculated with RNA transcripts from the intron-less construct because the instability of the viral cDNA clone in bacterial cells caused a non-sense or deletion mutation of the genomic sequence of PLDMV-DF. To our knowledge, this is the first report of the construction of an infectious full-length cDNA clone of PLDMV and the splicing of intron-containing transcripts following mechanical inoculation. In-Fusion cloning shortens the construction time from months to days. Therefore, it is a faster, more flexible, and more efficient method than the traditional multistep restriction enzyme-mediated subcloning procedure. PMID:26633465
Tuo, Decai; Shen, Wentao; Yan, Pu; Li, Xiaoying; Zhou, Peng
2015-12-01
Papaya leaf distortion mosaic virus (PLDMV) is becoming a threat to papaya and transgenic papaya resistant to the related pathogen, papaya ringspot virus (PRSV). The generation of infectious viral clones is an essential step for reverse-genetics studies of viral gene function and cross-protection. In this study, a sequence- and ligation-independent cloning system, the In-Fusion(®) Cloning Kit (Clontech, Mountain View, CA, USA), was used to construct intron-less or intron-containing full-length cDNA clones of the isolate PLDMV-DF, with the simultaneous scarless assembly of multiple viral and intron fragments into a plasmid vector in a single reaction. The intron-containing full-length cDNA clone of PLDMV-DF was stably propagated in Escherichia coli. In vitro intron-containing transcripts were processed and spliced into biologically active intron-less transcripts following mechanical inoculation and then initiated systemic infections in Carica papaya L. seedlings, which developed similar symptoms to those caused by the wild-type virus. However, no infectivity was detected when the plants were inoculated with RNA transcripts from the intron-less construct because the instability of the viral cDNA clone in bacterial cells caused a non-sense or deletion mutation of the genomic sequence of PLDMV-DF. To our knowledge, this is the first report of the construction of an infectious full-length cDNA clone of PLDMV and the splicing of intron-containing transcripts following mechanical inoculation. In-Fusion cloning shortens the construction time from months to days. Therefore, it is a faster, more flexible, and more efficient method than the traditional multistep restriction enzyme-mediated subcloning procedure.
Identification of a new hepatitis B virus recombinant D2/D3 in the city of São Paulo, Brazil.
Santana, Luiz Claudio; Mantovani, Nathalia Pena; Ferreira, Maira Cicero; Arnold, Rafael; Duro, Rodrigo Lopes Sanz; Ferreira, Paulo Roberto Abrão; Hunter, James Richard; Leal, Élcio; Diaz, Ricardo Sobhie; Komninakis, Shirley Vasconcelos
2017-02-01
Two hundred forty million people are chronically infected with hepatitis B virus (HBV) worldwide. The rise of globalization has facilitated the emergence of novel HBV recombinants and genotypes. We evaluated HBV genotypes and recombinants, mutations associated with resistance to antivirals (AVs), progression of hepatic illness, and inefficient hepatitis B vaccination responses in chronically infected individuals in the city of São Paulo, Brazil. Forty-five full-length and 24 partial-length sequences were obtained. The genotype distribution was as follows: A (66.7%), D (15.9%), F (11.6%) and C (4.3%). We describe a new recombinant (D2/D3), confirmed through next-generation sequencing (NGS) and reconstruction of the quasispecies sequences in silico. Primary resistance and major vaccine escape mutations were not found. We did, however, find mutations in the S region that might may be related to HBV antigenicity changes, as well as Pre-S deletions. The precore/core mutations A1762T + G1764A (40.9%) were found mostly in genotypes A and D, and G1896A (29.55%) was more frequent in genotype D than in genotype A. The genotypic distribution reflects the history of Brazilian immigration. This is the first description of recombination between genotypes D2 and D3 in Brazil. It is also the first confirmation through NGS and reconstruction of the quasispecies in silico. However, little is known about the response to treatment of recombinants. This demonstrates the need for molecular epidemiology studies involving the analysis of full-length HBV sequences.
USDA-ARS?s Scientific Manuscript database
Two different alleles of an ethylene receptor gene (CaETR-1) of chickpea (Cicer aritinum) were isolated and characterized through synteny analysis with genome sequences of Medicago truncatula. The full length of CaETR-1 in cultivar FLIP84-92C (CaETR-1a) is 4,428 bp including the polyadenylation sig...
USDA-ARS?s Scientific Manuscript database
The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...
Liu, Wei-long; Yang, Gui-lin; Wei, Qing; Zhang, Ming-xia; Chen, Xin-chun; Liu, Ying-xia; Gao, Yang; Zhou, Bo-ping
2011-02-01
To investigate the characteristics of molecular epidemiology and molecular evolution of 5 EV 71 (enterovirus 71, EV71) strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV 71 infection. 5 EV 71 strains were isolated, and sequenced to analyzed the full length gene sequences in order to compare nucleotide and amino acid homology with other EV71 strains from other regions and countries as well as previous strains across the world through bioinformatics software. 5 strains of EV 71 belonged to sub-genotype C4 by analysis of nucleotide sequences of VP1 and VP4 of EV 71. The differences of nucleotide and amino acid sequences were much small with nucleotide homology of 93% and amino acid homology of 98% among these 5 strains. A phylogenetic tree analysis indicated that 2008 Shenzhen epidemic strains were the most close to 2004 Shenzhen circulating strains, and also much close to 1998 Shenzhen epidemic strains and 2008 Fuyang Anhui strains. The dead strain was very close to 2008 Fuyang Anhui epidemic strains. It can be speculated that this epidemic strains of EV 71 probably originate from the same ancient strain in the history, may from 1998 Shenzhen strain.
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).
Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E
2005-12-02
cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Markussen, Turhan; Jonassen, Christine Monceyron; Numanovic, Sanela
2008-05-10
Infectious salmon anaemia virus (ISAV) is an orthomyxovirus causing a multisystemic, emerging disease in Atlantic salmon. Here we present, for the first time, detailed sequence analyses of the full-genome sequence of a presumed avirulent isolate displaying a full-length hemagglutinin-esterase (HE) gene (HPR0), and compare this with full-genome sequences of 11 Norwegian ISAV isolates from clinically diseased fish. These analyses revealed the presence of a virulence marker right upstream of the putative cleavage site R{sub 267} in the fusion (F) protein, suggesting a Q{sub 266} {yields} L{sub 266} substitution to be a prerequisite for virulence. To gain virulence in isolates lackingmore » this substitution, a sequence insertion near the cleavage site seems to be required. This strongly suggests the involvement of a protease recognition pattern at the cleavage site of the fusion protein as a determinant of virulence, as seen in highly pathogenic influenza A virus H5 or H7 and the paramyxovirus Newcastle disease virus.« less
Loperfido, Mariana; Jarmin, Susan; Dastidar, Sumitava; Di Matteo, Mario; Perini, Ilaria; Moore, Marc; Nair, Nisha; Samara-Kuko, Ermira; Athanasopoulos, Takis; Tedesco, Francesco Saverio; Dickson, George; Sampaolesi, Maurilio; VandenDriessche, Thierry; Chuah, Marinee K.
2016-01-01
Duchenne muscular dystrophy (DMD) is a genetic neuromuscular disorder caused by the absence of dystrophin. We developed a novel gene therapy approach based on the use of the piggyBac (PB) transposon system to deliver the coding DNA sequence (CDS) of either full-length human dystrophin (DYS: 11.1 kb) or truncated microdystrophins (MD1: 3.6 kb; MD2: 4 kb). PB transposons encoding microdystrophins were transfected in C2C12 myoblasts, yielding 65±2% MD1 and 66±2% MD2 expression in differentiated multinucleated myotubes. A hyperactive PB (hyPB) transposase was then deployed to enable transposition of the large-size PB transposon (17 kb) encoding the full-length DYS and green fluorescence protein (GFP). Stable GFP expression attaining 78±3% could be achieved in the C2C12 myoblasts that had undergone transposition. Western blot analysis demonstrated expression of the full-length human DYS protein in myotubes. Subsequently, dystrophic mesoangioblasts from a Golden Retriever muscular dystrophy dog were transfected with the large-size PB transposon resulting in 50±5% GFP-expressing cells after stable transposition. This was consistent with correction of the differentiated dystrophic mesoangioblasts following expression of full-length human DYS. These results pave the way toward a novel non-viral gene therapy approach for DMD using PB transposons underscoring their potential to deliver large therapeutic genes. PMID:26682797
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deymier, Martin J., E-mail: mdeymie@emory.edu; Claiborne, Daniel T., E-mail: dclaibo@emory.edu; Ende, Zachary, E-mail: zende@emory.edu
The high genetic diversity of HIV-1 impedes high throughput, large-scale sequencing and full-length genome cloning by common restriction enzyme based methods. Applying novel methods that employ a high-fidelity polymerase for amplification and an unbiased fusion-based cloning strategy, we have generated several HIV-1 full-length genome infectious molecular clones from an epidemiologically linked transmission pair. These clones represent the transmitted/founder virus and phylogenetically diverse non-transmitted variants from the chronically infected individual's diverse quasispecies near the time of transmission. We demonstrate that, using this approach, PCR-induced mutations in full-length clones derived from their cognate single genome amplicons are rare. Furthermore, all eight non-transmittedmore » genomes tested produced functional virus with a range of infectivities, belying the previous assumption that a majority of circulating viruses in chronic HIV-1 infection are defective. Thus, these methods provide important tools to update protocols in molecular biology that can be universally applied to the study of human viral pathogens. - Highlights: • Our novel methodology demonstrates accurate amplification and cloning of full-length HIV-1 genomes. • A majority of plasma derived HIV variants from a chronically infected individual are infectious. • The transmitted/founder was more infectious than the majority of the variants from the chronically infected donor.« less
Muangkram, Yuttamol; Wajjwalku, Worawidh; Kaolim, Nongnid; Buddhakosai, Waradee; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Dongsaard, Khwanruean; Maikaew, Umaporn; Sanannu, Saowaphang
2016-01-01
Asian tapir (Tapirus indicus) is categorized as Endangered on the 2008 IUCN red list. The first full-length mitochondrial DNA (mtDNA) sequence of Asian tapir is 16,717 bp in length. Base composition shows 34.6% A, 27.2% T, 25.8% C and 12.3% G. Highest polymorphic site is on the control region as typical for many species.
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-01-01
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-05-24
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
NASA Astrophysics Data System (ADS)
Qi, Fei; Guo, Huarong; Wang, Jian
2008-02-01
Reversible protein phosphorylation, catalyzed by protein kinases and phosphatases, is an important and versatile mechanism by which eukaryotic cells regulate almost all the signaling processes. Protein phosphatase 1 (PP1) is the first and well-characterized member of the protein serine/threonine phosphatase family. In the present study, a full-length cDNA encoding the beta isoform of the catalytic subunit of protein phosphatase 1(PP1cb), was for the first time isolated and sequenced from the skin tissue of flatfish turbot Scophthalmus maximus, designated SmPP1cb, by the rapid amplification of cDNA ends (RACE) technique. The cDNA sequence of SmPP1cb we obtained contains a 984 bp open reading frame (ORF), flanked by a complete 39 bp 5' untranslated region and 462 bp 3' untranslated region. The ORF encodes a putative 327 amino acid protein, and the N-terminal section of this protein is highly acidic, Met-Ala-Glu-Gly-Glu-Leu-Asp-Val-Asp, a common feature for PP1 catalytic subunit but absent in protein phosphatase 2B (PP2B). And its calculated molecular mass is 37 193 Da and pI 5.8. Sequence analysis indicated that, SmPP1cb is extremely conserved in both amino acid and nucleotide acid levels compared with the PP1cb of other vertebrates and invertebrates, and its Kozak motif contained in the 5'UTR around ATG start codon is GXXAXXGXX ATGG, which is different from mammalian in two positions A-6 and G-3, indicating the possibility of different initiation of translation in turbot, and also the 3'UTR of SmPP1cb is highly diverse in the sequence similarity and length compared with other animals, especially zebrafish. The cloning and sequencing of SmPP1cb gene lays a good foundation for the future work on the biological functions of PP1 in the flatfish turbot.
Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed
2014-01-01
Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the evolutionary history of this medically important virus.
Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming
2017-06-01
HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.
Molecular characterization of an ependymin precursor from goldfish brain.
Königstorfer, A; Sterrer, S; Eckerskorn, C; Lottspeich, F; Schmidt, R; Hoffmann, W
1989-01-01
Ependymins are thought to be implicated in fundamental processes involved in plasticity of the goldfish CNS. Gas-phase sequencing of purified ependymins beta and gamma revealed that they share the same N-terminal sequence. Each sequence displays microheterogeneities at several positions. Based on the protein sequences obtained, we constructed synthetic oligonucleotides and used them as hybridization probes for screening cDNA libraries of goldfish brain. In this article we describe the full-length sequence of a mRNA encoding a precursor of ependymins. A cleavable signal sequence characteristic of secretory proteins is located at the N-terminal end, followed directly by the ependymin sequence. Also, two potential N-glycosylation sites were detected. A computer search revealed that ependymins form a novel family of unique proteins.
Accurate typing of short tandem repeats from genome-wide sequencing data and its applications.
Fungtammasan, Arkarachai; Ananda, Guruprasad; Hile, Suzanne E; Su, Marcia Shu-Wei; Sun, Chen; Harris, Robert; Medvedev, Paul; Eckert, Kristin; Makova, Kateryna D
2015-05-01
Short tandem repeats (STRs) are implicated in dozens of human genetic diseases and contribute significantly to genome variation and instability. Yet profiling STRs from short-read sequencing data is challenging because of their high sequencing error rates. Here, we developed STR-FM, short tandem repeat profiling using flank-based mapping, a computational pipeline that can detect the full spectrum of STR alleles from short-read data, can adapt to emerging read-mapping algorithms, and can be applied to heterogeneous genetic samples (e.g., tumors, viruses, and genomes of organelles). We used STR-FM to study STR error rates and patterns in publicly available human and in-house generated ultradeep plasmid sequencing data sets. We discovered that STRs sequenced with a PCR-free protocol have up to ninefold fewer errors than those sequenced with a PCR-containing protocol. We constructed an error correction model for genotyping STRs that can distinguish heterozygous alleles containing STRs with consecutive repeat numbers. Applying our model and pipeline to Illumina sequencing data with 100-bp reads, we could confidently genotype several disease-related long trinucleotide STRs. Utilizing this pipeline, for the first time we determined the genome-wide STR germline mutation rate from a deeply sequenced human pedigree. Additionally, we built a tool that recommends minimal sequencing depth for accurate STR genotyping, depending on repeat length and sequencing read length. The required read depth increases with STR length and is lower for a PCR-free protocol. This suite of tools addresses the pressing challenges surrounding STR genotyping, and thus is of wide interest to researchers investigating disease-related STRs and STR evolution. © 2015 Fungtammasan et al.; Published by Cold Spring Harbor Laboratory Press.
Lion (Panthera leo) and cheetah (Acinonyx jubatus) IFN-gamma sequences.
Maas, Miriam; Van Rhijn, Ildiko; Allsopp, Maria T E P; Rutten, Victor P M G
2010-04-15
Cloning and sequencing of the full length lion and cheetah interferon-gamma (IFN-gamma) transcript will enable the expression of the recombinant cytokine, to be used for production of monoclonal antibodies and to set up lion and cheetah-specific IFN-gamma ELISAs. These are relevant in blood-based diagnosis of bovine tuberculosis, an important threat to lions in the Kruger National Park. Alignment of nucleotide and amino acid sequences of lion and cheetah and that of domestic cats showed homologies of 97-100%. Copyright 2009 Elsevier B.V. All rights reserved.
Classification of Cowpox Viruses into Several Distinct Clades and Identification of a Novel Lineage
Franke, Annika; Pfaff, Florian; Jenckel, Maria; Hoffmann, Bernd; Höper, Dirk; Antwerpen, Markus; Meyer, Hermann; Beer, Martin; Hoffmann, Donata
2017-01-01
Cowpox virus (CPXV) was considered as uniform species within the genus Orthopoxvirus (OPV). Previous phylogenetic analysis indicated that CPXV is polyphyletic and isolates may cluster into different clades with two of these clades showing genetic similarities to either variola (VARV) or vaccinia viruses (VACV). Further analyses were initiated to assess both the genetic diversity and the evolutionary background of circulating CPXVs. Here we report the full-length sequences of 20 CPXV strains isolated from different animal species and humans in Germany. A phylogenetic analysis of altogether 83 full-length OPV genomes confirmed the polyphyletic character of the species CPXV and suggested at least four different clades. The German isolates from this study mainly clustered into two CPXV-like clades, and VARV- and VACV-like strains were not observed. A single strain, isolated from a cotton-top tamarin, clustered distantly from all other CPXVs and might represent a novel and unique evolutionary lineage. The classification of CPXV strains into clades roughly followed their geographic origin, with the highest clade diversity so far observed for Germany. Furthermore, we found evidence for recombination between OPV clades without significant disruption of the observed clustering. In conclusion, this analysis markedly expands the number of available CPXV full-length sequences and confirms the co-circulation of several CPXV clades in Germany, and provides the first data about a new evolutionary CPXV lineage. PMID:28604604
Arthur, A K; Höss, A; Fanning, E
1988-01-01
The genomic coding sequence of the large T antigen of simian virus 40 (SV40) was cloned into an Escherichia coli expression vector by joining new restriction sites, BglII and BamHI, introduced at the intron boundaries of the gene. Full-length large T antigen, as well as deletion and amino acid substitution mutants, were inducibly expressed from the lac promoter of pUC9, albeit with different efficiencies and protein stabilities. Specific interaction with SV40 origin DNA was detected for full-length T antigen and certain mutants. Deletion mutants lacking T-antigen residues 1 to 130 and 260 to 708 retained specific origin-binding activity, demonstrating that the region between residues 131 and 259 must carry the essential binding domain for DNA-binding sites I and II. A sequence between residues 302 and 320 homologous to a metal-binding "finger" motif is therefore not required for origin-specific binding. However, substitution of serine for either of two cysteine residues in this motif caused a dramatic decrease in origin DNA-binding activity. This region, as well as other regions of the full-length protein, may thus be involved in stabilizing the DNA-binding domain and altering its preference for binding to site I or site II DNA. Images PMID:2835505
Modahl, Cassandra M.; Mackessy, Stephen P.
2016-01-01
Envenomation of humans by snakes is a complex and continuously evolving medical emergency, and treatment is made that much more difficult by the diverse biochemical composition of many venoms. Venomous snakes and their venoms also provide models for the study of molecular evolutionary processes leading to adaptation and genotype-phenotype relationships. To compare venom complexity and protein sequences, venom gland transcriptomes are assembled, which usually requires the sacrifice of snakes for tissue. However, toxin transcripts are also present in venoms, offering the possibility of obtaining cDNA sequences directly from venom. This study provides evidence that unknown full-length venom protein transcripts can be obtained from the venoms of multiple species from all major venomous snake families. These unknown venom protein cDNAs are obtained by the use of primers designed from conserved signal peptide sequences within each venom protein superfamily. This technique was used to assemble a partial venom gland transcriptome for the Middle American Rattlesnake (Crotalus simus tzabcan) by amplifying sequences for phospholipases A2, serine proteases, C-lectins, and metalloproteinases from within venom. Phospholipase A2 sequences were also recovered from the venoms of several rattlesnakes and an elapid snake (Pseudechis porphyriacus), and three-finger toxin sequences were recovered from multiple rear-fanged snake species, demonstrating that the three major clades of advanced snakes (Elapidae, Viperidae, Colubridae) have stable mRNA present in their venoms. These cDNA sequences from venom were then used to explore potential activities derived from protein sequence similarities and evolutionary histories within these large multigene superfamilies. Venom-derived sequences can also be used to aid in characterizing venoms that lack proteomic profiles and identify sequence characteristics indicating specific envenomation profiles. This approach, requiring only venom, provides access to cDNA sequences in the absence of living specimens, even from commercial venom sources, to evaluate important regional differences in venom composition and to study snake venom protein evolution. PMID:27280639
A Hybrid Approach for the Automated Finishing of Bacterial Genomes
Robins, William P.; Chin, Chen-Shan; Webster, Dale; Paxinos, Ellen; Hsu, David; Ashby, Meredith; Wang, Susana; Peluso, Paul; Sebra, Robert; Sorenson, Jon; Bullard, James; Yen, Jackie; Valdovino, Marie; Mollova, Emilia; Luong, Khai; Lin, Steven; LaMay, Brianna; Joshi, Amruta; Rowe, Lori; Frace, Michael; Tarr, Cheryl L.; Turnsek, Maryann; Davis, Brigid M; Kasarskis, Andrew; Mekalanos, John J.; Waldor, Matthew K.; Schadt, Eric E.
2013-01-01
Dramatic improvements in DNA sequencing technology have revolutionized our ability to characterize most genomic diversity. However, accurate resolution of large structural events has remained challenging due to the comparatively shorter read lengths of second-generation technologies. Emerging third-generation sequencing technologies, which yield markedly increased read length on rapid time scales and for low cost, have the potential to address assembly limitations. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at > 99.9% accuracy. Complex regions with clinically significant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 reference we obtain 14 and 8 scaffolds greater than 1kb, respectively, correcting several errors in the underlying source data. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly. PMID:22750883
Characterization of Toll-like receptor 3 gene in large yellow croaker, Pseudosciaena crocea.
Huang, Xue-Na; Wang, Zhi-Yong; Yao, Cui-Luan
2011-07-01
Toll-like receptor 3 (TLR3) plays an important role in innate immune responses. In this report, the full-length cDNA sequence and genomic structure of Pseudosciaena crocea TLR3 (PcTLR3) were identified and characterized. The full-length cDNA of PcTLR3 was of 3384 bp, including a 5'-terminal untranslated region (UTR) of 65 bp, a 3'-terminal UTR of 589 bp and an open reading frame (ORF) of 2730 bp encoding a polypeptide of 909 amino acid residues. The full-length genome sequence of PcTLR3 was composed of 5721 nucleotides, including five exons and four introns. The putative PcTLR3 protein contained a signal peptide sequence, 16 leucine-rich repeat (LRR) motifs, a transmembrane region and a Toll/interleukin-1 receptor (TIR) domain. Quantitative real-time reverse transcription PCR analysis revealed a broad expression of PcTLR3 in most tissues, with the predominant expression in liver, then intestine, and the weakest expression in blood cells. The expression of PcTLR3 after injection with poly inosinic:cytidylic (I:C) and Vibrio parahemolyticus was tested in spleen, blood cells and liver. The results indicated that PcTLR3 transcripts could be induced in the three tissues by injection with poly I:C. The highest expression was in the blood cells with 43.5 times (at 6h) greater expression than in the control (p<0.05). In addition, after V. parahemolyticus challenge, a moderate up-regulation and down-regulation of PcTLR3 was found in blood cells and liver, respectively. Our results suggested that PcTLR3 might play an important role in fish's defense against both viral and bacterial infection. Copyright © 2011 Elsevier Ltd. All rights reserved.
Laassri, Majid; Dragunsky, Eugenia; Enterline, Joan; Eremeeva, Tatiana; Ivanova, Olga; Lottenbach, Kathleen; Belshe, Robert; Chumakov, Konstantin
2005-01-01
Sabin strains of poliovirus used in the manufacture of oral poliovirus vaccine (OPV) are prone to genetic variations that occur during growth in cell cultures and the organisms of vaccine recipients. Such derivative viruses often have increased neurovirulence and transmissibility, and in some cases they can reestablish chains of transmission in human populations. Monitoring for vaccine-derived polioviruses is an important part of the worldwide campaign to eradicate poliomyelitis. Analysis of vaccine-derived polioviruses requires, as a first step, their isolation in cell cultures, which takes significant time and may yield viral stocks that are not fully representative of the strains present in the original sample. Here we demonstrate that full-length viral cDNA can be PCR amplified directly from stool samples and immediately subjected to genomic analysis by oligonucleotide microarray hybridization and nucleotide sequencing. Most fecal samples from healthy children who received OPV were found to contain variants of Sabin vaccine viruses. Sequence changes in the 5′ untranslated region were common, as were changes in the VP1-coding region, including changes in a major antigenic site. Analysis of stool samples taken from cases of acute flaccid paralysis revealed the presence of mixtures of recombinant polioviruses, in addition to the emergence of new sequence variants. Avoiding the need for cell culture isolation dramatically shortened the time needed for identification and analysis of vaccine-derived polioviruses and could be useful for preliminary screening of clinical samples. The amplified full-length viral cDNA can be archived and used to recover live virus for further virological studies. PMID:15956413
Yebra, Gonzalo; Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R Bridget; Waters, Laura; Tong, C Y William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J
2018-01-01
The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters.
Ebola Virus Epidemiology and Evolution in Nigeria
2016-10-04
the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of 10 Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 2012...cases, and full-4 length Ebola virus (EBOV) genome sequences for 12 of the 20. The detailed contact data permits 5 nearly complete reconstruction of...two methods highlights the strengths of each, and the importance 16 of both contact tracing and genomic sequencing during an outbreak. 17 18
K.D. Jermstad; L.A. Sheppard; B.B. Kinloch; A. Delfino-Mix; E.S. Ersoz; K.V. Krutovsky; D.B Neale
2006-01-01
The nucleotide-binding-site and leucine-rich-repeat (NBSâLRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBSâLRR class resistance genes catalogued in GenBank. The polymerase chain reaction products...
Yamaguchi, S; Saito, T; Abe, H; Yamane, H; Murofushi, N; Kamiya, Y
1996-08-01
The first committed step in the formation of diterpenoids leading to gibberellin (GA) biosynthesis is the conversion of geranylgeranyl diphosphate (GGDP) to ent-kaurene. ent-Kaurene synthase A (KSA) catalyzes the conversion of GGDP to copalyl diphosphate (CDP), which is subsequently converted to ent-kaurene by ent-kaurene synthase B (KSB). A full-length KSB cDNA was isolated from developing cotyledons in immature seeds of pumpkin (Cucurbita maxima L.). Degenerate oligonucleotide primers were designed from the amino acid sequences obtained from the purified protein to amplify a cDNA fragment, which was used for library screening. The isolated full-length cDNA was expressed in Escherichia coli as a fusion protein, which demonstrated the KSB activity to cyclize [3H]CDP to [3H]ent-kaurene. The KSB transcript was most abundant in growing tissues, but was detected in every organ in pumpkin seedlings. The deduced amino acid sequence shares significant homology with other terpene cyclases, including the conserved DDXXD motif, a putative divalent metal ion-diphosphate complex binding site. A putative transit peptide sequence that may target the translated product into the plastids is present in the N-terminal region.
Molecular cloning of pepsinogens A and C from adult newt (Cynops pyrrhogaster) stomach.
Inokuchi, Tomofumi; Ikuzawa, Masayuki; Yamazaki, Shin; Watanabe, Yukari; Shiota, Koushiro; Katoh, Takuma; Kobayashi, Ken-Ichiro
2013-08-01
The full-length cDNAs of three pepsinogens (Pgs) were cloned from the stomach of newt, Cynops pyrrhogaster, and nucleotide sequences of the full-length cDNAs were determined. Molecular phylogenetic analysis showed that two Pgs, named PgC1 and PgC2, belong to the pepsinogen C group, and one Pg, named PgA, belongs to the pepsinogen A group. The sequences contain an open reading frame (ORF) encoding 385 amino acid residues for PgC1, 383 amino acid residues for PgC2 and 377 amino acid residues for PgA. In addition, all of the three amino acid sequences conserve some unique characteristics such as six cysteine residues and putative active site two aspartic acid residues. All of the pepsinogen mRNAs were detected in the stomach by RT-PCR but not in other organs. Although a slight difference at the time of the start of expression was seen among the three pepsinogen genes, all of them were expressed in the larval stage after hatching. This is the first report on cloning of pepsinogens from urodele stomach. Copyright © 2013 Elsevier Inc. All rights reserved.
Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa
2006-03-01
The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Molzan, Manuela; Ottmann, Christian
2013-03-01
Myeloid leukemia factor 1 (MLF1) is associated with the development of leukemic diseases such as acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). However, information on the physiological function of MLF1 is limited and mostly derived from studies identifying MLF1 interaction partners like CSN3, MLF1IP, MADM, Manp and the 14-3-3 proteins. The 14-3-3-binding site surrounding S34 is one of the only known functional features of the MLF1 sequence, along with one nuclear export sequence (NES) and two nuclear localization sequences (NLS). It was recently shown that the subcellular localization of mouse MLF1 is dependent on 14-3-3 proteins. Based on these findings, we investigated whether the subcellular localization of human MLF1 was also directly 14-3-3-dependent. Live cell imaging with GFP-fused human MLF1 was used to study the effects of mutations and deletions on its subcellular localization. Surprisingly, we found that the subcellular localization of full-length human MLF1 is 14-3-3-independent, and is probably regulated by other as-yet-unknown proteins.
Amplification and chromosomal dispersion of human endogenous retroviral sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steele, P.E.; Martin, M.A.; Rabson, A.B.
1986-09-01
Endogenous retroviral sequences have undergone amplification events involving both viral and flanking cellular sequences. The authors cloned members of an amplified family of full-length endogenous retroviral sequences. Genomic blotting, employing a flanking cellular DNA probe derived from a member of this family, revealed a similar array of reactive bands in both humans and chimpanzees, indicating that an amplification event involving retroviral and associated cellular DNA sequences occurred before the evolutionary separation of these two primates. Southern analyses of restricted somatic cell hybrid DNA preparations suggested that endogenous retroviral segments are widely dispersed in the human genome and that amplification andmore » dispersion events may be linked.« less
PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.
Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng
2018-05-01
The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.
Characterization of a New HIV-1 CRF01_AE/ CRF07_BC recombinant virus in Tianjin, China.
Zhou, Zhehua; Ma, Ping; Feng, Yi; Ou, Weidong; Qian, Jing; Gao, Liying; Zhang, Defa; Shao, Yiming; Wei, Min
2018-05-04
Human immunodeficiency virus (HIV) is notorious for its rapid evolving since its transmissions from money to human. Currently, HIV contains multiple subtypes, circulating recombinant forms (CRFs) and unique recombinant forms (URFs). Here, from an HIV-positive mother and her child in Tianjin, China, we identified a novel HIV-1 second-generation recombinant virus (TJ20170316 and TJ20170317) between CRF01_AE and CRF07_BC. Near full-length genomes were obtained from both samples, and they shared very close sequences, except some point mutations. Phylogenetic analyses of the near full-length genomes showed that they consist of CRF01_AE backbone and part CRF07_BC sequences. Recombinant Identification Program (RIP) and Simplot software identified four breakpoints in gag, pol, vif, tat genes in TJ20170316, totally different from other reported CRFs and URFs. The emergence of such URF in Tianjin, China, highlights the complexity of HIV-1 epidemic and more measures should be taken to prevent HIV transmissions.
Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P
2010-11-01
Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
Pervasive sequence patents cover the entire human genome.
Rosenfeld, Jeffrey A; Mason, Christopher E
2013-01-01
The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.
Dynamic Energy Landscapes of Riboswitches Help Interpret Conformational Rearrangements and Function
Quarta, Giulio; Sin, Ken; Schlick, Tamar
2012-01-01
Riboswitches are RNAs that modulate gene expression by ligand-induced conformational changes. However, the way in which sequence dictates alternative folding pathways of gene regulation remains unclear. In this study, we compute energy landscapes, which describe the accessible secondary structures for a range of sequence lengths, to analyze the transcriptional process as a given sequence elongates to full length. In line with experimental evidence, we find that most riboswitch landscapes can be characterized by three broad classes as a function of sequence length in terms of the distribution and barrier type of the conformational clusters: low-barrier landscape with an ensemble of different conformations in equilibrium before encountering a substrate; barrier-free landscape in which a direct, dominant “downhill” pathway to the minimum free energy structure is apparent; and a barrier-dominated landscape with two isolated conformational states, each associated with a different biological function. Sharing concepts with the “new view” of protein folding energy landscapes, we term the three sequence ranges above as the sensing, downhill folding, and functional windows, respectively. We find that these energy landscape patterns are conserved in various riboswitch classes, though the order of the windows may vary. In fact, the order of the three windows suggests either kinetic or thermodynamic control of ligand binding. These findings help understand riboswitch structure/function relationships and open new avenues to riboswitch design. PMID:22359488
Alvarado-Mora, Mónica Viviana; Santana, Rúbia Anita Ferraz; Sitnik, Roberta; Ferreira, Paulo Roberto Abrão; Mangueira, Cristovão Luís Pitangueira; Carrilho, Flair José; Pinho, João Renato Rebello
2011-06-01
The hepatitis B virus (HBV) is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.
2000-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers
Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.
2001-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Characterization of a novel ADAM protease expressed by Pneumocystis carinii.
Kennedy, Cassie C; Kottom, Theodore J; Limper, Andrew H
2009-08-01
Pneumocystis species are opportunistic fungal pathogens that cause severe pneumonia in immunocompromised hosts. Recent evidence has suggested that unidentified proteases are involved in Pneumocystis life cycle regulation. Proteolytically active ADAM (named for "a disintegrin and metalloprotease") family molecules have been identified in some fungal organisms, such as Aspergillus fumigatus and Schizosaccharomyces pombe, and some have been shown to participate in life cycle regulation. Accordingly, we sought to characterize ADAM-like molecules in the fungal opportunistic pathogen, Pneumocystis carinii (PcADAM). After an in silico search of the P. carinii genomic sequencing project identified a 329-bp partial sequence with homology to known ADAM proteins, the full-length PcADAM sequence was obtained by PCR extension cloning, yielding a final coding sequence of 1,650 bp. Sequence analysis detected the presence of a typical ADAM catalytic active site (HEXXHXXGXXHD). Expression of PcADAM over the Pneumocystis life cycle was analyzed by Northern blot. Southern and contour-clamped homogenous electronic field blot analysis demonstrated its presence in the P. carinii genome. Expression of PcADAM was observed to be increased in Pneumocystis cysts compared to trophic forms. The full-length gene was subsequently cloned and heterologously expressed in Saccharomyces cerevisiae. Purified PcADAMp protein was proteolytically active in casein zymography, requiring divalent zinc. Furthermore, native PcADAMp extracted directly from freshly isolated Pneumocystis organisms also exhibited protease activity. This is the first report of protease activity attributable to a specific, characterized protein in the clinically important opportunistic fungal pathogen Pneumocystis.
Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin
2017-12-01
Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.
Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth; Welch, Kenneth C; Timp, Winston
2018-03-01
Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism.
Identification of full-length dentin matrix protein 1 in dentin and bone.
Huang, Bingzhen; Maciejewska, Izabela; Sun, Yao; Peng, Tao; Qin, Disheng; Lu, Yongbo; Bonewald, Lynda; Butler, William T; Feng, Jian; Qin, Chunlin
2008-05-01
Dentin matrix protein 1 (DMP1) has been identified in the extracellular matrix (ECM) of dentin and bone as the processed NH(2)-terminal and COOH-terminal fragment. However, the full-length form of DMP1 has not been identified in these tissues. The focus of this investigation was to search for the intact full-length DMP1 in dentin and bone. We used two types of anti-DMP1 antibodies to identify DMP1: one type specifically recognizes the NH(2)-terminal region and the other type is only reactive to the COOH-terminal region of the DMP1 amino acid sequence. An approximately 105-kDa protein, extracted from the ECM of rat dentin and bone, was recognized by both types of antibodies; and the migration rate of this protein was identical to the recombinant mouse full-length DMP1 made in eukaryotic cells. We concluded that this approximately 105-kDa protein is the full-length form of DMP1, which is considerably less abundant than its processed fragments in the ECM of dentin and bone. We also detected the full-length form of DMP1 and its processed fragments in the extract of dental pulp/odontoblast complex dissected from rat teeth. In addition, immunofluorescence analysis showed that in MC3T3-E1 cells the NH(2)-terminal and COOH-terminal fragments of DMP1 are distributed differently. Our findings indicate that the majority of DMP1 must be cleaved within the cells that synthesize it and that minor amounts of uncleaved DMP1 molecules are secreted into the ECM of dentin and bone.
Wen, Yangming; Lan, Kaijian; Wang, Junjie; Yu, Jingyi; Qu, Yarong; Zhao, Wei; Zhang, Fuchun; Tan, Wanlong; Cao, Hong; Zhou, Chen
2013-06-01
To construct dengue virus-specific full-length fully human antibody libraries using mammalian cell surface display technique. Total RNA was extracted from peripheral blood mononuclear cells (PBMCs) from convalescent patients with dengue fever. The reservoirs of the light chain and heavy chain variable regions (LCκ and VH) of the antibody genes were amplified by RT-PCR and inserted into the vector pDGB-HC-TM separately to construct the light chain and heavy chain libraries. The library DNAs were transfected into CHO cells and the expression of full-length fully human antibodies on the surface of CHO cells was analyzed by flow cytometry. Using 1.2 µg of the total RNA isolated from the PBMCs as the template, the LCκ and VH were amplified and the full-length fully human antibody mammalian display libraries were constructed. The kappa light chain gene library had a size of 1.45×10(4) and the heavy chain gene library had a size of 1.8×10(5). Sequence analysis showed that 8 out of the 10 light chain clones and 7 out of the 10 heavy chain clones randomly picked up from the constructed libraries contained correct open reading frames. FACS analysis demonstrated that all the 15 clones with correct open reading frames expressed full-length antibodies, which could be detected on CHO cell surfaces. After co-transfection of the heavy chain and light chain gene libraries into CHO cells, the expression of full-length antibodies on CHO cell surfaces could be detected by FACS analysis with an expressible diversity of the antibody library reaching 1.46×10(9) [(1.45×10(4)×80%)×(1.8×10(5)×70%)]. Using 1.2 µg of total RNA as template, the LCκ and VH full-length fully human antibody libraries against dengue virus have been successfully constructed with an expressible diversity of 10(9).
Tao, Junjie; Feng, Chao; Ai, Bin; Kang, Ming
2016-01-01
Background and Aims Limestone karst areas possess high floral diversity and endemism. The genus Primulina, which contributes to the unique calcicole flora, has high species richness and exhibit specific soil-based habitat associations that are mainly distributed on calcareous karst soils. The adaptive molecular evolutionary mechanism of the genus to karst calcium-rich environments is still not well understood. The Ca2+-permeable channel TPC1 was used in this study to test whether its gene is involved in the local adaptation of Primulina to karst high-calcium soil environments. Methods Specific amplification and sequencing primers were designed and used to amplify the full-length coding sequences of TPC1 from cDNA of 76 Primulina species. The sequence alignment without recombination and the corresponding reconstructed phylogeny tree were used in molecular evolutionary analyses at the nucleic acid level and amino acid level, respectively. Finally, the identified sites under positive selection were labelled on the predicted secondary structure of TPC1. Key Results Seventy-six full-length coding sequences of Primulina TPC1 were obtained. The length of the sequences varied between 2220 and 2286 bp and the insertion/deletion was located at the 5′ end of the sequences. No signal of substitution saturation was detected in the sequences, while significant recombination breakpoints were detected. The molecular evolutionary analyses showed that TPC1 was dominated by purifying selection and the selective pressures were not significantly different among species lineages. However, significant signals of positive selection were detected at both TPC1 codon level and amino acid level, and five sites under positive selective pressure were identified by at least three different methods. Conclusions The Ca2+-permeable channel TPC1 may be involved in the local adaptation of Primulina to karst Ca2+-rich environments. Different species lineages suffered similar selective pressure associated with calcium in karst environments, and episodic diversifying selection at a few sites may play a major role in the molecular evolution of Primulina TPC1. PMID:27582362
Unit-length line-1 transcripts in human teratocarcinoma cells.
Skowronski, J; Fanning, T G; Singer, M F
1988-01-01
We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-1 (L1) RNA present in a human teratocarcinoma cell line, NTera2D1, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identified as the 5' terminus of the longest known primate genomic L1 elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic L1 elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2D1 cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic L1s, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human L1 family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian L1s to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class II retrotransposons. Images PMID:2454389
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies
2014-01-01
Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
Knutzon, D S; Lardizabal, K D; Nelsen, J S; Bleibaum, J L; Davies, H M; Metz, J G
1995-01-01
Immature coconut (Cocos nucifera) endosperm contains a 1-acyl-sn-glycerol-3-phosphate acyltransferase (LPAAT) activity that shows a preference for medium-chain-length fatty acyl-coenzyme A substrates (H.M. Davies, D.J. Hawkins, J.S. Nelsen [1995] Phytochemistry 39:989-996). Beginning with solubilized membrane preparations, we have used chromatographic separations to identify a polypeptide with an apparent molecular mass of 29 kD, whose presence in various column fractions correlates with the acyltransferase activity detected in those same fractions. Amino acid sequence data obtained from several peptides generated from this protein were used to isolate a full-length clone from a coconut endosperm cDNA library. Clone pCGN5503 contains a 1325-bp cDNA insert with an open reading frame encoding a 308-amino acid protein with a calculated molecular mass of 34.8 kD. Comparison of the deduced amino acid sequence of pCGN5503 to sequences in the data banks revealed significant homology to other putative LPAAT sequences. Expression of the coconut cDNA in Escherichia coli conferred upon those cells a novel LPAAT activity whose substrate activity profile matched that of the coconut enzyme. PMID:8552723
Loperfido, Mariana; Jarmin, Susan; Dastidar, Sumitava; Di Matteo, Mario; Perini, Ilaria; Moore, Marc; Nair, Nisha; Samara-Kuko, Ermira; Athanasopoulos, Takis; Tedesco, Francesco Saverio; Dickson, George; Sampaolesi, Maurilio; VandenDriessche, Thierry; Chuah, Marinee K
2016-01-29
Duchenne muscular dystrophy (DMD) is a genetic neuromuscular disorder caused by the absence of dystrophin. We developed a novel gene therapy approach based on the use of the piggyBac (PB) transposon system to deliver the coding DNA sequence (CDS) of either full-length human dystrophin (DYS: 11.1 kb) or truncated microdystrophins (MD1: 3.6 kb; MD2: 4 kb). PB transposons encoding microdystrophins were transfected in C2C12 myoblasts, yielding 65±2% MD1 and 66±2% MD2 expression in differentiated multinucleated myotubes. A hyperactive PB (hyPB) transposase was then deployed to enable transposition of the large-size PB transposon (17 kb) encoding the full-length DYS and green fluorescence protein (GFP). Stable GFP expression attaining 78±3% could be achieved in the C2C12 myoblasts that had undergone transposition. Western blot analysis demonstrated expression of the full-length human DYS protein in myotubes. Subsequently, dystrophic mesoangioblasts from a Golden Retriever muscular dystrophy dog were transfected with the large-size PB transposon resulting in 50±5% GFP-expressing cells after stable transposition. This was consistent with correction of the differentiated dystrophic mesoangioblasts following expression of full-length human DYS. These results pave the way toward a novel non-viral gene therapy approach for DMD using PB transposons underscoring their potential to deliver large therapeutic genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Khurana, Simran; Chakraborty, Sharmistha; Zhao, Xuan; Liu, Yu; Guan, Dongyin; Lam, Minh; Huang, Wei; Yang, Sichun; Kao, Hung-Ying
2012-01-01
α-Actinins (ACTNs) are a family of proteins cross-linking actin filaments that maintain cytoskeletal organization and cell motility. Recently, it has also become clear that ACTN4 can function in the nucleus. In this report, we found that ACTN4 (full length) and its spliced isoform ACTN4 (Iso) possess an unusual LXXLL nuclear receptor interacting motif. Both ACTN4 (full length) and ACTN4 (Iso) potentiate basal transcription activity and directly interact with estrogen receptor α, although ACTN4 (Iso) binds ERα more strongly. We have also found that both ACTN4 (full length) and ACTN4 (Iso) interact with the ligand-independent and the ligand-dependent activation domains of estrogen receptor α. Although ACTN4 (Iso) interacts efficiently with transcriptional co-activators such as p300/CBP-associated factor (PCAF) and steroid receptor co-activator 1 (SRC-1), the full length ACTN4 protein either does not or does so weakly. More importantly, the flanking sequences of the LXXLL motif are important not only for interacting with nuclear receptors but also for the association with co-activators. Taken together, we have identified a novel extended LXXLL motif that is critical for interactions with both receptors and co-activators. This motif functions more efficiently in a spliced isoform of ACTN4 than it does in the full-length protein. PMID:22908231
Rapid PCR Assays That Specifically Identify Anthrax and Anthrax Surrogate Chromosomal Signatures
2002-08-30
The genetic variation among a set of 175 full-length sspE DNA sequences obtained from representative members of the B. anthracis clade have been...examined. Thirty-six sspE genotypes and seventeen protein phylotypes were identified among the B. cereus, B. thuringiensis, B. anthracis and B. mycoides...the sspE DNA sequence data sets suggests that the B. anthracis dade is more phylogenetically complex than has been inferred by traditional taxonomic methods.
Cingulin Contains Globular and Coiled-Coil Domains and Interacts with Zo-1, Zo-2, Zo-3, and Myosin
Cordenonsi, Michelangelo; D'Atri, Fabio; Hammar, Eva; Parry, David A.D.; Kendrick-Jones, John; Shore, David; Citi, Sandra
1999-01-01
We characterized the sequence and protein interactions of cingulin, an M r 140–160-kD phosphoprotein localized on the cytoplasmic surface of epithelial tight junctions (TJ). The derived amino acid sequence of a full-length Xenopus laevis cingulin cDNA shows globular head (residues 1–439) and tail (1,326–1,368) domains and a central α-helical rod domain (440–1,325). Sequence analysis, electron microscopy, and pull-down assays indicate that the cingulin rod is responsible for the formation of coiled-coil parallel dimers, which can further aggregate through intermolecular interactions. Pull-down assays from epithelial, insect cell, and reticulocyte lysates show that an NH2-terminal fragment of cingulin (1–378) interacts in vitro with ZO-1 (K d ∼5 nM), ZO-2, ZO-3, myosin, and AF-6, but not with symplekin, and a COOH-terminal fragment (377–1,368) interacts with myosin and ZO-3. ZO-1 and ZO-2 immunoprecipitates contain cingulin, suggesting in vivo interactions. Full-length cingulin, but not NH2-terminal and COOH-terminal fragments, colocalizes with endogenous cingulin in transfected MDCK cells, indicating that sequences within both head and rod domains are required for TJ localization. We propose that cingulin is a functionally important component of TJ, linking the submembrane plaque domain of TJ to the actomyosin cytoskeleton. PMID:10613913
Huang, Xiao-Yan; Li, Ming-Li; Xu, Juan; Gao, Yue-Dong; Wang, Wen-Guang; Yin, An-Guo; Li, Xiao-Fei; Sun, Xiao-Mei; Xia, Xue-Shan; Dai, Jie-Jie
2013-04-01
While the tree shrew (Tupaia belangeri chinensis) is an excellent animal model for studying the mechanisms of human diseases, but few studies examine interleukin-2 (IL-2), an important immune factor in disease model evaluation. In this study, a 465 bp of the full-length IL-2 cDNA encoding sequence was cloned from the RNA of tree shrew spleen lymphocytes, which were then cultivated and stimulated with ConA (concanavalin). Clustal W 2.0 was used to compare and analyze the sequence and molecular characteristics, and establish the similarity of the overall structure of IL-2 between tree shrews and other mammals. The homology of the IL-2 nucleotide sequence between tree shrews and humans was 93%, and the amino acid homology was 80%. The phylogenetic tree results, derived through the Neighbour-Joining method using MEGA5.0, indicated a close genetic relationship between tree shrews, Homo sapiens, and Macaca mulatta. The three-dimensional structure analysis showed that the surface charges in most regions of tree shrew IL-2 were similar to between tree shrews and humans; however, the N-glycosylation sites and local structures were different, which may affect antibody binding. These results provide a fundamental basis for the future study of IL-2 monoclonal antibody in tree shrews, thereby improving their utility as a model.
Hu, Lan; Zhang, Yong; Hong, Mei; Zhu, Shuangli; Yan, Dongmei; Wang, Dongyan; Li, Xiaolei; Zhu, Zhen; Tsewang; Xu, Wenbo
2014-01-01
Enterovirus B81 (EV-B81) is a newly identified serotype within the species enterovirus B (EV-B). To date, only eight nucleotide sequences of EV-B81 have been published and only one full-length genome sequence (the prototype strain) has been made available in the GenBank database. Here, we report the full-length genome sequences of two EV-B81 strains isolated in the Tibet Autonomous Region of China during acute flaccid paralysis surveillance activities, and we also conducted an antibody seroprevalence study in two prefectures of Tibet. The sequence comparison and phylogenetic dendrogram analysis revealed high variability among the global EV-B81 strains and frequent intertypic recombination in the non-structural protein region of EV-B serotypes, suggesting high genetic diversity of EV-B81. However, low positive rates and low titers of neutralizing antibodies against EV-B81 were detected. Nearly 68% of children under the age of five had no neutralizing antibodies against EV-B81. Hence, the extent of transmission and the exposure of the population to this EV type are very limited. Although little is known about the biological and pathogenic properties of EV-B81 because of few research in this field owing to the limited number of isolates, our study provides basic information for further studies of EV-B81. PMID:25112835
Pelsy, F.; Merdinoglu, D.
2002-09-01
A chromosome-walking strategy was used to sequence and characterize retrotransposons in the grapevine genome. The reconstitution of a family of retroelements, named Tvv1, was achieved by six successive steps. These elements share a single, highly conserved open reading frame 4,153 nucleotides-long, putatively encoding the gag, pro, int, rt and rh proteins. Comparison of the Tvv1 open reading frame coding potential with those of drosophila copia and tobacco Tnt1, revealed that Tvv1 is closely related to Ty 1 copia-like retrotransposons. A highly variable untranslated leader region, upstream of the open reading frame, allowed us to differentiate Tvv1 variants, which represent a family of at least 28 copies, in varying sizes. This internal region is flanked by two long terminal repeats in direct orientation, sized between 149 and 157 bp. Among elements theoretically sized from 4,970 to 5,550 bp, we describe the full-length sequence of a reference element Tvv1-1, 5,343 nucleotides-long. The full-length sequence of Tvv1-1 compared to pea PDR1 shows a 53.3% identity. In addition, both elements contain long terminal repeats of nearly the same size in which the U5 region could be entirely absent. Therefore, we assume that Tvv1 and PDR1 could constitute a particular class of short LTRs retroelements.
Virtual Northern analysis of the human genome.
Hurowitz, Evan H; Drori, Iddo; Stodden, Victoria C; Donoho, David L; Brown, Patrick O
2007-05-23
We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.
Genome-wide analysis of esterase-like genes in the striped rice stem borer, Chilo suppressalis.
Wang, Baoju; Wang, Ying; Zhang, Yang; Han, Ping; Li, Fei; Han, Zhaojun
2015-06-01
The striped rice stem borer, Chilo suppressalis, a destructive pest of rice, has developed high levels of resistance to certain insecticides. Esterases are reported to be involved in insecticide resistance in several insects. Therefore, this study systematically analyzed esterase-like genes in C. suppressalis. Fifty-one esterase-like genes were identified in the draft genomic sequences of the species, and 20 cDNA sequences were derived which encoded full- or nearly full-length proteins. The putative esterase proteins derived from these full-length genes are overall highly diversified. However, key residues that are functionally important including the serine residue in the active site are conserved in 18 out of the 20 proteins. Phylogenetic analysis revealed that most of these genes have homologues in other lepidoptera insects. Genes CsuEst6, CsuEst10, CsuEst11, and CsuEst51 were induced by the insecticide triazophos, and genes CsuEst9, CsuEst11, CsuEst14, and CsuEst51 were induced by the insecticide chlorantraniliprole. Our results provide a foundation for future studies of insecticide resistance in C. suppressalis and for comparative research with esterase genes from other insect species.
Characterization and mapping of cDNA encoding aspartate aminotransferase in rice, Oryza sativa L.
Song, J; Yamamoto, K; Shomura, A; Yano, M; Minobe, Y; Sasaki, T
1996-10-31
Fifteen cDNA clones, putatively identified as encoding aspartate aminotransferase (AST, EC 2.6.1.1.), were isolated and partially sequenced. Together with six previously isolated clones putatively identified to encode ASTs (Sasaki, et al. 1994, Plant Journal 6, 615-624), their sequences were characterized and classified into 4 cDNA species. Two of the isolated clones, C60213 and C2079, were full-length cDNAs, and their complete nucleotide sequences were determined. C60213 was 1612 bp long and its deduced amino acid sequence showed 88% homology with that of Panicum miliaceum L. mitochondrial AST. The C60213-encoded protein had an N-terminal amino acid sequence that was characteristic of a mitochondrial transit peptide. On the other hand, C2079 was 1546 bp long and had 91% amino acid sequence homology with P. miliaceum L. cytosolic AST but lacked in the transit peptide sequence. The homologies of nucleotide sequences and deduced amino acid sequences of C2079 and C60213 were 54% and 52%, respectively. C2079 and C60213 were mapped on chromosomes 1 and 6, respectively, by restriction fragment length polymorphism linkage analysis. Northern blot analysis using C2079 as a probe revealed much higher transcript levels in callus and root than in green and etiolated shoots, suggesting tissue-specific variations of AST gene expression.
Cotesia vestalis parasitization suppresses expression of a Plutella xylostella thioredoxin
USDA-ARS?s Scientific Manuscript database
Thioredoxins (Trxs) are a family of small, highly conserved and ubiquitous proteins involved in protecting organisms against toxic reactive oxygen species (ROS). In this study, a typical thioredoxin gene, PxTrx, was isolated from Plutella xylostella. The full-length cDNA sequence is composed of 959 ...
Wenke, Torsten; Döbel, Thomas; Sörensen, Thomas Rosleff; Junghans, Holger; Weisshaar, Bernd; Schmidt, Thomas
2011-01-01
Short interspersed nuclear elements (SINEs) are non-long terminal repeat retrotransposons that are highly abundant, heterogeneous, and mostly not annotated in eukaryotic genomes. We developed a tool designated SINE-Finder for the targeted discovery of tRNA-derived SINEs. We analyzed sequence data of 16 plant genomes, including 13 angiosperms and three gymnosperms and identified 17,829 full-length and truncated SINEs falling into 31 families showing the widespread occurrence of SINEs in higher plants. The investigation focused on potato (Solanum tuberosum), resulting in the detection of seven different SolS SINE families consisting of 1489 full-length and 870 5′ truncated copies. Consensus sequences of full-length members range in size from 106 to 244 bp depending on the SINE family. SolS SINEs populated related species and evolved separately, which led to some distinct subfamilies. Solanaceae SINEs are dispersed along chromosomes and distributed without clustering but with preferred integration into short A-rich motifs. They emerged more than 23 million years ago and were species specifically amplified during the radiation of potato, tomato (Solanum lycopersicum), and tobacco (Nicotiana tabacum). We show that tobacco TS retrotransposons are composite SINEs consisting of the 3′ end of a long interspersed nuclear element integrated downstream of a nonhomologous SINE family followed by successfully colonization of the genome. We propose an evolutionary scenario for the formation of TS as a spontaneous event, which could be typical for the emergence of SINE families. PMID:21908723
Wenke, Torsten; Döbel, Thomas; Sörensen, Thomas Rosleff; Junghans, Holger; Weisshaar, Bernd; Schmidt, Thomas
2011-09-01
Short interspersed nuclear elements (SINEs) are non-long terminal repeat retrotransposons that are highly abundant, heterogeneous, and mostly not annotated in eukaryotic genomes. We developed a tool designated SINE-Finder for the targeted discovery of tRNA-derived SINEs. We analyzed sequence data of 16 plant genomes, including 13 angiosperms and three gymnosperms and identified 17,829 full-length and truncated SINEs falling into 31 families showing the widespread occurrence of SINEs in higher plants. The investigation focused on potato (Solanum tuberosum), resulting in the detection of seven different SolS SINE families consisting of 1489 full-length and 870 5' truncated copies. Consensus sequences of full-length members range in size from 106 to 244 bp depending on the SINE family. SolS SINEs populated related species and evolved separately, which led to some distinct subfamilies. Solanaceae SINEs are dispersed along chromosomes and distributed without clustering but with preferred integration into short A-rich motifs. They emerged more than 23 million years ago and were species specifically amplified during the radiation of potato, tomato (Solanum lycopersicum), and tobacco (Nicotiana tabacum). We show that tobacco TS retrotransposons are composite SINEs consisting of the 3' end of a long interspersed nuclear element integrated downstream of a nonhomologous SINE family followed by successfully colonization of the genome. We propose an evolutionary scenario for the formation of TS as a spontaneous event, which could be typical for the emergence of SINE families.
Replication of a chronic hepatitis B virus genotype F1b construct.
Hernández, Sergio; Jiménez, Gustavo; Alarcón, Valentina; Prieto, Cristian; Muñoz, Francisca; Riquelme, Constanza; Venegas, Mauricio; Brahm, Javier; Loyola, Alejandra; Villanueva, Rodrigo A
2016-03-01
Genotype F is one of the less-studied genotypes of human hepatitis B virus, although it is widely distributed in regions of Central and South American. Our previous studies have shown that HBV genotype F is prevalent in Chile, and phylogenetic analysis of its full-length sequence amplified from the sera of chronically infected patients identified it as HBV subgenotype F1b. We have previously reported the full-length sequence of a HBV molecular clone obtained from a patient chronically infected with genotype F1b. In this report, we established a system to study HBV replication based on hepatoma cell lines transfected with full-length monomers of the HBV genome. Culture supernatants were analyzed after transfection and found to contain both HBsAg and HBeAg viral antigens. Consistently, fractionated cell extracts revealed the presence of viral replication, with both cytoplasmic and nuclear DNA intermediates. Analysis of HBV-transfected cells by indirect immunofluorescence or immunoelectron microscopy revealed the expression of viral antigens and cytoplasmic viral particles, respectively. To test the functionality of the ongoing viral replication further at the level of chromatinized cccDNA, transfected cells were treated with a histone deacetylase inhibitor, and this resulted in increased viral replication. This correlated with changes posttranslational modifications of histones at viral promoters. Thus, the development of this viral replication system for HBV genotype F will facilitate studies on the regulation of viral replication and the identification of new antiviral drugs.
What can we learn about lyssavirus genomes using 454 sequencing?
Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin
2012-01-01
The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.
DeWitt, D L; Smith, W L
1988-01-01
Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong
2015-03-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG
2015-01-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
Gapped Spectral Dictionaries and Their Applications for Database Searches of Tandem Mass Spectra*
Jeong, Kyowon; Kim, Sangtae; Bandeira, Nuno; Pevzner, Pavel A.
2011-01-01
Generating all plausible de novo interpretations of a peptide tandem mass (MS/MS) spectrum (Spectral Dictionary) and quickly matching them against the database represent a recently emerged alternative approach to peptide identification. However, the sizes of the Spectral Dictionaries quickly grow with the peptide length making their generation impractical for long peptides. We introduce Gapped Spectral Dictionaries (all plausible de novo interpretations with gaps) that can be easily generated for any peptide length thus addressing the limitation of the Spectral Dictionary approach. We show that Gapped Spectral Dictionaries are small thus opening a possibility of using them to speed-up MS/MS searches. Our MS-GappedDictionary algorithm (based on Gapped Spectral Dictionaries) enables proteogenomics applications (such as searches in the six-frame translation of the human genome) that are prohibitively time consuming with existing approaches. MS-GappedDictionary generates gapped peptides that occupy a niche between accurate but short peptide sequence tags and long but inaccurate full length peptide reconstructions. We show that, contrary to conventional wisdom, some high-quality spectra do not have good peptide sequence tags and introduce gapped tags that have advantages over the conventional peptide sequence tags in MS/MS database searches. PMID:21444829
Automated sample-preparation technologies in genome sequencing projects.
Hilbert, H; Lauber, J; Lubenow, H; Düsterhöft, A
2000-01-01
A robotic workstation system (BioRobot 96OO, QIAGEN) and a 96-well UV spectrophotometer (Spectramax 250, Molecular Devices) were integrated in to the process of high-throughput automated sequencing of double-stranded plasmid DNA templates. An automated 96-well miniprep kit protocol (QIAprep Turbo, QIAGEN) provided high-quality plasmid DNA from shotgun clones. The DNA prepared by this procedure was used to generate more than two mega bases of final sequence data for two genomic projects (Arabidopsis thaliana and Schizosaccharomyces pombe), three thousand expressed sequence tags (ESTs) plus half a mega base of human full-length cDNA clones, and approximately 53,000 single reads for a whole genome shotgun project (Pseudomonas putida).
Mouillon, Jean-Marie; Gustafsson, Petter; Harryson, Pia
2006-01-01
Dehydrins constitute a class of intrinsically disordered proteins that are expressed under conditions of water-related stress. Characteristic of the dehydrins are some highly conserved stretches of seven to 17 residues that are repetitively scattered in their sequences, the K-, S-, Y-, and Lys-rich segments. In this study, we investigate the putative role of these segments in promoting structure. The analysis is based on comparative analysis of four full-length dehydrins from Arabidopsis (Arabidopsis thaliana; Cor47, Lti29, Lti30, and Rab18) and isolated peptide mimics of the K-, Y-, and Lys-rich segments. In physiological buffer, the circular dichroism spectra of the full-length dehydrins reveal overall disordered structures with a variable content of poly-Pro helices, a type of elongated secondary structure relying on bridging water molecules. Similar disordered structures are observed for the isolated peptides of the conserved segments. Interestingly, neither the full-length dehydrins nor their conserved segments are able to adopt specific structure in response to altered temperature, one of the factors that regulate their expression in vivo. There is also no structural response to the addition of metal ions, increased protein concentration, or the protein-stabilizing salt Na2SO4. Taken together, these observations indicate that the dehydrins are not in equilibrium with high-energy folded structures. The result suggests that the dehydrins are highly evolved proteins, selected to maintain high configurational flexibility and to resist unspecific collapse and aggregation. The role of the conserved segments is thus not to promote tertiary structure, but to exert their biological function more locally upon interaction with specific biological targets, for example, by acting as beads on a string for specific recognition, interaction with membranes, or intermolecular scaffolding. In this perspective, it is notable that the Lys-rich segment in Cor47 and Lti29 shows sequence similarity with the animal chaperone HSP90. PMID:16565295
Ma, Kaifeng; Sun, Lidan; Cheng, Tangren; Pan, Huitang; Wang, Jia; Zhang, Qixiang
2018-01-01
Increasing evidence shows that epigenetics plays an important role in phenotypic variance. However, little is known about epigenetic variation in the important ornamental tree Prunus mume. We used amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP) techniques, and association analysis and sequencing to investigate epigenetic variation and its relationships with genetic variance, environment factors, and traits. By performing leaf sampling, the relative total methylation level (29.80%) was detected in 96 accessions of P. mume. And the relative hemi-methylation level (15.77%) was higher than the relative full methylation level (14.03%). The epigenetic diversity (I∗ = 0.575, h∗ = 0.393) was higher than the genetic diversity (I = 0.484, h = 0.319). The cultivated population displayed greater epigenetic diversity than the wild populations in both southwest and southeast China. We found that epigenetic variance and genetic variance, and environmental factors performed cooperative structures, respectively. In particular, leaf length, width and area were positively correlated with relative full methylation level and total methylation level, indicating that the DNA methylation level played a role in trait variation. In total, 203 AFLP and 423 MSAP associated markers were detected and 68 of them were sequenced. Homologous analysis and functional prediction suggested that the candidate marker-linked genes were essential for leaf morphology development and metabolism, implying that these markers play critical roles in the establishment of leaf length, width, area, and ratio of length to width. PMID:29441078
Ma, Kaifeng; Sun, Lidan; Cheng, Tangren; Pan, Huitang; Wang, Jia; Zhang, Qixiang
2018-01-01
Increasing evidence shows that epigenetics plays an important role in phenotypic variance. However, little is known about epigenetic variation in the important ornamental tree Prunus mume . We used amplified fragment length polymorphism (AFLP) and methylation-sensitive amplified polymorphism (MSAP) techniques, and association analysis and sequencing to investigate epigenetic variation and its relationships with genetic variance, environment factors, and traits. By performing leaf sampling, the relative total methylation level (29.80%) was detected in 96 accessions of P . mume . And the relative hemi-methylation level (15.77%) was higher than the relative full methylation level (14.03%). The epigenetic diversity ( I ∗ = 0.575, h ∗ = 0.393) was higher than the genetic diversity ( I = 0.484, h = 0.319). The cultivated population displayed greater epigenetic diversity than the wild populations in both southwest and southeast China. We found that epigenetic variance and genetic variance, and environmental factors performed cooperative structures, respectively. In particular, leaf length, width and area were positively correlated with relative full methylation level and total methylation level, indicating that the DNA methylation level played a role in trait variation. In total, 203 AFLP and 423 MSAP associated markers were detected and 68 of them were sequenced. Homologous analysis and functional prediction suggested that the candidate marker-linked genes were essential for leaf morphology development and metabolism, implying that these markers play critical roles in the establishment of leaf length, width, area, and ratio of length to width.
LoRTE: Detecting transposon-induced genomic variants using low coverage PacBio long read sequences.
Disdero, Eric; Filée, Jonathan
2017-01-01
Population genomic analysis of transposable elements has greatly benefited from recent advances of sequencing technologies. However, the short size of the reads and the propensity of transposable elements to nest in highly repeated regions of genomes limits the efficiency of bioinformatic tools when Illumina or 454 technologies are used. Fortunately, long read sequencing technologies generating read length that may span the entire length of full transposons are now available. However, existing TE population genomic softwares were not designed to handle long reads and the development of new dedicated tools is needed. LoRTE is the first tool able to use PacBio long read sequences to identify transposon deletions and insertions between a reference genome and genomes of different strains or populations. Tested against simulated and genuine Drosophila melanogaster PacBio datasets, LoRTE appears to be a reliable and broadly applicable tool to study the dynamic and evolutionary impact of transposable elements using low coverage, long read sequences. LoRTE is an efficient and accurate tool to identify structural genomic variants caused by TE insertion or deletion. LoRTE is available for download at http://www.egce.cnrs-gif.fr/?p=6422.
Kornthong, Napamanee; Cummins, Scott F; Chotwiwatthanakun, Charoonroj; Khornchatri, Kanjana; Engsusophon, Attakorn; Hanna, Peter J; Sobhon, Prasert
2014-01-01
The central nervous system (CNS) is often intimately involved in reproduction control and is therefore a target organ for transcriptomic investigations to identify reproduction-associated genes. In this study, 454 transcriptome sequencing was performed on pooled brain and ventral nerve cord of the female mud crab (Scylla olivacea) following serotonin injection (5 µg/g BW). A total of 197,468 sequence reads was obtained with an average length of 828 bp. Approximately 38.7% of 2,183 isotigs matched with significant similarity (E value < 1e-4) to sequences within the Genbank non-redundant (nr) database, with most significant matches being to crustacean and insect sequences. Approximately 32 putative neuropeptide genes were identified from nonmatching blast sequences. In addition, we identified full-length transcripts for crustacean reproductive-related genes, namely farnesoic acid o-methyltransferase (FAMeT), estrogen sulfotransferase (ESULT) and prostaglandin F synthase (PGFS). Following serotonin injection, which would normally initiate reproductive processes, we found up-regulation of FAMeT, ESULT and PGFS expression in the female CNS and ovary. Our data here provides an invaluable new resource for understanding the molecular role of the CNS on reproduction in S. olivacea.
Prasad, B. C. Narasimha; Kumar, Vinod; Gururaj, H. B.; Parimalan, R.; Giridhar, P.; Ravishankar, G. A.
2006-01-01
Capsaicin is a unique alkaloid of the plant kingdom restricted to the genus Capsicum. Capsaicin is the pungency factor, a bioactive molecule of food and of medicinal importance. Capsaicin is useful as a counterirritant, antiarthritic, analgesic, antioxidant, and anticancer agent. Capsaicin biosynthesis involves condensation of vanillylamine and 8-methyl nonenoic acid, brought about by capsaicin synthase (CS). We found that CS activity correlated with genotype-specific capsaicin levels. We purified and characterized CS (≈35 kDa). Immunolocalization studies confirmed that CS is specifically localized to the placental tissues of Capsicum fruits. Western blot analysis revealed concomitant enhancement of CS levels and capsaicin accumulation during fruit development. We determined the N-terminal amino acid sequence of purified CS, cloned the CS gene (csy1) and sequenced full-length cDNA (981 bp). The deduced amino acid sequence of CS from full-length cDNA was 38 kDa. Functionality of csy1 through heterologous expression in recombinant Escherichia coli was also demonstrated. Here we report the gene responsible for capsaicin biosynthesis, which is unique to Capsicum spp. With this information on the CS gene, speculation on the gene for pungency is unequivocally resolved. Our findings have implications in the regulation of capsaicin levels in Capsicum genotypes. PMID:16938870
Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.
2011-01-01
Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074
Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth
2018-01-01
Abstract Background Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. Findings We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. Conclusions We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism. PMID:29618047
Identification and genomic characterization of a novel porcine parvovirus (PPV6) in China.
Ni, Jianqiang; Qiao, Caixia; Han, Xue; Han, Tao; Kang, Wenhua; Zi, Zhanchao; Cao, Zhen; Zhai, Xinyan; Cai, Xuepeng
2014-12-02
Parvoviruses are classified into two subfamilies based on their host range: the Parvovirinae, which infect vertebrates, and the Densovirinae, which mainly infect insects and other arthropods. In recent years, a number of novel parvoviruses belonging to the subfamily Parvovirinae have been identified from various animal species and humans, including human parvovirus 4 (PARV4), porcine hokovirus, ovine partetravirus, porcine parvovirus 4 (PPV4), and porcine parvovirus 5 (PPV5). Using sequence-independent single primer amplification (SISPA), a novel parvovirus within the subfamily Parvovirinae that was distinct from any known parvoviruses was identified and five full-length genome sequences were determined and analyzed. A novel porcine parvovirus, provisionally named PPV6, was initially identified from aborted pig fetuses in China. Retrospective studies revealed the prevalence of PPV6 in aborted pig fetuses and piglets(50% and 75%, respectively) was apparently higher than that in finishing pigs and sows (15.6% and 3.8% respectively). Furthermore, the prevalence of PPV6 in finishing pig was similar in affected and unaffected farms (i.e. 16.7% vs. 13.6%-21.7%). This finding indicates that animal age, perhaps due to increased innate immune resistance, strongly influences the level of PPV6 viremia. Complete genome sequencing and multiple alignments have shown that the nearly full-length genome sequences were approximately 6,100 nucleotides in length and shared 20.5%-42.6% DNA sequence identity with other members of the Parvovirinae subfamily. Phylogenetic analysis showed that PPV6 was significantly distinct from other known parvoviruses and was most closely related to PPV4. Our findings and review of published parvovirus sequences suggested that a novel porcine parvovirus is currently circulating in China and might be classified into the novel genus Copiparvovirus within the subfamily Parvovirinae. However, the clinical manifestations of PPV6 are still unknown in that the prevalence of PPV6 was similar between healthy pigs and sick pigs in a retrospective epidemiological study. The identification of PPV6 within the subfamily Parvovirinae provides further insight into the viral and genetic diversity of parvoviruses.
Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.
2018-01-01
Background & methods The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. Results The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. Conclusions The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters. PMID:29389981
Identification of a nuclear localization sequence in the polyomavirus capsid protein VP2
NASA Technical Reports Server (NTRS)
Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)
1992-01-01
A nuclear localization signal (NLS) has been identified in the C-terminal (Glu307-Glu-Asp-Gly-Pro-Gln-Lys-Lys-Lys-Arg-Arg-Leu318) amino acid sequence of the polyomavirus minor capsid protein VP2. The importance of this amino acid sequence for nuclear transport of newly synthesized VP2 was demonstrated by a genetic "subtractive" study using the constructs pSG5VP2 (expressing full-length VP2) and pSG5 delta 3VP2 (expressing truncated VP2, lacking amino acids Glu307-Leu318). These constructs were transfected into COS-7 cells, and the intracellular localization of the VP2 protein was determined by indirect immunofluorescence. These studies revealed that the full-length VP2 was localized in the nucleus, while the truncated VP2 protein was localized in the cytoplasm and not transported to the nucleus. A biochemical "additive" approach was also used to determine whether this sequence could target nonnuclear proteins to the nucleus. A synthetic peptide identical to VP2 amino acids Glu307-Leu318 was cross-linked to the nonnuclear proteins bovine serum albumin (BSA) or immunoglobulin G (IgG). The conjugates were then labeled with fluorescein isothiocyanate and microinjected into the cytoplasm of NIH 3T6 cells. Both conjugates localized in the nucleus of the microinjected cells, whereas unconjugated BSA and IgG remained in the cytoplasm. Taken together, these genetic subtractive and biochemical additive approaches have identified the C-terminal sequence of polyoma-virus VP2 (containing amino acids Glu307-Leu318) as the NLS of this protein.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharma, Manisha; Jamieson, Cara; Lui, Christina
β-catenin is a key mediator of Wnt signaling and its deregulated nuclear accumulation can drive cancer progression. While the central armadillo (Arm) repeats of β-catenin stimulate nuclear entry, the N- and C-terminal “tail” sequences are thought to regulate turnover and transactivation. We show here that the N- and C-tails are also potent transport sequences. The unstructured tails of β-catenin, when individually fused to a GFP-reporter, could enter and exit the nucleus rapidly in live cells. Proximity ligation assays and pull-down assays identified a weak interaction between the tail sequences and the FG-repeats of nucleoporins, consistent with a possible direct translocationmore » of β-catenin through the nuclear pore complex. Extensive alanine mutagenesis of the tail sequences revealed that nuclear translocation of β-catenin was dependent on specific uniformly distributed patches of hydrophobic residues, whereas the mutagenesis of acidic amino acids had no effect. Moreover, the mutation of hydrophobic patches within the N-tail and C-tail of full length β-catenin reduced nuclear transport rate and diminished its ability to activate transcription. We propose that the tail sequences can contribute to β-catenin transport and suggest a possible similar role for hydrophobic unstructured regions in other proteins. - Highlights: • We show that the N- and C-tails of beta-catenin possess nuclear transport activity. • Nuclear transport of the N- or C-tails requires specific hydrophobic amino acids. • Mutagenesis of the N-terminus diminished nuclear entry of full-length beta-catenin. • We propose the N-tail contributes to beta-catenin nuclear entry and transactivation.« less
Zhou, Chengran
2017-01-01
Abstract Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)–based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. PMID:29077841
Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro
2015-11-18
RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule
2017-06-26
IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino changes). The functionality is generic and can analyse any IG or TR single chain nucleotide sequence containing two V domains, provided that the corresponding species IMGT reference directory is available. The "Analysis of single chain Fragment variable (scFv)" implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST provides the identification and full characterization of the two V domains of full-length scFv (~850 bp) nucleotide sequences from combinatorial libraries. The analysis can also be performed on concatenated paired chains of expressed antigen receptor IG or TR repertoires.
Candidate Genes Expressed in Tolerant Common Wheat With Resistant to English Grain Aphid.
Luo, Kun; Zhang, Gaisheng; Wang, Chunping; Ouellet, Thérèse; Wu, Jingjing; Zhu, Qidi; Zhao, Huiyan
2014-10-01
The English grain aphid, Sitobion avenae (F.) (Hemiptera: Aphididae), is a common worldwide pest of wheat (Triticum aestivum L.). The use of improved resistant cultivars by the farmers is the most effective and environmentally friendly method to control this aphid in the field. The winter wheat genotypes 98-10-35 and Amigo are resistant to S. avenae. To identify genes responsible for resistance to S. avenae in these genotypes, differential-display reverse transcription-polymerase chain reaction was used to identify the corresponding differentially expressed sequences in current study. Two backcross progenies were obtained by crossing the two resistant genotypes with the susceptible genotype 1376. Six potential expected-differential bands were sequenced. Lengths of the expressed sequence tags ranged from 128 to 532 bp. Although these expressed sequences were likely associated with S. avenae resistance, there was one expressed sequence tag located on 7DL chromosome, and its potential function may associate with the ability to maintain photosynthesis in wheat. That serves as an active way for tolerant common wheat with resistant to S. avenae. Cloning the full length of these sequences would help us thoroughly understand the mechanism of wheat resistance to S. avenae and be valuable for breeding cultivars with S. avenae resistance. © 2014 Entomological Society of America.
Virtual Northern Analysis of the Human Genome
Hurowitz, Evan H.; Drori, Iddo; Stodden, Victoria C.; Donoho, David L.; Brown, Patrick O.
2007-01-01
Background We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. Methodology/Principal Findings We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Conclusions/Significance Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes. PMID:17520019
Hingston, Patricia; Chen, Jessica; Dhillon, Bhavjinder K.; Laing, Chad; Bertelli, Claire; Gannon, Victor; Tasara, Taurai; Allen, Kevin; Brinkman, Fiona S. L.; Truelstrup Hansen, Lisbeth; Wang, Siyun
2017-01-01
The human pathogen Listeria monocytogenes is a large concern in the food industry where its continuous detection in food products has caused a string of recalls in North America and Europe. Most recognized for its ability to grow in foods during refrigerated storage, L. monocytogenes can also tolerate several other food-related stresses with some strains possessing higher levels of tolerances than others. The objective of this study was to use a combination of phenotypic analyses and whole genome sequencing to elucidate potential relationships between L. monocytogenes genotypes and food-related stress tolerance phenotypes. To accomplish this, 166 L. monocytogenes isolates were sequenced and evaluated for their ability to grow in cold (4°C), salt (6% NaCl, 25°C), and acid (pH 5, 25°C) stress conditions as well as survive desiccation (33% RH, 20°C). The results revealed that the stress tolerance of L. monocytogenes is associated with serotype, clonal complex (CC), full length inlA profiles, and the presence of a plasmid which was identified in 55% of isolates. Isolates with full length inlA exhibited significantly (p < 0.001) enhanced cold tolerance relative to those harboring a premature stop codon (PMSC) in this gene. Similarly, isolates possessing a plasmid demonstrated significantly (p = 0.013) enhanced acid tolerance. We also identified nine new L. monocytogenes sequence types, a new inlA PMSC, and several connections between CCs and the presence/absence or variations of specific genetic elements. A whole genome single-nucleotide-variants phylogeny revealed sporadic distribution of tolerant isolates and closely related sensitive and tolerant isolates, highlighting that minor genetic differences can influence the stress tolerance of L. monocytogenes. Specifically, a number of cold and desiccation sensitive isolates contained PMSCs in σB regulator genes (rsbS, rsbU, rsbV). Collectively, the results suggest that knowing the sequence type of an isolate in addition to screening for the presence of full-length inlA and a plasmid, could help food processors and food agency investigators determine why certain isolates might be persisting in a food processing environment. Additionally, increased sequencing of L. monocytogenes isolates in combination with stress tolerance profiling, will enhance the ability to identify genetic elements associated with higher risk strains. PMID:28337186
Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.
2015-01-01
Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
Nyaga, Martin M; Tan, Yi; Seheri, Mapaseka L; Halpin, Rebecca A; Akopov, Asmik; Stucker, Karla M; Fedorova, Nadia B; Shrivastava, Susmita; Duncan Steele, A; Mwenda, Jason M; Pickett, Brett E; Das, Suman R; Jeffrey Mphahlele, M
2018-05-18
Rotavirus A (RVA) exhibits a wide genotype diversity globally. Little is known about the genetic composition of genotype P[6] from Africa. This study investigated possible evolutionary mechanisms leading to genetic diversity of genotype P[6] VP4 sequences. Phylogenetic analyses on 167 P[6] VP4 full-length sequences were conducted, which included six porcine-origin sequences. Of the 167 sequences, 57 were newly acquired through whole genome sequencing as part of this study. The other 110 sequences were all publicly-available global P[6] VP4 full-length sequences downloaded from GenBank. The strength of association between the phenotypic features and the phylogeny was also determined. A number of reassortment and mixed infections of RVA genotype P[6] strains were observed in this study. Phylogenetic analyses demostrated the extensive genetic diversity that exists among human P[6] strains, porcine-like strains, their concomitant clades/subclades and estimated that P[6] VP4 gene has a higher substitution rate with the mean of 1.05E-3 substitutions/site/year. Further, the phylogenetic analyses indicated that genotype P[6] strains were endemic in Africa, characterised by an extensive genetic diversity and long-time local evolution of the viruses. This was also supported by phylogeographic clustering and G-genotype clustering of the P[6] strains when Bayesian Tip-association Significance testing (BaTS) was applied, clearly supporting that the viruses evolved locally in Africa instead of spatial mixing among different regions. Overall, the results demonstrated that multiple mechanisms such as reassortment events, various mutations and possibly interspecies transmission account for the enormous diversity of genotype P[6] strains in Africa. These findings highlight the need for continued global surveillance of rotavirus diversity. Copyright © 2018 Elsevier B.V. All rights reserved.
Frost Bite: A Dramatic Tale of Research in Aesthetic Education
ERIC Educational Resources Information Center
Hirsch, Miriam
2008-01-01
This article follows the author's research on the integration of an aesthetic arts initiative in a private elementary school with an established traditional arts program. The narrative describes the sequence of events, interpersonal interactions, and learning experiences in the format of a full-length dramatic performance. Informed by Ben Peretz's…
Oba, Mami; Tsuchiaka, Shinobu; Omatsu, Tsutomu; Katayama, Yukie; Otomaru, Konosuke; Hirata, Teppei; Aoki, Hiroshi; Murata, Yoshiteru; Makino, Shinji; Nagai, Makoto; Mizutani, Tetsuya
2018-01-08
We tested usefulness of a target enrichment system SureSelect, a comprehensive viral nucleic acid detection method, for rapid identification of viral pathogens in feces samples of cattle, pigs and goats. This system enriches nucleic acids of target viruses in clinical/field samples by using a library of biotinylated RNAs with sequences complementary to the target viruses. The enriched nucleic acids are amplified by PCR and subjected to next generation sequencing to identify the target viruses. In many samples, SureSelect target enrichment method increased efficiencies for detection of the viruses listed in the biotinylated RNA library. Furthermore, this method enabled us to determine nearly full-length genome sequence of porcine parainfluenza virus 1 and greatly increased Breadth, a value indicating the ratio of the mapping consensus length in the reference genome, in pig samples. Our data showed usefulness of SureSelect target enrichment system for comprehensive analysis of genomic information of various viruses in field samples. Copyright © 2017 Elsevier Inc. All rights reserved.
A novel totivirus-like virus isolated from bat guano.
Yang, Xinglou; Zhang, Yunzhi; Ge, Xingyi; Yuan, Junfa; Shi, Zhengli
2012-06-01
Previous metagenomic analysis indicated that numerous insect viruses exist in bat guano. In this study, we isolated a novel double-stranded RNA virus, a tentative member of the family Totiviridae, designated Tianjin totivirus (ToV-TJ), from bat feces. The virus is an icosahedral particle with a diameter of 40-43 nm, and it causes cytopathic effect in Sf9, Hz, and C6/36 cell lines. Full-length genomic sequence analysis showed that ToV-TJ shares high similarity with the totivirus OMRV-AK4, which was recently isolated from mosquitoes in Japan. The full-length genome of the ToV-TJ was 7611 bp and contained two predicted non-overlapping open reading frames (ORFs): ORF1, encoding the capsid protein (CP), and ORF2, encoding an RNA-dependent RNA polymerase. Bioassay of ToV-TJ by feeding on the larvae of Spodoptera exigua and Helicoverpa armigera (Hubner) suggests that this virus is not infectious for these two larvae in vivo. Sequences similar to that of ToV-TJ have been detected in bat feces sampled in Yunnan and Hainan Provinces, suggesting that this virus is widely distributed.
NASA Astrophysics Data System (ADS)
Omar, Aimi Farehah; Ismail, Ismanizan
2016-11-01
Sesquiterpene synthase (SS) catalyzes the formation of sesquiterpenes from farnesyl diphosphate (FDP) via carbocation intermediates. In this study, the promoter region of sesquiterpene synthase was isolated from Persicaria minor to identify possible cis-acting elements in the promoter. The full-length PmSS promoter of P. minor is 1824-bp sequences. The sequence was analyzed and several putative cis-acting regulatory elements were identified. Three cis-acting regulatory elements were selected for deletion analysis which are cis-acting element involved in wound responsiveness (WUN), cis - acting element involved in defense and stress responsiveness (TC) and cis-acting element involved in ABA responsiveness (ABRE). Series of deletions were conducted to assess the promoter activity producing three truncated fragments promoter; Prom 2 1606-bp, Prom 3 1144- bp, and Prom 4 921-bp. The full-length promoter and its deletion series were cloned into the pBGWFS7 vector which contain β-glucuronidase (GUS) gene and green fluorescent protein (GFP) as the reporter gene. All constructs were successfully transformed into Arabidopsis thaliana based on PCR of positive BASTA resistance plants.
Construction of Infectious cDNA Clone of a Chrysanthemum stunt viroid Korean Isolate
Yoon, Ju-Yeon; Cho, In-Sook; Choi, Gug-Seoun; Choi, Seung-Kook
2014-01-01
Chrysanthemum stunt viroid (CSVd), a noncoding infectious RNA molecule, causes seriously economic losses of chrysanthemum for 3 or 4 years after its first infection. Monomeric cDNA clones of CSVd isolate SK1 (CSVd-SK1) were constructed in the plasmids pGEM-T easy vector and pUC19 vector. Linear positive-sense transcripts synthesized in vitro from the full-length monomeric cDNA clones of CSVd-SK1 could infect systemically tomato seedlings and chrysanthemum plants, suggesting that the linear CSVd RNA transcribed from the cDNA clones could be replicated as efficiently as circular CSVd in host species. However, direct inoculation of plasmid cDNA clones containing full-length monomeric cDNA of CSVd-SK1 failed to infect tomato and chrysanthemum and linear negative-sense transcripts from the plasmid DNAs were not infectious in the two plant species. The cDNA sequences of progeny viroid in systemically infected tomato and chrysanthemum showed a few substitutions at a specific nucleotide position, but there were no deletions and insertions in the sequences of the CSVd progeny from tomato and chrysanthemum plants. PMID:25288987
Ahuka-Mundeke, Steve; Liegeois, Florian; Ayouba, Ahidjo; Foupouapouognini, Yacouba; Nerrienet, Eric; Delaporte, Eric; Peeters, Martine
2010-01-01
Simian immunodeficiency viruses (SIVs) are lentiviruses that infect an extensive number of wild African primate species. Here we describe for the first time SIV infection in a captive agile mangabey (Cercocebus agilis) from Cameroon. Phylogenetic analysis of the full-length genome sequence of SIVagi-00CM312 showed that this novel virus fell into the SIVrcm lineage and was most closely related to a newly characterized SIVrcm strain (SIVrcm-02CM8081) from a wild-caught red-capped mangabey (Cercocebus torquatus) from Cameroon. In contrast to red-capped mangabeys, no 24 bp deletion in CCR5 has been observed in the agile mangabey. Further studies on wild agile mangabeys are needed to determine whether agile and red-capped mangabeys are naturally infected with the same SIV lineage, or whether this agile mangabey became infected with an SIVrcm strain in captivity. However, our study shows that agile mangabeys are susceptible to SIV infection. PMID:20797968
Ahuka-Mundeke, Steve; Liegeois, Florian; Ayouba, Ahidjo; Foupouapouognini, Yacouba; Nerrienet, Eric; Delaporte, Eric; Peeters, Martine
2010-12-01
Simian immunodeficiency viruses (SIVs) are lentiviruses that infect an extensive number of wild African primate species. Here we describe for the first time SIV infection in a captive agile mangabey (Cercocebus agilis) from Cameroon. Phylogenetic analysis of the full-length genome sequence of SIVagi-00CM312 showed that this novel virus fell into the SIVrcm lineage and was most closely related to a newly characterized SIVrcm strain (SIVrcm-02CM8081) from a wild-caught red-capped mangabey (Cercocebus torquatus) from Cameroon. In contrast to red-capped mangabeys, no 24 bp deletion in CCR5 has been observed in the agile mangabey. Further studies on wild agile mangabeys are needed to determine whether agile and red-capped mangabeys are naturally infected with the same SIV lineage, or whether this agile mangabey became infected with an SIVrcm strain in captivity. However, our study shows that agile mangabeys are susceptible to SIV infection.
Palanga, Essowè; Martin, Darren P; Galzi, Serge; Zabré, Jean; Bouda, Zakaria; Neya, James Bouma; Sawadogo, Mahamadou; Traore, Oumar; Peterschmitt, Michel; Roumagnac, Philippe; Filloux, Denis
2017-07-01
The full-length genome sequences of two novel poleroviruses found infecting cowpea plants, cowpea polerovirus 1 (CPPV1) and cowpea polerovirus 2 (CPPV2), were determined using overlapping RT-PCR and RACE-PCR. Whereas the 5845-nt CPPV1 genome was most similar to chickpea chlorotic stunt virus (73% identity), the 5945-nt CPPV2 genome was most similar to phasey bean mild yellow virus (86% identity). The CPPV1 and CPPV2 genomes both have a typical polerovirus genome organization. Phylogenetic analysis of the inferred P1-P2 and P3 amino acid sequences confirmed that CPPV1 and CPPV2 are indeed poleroviruses. Four apparently unique recombination events were detected within a dataset of 12 full polerovirus genome sequences, including two events in the CPPV2 genome. Based on the current species demarcation criteria for the family Luteoviridae, we tentatively propose that CPPV1 and CPPV2 should be considered members of novel polerovirus species.
Goller, Katja V; Gabriel, Claudia; Dimna, Mireille Le; Le Potier, Marie-Frédérique; Rossi, Sophie; Staubach, Christoph; Merboth, Matthias; Beer, Martin; Blome, Sandra
2016-03-01
Classical swine fever is a viral disease of pigs that carries tremendous socio-economic impact. In outbreak situations, genetic typing is carried out for the purpose of molecular epidemiology in both domestic pigs and wild boar. These analyses are usually based on harmonized partial sequences. However, for high-resolution analyses towards the understanding of genetic variability and virus evolution, full-genome sequences are more appropriate. In this study, a unique set of representative virus strains was investigated that was collected during an outbreak in French free-ranging wild boar in the Vosges-du-Nord mountains between 2003 and 2007. Comparative sequence and evolutionary analyses of the nearly full-length sequences showed only slow evolution of classical swine fever virus strains over the years and no impact of vaccination on mutation rates. However, substitution rates varied amongst protein genes; furthermore, a spatial and temporal pattern could be observed whereby two separate clusters were formed that coincided with physical barriers.
Malouli, Daniel; Howell, Grant L; Legasse, Alfred W; Kahl, Christoph; Axthelm, Michael K; Hansen, Scott G; Früh, Klaus
2014-09-01
Multiple novel simian adenoviruses have been isolated over the past years and their potential to cross the species barrier and infect the human population is an ever present threat. Here we describe the isolation and full genome sequencing of a novel simian adenovirus (SAdV) isolated from the urine of two independent, never co-housed, late stage simian immunodeficiency virus (SIV)-infected rhesus macaques. The viral genome sequences revealed a novel type with a unique genome length, GC content, E3 region and DNA polymerase amino acid sequence that is sufficiently distinct from all currently known human- or simian adenovirus species to warrant classifying these isolates as a novel species of simian adenovirus. This new species, termed Simian mastadenovirus D (SAdV-D), displays the standard genome organization for the genus Mastadenovirus containing only one copy of the fiber gene which sets it apart from the old world monkey adenovirus species HAdV-G, SAdV-B and SAdV-C.
Isolation and cloning of a metalloproteinase from king cobra snake venom.
Guo, Xiao-Xi; Zeng, Lin; Lee, Wen-Hui; Zhang, Yun; Jin, Yang
2007-06-01
A 50 kDa fibrinogenolytic protease, ohagin, from the venom of Ophiophagus hannah was isolated by a combination of gel filtration, ion-exchange and heparin affinity chromatography. Ohagin specifically degraded the alpha-chain of human fibrinogen and the proteolytic activity was completely abolished by EDTA, but not by PMSF, suggesting it is a metalloproteinase. It dose-dependently inhibited platelet aggregation induced by ADP, TMVA and stejnulxin. The full sequence of ohagin was deduced by cDNA cloning and confirmed by protein sequencing and peptide mass fingerprinting. The full-length cDNA sequence of ohagin encodes an open reading frame of 611 amino acids that includes signal peptide, proprotein and mature protein comprising metalloproteinase, disintegrin-like and cysteine-rich domains, suggesting it belongs to P-III class metalloproteinase. In addition, P-III class metalloproteinases from the venom glands of Naja atra, Bungarus multicinctus and Bungarus fasciatus were also cloned in this study. Sequence analysis and phylogenetic analysis indicated that metalloproteinases from elapid snake venoms form a new subgroup of P-III SVMPs.
Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes
Salazar-Gonzalez, Jesus F.; Salazar, Maria G.; Tully, Damien C.; Ogilvie, Colin B.; Learn, Gerald H.; Allen, Todd M.; Heath, Sonya L.; Goepfert, Paul; Bar, Katharine J.
2016-01-01
Background Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS) as a simple and convenient alternative to collecting and storing frozen plasma. Methods We performed parallel nucleic acid extraction, single genome amplification (SGA), next generation sequencing (NGS), and phylogenetic analyses on plasma and DBS. Results We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF) HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ∼50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20°C for up to five months. Conclusions These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings. PMID:27819061
Bacterial diversity in the oral cavity of ten healthy individuals
Bik, Elisabeth M.; Long, Clara Davis; Armitage, Gary C.; Loomer, Peter; Emerson, Joanne; Mongodin, Emmanuel F.; Nelson, Karen E.; Gill, Steven R.; Fraser-Liggett, Claire M.; Relman, David A.
2010-01-01
The composition of the oral microbiota from 10 individuals with healthy oral tissues was determined using culture-independent techniques. From each individual, 26 specimens, each from different oral sites at a single point in time, were collected and pooled. An eleventh pool was constructed using portions of the subgingival specimens from all 10 individuals. The 16S rRNA gene was amplified using broad-range bacterial primers, and clone libraries from the individual and subgingival pools were constructed. From a total of 11 368 high-quality, non-chimeric, near full-length sequences, 247 species-level phylotypes (using a 99% sequence identity threshold) and 9 bacteria phyla were identified. At least 15 bacterial genera were conserved among all 10 individuals, with significant interindividual differences at the species and strain level. Comparisons of these oral bacterial sequences to near full-length sequences found previously in the large intestines and feces of other healthy individuals suggest that the mouth and intestinal tract harbor distinct sets of bacteria. Co-occurrence analysis demonstrated significant segregation of taxa when community membership was examined at the level of genus, but not at the level of species, suggesting that ecologically-significant, competitive interactions are more apparent at a broader taxonomic level than species. This study is one of the more comprehensive, high-resolution analyses of bacterial diversity within the healthy human mouth to date, and highlights the value of tools from macroecology for enhancing our understanding of bacterial ecology in human health. PMID:20336157
Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai
Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung
2016-01-01
An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His69, Asp117, and Ser216. The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5′ donor splice (GT) and 3′ acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai. PMID:27399771
Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai.
Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung
2016-07-05
An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His(69), Asp(117), and Ser(216). The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5' donor splice (GT) and 3' acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai.
Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.
2002-01-01
A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.
Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W
2018-02-01
Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.
Chien, Wenwen; O'Kelly, James; Lu, Daning; Leiter, Amanda; Sohn, Julia; Yin, Dong; Karlan, Beth; Vadgama, Jay; Lyons, Karen M; Koeffler, H Phillip
2011-06-01
Connective tissue growth factor (CTGF/CCN2) belongs to the CCN family of matricellular proteins, comprising Cyr61, CTGF, NovH and WISP1-3. The CCN proteins contain an N-terminal signal peptide followed by four conserved domains sharing sequence similarities with the insulin-like growth factor binding proteins, von Willebrand factor type C repeat, thrombospondin type 1 repeat, and a C-terminal growth factor cysteine knot domain. To investigate the role of CCN2 in breast cancer, we transfected MCF-7 cells with full-length CCN2, and with four mutant constructs in which one of the domains had been deleted. MCF-7 cells stably expressing full-length CCN2 demonstrated reduced cell proliferation, increased migration in Boyden chamber assays and promoted angiogenesis in chorioallantoic membrane assays compared to control cells. Deletion of the C-terminal cysteine knot domain, but not of any other domain-deleted mutants, abolished activities mediated by full-length CCN2. We have dissected the role of CCN2 in breast tumorigenesis on a structural basis.
Qin, Shaomin; Yang, Heng; Zhang, Yixuan; Li, Zhanhong; Lin, Jun; Gao, Lin; Liao, Defang; Cao, Yingying; Ren, Pengfei; Li, Huachun; Wu, Jianmin
2018-05-01
Bluetongue (BT) is one of the most important insect-borne, non-contagious viral diseases of ruminants and can cause severe disease and death in sheep. Its pathogen, bluetongue virus (BTV) has a double-stranded RNA genome consisting of 10 segments that provides an opportunity for field and vaccine strains of different serotypes to reassort whilst simultaneously infecting the same animal. For the first time, we report the full-length genome sequence of a BTV strain of serotype 21 (5149E) isolated from sentinel cattle in Guangxi Province in China in 2015. Sequence analysis suggested that the isolate 5149E had undergone a reassortment incident and acquired seg-6 from an isolate of BTV-16 which originated from Japan. This study aims to provide more understanding as to the origin and epidemiology of BTV.
NASA Astrophysics Data System (ADS)
Zhao, Liyuan; Mi, Tiezhu; Zhen, Yu; Yu, Zhigang
2012-05-01
Mitochondrial cytochrome b (Cytb), one of the few proteins encoded by the mitochondrial DNA, plays an important role in transferring electrons. As a mitochondrial gene, it has been widely used for phylogenetic analysis. Previously, a 949-bp fragment of the coding gene and mRNA editing were characterized from Prorocentrum donghaiense, which might prove useful for resolving P. donghaiense from closely related species. However, the full-length coding region has not been characterized. In this study, we used rapid amplification of cDNA ends (RACE) to obtain full-length, 1 124 bp cDNA. Cytb transcript contained a standard initiation codon ATG, but did not have a recognizable stop codon. Homology comparison showed that the P. donghaiense Cytb had a high sequence identity to Cytb sequences from other dinoflagellate species. Phylogenetic analysis placed Cytb from P. donghaiense in the clade of dinoflagellates and it clustered together strongly with that from P. minimum. Based on the full-length sequence, we inferred 32 editing events at different positions, accounting for 2.93% of the Cytb gene. 34.4% (11) of the changes were A to G, 25% (8) were T to C, and 25% (8) were C to U, with smaller proportions of G to C and G to A edits (9.4% (3) and 6.2% (2), respectively). The expression level of the Cytb transcript was quantified by real-time PCR with a TaqMan probe at different times during the whole growth phase. The average Cytb transcript was present at 39.27±7.46 copies of cDNA per cell during the whole growth cycle, and the expression of Cytb was relatively stable over the different phases. These results deepen our understanding of the structure and characteristics of Cytb in P. donghaiense, and confirmed that Cytb in P. donghaiense is a candidate reference gene for studying the expression of other genes.
Terrat, Yves; Biass, Daniel; Dutertre, Sébastien; Favreau, Philippe; Remm, Maido; Stöcklin, Reto; Piquemal, David; Ducancel, Frédéric
2012-01-01
Although cone snail venoms have been intensively investigated in the past few decades, little is known about the whole conopeptide and protein content in venom ducts, especially at the transcriptomic level. If most of the previous studies focusing on a limited number of sequences have contributed to a better understanding of conopeptide superfamilies, they did not give access to a complete panorama of a whole venom duct. Additionally, rare transcripts were usually not identified due to sampling effect. This work presents the data and analysis of a large number of sequences obtained from high throughput 454 sequencing technology using venom ducts of Conus consors, an Indo-Pacific living piscivorous cone snail. A total of 213,561 Expressed Sequence Tags (ESTs) with an average read length of 218 base pairs (bp) have been obtained. These reads were assembled into 65,536 contiguous DNA sequences (contigs) then into 5039 clusters. The data revealed 11 conopeptide superfamilies representing a total of 53 new isoforms (full length or nearly full-length sequences). Considerable isoform diversity and major differences in transcription level could be noted between superfamilies. A, O and M superfamilies are the most diverse. The A family isoforms account for more than 70% of the conopeptide cocktail (considering all ESTs before clustering step). In addition to traditional superfamilies and families, minor transcripts including both cysteine free and cysteine-rich peptides could be detected, some of them figuring new clades of conopeptides. Finally, several sets of transcripts corresponding to proteins commonly recruited in venom function could be identified for the first time in cone snail venom duct. This work provides one of the first large-scale EST project for a cone snail venom duct using next-generation sequencing, allowing a detailed overview of the venom duct transcripts. This leads to an expanded definition of the overall cone snail venom duct transcriptomic activity, which goes beyond the cysteine-rich conopeptides. For instance, this study enabled to detect proteins involved in common post-translational maturation and folding, and to reveal compounds classically involved in hemolysis and mechanical penetration of the venom into the prey. Further comparison with proteomic and genomic data will lead to a better understanding of conopeptides diversity and the underlying mechanisms involved in conopeptide evolution. Copyright © 2011 Elsevier Ltd. All rights reserved.
Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros
2015-01-01
Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths. PMID:26227865
Quiroz Velasquez, Paula F.; Abiff, Sumayyah K.; Fins, Katrina C.; Conway, Quincy B.; Salazar, Norma C.; Delgado, Ana Paula; Dawes, Jhanelle K.; Douma, Lauren G.
2014-01-01
A combination of 454 pyrosequencing and Sanger sequencing was used to sample and characterize the transcriptome of the entomopathogenic oomycete Lagenidium giganteum. More than 50,000 high-throughput reads were annotated through homology searches. Several selected reads served as seeds for the amplification and sequencing of full-length transcripts. Phylogenetic analyses inferred from full-length cellulose synthase alignments revealed that L giganteum is nested within the peronosporalean galaxy and as such appears to have evolved from a phytopathogenic ancestor. In agreement with the phylogeny reconstructions, full-length L. giganteum oomycete effector orthologs, corresponding to the cellulose-binding elicitor lectin (CBEL), crinkler (CRN), and elicitin proteins, were characterized by domain organizations similar to those of pathogenicity factors of plant-pathogenic oomycetes. Importantly, the L. giganteum effectors provide a basis for detailing the roles of canonical CRN, CBEL, and elicitin proteins in the infectious process of an oomycete known principally as an animal pathogen. Finally, phylogenetic analyses and genome mining identified members of glycoside hydrolase family 5 subfamily 27 (GH5_27) as putative virulence factors active on the host insect cuticle, based in part on the fact that GH5_27 genes are shared by entomopathogenic oomycetes and fungi but are underrepresented in nonentomopathogenic genomes. The genomic resources gathered from the L. giganteum transcriptome analysis strongly suggest that filamentous entomopathogens (oomycetes and fungi) exhibit convergent evolution: they have evolved independently from plant-associated microbes, have retained genes indicative of plant associations, and may share similar cores of virulence factors, such as GH5_27 enzymes, that are absent from the genomes of their plant-pathogenic relatives. PMID:25107973
Comparative Analysis of Type IV Pilin in Desulfuromonadales
Shu, Chuanjun; Xiao, Ke; Yan, Qin; Sun, Xiao
2016-01-01
During anaerobic respiration, the bacteria Geobacter sulfurreducens can transfer electrons to extracellular electron accepters through its pilus. G. sulfurreducens pili have been reported to have metallic-like conductivity that is similar to doped organic semiconductors. To study the characteristics and origin of conductive pilin proteins found in the pilus structure, their genetic, structural, and phylogenetic properties were analyzed. The genetic relationships, and conserved structures and sequences that were obtained were used to predict the evolution of the pilins. Homologous genes that encode conductive pilin were found using PilFind and Cluster. Sequence characteristics and protein tertiary structures were analyzed with MAFFT and QUARK, respectively. The origin of conductive pilins was explored by building a phylogenetic tree. Truncation is a characteristic of conductive pilin. The structures of truncated pilins and their accompanying proteins were found to be similar to the N-terminal and C-terminal ends of full-length pilins respectively. The emergence of the truncated pilins can probably be ascribed to the evolutionary pressure of their extracellular electron transporting function. Genes encoding truncated pilins and proteins similar to the C-terminal of full-length pilins, which contain a group of consecutive anti-parallel beta-sheets, are adjacent in bacterial genomes. According to the genetic, structure, and phylogenetic analyses performed in this study, we inferred that the truncated pilins and their accompanying proteins probably evolved from full-length pilins by gene fission through duplication, degeneration, and separation. These findings provide new insights about the molecular mechanisms involved in long-range electron transport along the conductive pili of Geobacter species. PMID:28066394
Length-independent structural similarities enrich the antibody CDR canonical class model.
Nowak, Jaroslaw; Baker, Terry; Georges, Guy; Kelm, Sebastian; Klostermann, Stefan; Shi, Jiye; Sridharan, Sudharsan; Deane, Charlotte M
2016-01-01
Complementarity-determining regions (CDRs) are antibody loops that make up the antigen binding site. Here, we show that all CDR types have structurally similar loops of different lengths. Based on these findings, we created length-independent canonical classes for the non-H3 CDRs. Our length variable structural clusters show strong sequence patterns suggesting either that they evolved from the same original structure or result from some form of convergence. We find that our length-independent method not only clusters a larger number of CDRs, but also predicts canonical class from sequence better than the standard length-dependent approach. To demonstrate the usefulness of our findings, we predicted cluster membership of CDR-L3 sequences from 3 next-generation sequencing datasets of the antibody repertoire (over 1,000,000 sequences). Using the length-independent clusters, we can structurally classify an additional 135,000 sequences, which represents a ∼20% improvement over the standard approach. This suggests that our length-independent canonical classes might be a highly prevalent feature of antibody space, and could substantially improve our ability to accurately predict the structure of novel CDRs identified by next-generation sequencing.
Gene length as a biological timer to establish temporal transcriptional regulation
Kirkconnell, Killeen S.; Magnuson, Brian; Paulsen, Michelle T.; Lu, Brian; Bedi, Karan; Ljungman, Mats
2017-01-01
ABSTRACT Transcriptional timing is inherently influenced by gene length, thus providing a mechanism for temporal regulation of gene expression. While gene size has been shown to be important for the expression timing of specific genes during early development, whether it plays a role in the timing of other global gene expression programs has not been extensively explored. Here, we investigate the role of gene length during the early transcriptional response of human fibroblasts to serum stimulation. Using the nascent sequencing techniques Bru-seq and BruUV-seq, we identified immediate genome-wide transcriptional changes following serum stimulation that were linked to rapid activation of enhancer elements. We identified 873 significantly induced and 209 significantly repressed genes. Variations in gene size allowed for a large group of genes to be simultaneously activated but produce full-length RNAs at different times. The median length of the group of serum-induced genes was significantly larger than the median length of all expressed genes, housekeeping genes, and serum-repressed genes. These gene length relationships were also observed in corresponding mouse orthologs, suggesting that relative gene size is evolutionarily conserved. The sizes of transcription factor and microRNA genes immediately induced after serum stimulation varied dramatically, setting up a cascade mechanism for temporal expression arising from a single activation event. The retention and expansion of large intronic sequences during evolution have likely played important roles in fine-tuning the temporal expression of target genes in various cellular response programs. PMID:28055303
2011-01-01
Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition. PMID:21492485
Natarajan, Purushothaman; Parani, Madasamy
2011-04-15
Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition.
Conservation of the Human Integrin-Type Beta-Propeller Domain in Bacteria
Chouhan, Bhanupratap; Denesyuk, Alexander; Heino, Jyrki; Johnson, Mark S.; Denessiouk, Konstantin
2011-01-01
Integrins are heterodimeric cell-surface receptors with key functions in cell-cell and cell-matrix adhesion. Integrin α and β subunits are present throughout the metazoans, but it is unclear whether the subunits predate the origin of multicellular organisms. Several component domains have been detected in bacteria, one of which, a specific 7-bladed β-propeller domain, is a unique feature of the integrin α subunits. Here, we describe a structure-derived motif, which incorporates key features of each blade from the X-ray structures of human αIIbβ3 and αVβ3, includes elements of the FG-GAP/Cage and Ca2+-binding motifs, and is specific only for the metazoan integrin domains. Separately, we searched for the metazoan integrin type β-propeller domains among all available sequences from bacteria and unicellular eukaryotic organisms, which must incorporate seven repeats, corresponding to the seven blades of the β-propeller domain, and so that the newly found structure-derived motif would exist in every repeat. As the result, among 47 available genomes of unicellular eukaryotes we could not find a single instance of seven repeats with the motif. Several sequences contained three repeats, a predicted transmembrane segment, and a short cytoplasmic motif associated with some integrins, but otherwise differ from the metazoan integrin α subunits. Among the available bacterial sequences, we found five examples containing seven sequential metazoan integrin-specific motifs within the seven repeats. The motifs differ in having one Ca2+-binding site per repeat, whereas metazoan integrins have three or four sites. The bacterial sequences are more conserved in terms of motif conservation and loop length, suggesting that the structure is more regular and compact than those example structures from human integrins. Although the bacterial examples are not full-length integrins, the full-length metazoan-type 7-bladed β-propeller domains are present, and sometimes two tandem copies are found. PMID:22022374
Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F.; Zhang, Qiuheng
2016-01-01
Background Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Methods Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3’ UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Results Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Conclusion Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation. PMID:27798706
Yin, Yuxin; Lan, James H; Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F; Zhang, Qiuheng
2016-01-01
Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3' UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation.
Structure, organization and expression of common carp (Cyprinus carpio L.) SLP-76 gene.
Huang, Rong; Sun, Xiao-Feng; Hu, Wei; Wang, Ya-Ping; Guo, Qiong-Lin
2008-05-01
SLP-76 is an important member of the SLP-76 family of adapters, and it plays a key role in TCR signaling and T cell function. Partial cDNA sequence of SLP-76 of common carp (Cyprinus carpio L.) was isolated from thymus cDNA library by the method of suppression subtractive hybridization (SSH). Subsequently, the full length cDNA of carp SLP-76 was obtained by means of 3' RACE and 5' RACE, respectively. The full length cDNA of carp SLP-76 was 2007 bp, consisting of a 5'-terminal untranslated region (UTR) of 285 bp, a 3'-terminal UTR of 240 bp, and an open reading frame of 1482 bp. Sequence comparison showed that the deduced amino acid sequence of carp SLP-76 had an overall similarity of 34-73% to that of other species homologues, and it was composed of an NH2-terminal domain, a central proline-rich domain, and a C-terminal SH2 domain. Amino acid sequence analysis indicated the existence of a Gads binding site R-X-X-K, a 10-aa-long sequence which binds to the SH3 domain of LCK in vitro, and three conserved tyrosine-containing sequence in the NH2-terminal domain. Then we used PCR to obtain a genomic DNA which covers the entire coding region of carp SLP-76. In the 9.2k-long genomic sequence, twenty one exons and twenty introns were identified. RT-PCR results showed that carp SLP-76 was expressed predominantly in hematopoietic tissues, and was upregulated in thymus tissue of four-month carp compared to one-year old carp. RT-PCR and virtual northern hybridization results showed that carp SLP-76 was also upregulated in thymus tissue of GH transgenic carp at the age of four-months. These results suggest that the expression level of SLP-76 gene may be related to thymocyte development in teleosts.
USDA-ARS?s Scientific Manuscript database
The present work characterized a second endogenous cellulase (endo-ß-1,4-glucanase) gene, CfEG4, uncovered in the transcriptome of Formosan subterranean termite (Coptotermes formosanus). The full-length gene was cloned and sequenced. It is similar to the CfEG3a described earlier (Zhang et al. 2009) ...
USDA-ARS?s Scientific Manuscript database
The Rift Valley fever virus (RVFV) encodes structural proteins, nucleoprotein (N), N-terminus glycoprotein (Gn), C-terminus glycoprotein (Gc) and L protein, 78-kDa and non-structural proteins NSm and NSs. Using the baculovirus system we expressed the full-length coding sequence of N, NSs, NSm, Gc an...
USDA-ARS?s Scientific Manuscript database
Cinnamoyl-CoA reductase (CCR) is an important enzyme for lignin biosynthesis as it catalyzes the first specific committed step in monolignol biosynthesis. We have cloned a full length coding sequence of CCR from kenaf (Hibiscus cannabinus L.), which contains a 1,020-bp open reading frame (ORF), enco...
Sitthithaworn, W; Kojima, N; Viroonchatapan, E; Suh, D Y; Iwanami, N; Hayashi, T; Noji, M; Saito, K; Niwa, Y; Sankawa, U
2001-02-01
cDNAs encoding geranylgeranyl diphosphate synthase (GGPPS) of two diterpene-producing plants, Scoparia dulcis and Croton sublyratus, have been isolated using the homology-based polymerase chain reaction (PCR) method. Both clones contained highly conserved aspartate-rich motifs (DDXX(XX)D) and their N-terminal residues exhibited the characteristics of chloroplast targeting sequence. When expressed in Escherichia coli, both the full-length and truncated proteins in which the putative targeting sequence was deleted catalyzed the condensation of farnesyl diphosphate and isopentenyl diphosphate to produce geranylgeranyl diphosphate (GGPP). The structural factors determining the product length in plant GGPPSs were investigated by constructing S. dulcis GGPPS mutants on the basis of sequence comparison with the first aspartate-rich motif (FARM) of plant farnesyl diphosphate synthase. The result indicated that in plant GGPPSs small amino acids, Met and Ser, at the fourth and fifth positions before FARM and Pro and Cys insertion in FARM play essential roles in determination of product length. Further, when a chimeric gene comprised of the putative transit peptide of the S. dulcis GGPPS gene and a green fluorescent protein was introduced into Arabidopsis leaves by particle gun bombardment, the chimeric protein was localized in chloroplasts, indicating that the cloned S. dulcis GGPPS is a chloroplast protein.
Shitara, M; Tsuboi, Y; Sekizuka, T; Tazumi, A; Moorei, J E; Millar, B C; Taneike, I; Matsuda, M
2008-01-01
Nucleotide sequences of approximately 3.1 kbp consisting of the full-length open reading frame (ORF) for grpE, a non-coding (NC) region and a putative ORF for the full-length dnaK gene (1860 bp) were identified from a urease-positive thermophilic Campylobacter (UPTC) CF89-12 isolate. Then, following the construction of a new degenerate polymerase chain reaction (PCR) primer pair for amplification of the dnaK structural gene, including the transcription terminator region of C. lari isolates, the dnaK region was amplified successfully, TA-cloned and sequenced in nine C. lari isolates. The dnaK gene sequences commenced with an ATG and terminated with a TAA in all 10 isolates, including CF89-12. In addition, the putative ORFs for the dnaK gene locus from seven UPTC isolates consisted of 1860 bases, and the four urease-negative (UN) C. lari isolates included C. lari RM2100 reference strain 1866. Interestingly, different probable ribosome binding sites and hypothetically intrinsic p-independent terminator structures were identified between the seven UPTC and four UN C. lari isolates, respectively. Moreover, it is interesting to note that 20 out of a total of 28 polymorphic sites occurred among amino acid sequences of the dnaK ORF from 11 C. lari isolates, identified to be alternatively UPTC-specific or UN C. lari-specific. In the neighbour-joining tree based on the nucleotide sequence information of the dnaK gene, C. lari forms two major distinct clusters consisting of UPTC and UN C. lari isolates, respectively, with UN C. lari being more closely related to other thermophilic campylobacters than to UPTC.
Akins, R A; Grant, D M; Stohl, L L; Bottorff, D A; Nargang, F E; Lambowitz, A M
1988-11-05
The Mauriceville and Varkud mitochondrial plasmids of Neurospora are closely related, closed circular DNAs (3.6 and 3.7 kb, respectively; 1 kb = 10(3) bases or base-pairs), whose characteristics suggest relationships to mitochondrial DNA introns and retrotransposons. Here, we characterized the structure of the Varkud plasmid, determined its complete nucleotide sequence and mapped its major transcripts. The Mauriceville and Varkud plasmids have more than 97% positional identity. Both plasmids contain a 710 amino acid open reading frame that encodes a reverse transcriptase-like protein. The amino acid sequence of this open reading frame is strongly conserved between the two plasmids (701/710 amino acids) as expected for a functionally important protein. Both plasmids have a 0.4 kb region that contains five PstI palindromes and a direct repeat of approximately 160 base-pairs. Comparison of sequences in this region suggests that the Varkud plasmid has diverged less from a common ancestor than has the Mauriceville plasmid. Two major transcripts of the Varkud plasmid were detected by Northern hybridization experiments: a full-length linear RNA of 3.7 kb and an additional prominent transcript of 4.9 kb, 1.2 kb longer than monomer plasmid. Remarkably, we find that the 4.9 kb transcript is a hybrid RNA consisting of the full-length 3.7 kb Varkud plasmid transcript plus a 5' leader of 1.2 kb that is derived from the 5' end of the mitochondrial small rRNA. This and other findings suggest that the Varkud plasmid, like certain RNA viruses, has a mechanism for joining heterologous RNAs to the 5' end of its major transcript, and that, under some circumstances, nucleotide sequences in mitochondria may be recombined at the RNA level.
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts
Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre
2013-01-01
The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Molecular cloning and characterization of SoxB2 gene from Zhikong scallop Chlamys farreri
NASA Astrophysics Data System (ADS)
He, Yan; Bao, Zhenmin; Guo, Huihui; Zhang, Yueyue; Zhang, Lingling; Wang, Shi; Hu, Jingjie; Hu, Xiaoli
2013-11-01
The Sox proteins play critical roles during the development of animals, including sex determination and central nervous system development. In this study, the SoxB2 gene was cloned from a mollusk, the Zhikong scallop ( Chlamys farreri), and characterized with respect to phylogeny and tissue distribution. The full-length cDNA and genomic DNA sequences of C. farreri SoxB2 ( Cf SoxB2) were obtained by rapid amplification of cDNA ends and genome walking, respectively, using a partial cDNA fragment from the highly conserved DNA-binding domain, i.e., the High Mobility Group (HMG) box. The full-length cDNA sequence of Cf SoxB2 was 2 048 bp and encoded 268 amino acids protein. The genomic sequence was 5 551 bp in length with only one exon. Several conserved elements, such as the TATA-box, GC-box, CAAT-box, GATA-box, and Sox/sry-sex/testis-determining and related HMG box factors, were found in the promoter region. Furthermore, real-time quantitative reverse transcription PCR assays were carried out to assess the mRNA expression of Cf SoxB 2 in different tissues. SoxB2 was highly expressed in the mantle, moderately in the digestive gland and gill, and weakly expressed in the gonad, kidney and adductor muscle. In male and female gonads at different developmental stages of reproduction, the expression levels of Cf SoxB2 were similar. Considering the specific expression and roles of SoxB 2 in other animals, in particular vertebrates, and the fact that there are many pallial nerves in the mantle, cerebral ganglia in the digestive gland and gill nerves in gill, we propose a possible essential role in nervous tissue function for Sox B 2 in C. farreri.
Bacterial Polysaccharide Co-Polymerases Share a Common Framework for Control of Polymer Length
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tocilj,A.; Munger, C.; Proteau, A.
2008-01-01
The chain length distribution of complex polysaccharides present on the bacterial surface is determined by polysaccharide co-polymerases (PCPs) anchored in the inner membrane. We report crystal structures of the periplasmic domains of three PCPs that impart substantially different chain length distributions to surface polysaccharides. Despite very low sequence similarities, they have a common protomer structure with a long central alpha-helix extending 100 Angstroms into the periplasm. The protomers self-assemble into bell-shaped oligomers of variable sizes, with a large internal cavity. Electron microscopy shows that one of the full-length PCPs has a similar organization as that observed in the crystal formore » its periplasmic domain alone. Functional studies suggest that the top of the PCP oligomers is an important region for determining polysaccharide modal length. These structures provide a detailed view of components of the bacterial polysaccharide assembly machinery.« less
Himuro, Yasuyo; Tanaka, Hidenori; Hashiguchi, Masatsugu; Ichikawa, Takanari; Nakazawa, Miki; Seki, Motoaki; Fujita, Miki; Shinozaki, Kazuo; Matsui, Minami; Akashi, Ryo; Hoffmann, Franz
2011-01-15
Using the full-length cDNA overexpressor (FOX) gene-hunting system, we have generated 130 Arabidopsis FOX-superroot lines in bird's-foot trefoil (Lotus corniculatus) for the systematic functional analysis of genes expressed in roots and for the selection of induced mutants with interesting root growth characteristics. We used the Arabidopsis-FOX Agrobacterium library (constructed by ligating pBIG2113SF) for the Agrobacterium-mediated transformation of superroots (SR) and the subsequent selection of gain-of-function mutants with ectopically expressed Arabidopsis genes. The original superroot culture of L. corniculatus is a unique host system displaying fast root growth in vitro, allowing continuous root cloning, direct somatic embryogenesis and mass regeneration of plants under entirely hormone-free culture conditions. Several of the Arabidopsis FOX-superroot lines show interesting deviations from normal growth and morphology of roots from SR-plants, such as differences in pigmentation, growth rate, length or diameter. Some of these mutations are of potential agricultural interest. Genomic PCR analysis revealed that 100 (76.9%) out of the 130 transgenic lines showed the amplification of single fragments. Sequence analysis of the PCR fragments from these 100 lines identified full-length cDNA in 74 of them. Forty-three out of 74 full-length cDNA carried known genes. The Arabidopsis FOX-superroot lines of L. corniculatus, produced in this study, expand the FOX hunting system and provide a new tool for the genetic analysis and control of root growth in a leguminous forage plant. Copyright © 2010 Elsevier GmbH. All rights reserved.
Zhang, Songyan; Gao, Jiuxiang; Lu, Yiling; Cai, Shasha; Qiao, Xue; Wang, Yipeng; Yu, Haining
2013-08-01
Antifreeze proteins (AFPs) refer to a class of polypeptides that are produced by certain vertebrates, plants, fungi, and bacteria and which permit their survival in subzero environments. In this study, we report the molecular cloning, sequence analysis and three-dimensional structure of the axolotl antifreeze-like protein (AFLP) by homology modeling of the first caudate amphibian AFLP. We constructed a full-length spleen cDNA library of axolotl (Ambystoma mexicanum). An EST having highest similarity (∼42%) with freeze-responsive liver protein Li16 from Rana sylvatica was identified, and the full-length cDNA was subsequently obtained by RACE-PCR. The axolotl antifreeze-like protein sequence represents an open reading frame for a putative signal peptide and the mature protein composed of 93 amino acids. The calculated molecular mass and the theoretical isoelectric point (pl) of this mature protein were 10128.6 Da and 8.97, respectively. The molecular characterization of this gene and its deduced protein were further performed by detailed bioinformatics analysis. The three-dimensional structure of current AFLP was predicted by homology modeling, and the conserved residues required for functionality were identified. The homology model constructed could be of use for effective drug design. This is the first report of an antifreeze-like protein identified from a caudate amphibian.
Koning-Boucoiran, Carole F S; Esselink, G Danny; Vukosavljev, Mirjana; van 't Westende, Wendy P C; Gitonga, Virginia W; Krens, Frans A; Voorrips, Roeland E; van de Weg, W Eric; Schulz, Dietmar; Debener, Thomas; Maliepaard, Chris; Arens, Paul; Smulders, Marinus J M
2015-01-01
In order to develop a versatile and large SNP array for rose, we set out to mine ESTs from diverse sets of rose germplasm. For this RNA-Seq libraries containing about 700 million reads were generated from tetraploid cut and garden roses using Illumina paired-end sequencing, and from diploid Rosa multiflora using 454 sequencing. Separate de novo assemblies were performed in order to identify single nucleotide polymorphisms (SNPs) within and between rose varieties. SNPs among tetraploid roses were selected for constructing a genotyping array that can be employed for genetic mapping and marker-trait association discovery in breeding programs based on tetraploid germplasm, both from cut roses and from garden roses. In total 68,893 SNPs were included on the WagRhSNP Axiom array. Next, an orthology-guided assembly was performed for the construction of a non-redundant rose transcriptome database. A total of 21,740 transcripts had significant hits with orthologous genes in the strawberry (Fragaria vesca L.) genome. Of these 13,390 appeared to contain the full-length coding regions. This newly established transcriptome resource adds considerably to the currently available sequence resources for the Rosaceae family in general and the genus Rosa in particular.
Liégeois, Florian; Butel, Christelle; Mouinga-Ondéme, Augustin; Verrier, Delphine; Motsch, Peggy; Gonzalez, Jean-Paul; Peeters, Martine; Rouet, François; Onanga, Richard
2011-11-01
Since the first characterization of SIVsun (L14 strain) from a sun-tailed monkey (Cercopithecus solatus) in Gabon in 1999, no further information exists about the evolutionary history and geographic distribution of this lentivirus. Here, we report the full-length molecular characterization of a second SIVsun virus (SIVsunK08) naturally infecting a wild-caught sun-tailed monkey. The SIVsunK08 strain was most closely related to SIVsunL14 and clustered with members of the SIVmnd-1/SIVlhoest group. SIVsunK08 shared identical functional motifs in the LTR, Gag and Env proteins with SIVsunL14. Our data indicate that C. solatus is naturally infected with a monophyletic SIVsun strain.
Alternative polyadenylation of the gene transcripts encoding a rat DNA polymerase beta.
Konopiński, R; Nowak, R; Siedlecki, J A
1996-10-17
Rat cells produce two different transcripts of DNA polymerase beta (beta-Pol). The low-molecular-weight transcript (1.4 kb) was already sequenced. We report here the cloning and sequencing of the full-length cDNA, corresponding to the high-molecular-weight (HMW) transcript (4.0 kb) of beta-Pol. Sequence data strongly suggest that both transcripts are produced from a single gene by alternative polyadenylation. The HMW transcript contains the entire 1.4 kb transcript sequence and additional 2.2 kb on the 3' end. The 3' UTR of the HMW transcript contains some regulatory sequences which are not present in the 1.4-kb transcript. The A + U-rich fragment and (GU)21 sequence are believed to influence the stability of the mRNA. The functional significance of the A-rich region locally destabilizing double-stranded secondary structure remains unknown.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
Reverse transcription polymerase chain reaction protocols for cloning small circular RNAs.
Navarro, B; Daròs, J A; Flores, R
1998-07-01
A protocol is described for general application for cloning small circular RNAs which requires only minimal amounts of template (approximately 50 ng) of unknown sequence. Both cDNA strands are synthesized with a 26-mer primer whose six 3'-terminal positions are totally degenerate in two consecutive reactions catalyzed by reverse transcriptase and DNA polymerase, respectively. The cDNAs are then PCR-amplified, using a 20-mer primer with the non-degenerate sequence of the previous primer, cloned and sequenced. This information permits the synthesis of one or more pairs of specific and adjacent primers for obtaining full-length cDNA clones by a protocol which is also described.
Reilly, Kevin J.; Spencer, Kristie A.
2013-01-01
The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-06-24
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-01-01
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928
Constable, Fiona E.; Nancarrow, Narelle; Rodoni, Brendan
2018-01-01
Apple mosaic virus (ApMV) and prune dwarf virus (PDV) are amongst the most common viruses infecting Prunus species worldwide but their incidence and genetic diversity in Australia is not known. In a survey of 127 Prunus tree samples collected from five states in Australia, ApMV and PDV occurred in 4 (3%) and 13 (10%) of the trees respectively. High-throughput sequencing (HTS) of amplicons from partial conserved regions of RNA1, RNA2, and RNA3, encoding the methyltransferase (MT), RNA-dependent RNA polymerase (RdRp), and the coat protein (CP) genes respectively, of ApMV and PDV was used to determine the genetic diversity of the Australian isolates of each virus. Phylogenetic comparison of Australian ApMV and PDV amplicon HTS variants and full length genomes of both viruses with isolates occurring in other countries identified genetic strains of each virus occurring in Australia. A single Australian Prunus infecting ApMV genetic strain was identified as all ApMV isolates sequence variants formed a single phylogenetic group in each of RNA1, RNA2, and RNA3. Two Australian PDV genetic strains were identified based on the combination of observed phylogenetic groups in each of RNA1, RNA2, and RNA3 and one Prunus tree had both strains. The accuracy of amplicon sequence variants phylogenetic analysis based on segments of each virus RNA were confirmed by phylogenetic analysis of full length genome sequences of Australian ApMV and PDV isolates and all published ApMV and PDV genomes from other countries. PMID:29562672
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N [San Leandro, CA; Mariella, Jr., Raymond P.; Christian, Allen T [Tracy, CA; Young, Jennifer A [Berkeley, CA; Clague, David S [Livermore, CA
2011-01-18
A method of fabricating a DNA molecule of user-defined sequence. The method comprises the steps of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an even or odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths. In one embodiment starting sequence fragments are of different lengths, n, n+1, n+2, etc.
Pyrin gene and mutants thereof, which cause familial Mediterranean fever
Kastner, Daniel L [Bethesda, MD; Aksentijevichh, Ivona [Bethesda, MD; Centola, Michael [Tacoma Park, MD; Deng, Zuoming [Gaithersburg, MD; Sood, Ramen [Rockville, MD; Collins, Francis S [Rockville, MD; Blake, Trevor [Laytonsville, MD; Liu, P Paul [Ellicott City, MD; Fischel-Ghodsian, Nathan [Los Angeles, CA; Gumucio, Deborah L [Ann Arbor, MI; Richards, Robert I [North Adelaide, AU; Ricke, Darrell O [San Diego, CA; Doggett, Norman A [Santa Cruz, NM; Pras, Mordechai [Tel-Hashomer, IL
2003-09-30
The invention provides the nucleic acid sequence encoding the protein associated with familial Mediterranean fever (FMF). The cDNA sequence is designated as MEFV. The invention is also directed towards fragments of the DNA sequence, as well as the corresponding sequence for the RNA transcript and fragments thereof. Another aspect of the invention provides the amino acid sequence for a protein (pyrin) associated with FMF. The invention is directed towards both the full length amino acid sequence, fusion proteins containing the amino acid sequence and fragments thereof. The invention is also directed towards mutants of the nucleic acid and amino acid sequences associated with FMF. In particular, the invention discloses three missense mutations, clustered in within about 40 to 50 amino acids, in the highly conserved rfp (B30.2) domain at the C-terminal of the protein. These mutants include M6801, M694V, K695R, and V726A. Additionally, the invention includes methods for diagnosing a patient at risk for having FMF and kits therefor.
Novel methodologies for spectral classification of exon and intron sequences
NASA Astrophysics Data System (ADS)
Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.
2012-12-01
Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
Molecular basis of length polymorphism in the human zeta-globin gene complex.
Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J
1983-01-01
The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
NASA Technical Reports Server (NTRS)
Wallace, G. R.; Weathers, G. D.; Graf, E. R.
1973-01-01
The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Tests of two convection theories for red giant and red supergiant envelopes
NASA Technical Reports Server (NTRS)
Stothers, Richard B.; Chin, Chao-Wen
1995-01-01
Two theories of stellar envelope convection are considered here in the context of red giants and red supergiants of intermediate to high mass: Boehm-Vitense's standard mixing-length theory (MLT) and Canuto & Mazzitelli's new theory incorporating the full spectrum of turbulence (FST). Both theories assume incompressible convection. Two formulations of the convective mixing length are also evaluated: l proportional to the local pressure scale height (H(sub P)) and l proportional to the distance from the upper boundary of the convection zone (z). Applications to test both theories are made by calculating stellar evolutionary sequences into the red zone (z). Applications to test both theories are made by calculating stellar evolutionary sequences into the red phase of core helium burning. Since the theoretically predicted effective temperatures for cool stars are known to be sensitive to the assigned value of the mixing length, this quantity has been individually calibrated for each evolutionary sequence. The calibration is done in a composite Hertzsprung-Russell diagram for the red giant and red supergiant members of well-observed Galactic open clusters. The MLT model requires the constant of proportionality for the convective mixing length to vary by a small but statistically significant amount with stellar mass, whereas the FST model succeeds in all cases with the mixing lenghth simply set equal to z. The structure of the deep stellar interior, however, remains very nearly unaffected by the choices of convection theory and mixing lenghth. Inside the convective envelope itself, a density inversion always occurs, but is somewhat smaller for the convectively more efficient MLT model. On physical grounds the FST model is preferable, and seems to alleviate the problem of finding the proper mixing length.
Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng
2017-01-01
To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae. Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae. Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat. PMID:28932215
Zhang, Peipei; Liu, Yan; Liu, Wenwen; Cao, Mengji; Massart, Sebastien; Wang, Xifeng
2017-01-01
To identify the pathogens responsible for leaf yellowing symptoms on wheat samples collected from Jinan, China, we tested for the presence of three known barley/wheat yellow dwarf viruses (BYDV-GAV, -PAV, WYDV-GPV) (most likely pathogens) using RT-PCR. A sample that tested negative for the three viruses was selected for small RNA sequencing. Twenty-five million sequences were generated, among which 5% were of viral origin. A novel polerovirus was discovered and temporarily named wheat leaf yellowing-associated virus (WLYaV). The full genome of WLYaV corresponds to 5,772 nucleotides (nt), with six AUG-initiated open reading frames, one non-AUG-initiated open reading frame, and three untranslated regions, showing typical features of the family Luteoviridae . Sequence comparison and phylogenetic analyses suggested that WLYaV had the closest relationship with sugarcane yellow leaf virus (ScYLV), but the identities of full genomic nucleotides and deduced amino acid sequence of coat protein (CP) were 64.9 and 86.2%, respectively, below the species demarcation thresholds (90%) in the family Luteoviridae . Furthermore, agroinoculation of Nicotiana benthamiana leaves with a cDNA clone of WLYaV caused yellowing symptoms on the plant. Our study adds a new polerovirus that is associated with wheat leaf yellowing disease, which would help to identify and control pathogens of wheat.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feyereisen-Koener, J.M.
Double-stranded cDNA was prepared from infectious hematopoietic necrosis virus mRNA and cloned into the plasmid vector pUC8. A coprotein (G-protein) of infectious hematopoietic necrosis virus was selected by hybridization to a /sup 32/P-labeled probe. The restriction map and nucleotide sequence of the mRNA encoding the glycoprotein of infectious hematopoietic necrosis virus was determined using this full-length cDNA clone.
Identification and expression analysis of duck interleukin-17D in Riemeralla anatipestifer infection
USDA-ARS?s Scientific Manuscript database
Interleukin (IL)-17D is a proinflammatory cytokine with limited information on its biological functions. Here we provide the description of the sequence, bioactivity, and mRNA expression profile of duck IL-17D homologue. A full-length duck IL-17D (duIL-17D) cDNA with a 624-bp coding region was ident...
USDA-ARS?s Scientific Manuscript database
In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise sequence similarity values based on alignment of near full-length 16SrRNA genes (1530 bp) reve...
USDA-ARS?s Scientific Manuscript database
Leafy spurge is an invasive perennial weed infesting range and recreational lands of North America. Previous research and omics projects with leafy spurge have helped develop it as a model for studying numerous aspects of perennial plant development and response to abiotic stress. However, the lack ...
USDA-ARS?s Scientific Manuscript database
Bovine rhinitis viruses (BRV) cause mild respiratory disease of cattle. In this study, a near full length genome sequence of a virus named RS3X, formerly classified as bovine rhinovirus type 1, isolated from infected cattle from the United Kingdom in the 1960s, was obtained and analyzed. Phylogeneti...
Ficarelli, A; Tassi, F; Restivo, F M
1999-03-01
We have isolated two full length cDNA clones encoding Nicotiana plumbaginifolia NADH-glutamate dehydrogenase. Both clones share amino acid boxes of homology corresponding to conserved GDH catalytic domains and putative mitochondrial targeting sequence. One clone shows a putative EF-hand loop. The level of the two transcripts is affected differently by carbon source.
Genome-wide analysis of the WRKY transcription factors in aegilops tauschii.
Ma, Jianhui; Zhang, Daijing; Shao, Yun; Liu, Pei; Jiang, Lina; Li, Chunxi
2014-01-01
The WRKY transcription factors (TFs) play important roles in responding to abiotic and biotic stress in plants. However, due to its unfinished genome sequencing, relatively few WRKY TFs with full-length coding sequences (CDSs) have been identified in wheat. Instead, the Aegilops tauschii genome, which is the D-genome progenitor of the hexaploid wheat genome, provides important resources for the discovery of new genes. In this study, we performed a bioinformatics analysis to identify WRKY TFs with full-length CDSs from the A. tauschii genome. A detailed evolutionary analysis for all these TFs was conducted, and quantitative real-time PCR was carried out to investigate the expression patterns of the abiotic stress-related WRKY TFs under different abiotic stress conditions in A. tauschii seedlings. A total of 93 WRKY TFs were identified from A. tauschii, and 79 of them were found to be newly discovered genes compared with wheat. Gene phylogeny, gene structure and chromosome location of the 93 WRKY TFs were fully analyzed. These studies provide a global view of the WRKY TFs from A. tauschii and a firm foundation for further investigations in both A. tauschii and wheat. © 2015 S. Karger AG, Basel.
Trejo, Sebastián A; López, Laura M I; Caffini, Néstor O; Natalucci, Claudia L; Canals, Francesc; Avilés, Francesc X
2009-07-01
Asclepain f is a papain-like protease previously isolated and characterized from latex of Asclepias fruticosa. This enzyme is a member of the C1 family of cysteine proteases that are synthesized as preproenzymes. The enzyme belongs to the alpha + beta class of proteins, with two disulfide bridges (Cys22-Cys63 and Cys56-Cys95) in the alpha domain, and another one (Cys150-Cys201) in the beta domain, as was determined by molecular modeling. A full-length 1,152 bp cDNA was cloned by RT-RACE-PCR from latex mRNA. The sequence was predicted as an open reading frame of 340 amino acid residues, of which 16 residues belong to the signal peptide, 113 to the propeptide and 211 to the mature enzyme. The full-length cDNA was ligated to pPICZalpha vector and expressed in Pichia pastoris. Recombinant asclepain f showed endopeptidase activity on pGlu-Phe-Leu-p-nitroanilide and was identified by PMF-MALDI-TOF MS. Asclepain f is the first peptidase cloned and expressed from mRNA isolated from plant latex, confirming the presence of the preprocysteine peptidase in the latex.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.
Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew
2012-12-20
The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Medici, Maria Cristina; Tummolo, Fabio; Martella, Vito; Arcangeletti, Maria Cristina; De Conto, Flora; Chezzi, Carlo; Fehér, Enikő; Marton, Szilvia; Calderaro, Adriana; Bányai, Krisztián
2016-08-01
Group C rotaviruses (RVC) are enteric pathogens of humans and animals. Whole-genome sequences are available only for few RVCs, leaving gaps in our knowledge about their genetic diversity. We determined the full-length genome sequence of two human RVCs (PR2593/2004 and PR713/2012), detected in Italy from hospital-based surveillance for rotavirus infection in 2004 and 2012. In the 11 RNA genomic segments, the two Italian RVCs segregated within separate intra-genotypic lineages showed variation ranging from 1.9 % (VP6) to 15.9 % (VP3) at the nucleotide level. Comprehensive analysis of human RVC sequences available in the databases allowed us to reveal the existence of at least two major genome configurations, defined as type I and type II. Human RVCs of type I were all associated with the M3 VP3 genotype, including the Italian strain PR2593/2004. Conversely, human RVCs of type II were all associated with the M2 VP3 genotype, including the Italian strain PR713/2012. Reassortant RVC strains between these major genome configurations were identified. Although only a few full-genome sequences of human RVCs, mostly of Asian origin, are available, the analysis of human RVC sequences retrieved from the databases indicates that at least two intra-genotypic RVC lineages circulate in European countries. Gathering more sequence data is necessary to develop a standardized genotype and intra-genotypic lineage classification system useful for epidemiological investigations and avoiding confusion in the literature.
E2FM: an encrypted and compressed full-text index for collections of genomic sequences.
Montecuollo, Ferdinando; Schmid, Giovannni; Tagliaferri, Roberto
2017-09-15
Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets-as those related to personalized medicine-require the compliance with regulations about the storage and processing of sensitive data. We have designed and carefully engineered E 2 FM -index, a new full-text index in minute space which was optimized for compressing and encrypting nucleotide sequence collections in FASTA format and for performing fast pattern-search queries. E 2 FM -index allows to build self-indexes which occupy till to 1/20 of the storage required by the input FASTA file, thus permitting to save about 95% of storage when indexing collections of highly similar sequences; moreover, it can exactly search the built indexes for patterns in times ranging from few milliseconds to a few hundreds milliseconds, depending on pattern length. Source code is available at https://github.com/montecuollo/E2FM . ferdinando.montecuollo@unicampania.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Hepp, Gary R; Kennamer, Robert A
2018-01-01
Incubation starts during egg laying for many bird species and causes developmental asynchrony within clutches. Faster development of late-laid eggs can help reduce developmental differences and synchronize hatching, which is important for precocial species whose young must leave the nest soon after hatching. In this study, we examined the effect of egg laying sequence on length of the incubation period in Wood Ducks (Aix sponsa). Because incubation temperature strongly influences embryonic development rates, we tested the interactive effects of laying sequence and incubation temperature on the ability of late-laid eggs to accelerate development and synchronize hatching. We also examined the potential cost of faster development on duckling body condition. Fresh eggs were collected and incubated at three biologically relevant temperatures (Low: 34.9°C, Medium: 35.8°C, and High: 37.6°C), and egg laying sequences from 1 to 12 were used. Length of the incubation period declined linearly as laying sequence advanced, but the relationship was strongest at medium temperatures followed by low temperatures and high temperatures. There was little support for including fresh egg mass in models of incubation period. Estimated differences in length of the incubation period between eggs 1 and 12 were 2.7 d, 1.2 d, and 0.7 d at medium, low and high temperatures, respectively. Only at intermediate incubation temperatures did development rates of late-laid eggs increase sufficiently to completely compensate for natural levels of developmental asynchrony that have been reported in Wood Duck clutches at the start of full incubation. Body condition of ducklings was strongly affected by fresh egg mass and incubation temperature but declined only slightly as laying sequence progressed. Our findings show that laying sequence and incubation temperature play important roles in helping to shape embryo development and hatching synchrony in a precocial bird.
Kennamer, Robert A.
2018-01-01
Incubation starts during egg laying for many bird species and causes developmental asynchrony within clutches. Faster development of late-laid eggs can help reduce developmental differences and synchronize hatching, which is important for precocial species whose young must leave the nest soon after hatching. In this study, we examined the effect of egg laying sequence on length of the incubation period in Wood Ducks (Aix sponsa). Because incubation temperature strongly influences embryonic development rates, we tested the interactive effects of laying sequence and incubation temperature on the ability of late-laid eggs to accelerate development and synchronize hatching. We also examined the potential cost of faster development on duckling body condition. Fresh eggs were collected and incubated at three biologically relevant temperatures (Low: 34.9°C, Medium: 35.8°C, and High: 37.6°C), and egg laying sequences from 1 to 12 were used. Length of the incubation period declined linearly as laying sequence advanced, but the relationship was strongest at medium temperatures followed by low temperatures and high temperatures. There was little support for including fresh egg mass in models of incubation period. Estimated differences in length of the incubation period between eggs 1 and 12 were 2.7 d, 1.2 d, and 0.7 d at medium, low and high temperatures, respectively. Only at intermediate incubation temperatures did development rates of late-laid eggs increase sufficiently to completely compensate for natural levels of developmental asynchrony that have been reported in Wood Duck clutches at the start of full incubation. Body condition of ducklings was strongly affected by fresh egg mass and incubation temperature but declined only slightly as laying sequence progressed. Our findings show that laying sequence and incubation temperature play important roles in helping to shape embryo development and hatching synchrony in a precocial bird. PMID:29373593
Poirier, John T; Reddy, P Seshidhar; Idamakanti, Neeraja; Li, Shawn S; Stump, Kristine L; Burroughs, Kevin D; Hallenbeck, Paul L; Rudin, Charles M
2012-12-01
Seneca Valley virus (SVV-001) is an oncolytic picornavirus with selective tropism for a subset of human cancers with neuroendocrine differentiation. To characterize further the specificity of SVV-001 and its patterns and kinetics of intratumoral spread, bacterial plasmids encoding a cDNA clone of the full-length wild-type virus and a derivative virus expressing GFP were generated. The full-length cDNA of the SVV-001 RNA genome was cloned into a bacterial plasmid under the control of the T7 core promoter sequence to create an infectious cDNA clone, pNTX-09. A GFP reporter virus cDNA clone, pNTX-11, was then generated by cloning a fusion protein of GFP and the 2A protein from foot-and-mouth disease virus immediately following the native SVV-001 2A sequence. Recombinant GFP-expressing reporter virus, SVV-GFP, was rescued from cells transfected with in vitro RNA transcripts from pNTX-11 and propagated in cell culture. The proliferation kinetics of SVV-001 and SVV-GFP were indistinguishable. The SVV-GFP reporter virus was used to determine that a subpopulation of permissive cells is present in small-cell lung cancer cell lines previously thought to lack permissivity to SVV-001. Finally, it was shown that SVV-GFP administered to tumour-bearing animals homes in to and infects tumours whilst having no detectable tropism for normal mouse tissues at 1×10(11) viral particles kg(-1), a dose equivalent to that administered in ongoing clinical trials. These infectious clones will be of substantial value in further characterizing the biology of this virus and as a backbone for the generation of additional oncolytic derivatives.
Xu, Dongxue; Sun, Lina; Liu, Shilin; Zhang, Libin; Yang, Hongsheng
2016-08-01
The heat shock response (HSR) is known for the elevated synthesis of heat shock proteins (HSPs) under heat stress, which is mediated primarily by heat shock factor 1 (HSF1). Heat shock factor binding protein 1 (HSBP1) and feedback control of heat shock protein 70 (HSP70) are major regulators of the activity of HSF1. We obtained full-length cDNA of genes hsf1 and hsbp1 in the sea cucumber Apostichopus japonicus, which are the second available for echinoderm (after Strongylocentrotus purpuratus), and the first available for holothurian. The full-length cDNA of hsf1 was 2208bp, containing a 1326bp open reading frame encoding 441 amino acids. The full-length cDNA of hsbp1 was 2850bp, containing a 225bp open reading frame encoding 74 amino acids. The similarities of A. japonicus HSF1 with other species are low, and much higher similarity identities of A. japonicus HSBP1 were shared. Phylogenetic trees showed that A. japonicus HSF1 and HSBP1 were clustered with sequences from S. purpuratus, and fell into distinct clades with sequences from mollusca, arthropoda and vertebrata. Analysis by real-time PCR showed hsf1 and hsbp1 mRNA was expressed constitutively in all tissues examined. The expression of hsf1, hsbp1 and hsp70 in the intestine at 26°C was time-dependent. The results of this study might provide new insights into the regulation of heat shock response in this species. Copyright © 2016. Published by Elsevier Inc.
Pei, D; Neel, B G; Walsh, C T
1993-01-01
A protein-tyrosine-phosphatase (PTPase; EC 3.1.3.48) containing two Src homology 2 (SH2) domains, SHPTP1, was previously identified in hematopoietic and epithelial cells. By placing the coding sequence of the PTPase behind a bacteriophage T7 promoter, we have overexpressed both the full-length enzyme and a truncated PTPase domain in Escherichia coli. In each case, the soluble enzyme was expressed at levels of 3-4% of total soluble E. coli protein. The recombinant proteins had molecular weights of 63,000 and 45,000 for the full-length protein and the truncated PTPase domain, respectively, as determined by SDS/PAGE. The recombinant enzymes dephosphorylated p-nitrophenyl phosphate, phosphotyrosine, and phosphotyrosyl peptides but not phosphoserine, phosphothreonine, or phosphoseryl peptides. The enzymes showed a strong dependence on pH and ionic strength for their activity, with pH optima of 5.5 and 6.3 for the full-length enzyme and the catalytic domain, respectively, and an optimal NaCl concentration of 250-300 mM. The recombinant PTPases had high Km values for p-nitrophenyl phosphate and exhibited non-Michaelis-Menten kinetics for phosphotyrosyl peptides. Images PMID:8430079
Paz, Rosalía Cristina; Kozaczek, Melisa Eliana; Rosli, Hernán Guillermo; Andino, Natalia Pilar; Sanchez-Puerta, Maria Virginia
2017-10-01
Transposable elements are the most abundant components of plant genomes and can dramatically induce genetic changes and impact genome evolution. In the recently sequenced genome of tomato (Solanum lycopersicum), the estimated fraction of elements corresponding to retrotransposons is nearly 62%. Given that tomato is one of the most important vegetable crop cultivated and consumed worldwide, understanding retrotransposon dynamics can provide insight into its evolution and domestication processes. In this study, we performed a genome-wide in silico search of full-length LTR retroelements in the tomato nuclear genome and annotated 736 full-length Gypsy and Copia retroelements. The dispersion level across the 12 chromosomes, the diversity and tissue-specific expression of those elements were estimated. Phylogenetic analysis based on the retrotranscriptase region revealed the presence of 12 major lineages of LTR retroelements in the tomato genome. We identified 97 families, of which 77 and 20 belong to the superfamilies Copia and Gypsy, respectively. Each retroelement family was characterized according to their element size, relative frequencies and insertion time. These analyses represent a valuable resource for comparative genomics within the Solanaceae, transposon-tagging and for the design of cultivar-specific molecular markers in tomato.
Chen, X L; Lui, E Y; Ip, Y Kwong; Lam, S H
2018-06-21
To obtain transcriptomic insights into branchial responses to salinity challenge in Anabas testudineus, this study employed RNA sequencing (RNA-Seq) to analyse the gill transcriptome of A. testudineus exposed to seawater (SW) for 6 days compared with the freshwater (FW) control group. A combined FW and SW gill transcriptome was de novo assembled from 169.9 million 101 bp paired-end reads. In silico validation employing 17 A. testudineus Sanger full-length coding sequences showed that 15/17 of them had greater than 80% of their sequences aligned to the de novo assembled contigs where 5/17 had their full-length (100%) aligned and 9/17 had greater than 90% of their sequences aligned. The combined FW and SW gill transcriptome was mapped to 13780 unique human identifiers at E-value < 1.0E-20 while 952 and 886 identifiers were determined as up and down-regulated by 1.5 fold, respectively, in the gills of A. testudineus in SW when compared with FW. These genes were found to be associated with at least 23 biological processes. A larger proportion of genes encoding enzymes and transporters associated with molecular transport, energy production, metabolisms were up-regulated, while a larger proportion of genes encoding transmembrane receptors, G-protein coupled receptors, kinases and transcription regulators associated with cell cycle, growth, development, signalling, morphology and gene expression were relatively lower in the gills of A. testudineus in SW when compared with FW. High correlation (R = 0.99) was observed between RNA-Seq data and real-time quantitative PCR validation for 13 selected genes. The transcriptomic sequence information will facilitate development of molecular resources and tools while the findings will provide insights for future studies into branchial iono-osmoregulation and related cellular processes in A. testudineus. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa
2014-02-03
Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
Sequencing and analysis of 10967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morin, R D; Chang, E; Petrescu, A
2005-10-31
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection initiative. Here we present an analysis of 10967 clones (8049 from X. laevis and 2918 from X. tropicalis). The clone set contains 2013 orthologs between X. laevis and X. tropicalis as well as 1795 paralog pairs within X. laevis. 1199 are in-paralogs, believed to have resulted from an allotetraploidization event approximately 30 million years ago, and the remaining 546 are likely out-paralogs that have resulted from more ancient gene duplications, prior to the divergence betweenmore » the two species. We do not detect any evidence for positive selection by the Yang and Nielsen maximum likelihood method of approximating d{sub N}/d{sub S}. However, d{sub N}/d{sub S} for X. laevis in-paralogs is elevated relative to X. tropicalis orthologs. This difference is highly significant, and indicates an overall relaxation of selective pressures on duplicated gene pairs. Within both groups of paralogs, we found evidence of subfunctionalization, manifested as differential expression of paralogous genes among tissues, as measured by EST information from public resources. We have observed, as expected, a higher instance of subfunctionalization in out-paralogs relative to in-paralogs.« less
Methods for detection of methyl-CpG dinucleotides
Dunn, John J
2013-11-26
The invention provides methods for enriching methyl-CpG sequences from a DNA sample. The method makes use of conversion of cytosine residues to uracil under conditions in which methyl-cytosine residues are preserved. Additional methods of the invention enable to preservation of the context of me-CpG dinucleotides. The invention also provides a recombinant, full length and substantially pure McrA protein (rMcrA) for binding and isolation of DNA fragments containing the sequence 5'-C.sup.MeCpGG-3'. Methods for making and using the rMcrA protein, and derivatives thereof are provided.
Methods for detection of methyl-CpG dinucleotides
Dunn, John J.
2013-01-29
The invention provides methods for enriching methyl-CpG sequences from a DNA sample. The method makes use of conversion of cytosine residues to uracil under conditions in which methyl-cytosine residues are preserved. Additional methods of the invention enable to preservation of the context of me-CpG dinucleotides. The invention also provides a recombinant, full length and substantially pure McrA protein (rMcrA) for binding and isolation of DNA fragments containing the sequence 5'-C.sup.MeCpGG-3'. Methods for making and using the rMcrA protein, and derivatives thereof are provided.
Methods for detection of methyl-CpG dinucleotides
Dunn, John J.
2012-09-11
The invention provides methods for enriching methyl-CpG sequences from a DNA sample. The method makes use of conversion of cytosine residues to uracil under conditions in which methyl-cytosine residues are preserved. Additional methods of the invention enable to preservation of the context of me-CpG dinucleotides. The invention also provides a recombinant, full length and substantially pure McrA protein (rMcrA) for binding and isolation of DNA fragments containing the sequence 5'-C.sup.MeCpGG-3'. Methods for making and using the rMcrA protein, and derivatives thereof are provided.
Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T
2013-06-01
Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.
Phylogenetic tree of 16s rRNA sequences from sulfate-reducing bacteria in a sandy marine sediment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devereux, R.; Mundfrom, G.W.
1994-01-01
Phylogenetic divergence among sulfate-reducing bateria in an estuarine sediment sample was investigated by PCR amplification and comparison of partial 16S rDNA sequences. Twenty unique 16S rDNA sequences were found, 12 from delta subclass bacteria based on overall sequence similarity (82-91%). Two successive PCR amplifications were used to obtain and clone the 16S rDNA. The first reaction used templates derived from phosphate-buffered saline washed sediment with primers designed to amplify nearly full-length bacterial domain 16S rDNA. A produce from a first reaction was used as template in a second reaction with primers designed to selectivity amplify a region of 16S rDNAmore » genes of sulfate-reducing bacteria. A phylogenetic tree incorporating the cloned sequences suggests the presence of yet to be cultivated lines of sulfate-reducing bacteria within the sediment sample.« less
Hawkins, Charlene
2014-01-01
The Est1 (ever shorter telomeres 1) protein is an essential component of yeast telomerase, a ribonucleoprotein complex that restores the repetitive sequences at chromosome ends (telomeres) that would otherwise be lost during DNA replication. Previous work has shown that the telomerase RNA component (TLC1) transits through the cytoplasm during telomerase biogenesis, but mechanisms of protein import have not been addressed. Here we identify three nuclear localization sequences (NLSs) in Est1p. Mutation of the most N-terminal NLS in the context of full-length Est1p reduces Est1p nuclear localization and causes telomere shortening—phenotypes that are rescued by fusion with the NLS from the simian virus 40 (SV40) large-T antigen. In contrast to that of the TLC1 RNA, Est1p nuclear import is facilitated by Srp1p, the yeast homolog of importin α. The reduction in telomere length observed at the semipermissive temperature in a srp1 mutant strain is rescued by increased Est1p expression, consistent with a defect in Est1p nuclear import. These studies suggest that at least two nuclear import pathways are required to achieve normal telomere length homeostasis in yeast. PMID:24906415
Shirvani-Dastgerdi, E; Amini-Bavil-Olyaee, S; Alavian, S Moayed; Trautwein, C; Tacke, F
2015-05-01
Delta hepatitis, caused by co-infection or super-infection of hepatitis D virus (HDV) in hepatitis B virus (HBV) -infected patients, is the most severe form of chronic hepatitis, often progressing to liver cirrhosis and liver failure. Although 15 million individuals are affected worldwide, molecular data on the HDV genome and its proteins, small and large delta antigen (S-/L-HDAg), are limited. We therefore conducted a nationwide study in HBV-HDV-infected patients from Iran and successfully amplified 38 HDV full genomes and 44 L-HDAg sequences from 34 individuals. Phylogenetic analyses of full-length HDV and L-HDAg isolates revealed that all strains clustered with genotype 1 and showed high genotypic distances to HDV genotypes 2 to 8, with a maximal distance to genotype 3. Longitudinal analyses in individual patients indicated a reverse evolutionary trend, especially in L-HDAg amino acid composition, over time. Besides multiple sequence variations in the hypervariable region of HDV, nucleotide substitutions preferentially occurred in the stabilizing P4 domain of the HDV ribozyme. A high rate of single amino acid changes was detected in structural parts of L-HDAg, whereas its post-translational modification sites were highly conserved. Interestingly, several non-synonymous mutations were positively selected that affected immunogenic epitopes of L-HDAg towards CD8 T-cell- and B-cell-driven immune responses. Hence, our comprehensive molecular analysis comprising a nationwide cohort revealed phylogenetic relationships and provided insight into viral evolution within individual hosts. Moreover, preferential areas of frequent mutations in the HDV ribozyme and antigen protein were determined in this study. Copyright © 2014 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Birla, Bhagyashree S; Chou, Hui-Hsien
2015-01-01
Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.
Sequence-Dependent Persistence Length of Long DNA
NASA Astrophysics Data System (ADS)
Chuang, Hui-Min; Reifenberger, Jeffrey G.; Cao, Han; Dorfman, Kevin D.
2017-12-01
Using a high-throughput genome-mapping approach, we obtained circa 50 million measurements of the extension of internal human DNA segments in a 41 nm ×41 nm nanochannel. The underlying DNA sequences, obtained by mapping to the reference human genome, are 2.5-393 kilobase pairs long and contain percent GC contents between 32.5% and 60%. Using Odijk's theory for a channel-confined wormlike chain, these data reveal that the DNA persistence length increases by almost 20% as the percent GC content increases. The increased persistence length is rationalized by a model, containing no adjustable parameters, that treats the DNA as a statistical terpolymer with a sequence-dependent intrinsic persistence length and a sequence-independent electrostatic persistence length.
Maestre, Juan P; Rovira, Roger; Gamisans, Xavier; Kinney, Kerry A; Kirisits, Mary Jo; Lafuente, Javier; Gabriel, David
2009-01-01
The diversity and spatial distribution of bacteria in a lab-scale biotrickling filter treating high loads of hydrogen sulfide (H(2)S) were investigated. Diversity and community structure were studied by terminal-restriction fragment length polymorphism (T-RFLP). A 16S rRNA gene clone library was established. Near Full-length 16S rRNA gene sequences were obtained, and clones were clustered into 24 operational taxonomic units (OTUs). Nearly 74% and 26% of the clones were affiliated with the phyla Proteobacteria and Bacteroidetes, respectively. Beta-, epsilon- and gamma-proteobacteria accounted for 15, 9 and 48%, respectively. Around 45% of the sequences retrieved were affiliated to bacteria of the sulfur cycle including Thiothrix spp., Thiobacillus spp. and Sulfurimonas denitrificans. Sequences related to Thiothrix lacustris accounted for a 38%. Rarefaction curve demonstrated that clone library constructed can be sufficient to describe the vast majority of the bacterial diversity of this reactor operating under strict conditions (2,000 ppm(v) of H(2)S). A spatial distribution of bacteria was found along the length of the reactor by means of the T-RFLP technique. Although aerobic species were predominant along the reactor, facultative anaerobes had a major relative abundance in the inlet part of the reactor, where the sulfide to oxygen ratio is higher.
The arbuscular mycorrhizal fungal protein glomalin is a putative homolog of heat shock protein 60.
Gadkar, Vijay; Rillig, Matthias C
2006-10-01
Work on glomalin-related soil protein produced by arbuscular mycorrhizal (AM) fungi (AMF) has been limited because of the unknown identity of the protein. A protein band cross-reactive with the glomalin-specific antibody MAb32B11 from the AM fungus Glomus intraradices was partially sequenced using tandem liquid chromatography-mass spectrometry. A 17 amino acid sequence showing similarity to heat shock protein 60 (hsp 60) was obtained. Based on degenerate PCR, a full-length cDNA of 1773 bp length encoding the hsp 60 gene was isolated from a G. intraradices cDNA library. The ORF was predicted to encode a protein of 590 amino acids. The protein sequence had three N-terminal glycosylation sites and a string of GGM motifs at the C-terminal end. The GiHsp 60 ORF had three introns of 67, 76 and 131 bp length. The GiHsp 60 was expressed using an in vitro translation system, and the protein was purified using the 6xHis-tag system. A dot-blot assay on the purified protein showed that it was highly cross-reactive with the glomalin-specific antibody MAb32B11. The present work provides the first evidence for the identity of the glomalin protein in the model AMF G. intraradices, thus facilitating further characterization of this protein, which is of great interest in soil ecology.
2013-01-01
Background Hybridization based assays and capture systems depend on the specificity of hybridization between a probe and its intended target. A common guideline in the construction of DNA microarrays, for instance, is that avoiding complementary stretches of more than 15 nucleic acids in a 50 or 60-mer probe will eliminate sequence specific cross-hybridization reactions. Here we present a study of the behavior of partially matched oligonucleotide pairs with complementary stretches starting well below this threshold complementarity length – in silico, in solution, and at the microarray surface. The modeled behavior of pairs of oligonucleotide probes and their targets suggests that even a complementary stretch of sequence 12 nt in length would give rise to specific cross-hybridization. We designed a set of binding partners to a 50-mer oligonucleotide containing complementary stretches from 6 nt to 21 nt in length. Results Solution melting experiments demonstrate that stable partial duplexes can form when only 12 bp of complementary sequence are present; surface hybridization experiments confirm that a signal close in magnitude to full-strength signal can be obtained from hybridization of a 12 bp duplex within a 50mer oligonucleotide. Conclusions Microarray and other molecular capture strategies that rely on a 15 nt lower complementarity bound for eliminating specific cross-hybridization may not be sufficiently conservative. PMID:23445545
Tian, Wenlan; Paudel, Dev
2017-01-01
Jatropha (Jatropha curcas L.) is an economically important species with a great potential for biodiesel production. To enrich the jatropha genomic databases and resources for microgravity studies, we sequenced and annotated the transcriptome of jatropha and developed SSR and SNP markers from the transcriptome sequences. In total 1,714,433 raw reads with an average length of 441.2 nucleotides were generated. De novo assembling and clustering resulted in 115,611 uniquely assembled sequences (UASs) including 21,418 full-length cDNAs and 23,264 new jatropha transcript sequences. The whole set of UASs were fully annotated, out of which 59,903 (51.81%) were assigned with gene ontology (GO) term, 12,584 (10.88%) had orthologs in Eukaryotic Orthologous Groups (KOG), and 8,822 (7.63%) were mapped to 317 pathways in six different categories in Kyoto Encyclopedia of Genes and Genome (KEGG) database, and it contained 3,588 putative transcription factors. From the UASs, 9,798 SSRs were discovered with AG/CT as the most frequent (45.8%) SSR motif type. Further 38,693 SNPs were detected and 7,584 remained after filtering. This UAS set has enriched the current jatropha genomic databases and provided a large number of genetic markers, which can facilitate jatropha genetic improvement and many other genetic and biological studies. PMID:28154822
Zulfiqar, Awais; Zhang, Jie; Cui, Xiaofeng; Qian, Yajuan; Zhou, Xueping; Xie, Yan
2012-01-01
A begomovirus disease complex associated with Vernonia cinerea showing yellow vein symptoms was studied. The full-length genomic DNA was comprised of 2739 nucleotides (nt) and contained the typical genome structure of begomoviruses. Comparison analysis showed that it shared the highest (78.9%) nucleotide sequence identity with recently characterized Vernonia yellow vein virus (VeYVV) from India. For associated satellites, betasatellite showed the highest nucleotide sequence identity (52.1%) with Vernonia yellow vein virus betasatellite (VeYVVB) and alphasatellite shared the highest sequence identity (70.7%) with Gossypium mustelinium symptomless alphasatellite (GMusSLA). It is a member of a distinct species with cognate alpha- and betasatellites for which the name Vernonia yellow vein Fujian virus (VeYVFjV) is proposed.
Lampe, David J; Witherspoon, David J; Soto-Adames, Felipe N; Robertson, Hugh M
2003-04-01
We report the isolation and sequencing of genomic copies of mariner transposons involved in recent horizontal transfers into the genomes of the European earwig, Forficula auricularia; the European honey bee, Apis mellifera; the Mediterranean fruit fly, Ceratitis capitata; and a blister beetle, Epicauta funebris, insects from four different orders. These elements are in the mellifera subfamily and are the second documented example of full-length mariner elements involved in this kind of phenomenon. We applied maximum likelihood methods to the coding sequences and determined that the copies in each genome were evolving neutrally, whereas reconstructed ancestral coding sequences appeared to be under selection, which strengthens our previous hypothesis that the primary selective constraint on mariner sequence evolution is the act of horizontal transfer between genomes.
Kiesler, Kevin M; Coble, Michael D; Hall, Thomas A; Vallone, Peter M
2014-01-01
A set of 711 samples from four U.S. population groups was analyzed using a novel mass spectrometry based method for mitochondrial DNA (mtDNA) base composition profiling. Comparison of the mass spectrometry results with Sanger sequencing derived data yielded a concordance rate of 99.97%. Length heteroplasmy was identified in 46% of samples and point heteroplasmy was observed in 6.6% of samples in the combined mass spectral and Sanger data set. Using discrimination capacity as a metric, Sanger sequencing of the full control region had the highest discriminatory power, followed by the mass spectrometry base composition method, which was more discriminating than Sanger sequencing of just the hypervariable regions. This trend is in agreement with the number of nucleotides covered by each of the three assays. Published by Elsevier Ireland Ltd.
Zhang, Xiao-Yan; Xiang, Hai-Ying; Zhou, Cui-Ji; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui
2014-08-01
For brassica yellows virus (BrYV), proposed to be a member of a new polerovirus species, two clearly distinct genotypes (BrYV-A and BrYV-B) have been described. In this study, the complete nucleotide sequences of two BrYV isolates from radish and Chinese cabbage were determined. Sequence analysis suggested that these isolates represent a new genotype, referred to here as BrYV-C. The full-length sequences of the two BrYV-C isolates shared 93.4-94.8 % identity with BrYV-A and BrYV-B. Further phylogenetic analysis showed that the BrYV-C isolates formed a subgroup that was distinct from the BrYV-A and BrYV-B isolates based on all of the proteins except P5.
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
Large-scale collection of full-length cDNA and transcriptome analysis in Hevea brasiliensis
Makita, Yuko; Ng, Kiaw Kiaw; Veera Singham, G.; Kawashima, Mika; Hirakawa, Hideki; Sato, Shusei
2017-01-01
Abstract Natural rubber has unique physical properties that cannot be replaced by products from other latex-producing plants or petrochemically produced synthetic rubbers. Rubber from Hevea brasiliensis is the main commercial source for this natural rubber that has a cis-polyisoprene configuration. For sustainable production of enough rubber to meet demand elucidation of the molecular mechanisms involved in the production of latex is vital. To this end, we firstly constructed rubber full-length cDNA libraries of RRIM 600 cultivar and sequenced around 20,000 clones by the Sanger method and over 15,000 contigs by Illumina sequencer. With these data, we updated around 5,500 gene structures and newly annotated around 9,500 transcription start sites. Second, to elucidate the rubber biosynthetic pathways and their transcriptional regulation, we carried out tissue- and cultivar-specific RNA-Seq analysis. By using our recently published genome sequence, we confirmed the expression patterns of the rubber biosynthetic genes. Our data suggest that the cytoplasmic mevalonate (MVA) pathway is the main route for isoprenoid biosynthesis in latex production. In addition to the well-studied polymerization factors, we suggest that rubber elongation factor 8 (REF8) is a candidate factor in cis-polyisoprene biosynthesis. We have also identified 39 transcription factors that may be key regulators in latex production. Expression profile analysis using two additional cultivars, RRIM 901 and PB 350, via an RNA-Seq approach revealed possible expression differences between a high latex-yielding cultivar and a disease-resistant cultivar. PMID:28431015
Wang, Bu-Yong; Wen, Rong-Rong; Ma, Ling
2017-09-26
Aphelenchoides besseyi, the nematode agent of rice tip white disease, causes huge economic losses in almost all the rice-growing regions of the world. Glutathione peroxidase (GPx), an esophageal glands secretion protein, plays important roles in the parasitism, immune evasion, reproduction and pathogenesis of many plant-parasitic nematodes (PPNs). Therefore, GPx is a promising target for control A. besseyi. Here, the full-length sequence of the GPx gene from A. besseyi (AbGPx1) was cloned using the rapid amplification of cDNA ends method. The full-length 944 bp AbGPx1 sequence, which contains a 678 bp open reading frame, encodes a 225 amino acid protein. The deduced amino acid sequence of the AbGPxl shares highly homologous with other nematode GPxs, and showed the closest evolutionary relationship with DrGPx. In situ hybridization showed that AbGPx1 was constitutively expressed in the esophageal glands of A. besseyi, suggesting its potential roles in parasitism and reproduction. RNA interference (RNAi) was used to assess the functions of the AbGPx1 gene, and quantitative real-time PCR was used to monitor the RNAi effects. After treatment with dsRNA for 12 h, AbGPx1 expression levels and reproduction in the nematodes decreased compared with the same parameters in the control group; thus, the AbGPx1 gene is likely to be associated with the development, reproduction, and infection ability of A. besseyi. These findings may open new avenues towards nematode control.
Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried
2015-01-01
A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Heat Recovery at Army Materiel Command (AMC) Facilities
1988-06-01
industrial complexes and somewhat smaller commercial/ HVAC ** systems, a portion of this waste heat can be recovered, improving energy efficiency. Heat...devices are used in sequence. Other shell-and-tube applications include heat transfer from process liquids, condensates, and cooling water. Two...pipe consists of a sealed element involving an annular capillary wick con- tained inside the full length of the tube, with an appropriate entrained
USDA-ARS?s Scientific Manuscript database
Polygalacturonase-inhibiting proteins (PGIPs) are leucine-rich repeat (LRR) proteins involved in plant defense. Sugar beet (Beta vulgaris L.) PGIP genes, BvPGIP1, BvPGIP2 and BvPGIP3, were isolated from two breeding lines, F1016 and F1010. Full-length cDNA sequences of the three BvPGIP genes encod...
Liu, Tong; Pan, Luqing; Cai, Yuefeng; Miao, Jingjing
2015-01-25
HSP70 and HSP90 are the most important heat shock proteins (HSPs), which play the key roles in the cell as molecular chaperones and may involve in metabolic detoxification. The present research has obtained full-length cDNAs of genes HSP70 and HSP90 from the clam Ruditapes philippinarum and studied the transcriptional responses of the two genes when exposed to benzo(a)pyrene (BaP). The full-length RpHSP70 cDNA was 2336bp containing a 5' untranslated region (UTR) of 51bp, a 3' UTR of 335bp and an open reading frame (ORF) of 1950bp encoding 650 amino acid residues. The full-length RpHSP90 cDNA was 2839bp containing a 107-bp 5' UTR, a 554-bp 3' UTR and a 2178-bp ORF encoding 726 amino acid residues. The deduced amino acid sequences of RpHSP70 and RpHSP90 shared the highest identity with the sequences of Paphia undulata, and the phylogenetic trees showed that the evolutions of RpHSP70 and RpHSP90 were almost in accord with the evolution of species. The RpHSP70 and RpHSP90 mRNA expressions were detected in all tested tissues in the adult clams (digestive gland, gill, adductor muscle and mantle) and the highest mRNA expression level was observed in the digestive gland compared to other tissues. Quantitative real-time RT-PCR analysis revealed that mRNA expression levels of the clam RpHSP70, RpHSP90 and other xenobiotic metabolizing enzymes (XMEs) (AhR, DD, GST, GPx) in the digestive gland of R. philippinarum were induced by benzo(a)pyrene (BaP) and the absolute expression levels of these genes showed a temporal and dose-dependent response. The results suggested that RpHSP70 and RpHSP90 were involved in the metabolic detoxification of BaP in the clam R. philippinarum. Copyright © 2014 Elsevier B.V. All rights reserved.
Sequential addition of short DNA oligos in DNA-polymerase-based synthesis reactions
Gardner, Shea N; Mariella, Jr., Raymond P; Christian, Allen T; Young, Jennifer A; Clague, David S
2013-06-25
A method of preselecting a multiplicity of DNA sequence segments that will comprise the DNA molecule of user-defined sequence, separating the DNA sequence segments temporally, and combining the multiplicity of DNA sequence segments with at least one polymerase enzyme wherein the multiplicity of DNA sequence segments join to produce the DNA molecule of user-defined sequence. Sequence segments may be of length n, where n is an odd integer. In one embodiment the length of desired hybridizing overlap is specified by the user and the sequences and the protocol for combining them are guided by computational (bioinformatics) predictions. In one embodiment sequence segments are combined from multiple reading frames to span the same region of a sequence, so that multiple desired hybridizations may occur with different overlap lengths.
NASA Astrophysics Data System (ADS)
Zhao, Chunling; Ju, Jiyu
2015-06-01
The full-length cDNA of a protease gene from a marine annelid Arenicola cristata was amplified through rapid amplification of cDNA ends technique and sequenced. The size of the cDNA was 936 bp in length, including an open reading frame encoding a polypeptide of 270 amino acid residues. The deduced amino acid sequnce consisted of pro- and mature sequences. The protease belonged to the serine protease family because it contained the highly conserved sequence GDSGGP. This protease was novel as it showed a low amino acid sequence similarity (< 40%) to other serine proteases. The gene encoding the active form of A. cristata serine protease was cloned and expressed in E. coli. Purified recombinant protease in a supernatant could dissolve an artificial fibrin plate with plasminogen-rich fibrin, whereas the plasminogen-free fibrin showed no clear zone caused by hydrolysis. This result suggested that the recombinant protease showed an indirect fibrinolytic activity of dissolving fibrin, and was probably a plasminogen activator. A rat model with venous thrombosis was established to demonstrate that the recombinant protease could also hydrolyze blood clot in vivo. Therefore, this recombinant protease may be used as a thrombolytic agent for thrombosis treatment. To our knowledge, this study is the first of reporting the fibrinolytic serine protease gene in A. cristata.
Caught in the act: the lifetime of synaptic intermediates during the search for homology on DNA
Mani, Adam; Braslavsky, Ido; Arbel-Goren, Rinat; Stavans, Joel
2010-01-01
Homologous recombination plays pivotal roles in DNA repair and in the generation of genetic diversity. To locate homologous target sequences at which strand exchange can occur within a timescale that a cell’s biology demands, a single-stranded DNA-recombinase complex must search among a large number of sequences on a genome by forming synapses with chromosomal segments of DNA. A key element in the search is the time it takes for the two sequences of DNA to be compared, i.e. the synapse lifetime. Here, we visualize for the first time fluorescently tagged individual synapses formed by RecA, a prokaryotic recombinase, and measure their lifetime as a function of synapse length and differences in sequence between the participating DNAs. Surprisingly, lifetimes can be ∼10 s long when the DNAs are fully heterologous, and much longer for partial homology, consistently with ensemble FRET measurements. Synapse lifetime increases rapidly as the length of a region of full homology at either the 3′- or 5′-ends of the invading single-stranded DNA increases above 30 bases. A few mismatches can reduce dramatically the lifetime of synapses formed with nearly homologous DNAs. These results suggest the need for facilitated homology search mechanisms to locate homology successfully within the timescales observed in vivo. PMID:20044347
NASA Technical Reports Server (NTRS)
Chang, D.; Haynes, J. I. 2nd; Brady, J. N.; Consigli, R. A.; Spooner, B. S. (Principal Investigator)
1992-01-01
A nuclear localization signal (NLS) has been identified in the N-terminal (Ala1-Pro-Lys-Arg-Lys-Ser-Gly-Val-Ser-Lys-Cys11) amino acid sequence of the polyomavirus major capsid protein VP1. The importance of this amino acid sequence for nuclear transport of VP1 protein was demonstrated by a genetic "subtractive" study using the constructs pSG5VP1 (full-length VP1) and pSG5 delta 5'VP1 (truncated VP1, lacking amino acids Ala1-Cys11). These constructs were used to transfect COS-7 cells, and expression and intracellular localization of the VP1 protein was visualized by indirect immunofluorescence. These studies revealed that the full-length VP1 was expressed and localized in the nucleus, while the truncated VP1 protein was localized in the cytoplasm and not transported to the nucleus. These findings were substantiated by an "additive" approach using FITC-labeled conjugates of synthetic peptides homologous to the NLS of VP1 cross-linked to bovine serum albumin or immunoglobulin G. Both conjugates localized in the nucleus after microinjection into the cytoplasm of 3T6 cells. The importance of individual amino acids found in the basic sequence (Lys3-Arg-Lys5) of the NLS was also investigated. This was accomplished by synthesizing three additional peptides in which lysine-3 was substituted with threonine, arginine-4 was substituted with threonine, or lysine-5 was substituted with threonine. It was found that lysine-3 was crucial for nuclear transport, since substitution of this amino acid with threonine prevented nuclear localization of the microinjected, FITC-labeled conjugate.
Li, Xiaoyu; Ma, Junguo; Lei, Wenlong; Li, Jie; Zhang, Yaning; Li, Yuanlong
2013-08-01
Cytochrome P450 (CYP) enzymes, especially CYP 3A, are responsible for metabolizing of various kinds of endogenous and exogenous compounds in animals. In the present study, a full-length sequence of CYP 3A137 cDNA in silver carp was cloned and sequenced, and then a phylogenetic tree of CYP 3A was structured. Additionally, the acute toxicity of the ionic liquid 1-octyl-3-methylimidazolium bromide ([C8mim]Br) on silver carp and transcription and microsome enzyme activity of CYP 3A137 in the liver of silver fish after rifampicin or [C8mim]Br exposure were also determined in this study. The results show that the full length of CYP 3A137 cDNA is 1810 base pair (bp) long and contains an open reading frame of 1539bp encoding a protein of 513 amino acids. Sequence analysis reveals that CYP 3A137 is highly conserved in fish. Moreover, the results of quantitative real-time polymerase chain reaction reveal that CYP 3A137 in silver carp is constitutively expressed in all tissues examined and the sequence of expression rate is liver>intestine>kidney>spleen>brain>heart>muscle. Finally, the results of acute toxicity tests indicate that both rifampicin and [C8mim]Br significantly up-regulate the expression of CYP 3A137 at mRNA level and increase CYP 3A137 enzyme activity in fish liver, suggesting that CYP 3A137 be involved in metabolism of [C8mim]Br in silver carp. Copyright © 2013 Elsevier Ltd. All rights reserved.
Huang, Shengbing; Song, Wei; Lin, Qishui
2005-08-01
A membrane-bound protein was purified from rat liver mitochondria. After being digested with V8 protease, two peptides containing identical 14 amino acid residue sequences were obtained. Using the 14 amino acid peptide derived DNA sequence as gene specific primer, the cDNA of correspondent gene 5'-terminal and 3'-terminal were obtained by RACE technique. The full-length cDNA that encoded a protein of 616 amino acids was thus cloned, which included the above mentioned peptide sequence. The full length cDNA was highly homologous to that of human ETF-QO, indicating that it may be the cDNA of rat ETF-QO. ETF-QO is an iron sulfur protein located in mitochondria inner membrane containing two kinds of redox center: FAD and [4Fe-4S] center. After comparing the sequence from the cDNA of the 616 amino acids protein with that of the mature protein of rat liver mitochondria, it was found that the N terminal 32 amino acid residues did not exist in the mature protein, indicating that the cDNA was that of ETF-QOp. When the cDNA was expressed in Saccharomyces cerevisiae with inducible vectors, the protein product was enriched in mitochondrial fraction and exhibited electron transfer activity (NBT reductase activity) of ETF-QO. Results demonstrated that the 32 amino acid peptide was a mitochondrial targeting peptide, and both FAD and iron-sulfur cluster were inserted properly into the expressed ETF-QO. ETF-QO had a high level expression in rat heart, liver and kidney. The fusion protein of GFP-ETF-QO co-localized with mitochondria in COS-7 cells.
van der Walt, Elizna M; Smuts, Izelle; Taylor, Robert W; Elson, Joanna L; Turnbull, Douglass M; Louw, Roan; van der Westhuizen, Francois H
2012-06-01
Mitochondrial disease can be attributed to both mitochondrial and nuclear gene mutations. It has a heterogeneous clinical and biochemical profile, which is compounded by the diversity of the genetic background. Disease-based epidemiological information has expanded significantly in recent decades, but little information is known that clarifies the aetiology in African patients. The aim of this study was to investigate mitochondrial DNA variation and pathogenic mutations in the muscle of diagnosed paediatric patients from South Africa. A cohort of 71 South African paediatric patients was included and a high-throughput nucleotide sequencing approach was used to sequence full-length muscle mtDNA. The average coverage of the mtDNA genome was 81±26 per position. After assigning haplogroups, it was determined that although the nature of non-haplogroup-defining variants was similar in African and non-African haplogroup patients, the number of substitutions were significantly higher in African patients. We describe previously reported disease-associated and novel variants in this cohort. We observed a general lack of commonly reported syndrome-associated mutations, which supports clinical observations and confirms general observations in African patients when using single mutation screening strategies based on (predominantly non-African) mtDNA disease-based information. It is finally concluded that this first extensive report on muscle mtDNA sequences in African paediatric patients highlights the need for a full-length mtDNA sequencing strategy, which applies to all populations where specific mutations is not present. This, in addition to nuclear DNA gene mutation and pathogenicity evaluations, will be required to better unravel the aetiology of these disorders in African patients.
Vaughan, Sue; Wickstead, Bill; Gull, Keith; Addinall, Stephen G
2004-01-01
The FtsZ protein is a polymer-forming GTPase which drives bacterial cell division and is structurally and functionally related to eukaryotic tubulins. We have searched for FtsZ-related sequences in all freely accessible databases, then used strict criteria based on the tertiary structure of FtsZ and its well-characterized in vitro and in vivo properties to determine which sequences represent genuine homologues of FtsZ. We have identified 225 full-length FtsZ homologues, which we have used to document, phylum by phylum, the primary sequence characteristics of FtsZ homologues from the Bacteria, Archaea, and Eukaryota. We provide evidence for at least five independent ftsZ gene-duplication events in the bacterial kingdom and suggest the existence of three ancestoral euryarchaeal FtsZ paralogues. In addition, we identify "FtsZ-like" sequences from Bacteria and Archaea that, while showing significant sequence similarity to FtsZs, are unlikely to bind and hydrolyze GTP.
Open-pNovo: De Novo Peptide Sequencing with Thousands of Protein Modifications.
Yang, Hao; Chi, Hao; Zhou, Wen-Jing; Zeng, Wen-Feng; He, Kun; Liu, Chao; Sun, Rui-Xiang; He, Si-Min
2017-02-03
De novo peptide sequencing has improved remarkably, but sequencing full-length peptides with unexpected modifications is still a challenging problem. Here we present an open de novo sequencing tool, Open-pNovo, for de novo sequencing of peptides with arbitrary types of modifications. Although the search space increases by ∼300 times, Open-pNovo is close to or even ∼10-times faster than the other three proposed algorithms. Furthermore, considering top-1 candidates on three MS/MS data sets, Open-pNovo can recall over 90% of the results obtained by any one traditional algorithm and report 5-87% more peptides, including 14-250% more modified peptides. On a high-quality simulated data set, ∼85% peptides with arbitrary modifications can be recalled by Open-pNovo, while hardly any results can be recalled by others. In summary, Open-pNovo is an excellent tool for open de novo sequencing and has great potential for discovering unexpected modifications in the real biological applications.
Novel antigenic shift in HA sequences of H1N1 viruses detected by big data analysis.
Zhang, Ruiying; Xu, Chongfeng; Duan, Ziyuan
2017-07-01
The influenza virus H1N1 has been prevalent all over the world for nearly a century. Many studies on its evolutionary history, substitution rate and antigenicity-associated sites have been done with small datasets. To have a complete view, we analysed 3171 full-length HA sequences from human H1N1 viruses sampled from 1918 to 2016, and discovered a new clade has formed with sequences isolated in Iran. Based on genetic distance calculations, we revealed an uneven evolutionary rate among sequences isolated in different years. We also found that the HA1 fragment of the new clade is like that of viruses that existed in the 1930s, while the HA2 fragment is closely associated with strains isolated after the 2009 pandemic. This new, "mixed" HA sequence indicates a cryptic antigenic shift event occurred, and it should draw more attention to the new clade identified from sequences from Iran. Copyright © 2017. Published by Elsevier B.V.
Saeng-Chuto, Kepalee; Stott, Christopher James; Wegner, Matthew; Kaewprommal, Pavita; Piriyapongsa, Jittima; Nilubol, Dachrit
2018-06-08
Senecavirus A (SVA) is a novel picornavirus that causes porcine idiopathic vesicular disease characterized by lameness, coronary band hyperemia, and vesicles on the snout and coronary bands. An increase in the detection rate of SVA in several countries suggests that the disease has become a widespread problem. Herein, we report the detection of SVA in Thailand and the characterization of full-length genomic sequences of six Thai SVA isolates. Phylogenetic, genetic, recombination, and evolutionary analyses were performed. The full-length genome, excluding the poly (A) tail of the Thai SVA isolates, was 7282 nucleotides long, with the genomic organization resembling other previously reported SVA isolates. Phylogenetic and genetic analyses based on full-length genome demonstrated that the Thai SVA isolates were grouped in a novel cluster, separated from SVA isolates from other countries. Although the Thai SVA isolates were closely related to 11-55910-3, the first SVA isolate from Canada, with 97.9-98.2%, but they are different. Evolutionary and recombinant analyses suggested that the Thai SVA isolates shared a common ancestor with the 11-55910-3 isolate. The positive selection in the VP4 and 3D genes suggests that the virus was not externally introduced, but rather continuously evolved in the population prior to the first detection. Addition, the presence of SVA could have been ignored due to the presence of other pathogens causing similar clinical diseases. This study warrants further investigations into molecular epidemiology and genetic evolution of the SVA in Thailand. Copyright © 2017. Published by Elsevier B.V.
First report of an HIV-1 triple recombinant of subtypes B, C and F in Buenos Aires, Argentina.
Pando, María A; Eyzaguirre, Lindsay M; Segura, Marcela; Bautista, Christian T; Marone, Rubén; Ceballos, Ana; Montano, Silvia M; Sánchez, José L; Weissenbacher, Mercedes; Avila, María M; Carr, Jean K
2006-09-07
We describe the genetic diversity of currently transmitted strains of HIV-1 in men who have sex with men (MSM) in Buenos Aires, Argentina between 2000 and 2004. Nearly full-length sequence analysis of 10 samples showed that 6 were subtype B, 3 were BF recombinant and 1 was a triple recombinant of subtypes B, C and F. The 3 BF recombinants were 3 different unique recombinant forms. Full genome analysis of one strain that was subtype F when sequenced in pol was found to be a triple recombinant. Gag and pol were predominantly subtype F, while gp120 was subtype B; there were regions of subtype C interspersed throughout. The young man infected with this strain reported multiple sexual partners and sero-converted between May and November of 2004. This study reported for the first time the full genome analysis of a triple recombinant between subtypes B, C and F, that combines in one virus the three most common subtypes in South America.
Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo
2003-01-01
To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Dang, Que; Hu, Wei-Shau
2001-01-01
Homology between the two repeat (R) regions in the retroviral genome mediates minus-strand DNA transfer during reverse transcription. We sought to define the effects of R homology lengths on minus-strand DNA transfer. We generated five murine leukemia virus (MLV)-based vectors that contained identical sequences but different lengths of the 3′ R (3, 6, 12, 24 and 69 nucleotides [nt]); 69 nt is the full-length MLV R. After one round of replication, viral titers from the vector with a full-length downstream R were compared with viral titers generated from the other four vectors with reduced R lengths. Viral titers generated from vectors with R lengths reduced to one-third (24 nt) or one-sixth (12 nt) that of the wild type were not significantly affected; however, viral titers generated from vectors with only 3- or 6-nt homology in the R region were significantly lower. Because expression and packaging of the RNA were similar among all the vectors, the differences in the viral titers most likely reflected the impact of the homology lengths on the efficiency of minus-strand DNA transfer. The molecular nature of minus-strand DNA transfer was characterized in 63 proviruses. Precise R-to-R transfer was observed in most proviruses generated from vectors with 12-, 24-, or 69-nt homology in R, whereas aberrant transfers were predominantly used to generate proviruses from vectors with 3- or 6-nt homology. Reverse transcription using RNA transcribed from an upstream promoter, termed read-in RNA transcripts, resulted in most of the aberrant transfers. These data demonstrate that minus-strand DNA transfer is homology driven and a minimum homology length is required for accurate and efficient minus-strand DNA transfer. PMID:11134294
Mapping the Space of Genomic Signatures
Kari, Lila; Hill, Kathleen A.; Sayem, Abu S.; Karamichalis, Rallis; Bryans, Nathaniel; Davis, Katelyn; Dattani, Nikesh S.
2015-01-01
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. PMID:26000734
Large-Scale Concatenation cDNA Sequencing
Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.
1997-01-01
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Goyal, K; Browne, J A; Burnell, A M; Tunnacliffe, A
2005-06-01
Accumulation of the non-reducing disaccharide trehalose is associated with desiccation tolerance during anhydrobiosis in a number of invertebrates, but there is little information on trehalose biosynthetic genes in these organisms. We have identified two trehalose-6-phosphate synthase (tps) genes in the anhydrobiotic nematode Aphelenchus avenae and determined full length cDNA sequences for both; for comparison, full length tps cDNAs from the model nematode, Caenorhabditis elegans, have also been obtained. The A. avenae genes encode very similar proteins containing the catalytic domain characteristic of the GT-20 family of glycosyltransferases and are most similar to tps-2 of C. elegans; no evidence was found for a gene in A. avenae corresponding to Ce-tps-1. Analysis of A. avenae tps cDNAs revealed several features of interest, including alternative trans-splicing of spliced leader sequences in Aav-tps-1, and four different, novel SL1-related trans-spliced leaders, which were different to the canonical SL1 sequence found in all other nematodes studied. The latter observation suggests that A. avenae does not comply with the strict evolutionary conservation of SL1 sequences observed in other species. Unusual features were also noted in predicted nematode TPS proteins, which distinguish them from homologues in other higher eukaryotes (plants and insects) and in micro-organisms. Phylogenetic analysis confirmed their membership of the GT-20 glycosyltransferase family, but indicated an accelerated rate of molecular evolution. Furthermore, nematode TPS proteins possess N- and C-terminal domains, which are unrelated to those of other eukaryotes: nematode C-terminal domains, for example, do not contain trehalose-6-phosphate phosphatase-like sequences, as seen in plant and insect homologues. During onset of anhydrobiosis, both tps genes in A. avenae are upregulated, but exposure to cold or increased osmolarity also results in gene induction, although to a lesser extent. Trehalose seems likely therefore to play a role in a number of stress responses in nematodes.
Tulman, E. R.; Delhon, G.; Afonso, C. L.; Lu, Z.; Zsak, L.; Sandybaev, N. T.; Kerembekova, U. Z.; Zaitsev, V. L.; Kutish, G. F.; Rock, D. L.
2006-01-01
Here we present the genomic sequence of horsepox virus (HSPV) isolate MNR-76, an orthopoxvirus (OPV) isolated in 1976 from diseased Mongolian horses. The 212-kbp genome contained 7.5-kbp inverted terminal repeats and lacked extensive terminal tandem repetition. HSPV contained 236 open reading frames (ORFs) with similarity to those in other OPVs, with those in the central 100-kbp region most conserved relative to other OPVs. Phylogenetic analysis of the conserved region indicated that HSPV is closely related to sequenced isolates of vaccinia virus (VACV) and rabbitpox virus, clearly grouping together these VACV-like viruses. Fifty-four HSPV ORFs likely represented fragments of 25 orthologous OPV genes, including in the central region the only known fragmented form of an OPV ribonucleotide reductase large subunit gene. In terminal genomic regions, HSPV lacked full-length homologues of genes variably fragmented in other VACV-like viruses but was unique in fragmentation of the homologue of VACV strain Copenhagen B6R, a gene intact in other known VACV-like viruses. Notably, HSPV contained in terminal genomic regions 17 kbp of OPV-like sequence absent in known VACV-like viruses, including fragments of genes intact in other OPVs and approximately 1.4 kb of sequence present only in cowpox virus (CPXV). HSPV also contained seven full-length genes fragmented or missing in other VACV-like viruses, including intact homologues of the CPXV strain GRI-90 D2L/I4R CrmB and D13L CD30-like tumor necrosis factor receptors, D3L/I3R and C1L ankyrin repeat proteins, B19R kelch-like protein, D7L BTB/POZ domain protein, and B22R variola virus B22R-like protein. These results indicated that HSPV contains unique genomic features likely contributing to a unique virulence/host range phenotype. They also indicated that while closely related to known VACV-like viruses, HSPV contains additional, potentially ancestral sequences absent in other VACV-like viruses. PMID:16940536
Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N.
2012-01-01
The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ∼95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development. PMID:22984496
Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N
2012-01-01
The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ~95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development.
NASA Technical Reports Server (NTRS)
Hwang, I.; Harper, J. F.; Liang, F.; Sze, H.
2000-01-01
To investigate how calmodulin regulates a unique subfamily of Ca(2+) pumps found in plants, we examined the kinetic properties of isoform ACA2 identified in Arabidopsis. A recombinant ACA2 was expressed in a yeast K616 mutant deficient in two endogenous Ca(2+) pumps. Orthovanadate-sensitive (45)Ca(2+) transport into vesicles isolated from transformants demonstrated that ACA2 is a Ca(2+) pump. Ca(2+) pumping by the full-length protein (ACA2-1) was 4- to 10-fold lower than that of the N-terminal truncated ACA2-2 (Delta2-80), indicating that the N-terminal domain normally acts to inhibit the pump. An inhibitory sequence (IC(50) = 4 microM) was localized to a region within valine-20 to leucine-44, because a peptide corresponding to this sequence lowered the V(max) and increased the K(m) for Ca(2+) of the constitutively active ACA2-2 to values comparable to the full-length pump. The peptide also blocked the activity (IC(50) = 7 microM) of a Ca(2+) pump (AtECA1) belonging to a second family of Ca(2+) pumps. This inhibitory sequence appears to overlap with a calmodulin-binding site in ACA2, previously mapped between aspartate-19 and arginine-36 (J.F. Harper, B. Hong, I. Hwang, H.Q. Guo, R. Stoddard, J.F. Huang, M.G. Palmgren, H. Sze inverted question mark1998 J Biol Chem 273: 1099-1106). These results support a model in which the pump is kept "unactivated" by an intramolecular interaction between an autoinhibitory sequence located between residues 20 and 44 and a site in the Ca(2+) pump core that is highly conserved between different Ca(2+) pump families. Results further support a model in which activation occurs as a result of Ca(2+)-induced binding of calmodulin to a site overlapping or immediately adjacent to the autoinhibitory sequence.
Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C
2013-01-16
The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine
2002-06-15
To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.
High diversity of picornaviruses in rats from different continents revealed by deep sequencing.
Hansen, Thomas Arn; Mollerup, Sarah; Nguyen, Nam-Phuong; White, Nicole E; Coghlan, Megan; Alquezar-Planas, David E; Joshi, Tejal; Jensen, Randi Holm; Fridholm, Helena; Kjartansdóttir, Kristín Rós; Mourier, Tobias; Warnow, Tandy; Belsham, Graham J; Bunce, Michael; Willerslev, Eske; Nielsen, Lars Peter; Vinner, Lasse; Hansen, Anders Johannes
2016-08-17
Outbreaks of zoonotic diseases in humans and livestock are not uncommon, and an important component in containment of such emerging viral diseases is rapid and reliable diagnostics. Such methods are often PCR-based and hence require the availability of sequence data from the pathogen. Rattus norvegicus (R. norvegicus) is a known reservoir for important zoonotic pathogens. Transmission may be direct via contact with the animal, for example, through exposure to its faecal matter, or indirectly mediated by arthropod vectors. Here we investigated the viral content in rat faecal matter (n=29) collected from two continents by analyzing 2.2 billion next-generation sequencing reads derived from both DNA and RNA. Among other virus families, we found sequences from members of the Picornaviridae to be abundant in the microbiome of all the samples. Here we describe the diversity of the picornavirus-like contigs including near-full-length genomes closely related to the Boone cardiovirus and Theiler's encephalomyelitis virus. From this study, we conclude that picornaviruses within R. norvegicus are more diverse than previously recognized. The virome of R. norvegicus should be investigated further to assess the full potential for zoonotic virus transmission.
Postel, Alexander; Jha, Vijay C; Schmeiser, Stefanie; Becher, Paul
2013-01-01
Classical swine fever (CSF) is a major constraint to pig production worldwide, and in many developing countries, the epidemiological status is unknown. Here, for the first time, molecular identification and characterization of CSFV isolates from two recent outbreaks in Nepal are presented. Analysis of full-length E2-encoding sequences revealed that these isolates belonged to CSFV subgenotype 2.2 and had highest genetic similarity to isolates from India. Hence, for CSFV, Nepal and India should be regarded as one epidemiological unit. Both Nepalese isolates exhibited significant sequence differences, excluding a direct epidemiological connection and suggesting that CSFV is endemic in that country.
Metatranscriptomics of Soil Eukaryotic Communities.
Yadav, Rajiv K; Bragalini, Claudia; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia
2016-01-01
Functions expressed by eukaryotic organisms in soil can be specifically studied by analyzing the pool of eukaryotic-specific polyadenylated mRNA directly extracted from environmental samples. In this chapter, we describe two alternative protocols for the extraction of high-quality RNA from soil samples. Total soil RNA or mRNA can be converted to cDNA for direct high-throughput sequencing. Polyadenylated mRNA-derived full-length cDNAs can also be cloned in expression plasmid vectors to constitute soil cDNA libraries, which can be subsequently screened for functional gene categories. Alternatively, the diversity of specific gene families can also be explored following cDNA sequence capture using exploratory oligonucleotide probes.
Horse cDNA clones encoding two MHC class I genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barbis, D.P.; Maher, J.K.; Stanek, J.
1994-12-31
Two full-length clones encoding MHC class I genes were isolated by screening a horse cDNA library, using a probe encoding in human HLA-A2.2Y allele. The library was made in the pcDNA1 vector (Invitrogen, San Diego, CA), using mRNA from peripheral blood lymphocytes obtained from a Thoroughbred stallion (No. 0834) homozygous for a common horse MHC haplotype (ELA-A2, -B2, -D2; Antczak et al. 1984; Donaldson et al. 1988). The clones were sequenced, using SP6 and T7 universal primers and horse-specific oligonucleotides designed to extend previously determined sequences.
USDA-ARS?s Scientific Manuscript database
Seed-transmitted viruses have caused significant damage to watermelon crops in Korea in recent years, with Cucumber green mottle mosaic virus (CGMMV) infection widespread as a result of infected seed lots. To determine the likely origin of CGMMV infection, we collected CGMMV isolates from watermelon...
Identification and characterization of a novel serine-threonine kinase gene from the Xp22 region.
Montini, E; Andolfi, G; Caruso, A; Buchner, G; Walpole, S M; Mariani, M; Consalez, G; Trump, D; Ballabio, A; Franco, B
1998-08-01
Eukaryotic protein kinases are part of a large and expanding family of proteins. Through our transcriptional mapping effort in the Xp22 region, we have isolated and sequenced the full-length transcript of STK9, a novel cDNA highly homologous to serine-threonine kinases. A number of human genetic disorders have been mapped to the region where STK9 has been localized including Nance-Horan (NH) syndrome, oral-facial-digital syndrome type 1 (OFD1), and a novel locus for nonsyndromic sensorineural deafness (DFN6). To evaluate the possible involvement of STK9 in any of the above-mentioned disorders, a 2416-bp full-length cDNA was assembled. The entire genomic structure of the gene, which is composed of 20 coding exons, was determined. Northern analysis revealed a transcript larger than 9.5 kb in several tissues including brain, lung, and kidney. The mouse homologue (Stk9) was identified and mapped in the mouse in the region syntenic to human Xp. This location is compatible with the location of the Xcat mutant, which shows congenital cataracts very similar to those observed in NH patients. Sequence homologies, expression pattern, and mapping information in both human and mouse make STK9 a candidate gene for the above-mentioned disorders. Copyright 1998 Academic Press.
[Construction and expression of recombinant lentiviral vectors of AKT2,PDK1 and BAD].
Zhu, Jing; Chen, Bo-Jiang; Huang, Na; Li, Wei-Min
2014-03-01
To construct human protein kinase B (ATK2), phosphoinositide-dependent kinase 1 (PDK1) and bcl-2-associated death protein (BAD) lentiviral expression vector, and to determine their expressions in 293T cells. Total RNA was extracted from lung cancer tissues. The full-length coding regions of human ATK2, BAD and PDK1 cDNA were amplified via RT-PCR using specific primers, subcloned into PGEM-Teasy and then sequenced for confirmation. The full-length coding sequence was cut out with a specific restriction enzyme digest and subclone into pCDF1-MCS2-EF1-copGFP. The plasmids were transfected into 293T cells using the calcium phosphate method. The over expression of AKT2, BAD and PDK1 were detected by Western blot. AKT2, PDK1 and BAD were subcloned into pCDF1-MCS2-EF1-copGFP, with an efficiency of transfection of 100%, 95%, and 90% respectively. The virus titers were 6.7 x 10(6) PFU/mL in the supernatant. After infection, the proteins of AKT2, PDK1 and BAD were detected by Western blot. The lentivial vector pCDF1-MCS2-EF1-copGFP containing AKT2, BAD and PDK1 were successfully constructed and expressed in 293T cells.
Molecular cloning of allelopathy related genes and their relation to HHO in Eupatorium adenophorum.
Guo, Huiming; Pei, Xixiang; Wan, Fanghao; Cheng, Hongmei
2011-10-01
In this study, conserved sequence regions of HMGR, DXR, and CHS (encoding 3-hydroxy-3-methylglutaryl-CoA reductase, 1-deoxyxylulose-5-phosphate reductoisomerase and chalcone synthase, respectively) were amplified by reverse transcriptase (RT)-PCR from Eupatorium adenophorum. Quantitative real-time PCR showed that the expression of CHS was related to the level of HHO, an allelochemical isolated from E. adenophorum. Semi-quantitative RT-PCR showed that there was no significant difference in expression of genes among three different tissues, except for CHS. Southern blotting indicated that at least three CHS genes are present in the E. adenophorum genome. A full-length cDNA from CHS genes (named EaCHS1, GenBank ID: FJ913888) was cloned. The 1,455 bp cDNA contained an open reading frame (1,206 bp) encoding a protein of 401 amino acids. Preliminary bioinformatics analysis of EaCHS1 revealed that EaCHS1 was a member of CHS family, the subcellular localization predicted that EaCHS1 was a cytoplasmic protein. To the best of our knowledge, this is the first report of conserved sequences of these genes and of a full-length EaCHS1 gene in E. adenophorum. The results indicated that CHS gene is related to allelopathy of E. adenophorum.
Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Venermo, Maria S Söderlund; Young, Neal S.; Brown, Kevin E.
2008-01-01
Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus. PMID:18252260
Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Söderlund-Venermo, Maria; Young, Neal S; Brown, Kevin E
2008-05-10
Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus.
Recombination in Avian Gamma-Coronavirus Infectious Bronchitis Virus
Thor, Sharmi W.; Hilt, Deborah A.; Kissinger, Jessica C.; Paterson, Andrew H.; Jackwood, Mark W.
2011-01-01
Recombination in the family Coronaviridae has been well documented and is thought to be a contributing factor in the emergence and evolution of different coronaviral genotypes as well as different species of coronavirus. However, there are limited data available on the frequency and extent of recombination in coronaviruses in nature and particularly for the avian gamma-coronaviruses where only recently the emergence of a turkey coronavirus has been attributed solely to recombination. In this study, the full-length genomes of eight avian gamma-coronavirus infectious bronchitis virus (IBV) isolates were sequenced and along with other full-length IBV genomes available from GenBank were analyzed for recombination. Evidence of recombination was found in every sequence analyzed and was distributed throughout the entire genome. Areas that have the highest occurrence of recombination are located in regions of the genome that code for nonstructural proteins 2, 3 and 16, and the structural spike glycoprotein. The extent of the recombination observed, suggests that this may be one of the principal mechanisms for generating genetic and antigenic diversity within IBV. These data indicate that reticulate evolutionary change due to recombination in IBV, likely plays a major role in the origin and adaptation of the virus leading to new genetic types and strains of the virus. PMID:21994806
APC functions at the centrosome to stimulate microtubule growth.
Lui, Christina; Ashton, Cahora; Sharma, Manisha; Brocardo, Mariana G; Henderson, Beric R
2016-01-01
The adenomatous polyposis coli (APC) tumor suppressor is multi-functional. APC is known to localize at the centrosome, and in mitotic cells contributes to formation of the mitotic spindle. To test whether APC contributes to nascent microtubule (MT) growth at interphase centrosomes, we employed MT regrowth assays in U2OS cells to measure MT assembly before and after nocodazole treatment and release. We showed that siRNA knockdown of full-length APC delayed both initial MT aster formation and MT elongation/regrowth. In contrast, APC-mutant SW480 cancer cells displayed a defect in MT regrowth that was unaffected by APC knockdown, but which was rescued by reconstitution of full-length APC. Our findings identify APC as a positive regulator of centrosome MT initial assembly and suggest that this process is disrupted by cancer mutations. We confirmed that full-length APC associates with the MT-nucleation factor γ-tubulin, and found that the APC cancer-truncated form (1-1309) also bound to γ-tubulin through APC amino acids 1-453. While binding to γ-tubulin may help target APC to the site of MT nucleation complexes, additional C-terminal sequences of APC are required to stimulate and stabilize MT growth. Copyright © 2015 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
S Menon; S Wang
The PhoP protein from Mycobacterium tuberculosis is a response regulator of the OmpR/PhoB subfamily, whose structure consists of an N-terminal receiver domain and a C-terminal DNA-binding domain. How the DNA-binding activities are regulated by phosphorylation of the receiver domain remains unclear due to a lack of structural information on the full-length proteins. Here we report the crystal structure of the full-length PhoP of M. tuberculosis. Unlike other known structures of full-length proteins of the same subfamily, PhoP forms a dimer through its receiver domain with the dimer interface involving {alpha}4-{beta}5-{alpha}5, a common interface for activated receiver domain dimers. However, themore » switch residues, Thr99 and Tyr118, are in a conformation resembling those of nonactivated receiver domains. The Tyr118 side chain is involved in the dimer interface interactions. The receiver domain is tethered to the DNA-binding domain through a flexible linker and does not impose structural constraints on the DNA-binding domain. This structure suggests that phosphorylation likely facilitates/stabilizes receiver domain dimerization, bringing the DNA-binding domains to close proximity, thereby increasing their binding affinity for direct repeat DNA sequences.« less
Xu, Guanlong; Zhang, Xuxiao; Sun, Yipeng; Liu, Qinfang; Sun, Honglei; Xiong, Xin; Jiang, Ming; He, Qiming; Wang, Yu; Pu, Juan; Guo, Xin; Yang, Hanchun; Liu, Jinhua
2016-02-25
The PA-X protein is a fusion protein incorporating the N-terminal 191 amino acids of the PA protein with a short C-terminal sequence encoded by an overlapping ORF (X-ORF) in segment 3 that is accessed by + 1 ribosomal frameshifting, and this X-ORF exists in either full length or a truncated form (either 61-or 41-condons). Genetic evolution analysis indicates that all swine influenza viruses (SIVs) possessed full-length PA-X prior to 1985, but since then SIVs with truncated PA-X have gradually increased and become dominant, implying that truncation of this protein may contribute to the adaptation of influenza virus in pigs. To verify this hypothesis, we constructed PA-X extended viruses in the background of a "triple-reassortment" H1N2 SIV with truncated PA-X, and evaluated their biological characteristics in vitro and in vivo. Compared with full-length PA-X, SIV with truncated PA-X had increased viral replication in porcine cells and swine respiratory tissues, along with enhanced pathogenicity, replication and transmissibility in pigs. Furthermore, we found that truncation of PA-X improved the inhibition of IFN-I mRNA expression. Hereby, our results imply that truncation of PA-X may contribute to the adaptation of SIV in pigs.
2004-01-01
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O
2004-01-01
Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Detection of Human Papillomavirus Type 2 Related Sequence in Oral Papilloma
Yamaguchi, Taihei; Shindoh, Masanobu; Amemiya, Akira; Inoue, Nobuo; Kawamura, Masaaki; Sakaoka, Hiroshi; Inoue, Masakazu; Fujinaga, Kei
1998-01-01
Oral papilloma is a benign tumourous lesion. Part of this lesion is associated with human papillomavirus (HPV) infection. We analysed the genetical and histopathological evidence for HPV type 2 infection in three oral papillomas. Southern blot hybridization showed HPV 2a sequence in one lesion. Cells of the positive specimen appeared to contain high copy numbers of the viral DNA in an episomal state. In situ staining demonstrated virus capsid antigen in koilocytotic cells and surrounding cells in the hyperplastic epithelial layer. Two other specimens contained no HPV sequences by labeled probe of full length linear HPVs 2a, 6b, 11, 16, 18, 31 and 33 DNA under low stringency hybridization conditions. These results showed the possibility that HPV 2 plays a role in oral papilloma. PMID:9699941
Genome-Scale Phylogeny of the Alphavirus Genus Suggests a Marine Origin
Palacios, G.; Tesh, R. B.; Savji, N.; Guzman, H.; Sherman, M.; Weaver, S. C.; Lipkin, W. I.
2012-01-01
The genus Alphavirus comprises a diverse group of viruses, including some that cause severe disease. Using full-length sequences of all known alphaviruses, we produced a robust and comprehensive phylogeny of the Alphavirus genus, presenting a more complete evolutionary history of these viruses compared to previous studies based on partial sequences. Our phylogeny suggests the origin of the alphaviruses occurred in the southern oceans and spread equally through the Old and New World. Since lice appear to be involved in aquatic alphavirus transmission, it is possible that we are missing a louse-borne branch of the alphaviruses. Complete genome sequencing of all members of the genus also revealed conserved residues forming the structural basis of the E1 and E2 protein dimers. PMID:22190718
A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences
Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.
2017-01-01
An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Waltari, Eric; Jia, Manxue; Jiang, Caroline S; Lu, Hong; Huang, Jing; Fernandez, Cristina; Finzi, Andrés; Kaufmann, Daniel E; Markowitz, Martin; Tsuji, Moriya; Wu, Xueling
2018-01-01
Using 5' rapid amplification of cDNA ends, Illumina MiSeq, and basic flow cytometry, we systematically analyzed the expressed B cell receptor (BCR) repertoire in 14 healthy adult PBMCs, 5 HIV-1+ adult PBMCs, 5 cord blood samples, and 3 HIS-CD4/B mice, examining the full-length variable region of μ, γ, α, κ, and λ chains for V-gene usage, somatic hypermutation (SHM), and CDR3 length. Adding to the known repertoire of healthy adults, Illumina MiSeq consistently detected small fractions of reads with high mutation frequencies including hypermutated μ reads, and reads with long CDR3s. Additionally, the less studied IgA repertoire displayed similar characteristics to that of IgG. Compared to healthy adults, the five HIV-1 chronically infected adults displayed elevated mutation frequencies for all μ, γ, α, κ, and λ chains examined and slightly longer CDR3 lengths for γ, α, and λ. To evaluate the reconstituted human BCR sequences in a humanized mouse model, we analyzed cord blood and HIS-CD4/B mice, which all lacked the typical SHM seen in the adult reference. Furthermore, MiSeq revealed identical unmutated IgM sequences derived from separate cell aliquots, thus for the first time demonstrating rare clonal members of unmutated IgM B cells by sequencing.
Li, Chunhua; Lu, Ling; Wu, Xianghong; Wang, Chuanxi; Bennett, Phil; Lu, Teng; Murphy, Donald
2009-08-01
In this study, we characterized the full-length genomic sequences of 13 distinct hepatitis C virus (HCV) genotype 4 isolates/subtypes: QC264/4b, QC381/4c, QC382/4d, QC193/4g, QC383/4k, QC274/4l, QC249/4m, QC97/4n, QC93/4o, QC139/4p, QC262/4q, QC384/4r and QC155/4t. These were amplified, using RT-PCR, from the sera of patients now residing in Canada, 11 of which were African immigrants. The resulting genomes varied between 9421 and 9475 nt in length and each contains a single ORF of 9018-9069 nt. The sequences showed nucleotide similarities of 77.3-84.3 % in comparison with subtypes 4a (GenBank accession no. Y11604) and 4f (EF589160) and 70.6-72.8 % in comparison with genotype 1 (M62321/1a, M58335/1b, D14853/1c, and 1?/AJ851228) reference sequences. These similarities were often higher than those currently defined by HCV classification criteria for subtype (75.0-80.0 %) and genotype (67.0-70.0 %) division, respectively. Further analyses of the complete and partial E1 and partial NS5B sequences confirmed these 13 'provisionally assigned subtypes'.
Zhou, H; Miller, A W; Sosic, Z; Buchholz, B; Barron, A E; Kotler, L; Karger, B L
2000-03-01
This paper presents results on ultralong read DNA sequencing with relatively short separation times using capillary electrophoresis with replaceable polymer matrixes. In previous work, the effectiveness of mixed replaceable solutions of linear polyacrylamide (LPA) was demonstrated, and 1000 bases were routinely obtained in less than 1 h. Substantially longer read lengths have now been achieved by a combination of improved formulation of LPA mixtures, optimization of temperature and electric field, adjustment of the sequencing reaction, and refinement of the base-caller. The average molar masses of LPA used as DNA separation matrixes were measured by gel permeation chromatography and multiangle laser light scattering. Newly formulated matrixes comprising 0.5% (w/w) 270 kDa and 2% (w/w) 10 or 17 MDa LPA raised the optimum column temperature from 60 to 70 degrees C, increasing the selectivity for large DNA fragments, while maintaining high selectivity for small fragments as well. This improved resolution was further enhanced by reducing the electric field strength from 200 to 125 V/cm. In addition, because sequencing accuracy beyond 1000 bases was diminished by the low signal from G-terminated fragments when the standard reaction protocol for a commercial dye primer kit was used, the amount of these fragments was doubled. Augmenting the base-calling expert system with rules specific for low peak resolution also had a significant effect, contributing slightly less than half of the total increase in read length. With full optimization, this read length reached up to 1300 bases (average 1250) with 98.5% accuracy in 2 h for a single-stranded M13 template.
Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren
2015-01-01
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Flanking sequence determination and specific PCR identification of transgenic wheat B102-1-2.
Cao, Jijuan; Xu, Junyi; Zhao, Tongtong; Cao, Dongmei; Huang, Xin; Zhang, Piqiao; Luan, Fengxia
2014-01-01
The exogenous fragment sequence and flanking sequence between the exogenous fragment and recombinant chromosome of transgenic wheat B102-1-2 were successfully acquired using genome walking technology. The newly acquired exogenous fragment encoded the full-length sequence of transformed genes with transformed plasmid and corresponding functional genes including ubi, vector pBANF-bar, vector pUbiGUSPlus, vector HSP, reporter vector pUbiGUSPlus, promoter ubiquitin, and coli DH1. A specific polymerase chain reaction (PCR) identification method for transgenic wheat B102-1-2 was established on the basis of designed primers according to flanking sequence. This established specific PCR strategy was validated by using transgenic wheat, transgenic corn, transgenic soybean, transgenic rice, and non-transgenic wheat. A specifically amplified target band was observed only in transgenic wheat B102-1-2. Therefore, this method is characterized by high specificity, high reproducibility, rapid identification, and excellent accuracy for the identification of transgenic wheat B102-1-2.
Muangkram, Yuttamol; Amano, Akira; Wajjwalku, Worawidh; Pinyopummintr, Tanu; Thongtip, Nikorn; Kaolim, Nongnid; Sukmak, Manakorn; Kamolnorranath, Sumate; Siriaroonrat, Boripat; Tipkantha, Wanlaya; Maikaew, Umaporn; Thomas, Warisara; Polsrila, Kanda; Dongsaard, Kwanreaun; Sanannu, Saowaphang; Wattananorrasate, Anuwat
2017-07-01
The Asian tapir (Tapirus indicus) has been classified as Endangered on the IUCN Red List of Threatened Species (2008). Genetic diversity data provide important information for the management of captive breeding and conservation of this species. We analyzed mitochondrial control region (CR) sequences from 37 captive Asian tapirs in Thailand. Multiple alignments of the full-length CR sequences sized 1268 bp comprised three domains as described in other mammal species. Analysis of 16 parsimony-informative variable sites revealed 11 haplotypes. Furthermore, the phylogenetic analysis using median-joining network clearly showed three clades correlated with our earlier cytochrome b gene study in this endangered species. The repetitive motif is located between first and second conserved sequence blocks, similar to the Brazilian tapir. The highest polymorphic site was located in the extended termination associated sequences domain. The results could be applied for future genetic management based in captivity and wild that shows stable populations.
Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes
An, Dong; Li, Changsheng; Humbeck, Klaus
2018-01-01
Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292
Large-scale collection of full-length cDNA and transcriptome analysis in Hevea brasiliensis.
Makita, Yuko; Ng, Kiaw Kiaw; Veera Singham, G; Kawashima, Mika; Hirakawa, Hideki; Sato, Shusei; Othman, Ahmad Sofiman; Matsui, Minami
2017-04-01
Natural rubber has unique physical properties that cannot be replaced by products from other latex-producing plants or petrochemically produced synthetic rubbers. Rubber from Hevea brasiliensis is the main commercial source for this natural rubber that has a cis-polyisoprene configuration. For sustainable production of enough rubber to meet demand elucidation of the molecular mechanisms involved in the production of latex is vital. To this end, we firstly constructed rubber full-length cDNA libraries of RRIM 600 cultivar and sequenced around 20,000 clones by the Sanger method and over 15,000 contigs by Illumina sequencer. With these data, we updated around 5,500 gene structures and newly annotated around 9,500 transcription start sites. Second, to elucidate the rubber biosynthetic pathways and their transcriptional regulation, we carried out tissue- and cultivar-specific RNA-Seq analysis. By using our recently published genome sequence, we confirmed the expression patterns of the rubber biosynthetic genes. Our data suggest that the cytoplasmic mevalonate (MVA) pathway is the main route for isoprenoid biosynthesis in latex production. In addition to the well-studied polymerization factors, we suggest that rubber elongation factor 8 (REF8) is a candidate factor in cis-polyisoprene biosynthesis. We have also identified 39 transcription factors that may be key regulators in latex production. Expression profile analysis using two additional cultivars, RRIM 901 and PB 350, via an RNA-Seq approach revealed possible expression differences between a high latex-yielding cultivar and a disease-resistant cultivar. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Molecular characterization of a novel gammaretrovirus in killer whales (Orcinus orca).
Lamere, Sarah A; St Leger, Judy A; Schrenzel, Mark D; Anthony, Simon J; Rideout, Bruce A; Salomon, Daniel R
2009-12-01
There are currently no published data documenting the presence of retroviruses in cetaceans, though the occurrences of cancers and immunodeficiency states suggest the potential. We examined tissues from adult killer whales and detected a novel gammaretrovirus by degenerate PCR. Reverse transcription-PCR also demonstrated tissue and serum expression of retroviral mRNA. The full-length sequence of the provirus was obtained by PCR, and a TaqMan-based copy number assay did not demonstrate evidence of productive infection. PCR on blood samples from 11 healthy captive killer whales and tissues from 3 free-ranging animals detected the proviral DNA in all tissues examined from all animals. A survey of multiple cetacean species by PCR for gag, pol, and env sequences showed homologs of this virus in the DNA of eight species of delphinids, pygmy and dwarf sperm whales, and harbor porpoises, but not in beluga or fin whales. Analysis of the bottlenose dolphin genome revealed two full-length proviral sequences with 97.4% and 96.9% nucleotide identity to the killer whale gammaretrovirus. The results of single-cell PCR on killer whale sperm and Southern blotting are also consistent with the conclusion that the provirus is endogenous. We suggest that this gammaretrovirus entered the delphinoid ancestor's genome before the divergence of modern dolphins or that an exogenous variant existed following divergence that was ultimately endogenized. However, the transcriptional activity demonstrated in tissues and the nearly intact viral genome suggest a more recent integration into the killer whale genome, favoring the latter hypothesis. The proposed name for this retrovirus is killer whale endogenous retrovirus.
Tian, Xue; Meng, Xiaolin; Wang, Liangyan; Song, Yunfei; Zhang, Danli; Ji, Yuankai; Li, Xuejun; Dong, Changsheng
2015-01-25
Slc7a11 encoding solute carrier family 7 member 11 (amionic amino acid transporter light chain, xCT), has been identified to be a critical genetic regulator of pheomelanin synthesis in hair and melanocytes. To better understand the molecular characterization of Slc7a11 and the expression patterns in skin of white versus brown alpaca (lama paco), we cloned the full length coding sequence (CDS) of alpaca Slc7a11 gene and analyzed the expression patterns using Real Time PCR, Western blotting and immunohistochemistry. The full length CDS of 1512bp encodes a 503 amino acid polypeptide. Sequence analysis showed that alpaca xCT contains 12 transmembrane regions consistent with the highly conserved amino acid permease (AA_permease_2) domain similar to other vertebrates. Sequence alignment and phylogenetic analysis revealed that alpaca xCT had the highest identity and shared the same branch with Camelus ferus. Real Time PCR and Western blotting suggested that xCT was expressed at significantly high levels in brown alpaca skin, and transcripts and protein possessed the same expression pattern in white and brown alpaca skins. Additionally, immunohistochemical analysis further demonstrated that xCT staining was robustly increased in the matrix and root sheath of brown alpaca skin compared with that of white. These results suggest that Slc7a11 functions in alpaca coat color regulation and offer essential information for further exploration on the role of Slc7a11 in melanogenesis. Copyright © 2014 Elsevier B.V. All rights reserved.
Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop
2012-01-01
Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Deep RNA-Seq to unlock the gene bank of floral development in Sinapis arvensis.
Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong
2014-01-01
Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.
Deep RNA-Seq to Unlock the Gene Bank of Floral Development in Sinapis arvensis
Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong
2014-01-01
Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops. PMID:25192023
Comparison of next generation sequencing technologies for transcriptome characterization
2009-01-01
Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Noh, Ju Young; Patnaik, Bharat Bhusan; Tindwa, Hamisi; Seo, Gi Won; Kim, Dong Hyun; Patnaik, Hongray Howrelia; Jo, Yong Hun; Lee, Yong Seok; Lee, Bok Luel; Kim, Nam Jung; Han, Yeon Soo
2014-01-25
Apolipophorin III (apoLp-III) is a well-known hemolymph protein having a functional role in lipid transport and immune response of insects. We cloned full-length cDNA encoding putative apoLp-III from larvae of the coleopteran beetle, Tenebrio molitor (TmapoLp-III), by identification of clones corresponding to the partial sequence of TmapoLp-III, subsequently followed with full length sequencing by a clone-by-clone primer walking method. The complete cDNA consists of 890 nucleotides, including an ORF encoding 196 amino acid residues. Excluding a putative signal peptide of the first 20 amino acid residues, the 176-residue mature apoLp-III has a calculated molecular mass of 19,146Da. Genomic sequence analysis with respect to its cDNA showed that TmapoLp-III was organized into four exons interrupted by three introns. Several immune-related transcription factor binding sites were discovered in the putative 5'-flanking region. BLAST and phylogenetic analyses reveal that TmapoLp-III has high sequence identity (88%) with Tribolium castaneum apoLp-III but shares little sequence homologies (<26%) with other apoLp-IIIs. Homology modeling of Tm apoLp-III shows a bundle of five amphipathic alpha helices, including a short helix 3'. The 'helix-short helix-helix' motif was predicted to be implicated in lipid binding interactions, through reversible conformational changes and accommodating the hydrophobic residues to the exterior for stability. Highest level of TmapoLp-III mRNA was detected at late pupal stages, albeit it is expressed in the larval and adult stages at lower levels. The tissue specific expression of the transcripts showed significantly higher numbers in larval fat body and adult integument. In addition, TmapoLp-III mRNA was found to be highly upregulated in late stages of L. monocytogenes or E. coli challenge. These results indicate that TmapoLp-III may play an important role in innate immune responses against bacterial pathogens in T. molitor. Copyright © 2013 Elsevier B.V. All rights reserved.
2014-01-01
Background Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). Results We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. Conclusions This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar. PMID:24490620
Intensity inhomogeneity correction for magnetic resonance imaging of human brain at 7T.
Uwano, Ikuko; Kudo, Kohsuke; Yamashita, Fumio; Goodwin, Jonathan; Higuchi, Satomi; Ito, Kenji; Harada, Taisuke; Ogawa, Akira; Sasaki, Makoto
2014-02-01
To evaluate the performance and efficacy for intensity inhomogeneity correction of various sequences of the human brain in 7T MRI using the extended version of the unified segmentation algorithm. Ten healthy volunteers were scanned with four different sequences (2D spin echo [SE], 3D fast SE, 2D fast spoiled gradient echo, and 3D time-of-flight) by using a 7T MRI system. Intensity inhomogeneity correction was performed using the "New Segment" module in SPM8 with four different values (120, 90, 60, and 30 mm) of full width at half maximum (FWHM) in Gaussian smoothness. The uniformity in signals in the entire white matter was evaluated using the coefficient of variation (CV); mean signal intensities between the subcortical and deep white matter were compared, and contrast between subcortical white matter and gray matter was measured. The length of the lenticulostriate (LSA) was measured on maximum intensity projection (MIP) images in the original and corrected images. In all sequences, the CV decreased as the FWHM value decreased. The differences of mean signal intensities between subcortical and deep white matter also decreased with smaller FWHM values. The contrast between white and gray matter was maintained at all FWHM values. LSA length was significantly greater in corrected MIP than in the original MIP images. Intensity inhomogeneity in 7T MRI can be successfully corrected using SPM8 for various scan sequences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.
1987-06-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less
Length-Two Representations of Quantum Affine Superalgebras and Baxter Operators
NASA Astrophysics Data System (ADS)
Zhang, Huafeng
2018-03-01
Associated to quantum affine general linear Lie superalgebras are two families of short exact sequences of representations whose first and third terms are irreducible: the Baxter TQ relations involving infinite-dimensional representations; the extended T-systems of Kirillov-Reshetikhin modules. We make use of these representations over the full quantum affine superalgebra to define Baxter operators as transfer matrices for the quantum integrable model and to deduce Bethe Ansatz Equations, under genericity conditions.
USDA-ARS?s Scientific Manuscript database
The genetically engineered plum 'HoneySweet' (aka C5) has proven to be highly resistant to Plum pox virus (PPV) for over 10 years in field trials. The original vector used for transformation to develop 'HoneySweet' carried a single sense sequence of the full length PPV coat protein (ppv-cp) gene, y...
A Rapid and Improved Method to Generate Recombinant Dengue Virus Vaccine Candidates
Govindarajan, Dhanasekaran; Guan, Liming; Meschino, Steven; Fridman, Arthur; Bagchi, Ansu; Pak, Irene; ter Meulen, Jan; Casimiro, Danilo R.; Bett, Andrew J.
2016-01-01
Dengue is one of the most important mosquito-borne infections accounting for severe morbidity and mortality worldwide. Recently, the tetravalent chimeric live attenuated Dengue vaccine Dengvaxia® was approved for use in several dengue endemic countries. In general, live attenuated vaccines (LAV) are very efficacious and offer long-lasting immunity against virus-induced disease. Rationally designed LAVs can be generated through reverse genetics technology, a method of generating infectious recombinant viruses from full length cDNA contained in bacterial plasmids. In vitro transcribed (IVT) viral RNA from these infectious clones is transfected into susceptible cells to generate recombinant virus. However, the generation of full-length dengue virus cDNA clones can be difficult due to the genetic instability of viral sequences in bacterial plasmids. To circumvent the need for a single plasmid containing a full length cDNA, in vitro ligation of two or three cDNA fragments contained in separate plasmids can be used to generate a full-length dengue viral cDNA template. However, in vitro ligation of multiple fragments often yields low quality template for IVT reactions, resulting in inconsistent low yield RNA. These technical difficulties make recombinant virus recovery less efficient. In this study, we describe a simple, rapid and efficient method of using LONG-PCR to recover recombinant chimeric Yellow fever dengue (CYD) viruses as potential dengue vaccine candidates. Using this method, we were able to efficiently generate several viable recombinant viruses without introducing any artificial mutations into the viral genomes. We believe that the techniques reported here will enable rapid and efficient recovery of recombinant flaviviruses for evaluation as vaccine candidates and, be applicable to the recovery of other RNA viruses. PMID:27008550
A Rapid and Improved Method to Generate Recombinant Dengue Virus Vaccine Candidates.
Govindarajan, Dhanasekaran; Guan, Liming; Meschino, Steven; Fridman, Arthur; Bagchi, Ansu; Pak, Irene; ter Meulen, Jan; Casimiro, Danilo R; Bett, Andrew J
2016-01-01
Dengue is one of the most important mosquito-borne infections accounting for severe morbidity and mortality worldwide. Recently, the tetravalent chimeric live attenuated Dengue vaccine Dengvaxia® was approved for use in several dengue endemic countries. In general, live attenuated vaccines (LAV) are very efficacious and offer long-lasting immunity against virus-induced disease. Rationally designed LAVs can be generated through reverse genetics technology, a method of generating infectious recombinant viruses from full length cDNA contained in bacterial plasmids. In vitro transcribed (IVT) viral RNA from these infectious clones is transfected into susceptible cells to generate recombinant virus. However, the generation of full-length dengue virus cDNA clones can be difficult due to the genetic instability of viral sequences in bacterial plasmids. To circumvent the need for a single plasmid containing a full length cDNA, in vitro ligation of two or three cDNA fragments contained in separate plasmids can be used to generate a full-length dengue viral cDNA template. However, in vitro ligation of multiple fragments often yields low quality template for IVT reactions, resulting in inconsistent low yield RNA. These technical difficulties make recombinant virus recovery less efficient. In this study, we describe a simple, rapid and efficient method of using LONG-PCR to recover recombinant chimeric Yellow fever dengue (CYD) viruses as potential dengue vaccine candidates. Using this method, we were able to efficiently generate several viable recombinant viruses without introducing any artificial mutations into the viral genomes. We believe that the techniques reported here will enable rapid and efficient recovery of recombinant flaviviruses for evaluation as vaccine candidates and, be applicable to the recovery of other RNA viruses.
Wang, Wei-Ming; Lee, A-Young; Chiang, Cheng-Ming
2008-01-01
The AP-1 transcription factor is a dimeric protein complex formed primarily between Jun (c-Jun, JunB, JunD) and Fos (c-Fos, FosB, Fra-1, Fra-2) family members. These distinct AP-1 complexes are expressed in many cell types and modulate target gene expression implicated in cell proliferation, differentiation, and stress responses. Although the importance of AP-1 has long been recognized, the biochemical characterization of AP-1 remains limited in part due to the difficulty in purifying full-length, reconstituted dimers with active DNA-binding and transcriptional activity. Using a combination of bacterial coexpression and epitope-tagging methods, we successfully purified all 12 heterodimers (3 Jun × 4 Fos) of full-length human AP-1 complexes as well as c-Jun/c-Jun, JunD/JunD, and c-Jun/JunD dimers from bacterial inclusion bodies using one-step nickel-NTA affinity tag purification following denaturation and renaturation of coexpressed AP-1 subunits. Coexpression of two constitutive components in a dimeric AP-1 complex helps stabilize the proteins when compared with individual protein expression in bacteria. Purified dimeric AP-1 complexes are functional in sequence-specific DNA binding, as illustrated by electrophoretic mobility shift assays and DNase I footprinting, and are also active in transcription with in vitro-reconstituted human papillomavirus (HPV) chromatin containing AP-1-binding sites in the native configuration of HPV nucleosomes. The availability of these recombinant full-length human AP-1 complexes has greatly facilitated mechanistic studies of AP-1-regulated gene transcription in many biological systems. PMID:18329890
Coiled-coil length: Size does matter.
Surkont, Jaroslaw; Diekmann, Yoan; Ryder, Pearl V; Pereira-Leal, Jose B
2015-12-01
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints. © 2015 Wiley Periodicals, Inc.
Lawrence, James L M; Tong, Mei; Alfulaij, Naghum; Sherrin, Tessi; Contarino, Mark; White, Michael M; Bellinger, Frederick P; Todorovic, Cedomir; Nichols, Robert A
2014-10-22
Soluble β-amyloid has been shown to regulate presynaptic Ca(2+) and synaptic plasticity. In particular, picomolar β-amyloid was found to have an agonist-like action on presynaptic nicotinic receptors and to augment long-term potentiation (LTP) in a manner dependent upon nicotinic receptors. Here, we report that a functional N-terminal domain exists within β-amyloid for its agonist-like activity. This sequence corresponds to a N-terminal fragment generated by the combined action of α- and β-secretases, and resident carboxypeptidase. The N-terminal β-amyloid fragment is present in the brains and CSF of healthy adults as well as in Alzheimer's patients. Unlike full-length β-amyloid, the N-terminal β-amyloid fragment is monomeric and nontoxic. In Ca(2+) imaging studies using a model reconstituted rodent neuroblastoma cell line and isolated mouse nerve terminals, the N-terminal β-amyloid fragment proved to be highly potent and more effective than full-length β-amyloid in its agonist-like action on nicotinic receptors. In addition, the N-terminal β-amyloid fragment augmented theta burst-induced post-tetanic potentiation and LTP in mouse hippocampal slices. The N-terminal fragment also rescued LTP inhibited by elevated levels of full-length β-amyloid. Contextual fear conditioning was also strongly augmented following bilateral injection of N-terminal β-amyloid fragment into the dorsal hippocampi of intact mice. The fragment-induced augmentation of fear conditioning was attenuated by coadministration of nicotinic antagonist. The activity of the N-terminal β-amyloid fragment appears to reside largely in a sequence surrounding a putative metal binding site, YEVHHQ. These findings suggest that the N-terminal β-amyloid fragment may serve as a potent and effective endogenous neuromodulator. Copyright © 2014 the authors 0270-6474/14/3414210-09$15.00/0.
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
NASA Astrophysics Data System (ADS)
Pickman, Yishai; Dunn-Walters, Deborah; Mehr, Ramit
2013-10-01
Complementarity-determining region 3 (CDR3) is the most hyper-variable region in B cell receptor (BCR) and T cell receptor (TCR) genes, and the most critical structure in antigen recognition and thereby in determining the fates of developing and responding lymphocytes. There are millions of different TCR Vβ chain or BCR heavy chain CDR3 sequences in human blood. Even now, when high-throughput sequencing becomes widely used, CDR3 length distributions (also called spectratypes) are still a much quicker and cheaper method of assessing repertoire diversity. However, distribution complexity and the large amount of information per sample (e.g. 32 distributions of the TCRα chain, and 24 of TCRβ) calls for the use of machine learning tools for full exploration. We have examined the ability of supervised machine learning, which uses computational models to find hidden patterns in predefined biological groups, to analyze CDR3 length distributions from various sources, and distinguish between experimental groups. We found that (a) splenic BCR CDR3 length distributions are characterized by low standard deviations and few local maxima, compared to peripheral blood distributions; (b) healthy elderly people's BCR CDR3 length distributions can be distinguished from those of the young; and (c) a machine learning model based on TCR CDR3 distribution features can detect myelodysplastic syndrome with approximately 93% accuracy. Overall, we demonstrate that using supervised machine learning methods can contribute to our understanding of lymphocyte repertoire diversity.
Genomic Diversity and Evolution of the Lyssaviruses
Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé
2008-01-01
Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239
De novo peptide sequencing using CID and HCD spectra pairs.
Yan, Yan; Kusalik, Anthony J; Wu, Fang-Xiang
2016-10-01
In tandem mass spectrometry (MS/MS), there are several different fragmentation techniques possible, including, collision-induced dissociation (CID) higher energy collisional dissociation (HCD), electron-capture dissociation (ECD), and electron transfer dissociation (ETD). When using pairs of spectra for de novo peptide sequencing, the most popular methods are designed for CID (or HCD) and ECD (or ETD) spectra because of the complementarity between them. Less attention has been paid to the use of CID and HCD spectra pairs. In this study, a new de novo peptide sequencing method is proposed for these spectra pairs. This method includes a CID and HCD spectra merging criterion and a parent mass correction step, along with improvements to our previously proposed algorithm for sequencing merged spectra. Three pairs of spectral datasets were used to investigate and compare the performance of the proposed method with other existing methods designed for single spectrum (HCD or CID) sequencing. Experimental results showed that full-length peptide sequencing accuracy was increased significantly by using spectra pairs in the proposed method, with the highest accuracy reaching 81.31%. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Nilsson, R Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M; Bengtsson-Palme, Johan; Walker, Donald M; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C; Abarenkov, Kessy
2015-01-01
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric-artificially joined-DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation.
Nilsson, R. Henrik; Tedersoo, Leho; Ryberg, Martin; Kristiansson, Erik; Hartmann, Martin; Unterseher, Martin; Porter, Teresita M.; Bengtsson-Palme, Johan; Walker, Donald M.; de Sousa, Filipe; Gamper, Hannes Andres; Larsson, Ellen; Larsson, Karl-Henrik; Kõljalg, Urmas; Edgar, Robert C.; Abarenkov, Kessy
2015-01-01
The nuclear ribosomal internal transcribed spacer (ITS) region is the most commonly chosen genetic marker for the molecular identification of fungi in environmental sequencing and molecular ecology studies. Several analytical issues complicate such efforts, one of which is the formation of chimeric—artificially joined—DNA sequences during PCR amplification or sequence assembly. Several software tools are currently available for chimera detection, but rely to various degrees on the presence of a chimera-free reference dataset for optimal performance. However, no such dataset is available for use with the fungal ITS region. This study introduces a comprehensive, automatically updated reference dataset for fungal ITS sequences based on the UNITE database for the molecular identification of fungi. This dataset supports chimera detection throughout the fungal kingdom and for full-length ITS sequences as well as partial (ITS1 or ITS2 only) datasets. The performance of the dataset on a large set of artificial chimeras was above 99.5%, and we subsequently used the dataset to remove nearly 1,000 compromised fungal ITS sequences from public circulation. The dataset is available at http://unite.ut.ee/repository.php and is subject to web-based third-party curation. PMID:25786896
Influence of time and length size feature selections for human activity sequences recognition.
Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran
2014-01-01
In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.
Barman, Lalita Rani; Nooruzzaman, Mohammed; Sarker, Rahul Deb; Rahman, Md Tazinur; Saife, Md Rajib Bin; Giasuddin, Mohammad; Das, Bidhan Chandra; Das, Priya Mohan; Chowdhury, Emdadul Haque; Islam, Mohammad Rafiqul
2017-10-01
A total of 23 Newcastle disease virus (NDV) isolates from Bangladesh taken between 2010 and 2012 were characterized on the basis of partial F gene sequences. All the isolates belonged to genotype XIII of class II NDV but segregated into three sub-clusters. One sub-cluster with 17 isolates aligned with sub-genotype XIIIc. The other two sub-clusters were phylogenetically distinct from the previously described sub-genotypes XIIIa, XIIIb and XIIIc and could be candidates of new sub-genotypes; however, that needs to be validated through full-length F gene sequence data. The results of the present study suggest that genotype XIII NDVs are under continuing evolution in Bangladesh.
Kang, Hae Ji; Bennett, Shannon N.; Dizney, Laurie; Sumibcay, Laarni; Arai, Satoru; Ruedas, Luis A.; Song, Jin-Won; Yanagihara, Richard
2009-01-01
A genetically distinct hantavirus, designated Oxbow virus (OXBV), was detected in tissues of an American shrew mole (Neurotrichus gibbsii), captured in Gresham, Oregon, in September 2003. Pairwise analysis of full-length S- and M- and partial L-segment nucleotide and amino acid sequences of OXBV indicated low sequence similarity with rodent-borne hantaviruses. Phylogenetic analyses using maximum-likelihood and Bayesian methods, and host-parasite evolutionary comparisons, showed that OXBV and Asama virus, a hantavirus recently identified from the Japanese shrew mole (Urotrichus talpoides), were related to soricine shrew-borne hantaviruses from North America and Eurasia, respectively, suggesting parallel evolution associated with cross-species transmission. PMID:19394994
A Rapid Method for Engineering Recombinant Polioviruses or Other Enteroviruses.
Bessaud, Maël; Pelletier, Isabelle; Blondel, Bruno; Delpeyroux, Francis
2016-01-01
The cloning of large enterovirus RNA sequences is labor-intensive because of the frequent instability in bacteria of plasmidic vectors containing the corresponding cDNAs. In order to circumvent this issue we have developed a PCR-based method that allows the generation of highly modified or chimeric full-length enterovirus genomes. This method relies on fusion PCR which enables the concatenation of several overlapping cDNA amplicons produced separately. A T7 promoter sequence added upstream the fusion PCR products allows its transcription into infectious genomic RNAs directly in transfected cells constitutively expressing the phage T7 RNA polymerase. This method permits the rapid recovery of modified viruses that can be subsequently amplified on adequate cell-lines.
Beyond sequencing: optical mapping of DNA in the age of nanotechnology and nanoscopy.
Levy-Sakin, Michal; Ebenstein, Yuval
2013-08-01
Next generation sequencing (NGS) is revolutionizing all fields of biological research but it fails to extract the full range of information associated with genetic material. Optical mapping of DNA grants access to genetic and epigenetic information on individual DNA molecules up to ∼1 Mbp in length. Fluorescent labeling of specific sequence motifs, epigenetic marks and other genomic information on individual DNA molecules generates a high content optical barcode along the DNA. By stretching the DNA to a linear configuration this barcode may be directly visualized by fluorescence microscopy. We discuss the advances of these methods in light of recent developments in nano-fabrication and super-resolution optical imaging (nanoscopy) and review the latest achievements of optical mapping in the context of genomic analysis. Copyright © 2013 Elsevier Ltd. All rights reserved.
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koraber, Bette; Perelson, Alan; Hraber, Peter
2009-01-01
Recently, we developed a novel approach to the identification of transmitted or early founder HIV -1 genomes in acutely infected humans based on single genome amplification and sequencing. Here we tested this approach in 18 acutely infected Indian rhesus macaques to determine the molecular features of SIV transmission. Animals were inoculated intrarectally (IR) or intravenously (IV) with stocks of SIVmac251 or SIVsmE660 that exhibited sequence diversity typical of early-chronic HIV -1 infection. 987 full-length SIV env sequences (median of 48 per animal) were determined from plasma virion RNA one to five weeks after infection. IR inoculation was followed by productivemore » infection by one or few viruses (median 1; range 1-5) that diversified randomly with near star-like phylogeny and a Poisson distribution of mutations. Consensus viral sequences from ramp-up and peak viremia were identical to viruses found in the inocula or differed from them by only one or few nuc1eotides, providing direct evidence that early plasma viral sequences coalesce to transmitted/founder virus( es). IV infection was approximately 10,000-fold more efficient than IR infection, and viruses transmitted by either route represented the full genetic spectra of the inocula. These findings identify key similarities in mucosal transmission and early diversification between SIV and HIV -1.« less
Bashir, Ali; Bansal, Vikas; Bafna, Vineet
2010-06-18
Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules.
Li, Yueqi; Xiang, Limin; Palma, Julio L; Asai, Yoshihiro; Tao, Nongjian
2016-04-15
Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models.
Thermoelectric effect and its dependence on molecular length and sequence in single DNA molecules
Li, Yueqi; Xiang, Limin; Palma, Julio L.; Asai, Yoshihiro; Tao, Nongjian
2016-01-01
Studying the thermoelectric effect in DNA is important for unravelling charge transport mechanisms and for developing relevant applications of DNA molecules. Here we report a study of the thermoelectric effect in single DNA molecules. By varying the molecular length and sequence, we tune the charge transport in DNA to either a hopping- or tunnelling-dominated regimes. The thermoelectric effect is small and insensitive to the molecular length in the hopping regime. In contrast, the thermoelectric effect is large and sensitive to the length in the tunnelling regime. These findings indicate that one may control the thermoelectric effect in DNA by varying its sequence and length. We describe the experimental results in terms of hopping and tunnelling charge transport models. PMID:27079152
Xu, Guanlong; Zhang, Xuxiao; Sun, Yipeng; Liu, Qinfang; Sun, Honglei; Xiong, Xin; Jiang, Ming; He, Qiming; Wang, Yu; Pu, Juan; Guo, Xin; Yang, Hanchun; Liu, Jinhua
2016-01-01
The PA-X protein is a fusion protein incorporating the N-terminal 191 amino acids of the PA protein with a short C-terminal sequence encoded by an overlapping ORF (X-ORF) in segment 3 that is accessed by + 1 ribosomal frameshifting, and this X-ORF exists in either full length or a truncated form (either 61-or 41-condons). Genetic evolution analysis indicates that all swine influenza viruses (SIVs) possessed full-length PA-X prior to 1985, but since then SIVs with truncated PA-X have gradually increased and become dominant, implying that truncation of this protein may contribute to the adaptation of influenza virus in pigs. To verify this hypothesis, we constructed PA-X extended viruses in the background of a “triple-reassortment” H1N2 SIV with truncated PA-X, and evaluated their biological characteristics in vitro and in vivo. Compared with full-length PA-X, SIV with truncated PA-X had increased viral replication in porcine cells and swine respiratory tissues, along with enhanced pathogenicity, replication and transmissibility in pigs. Furthermore, we found that truncation of PA-X improved the inhibition of IFN-I mRNA expression. Hereby, our results imply that truncation of PA-X may contribute to the adaptation of SIV in pigs. PMID:26912401
Development and characterization of a eukaryotic expression system for human type II procollagen.
Wieczorek, Andrew; Rezaei, Naghmeh; Chan, Clara K; Xu, Chuan; Panwar, Preety; Brömme, Dieter; Merschrod S, Erika F; Forde, Nancy R
2015-12-15
Triple helical collagens are the most abundant structural protein in vertebrates and are widely used as biomaterials for a variety of applications including drug delivery and cellular and tissue engineering. In these applications, the mechanics of this hierarchically structured protein play a key role, as does its chemical composition. To facilitate investigation into how gene mutations of collagen lead to disease as well as the rational development of tunable mechanical and chemical properties of this full-length protein, production of recombinant expressed protein is required. Here, we present a human type II procollagen expression system that produces full-length procollagen utilizing a previously characterized human fibrosarcoma cell line for production. The system exploits a non-covalently linked fluorescence readout for gene expression to facilitate screening of cell lines. Biochemical and biophysical characterization of the secreted, purified protein are used to demonstrate the proper formation and function of the protein. Assays to demonstrate fidelity include proteolytic digestion, mass spectrometric sequence and posttranslational composition analysis, circular dichroism spectroscopy, single-molecule stretching with optical tweezers, atomic-force microscopy imaging of fibril assembly, and transmission electron microscopy imaging of self-assembled fibrils. Using a mammalian expression system, we produced full-length recombinant human type II procollagen. The integrity of the collagen preparation was verified by various structural and degradation assays. This system provides a platform from which to explore new directions in collagen manipulation.
Cheun-Arom, Thaniwan; Temeeyasen, Gun; Tripipat, Thitima; Kaewprommal, Pavita; Piriyapongsa, Jittima; Sukrong, Suchada; Chongcharoen, Wanchai; Tantituvanont, Angkana; Nilubol, Dachrit
2016-10-01
Porcine epidemic diarrhea virus (PEDV) has continued to cause sporadic outbreaks in Thailand since 2007 and a pandemic variant containing an insertion and deletion in the spike gene was responsible for outbreaks. In 2014, there were further outbreaks of the disease occurring within four months of each other. In this study, the full-length genome sequences of two genetically distinct PEDV isolates from the outbreaks were characterized. The two PEDV isolates, CBR1/2014 and EAS1/2014, were 28,039 and 28,033 nucleotides in length and showed 96.2% and 93.6% similarities at nucleotide and amino acid levels respectively. In total, we have observed 1048 nucleotide substitutions throughout the genome. Compared to EAS1/2014, CBR1/2014 has 2 insertions of 4 ((56)GENQ(59)) and 1 ((140)N) amino acid positions 56-59 and 140, and 2 deletions of 2 ((160)DG(161)) and 1 ((1199)Y) amino acid positions 160-161 and 1199. The phylogenetic analysis based on full-length genome of CBR1/2014 isolate has grouped the virus with the pandemic variants. In contrast, EAS1/2014 isolate was grouped with CV777, LZC and SM98, a classical variant. Our findings demonstrated the emergence of EAS1/2014, a classical variant which is novel to Thailand and genetically distinct from the currently circulating endemic variants. This study warrants further investigations into molecular epidemiology and genetic evolution of the PEDV in Thailand. Copyright © 2016 Elsevier B.V. All rights reserved.
Pasricha, Gunisha; Mishra, Akhilesh C; Chakrabarti, Alok K
2013-07-01
PB1F2 is the 11th protein of influenza A virus translated from +1 alternate reading frame of PB1 gene. Since the discovery, varying sizes and functions of the PB1F2 protein of influenza A viruses have been reported. Selection of PB1 gene segment in the pandemics, variable size and pleiotropic effect of PB1F2 intrigued us to analyze amino acid sequences of this protein in various influenza A viruses. Amino acid sequences for PB1F2 protein of influenza A H5N1, H1N1, H2N2, and H3N2 subtypes were obtained from Influenza Research Database. Multiple sequence alignments of the PB1F2 protein sequences of the aforementioned subtypes were used to determine the size, variable and conserved domains and to perform mutational analysis. Analysis showed that 96·4% of the H5N1 influenza viruses harbored full-length PB1F2 protein. Except for the 2009 pandemic H1N1 virus, all the subtypes of the 20th-century pandemic influenza viruses contained full-length PB1F2 protein. Through the years, PB1F2 protein of the H1N1 and H3N2 viruses has undergone much variation. PB1F2 protein sequences of H5N1 viruses showed both human- and avian host-specific conserved domains. Global database of PB1F2 protein revealed that N66S mutation was present only in 3·8% of the H5N1 strains. We found a novel mutation, N84S in the PB1F2 protein of 9·35% of the highly pathogenic avian influenza H5N1 influenza viruses. Varying sizes and mutations of the PB1F2 protein in different influenza A virus subtypes with pandemic potential were obtained. There was genetic divergence of the protein in various hosts which highlighted the host-specific evolution of the virus. However, studies are required to correlate this sequence variability with the virulence and pathogenicity. © 2012 John Wiley & Sons Ltd.
Mammalian cDNA Library from the NIH Mammalian Gene Collection (MGC) | Office of Cancer Genomics
The MGC provides the research community full-length clones for most of the defined (as of 2006) human and mouse genes, along with selected clones of cow and rat genes. Clones were designed to allow easy transfer of the ORF sequences into nearly any type of expression vector. MGC provides protein ‘expression-ready’ clones for each of the included human genes. MGC is part of the ORFeome Collaboration (OC).
Digital transcriptome profiling using selective hexamer priming for cDNA synthesis.
Armour, Christopher D; Castle, John C; Chen, Ronghua; Babak, Tomas; Loerch, Patrick; Jackson, Stuart; Shah, Jyoti K; Dey, John; Rohl, Carol A; Johnson, Jason M; Raymond, Christopher K
2009-09-01
We developed a procedure for the preparation of whole transcriptome cDNA libraries depleted of ribosomal RNA from only 1 microg of total RNA. The method relies on a collection of short, computationally selected oligonucleotides, called 'not-so-random' (NSR) primers, to obtain full-length, strand-specific representation of nonribosomal RNA transcripts. In this study we validated the technique by profiling human whole brain and universal human reference RNA using ultra-high-throughput sequencing.
Pecon-Slattery, Jill; Troyer, Jennifer L; Johnson, Warren E; O'Brien, Stephen J
2008-05-15
Genetic analyses of feline immunodeficiency viruses provide significant insights on the worldwide distribution and evolutionary history of this emerging pathogen. Large-scale screening of over 3000 samples from all species of Felidae indicates that at least some individuals from most species possess antibodies that cross react to FIV. Phylogenetic analyses of genetic variation in the pol-RT gene demonstrate that FIV lineages are species-specific and suggest that there has been a prolonged period of viral-host co-evolution. The clinical effects of FIV specific to species other than domestic cat are controversial. Comparative genomic analyses of all full-length FIV genomes confirmed that FIV is host specific. Recently sequenced lion subtype E is marginally more similar to Pallas cat FIV though env is more similar to that of domestic cat FIV, indicating a possible recombination between two divergent strains in the wild. Here we review global patterns of FIV seroprevalence and endemnicity, assess genetic differences within and between species-specific FIV strains, and interpret these with patterns of felid speciation to propose an ancestral origin of FIV in Africa followed by interspecies transmission and global dissemination to Eurasia and the Americas. Continued comparative genomic analyses of full-length FIV from all seropositive animals, along with whole genome sequence of host species, will greatly advance our understanding of the role of recombination, selection and adaptation in retroviral emergence.
Pecon-Slattery, Jill; Troyer, Jennifer L.; Johnson, Warren E.; O’Brien, Stephen J.
2008-01-01
Genetic analyses of feline immunodeficiency viruses provide significant insights on the worldwide distribution and evolutionary history of this emerging pathogen. Large-scale screening of over 3000 samples from all species of Felidae indicates that at least some individuals from most species possess antibodies that cross react to FIV. Phylogenetic analyses of genetic variation in the pol-RT gene demonstrate that FIV lineages are species-specific and suggest that there has been a prolonged period of viral-host co-evolution. The clinical effects of FIV specific to species other than domestic cat are controversial. Comparative genomic analyses of all full-length FIV genomes confirmed that FIV is host specific. Recently sequenced lion subtype E is marginally more similar to Pallas cat FIV though env is more similar to that of domestic cat FIV, indicating a possible recombination between two divergent strains in the wild. Here we review global patterns of FIV seroprevalence and endemnicity, assess genetic differences within and between species-specific FIV strains, and interpret these with patterns of felid speciation to propose an ancestral origin of FIV in Africa followed by interspecies transmission and global dissemination to Eurasia and the Americas. Continued comparative genomic analyses of full-length FIV from all seropositive animals, along with whole genome sequence of host species, will greatly advance our understanding of the role of recombination, selection and adaptation in retroviral emergence. PMID:18359092
Hout, David R; Schweitzer, Brock L; Lawrence, Kasey; Morris, Stephan W; Tucker, Tracy; Mazzola, Rosetta; Skelton, Rachel; McMahon, Frank; Handshoe, John; Lesperance, Mary; Karsan, Aly; Saltman, David L
2017-08-01
Patients with lung cancers harboring an activating anaplastic lymphoma kinase ( ALK ) rearrangement respond favorably to ALK inhibitor therapy. Fluorescence in situ hybridization (FISH) and immunohistochemistry (IHC) are validated and widely used screening tests for ALK rearrangements but both methods have limitations. The ALK RGQ RT-PCR Kit (RT-PCR) is a single tube quantitative real-time PCR assay for high throughput and automated interpretation of ALK expression. In this study, we performed a direct comparison of formalin-fixed paraffin-embedded (FFPE) lung cancer specimens using all three ALK detection methods. The RT-PCR test (diagnostic cut-off Δ C t of ≤8) was shown to be highly sensitive (100%) when compared to FISH and IHC. Sequencing of RNA detected full-length ALK transcripts or EML4-ALK and KIF5B-ALK fusion variants in discordant cases in which ALK expression was detected by the ALK RT-PCR test but negative by FISH and IHC. The overall specificity of the RT-PCR test for the detection of ALK in cases without full-length ALK expression was 94% in comparison to FISH and sequencing. These data support the ALK RT-PCR test as a highly efficient and reliable diagnostic screening approach to identify patients with non-small cell lung cancer whose tumors are driven by oncogenic ALK.
Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang
2017-03-09
In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone.
Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang
2017-01-01
In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone. PMID:28282934
Hout, David R.; Lawrence, Kasey; Morris, Stephan W.; Tucker, Tracy; Mazzola, Rosetta; Skelton, Rachel; McMahon, Frank; Handshoe, John; Lesperance, Mary; Karsan, Aly
2017-01-01
Patients with lung cancers harboring an activating anaplastic lymphoma kinase (ALK) rearrangement respond favorably to ALK inhibitor therapy. Fluorescence in situ hybridization (FISH) and immunohistochemistry (IHC) are validated and widely used screening tests for ALK rearrangements but both methods have limitations. The ALK RGQ RT-PCR Kit (RT-PCR) is a single tube quantitative real-time PCR assay for high throughput and automated interpretation of ALK expression. In this study, we performed a direct comparison of formalin-fixed paraffin-embedded (FFPE) lung cancer specimens using all three ALK detection methods. The RT-PCR test (diagnostic cut-off ΔCt of ≤8) was shown to be highly sensitive (100%) when compared to FISH and IHC. Sequencing of RNA detected full-length ALK transcripts or EML4-ALK and KIF5B-ALK fusion variants in discordant cases in which ALK expression was detected by the ALK RT-PCR test but negative by FISH and IHC. The overall specificity of the RT-PCR test for the detection of ALK in cases without full-length ALK expression was 94% in comparison to FISH and sequencing. These data support the ALK RT-PCR test as a highly efficient and reliable diagnostic screening approach to identify patients with non-small cell lung cancer whose tumors are driven by oncogenic ALK. PMID:28763012
Yang, Xian-Xian; Zhang, Mei; Yan, Zhao-Wen; Zhang, Ru-Hong; Mu, Xiong-Zheng
2008-01-01
To construct a high effective eukaryotic expressing plasmid PcDNA 3.1-MSX-2 encoding Sprague-Dawley rat MSX-2 gene for the further study of MSX-2 gene function. The full length SD rat MSX-2 gene was amplified by PCR, and the full length DNA was inserted in the PMD1 8-T vector. It was isolated by restriction enzyme digest with BamHI and Xhol, then ligated into the cloning site of the PcDNA3.1 expression plasmid. The positive recombinant was identified by PCR analysis, restriction endonudease analysis and sequence analysis. Expression of RNA and protein was detected by RT-PCR and Western blot analysis in PcDNA3.1-MSX-2 transfected HEK293 cells. Sequence analysis and restriction endonudease analysis of PcDNA3.1-MSX-2 demonstrated that the position and size of MSX-2 cDNA insertion were consistent with the design. RT-PCR and Western blot analysis showed specific expression of mRNA and protein of MSX-2 in the transfected HEK293 cells. The high effective eukaryotic expression plasmid PcDNA3.1-MSX-2 encoding Sprague-Dawley Rat MSX-2 gene which is related to craniofacial development can be successfully reconstructed. It may serve as the basis for the further study of MSX-2 gene function.
Lorsirigool, Athip; Saeng-Chuto, Kepalee; Madapong, Adthakorn; Temeeyasen, Gun; Tripipat, Thitima; Kaewprommal, Pavita; Tantituvanont, Angkana; Piriyapongsa, Jittima; Nilubol, Dachrit
2017-04-01
Porcine deltacoronavirus (PDCoV) was identified in intestinal samples collected from piglets with diarrhea in Thailand in 2015. Two Thai PDCoV isolates, P23_15_TT_1115 and P24_15_NT1_1215, were isolated and identified. The full-length genome sequences of the P23_15_TT_1115 and P24_15_NT1_1215 isolates were 25,404 and 25,407 nucleotides in length, respectively, which were relatively shorter than that of US and China PDCoV. The phylogenetic analysis based on the full-length genome demonstrated that Thai PDCoV isolates form a new cluster separated from US and China PDCoV but relatively were more closely related to China PDCoV than US isolates. The genetic analyses demonstrated that Thai PDCoVs have 97.0-97.8 and 92.2-94.0% similarities with China PDCoV at nucleotide and amino acid levels, respectively, but share 97.1-97.3 and 92.5-93.0 similarity with US PDCoV at the nucleotide and amino acid levels, respectively. Thai PDCoV possesses two discontinuous deletions of five amino acids in ORF1a/b region. One additional deletion of one amino acid was identified in P23_15_TT_1115. The variation analyses demonstrated that six regions (nt 1317-1436, 2997-3096, 19,737-19,836, 20,277-20,376, 21,177-21,276, and 22,371-22,416) in ORF1a/b and spike genes exhibit high sequence variation between Thai and other PDCoV. The analyses of amino acid changes suggested that they could potentially be from different lineages.
Zhang, L J; Dong, W X; Guo, S M; Wang, Y X; Wang, A D; Lu, X J
2015-11-19
This study aims to explore the roles of somatic embryogenesis receptor-like kinase (SERK) in Malus hupehensis (Pingyi Tiancha). The full-length sequences of SERK1 in triploid Pingyi Tiancha (3n) and a tetraploid hybrid strain 33# (4n) were cloned, sequenced, and designated as MhSERK1 and MhdSERK1, respectively. Multiple alignments of amino acid sequences were conducted to identify similarity between MhSERK1 and MhdSERK1 and SERK sequences in other species, and a neighbor-joining phylogenetic tree was constructed to elucidate their phylogenetic relations. Expression levels of MhSERK1 and MhdSERK1 in different tissues and developmental stages were investigated using quantitative real-time PCR. The coding sequence lengths of MhSERK1 and MhdSERK1 were 1899 bp (encoding 632 amino acids) and 1881 bp (encoding 626 amino acids), respectively. Sequence analysis demonstrated that MhSERK1 and MhdSERK1 display high similarity to SERKs in other species, with a conserved intron/exon structure that is unique to members of the SERK family. Additionally, the phylogenetic tree showed that MhSERK1 and MhdSERK1 clustered with orange CitSERK (93%). Furthermore, MhSERK1 and MhdSERK1 were mainly expressed in the reproductive organs, in particular the ovary. Their expression levels were highest in young flowers and they differed among different tissues and organs. Our results suggest that MhSERK1 and MhdSERK1 are related to plant reproduction, and that MhSERK1 is related to apomixis in triploid Pingyi Tiancha.
Russo, Alice G; Eden, John-Sebastian; Enosi Tuipulotu, Daniel; Shi, Mang; Selechnik, Daniel; Shine, Richard; Rollins, Lee Ann; Holmes, Edward C; White, Peter A
2018-06-13
Cane toads are a notorious invasive species, inhabiting over 1.2 million km 2 of Australia and threatening native biodiversity. Release of pathogenic cane toad viruses is one possible biocontrol strategy yet is currently hindered by the poorly-described cane toad virome. Metatranscriptomic analysis of 16 cane toad livers revealed the presence of a novel and full-length picornavirus, Rhimavirus A (RhiV-A), a member of a reptile and amphibian specific-cluster of the Picornaviridae basal to the Kobuvirus -like group. In the combined liver transcriptome, we also identified a complete genome sequence of a distinct epsilonretrovirus, R. marina endogenous retrovirus (RMERV). The recently sequenced cane toad genome contains eight complete RMERV proviruses, as well as 21 additional truncated insertions. The oldest full length RMERV provirus was estimated to have inserted 1.9 MYA. To screen for these viral sequences in additional toads, we analysed publicly available transcriptomes from six diverse Australian locations. RhiV-A transcripts were identified in toads sampled from three locations across 1,000 km of Australia, stretching to the current Western Australia (WA) invasion front, whilst RMERV transcripts were observed at all six sites. Lastly, we scanned the cane toad genome for non-retroviral endogenous viral elements, finding three sequences related to small DNA viruses in the family Circoviridae This shows ancestral circoviral infection with subsequent genomic integration. The identification of these current and past viral infections enriches our knowledge of the cane toad virome, an understanding of which will facilitate future work on infection and disease in this important invasive species. Importance Cane toads are poisonous amphibians which were introduced to Australia in 1935 for insect control. Since then, their population has increased dramatically, and they now threat many native Australian species. One potential method to control the population is to release a cane toad virus with high mortality, yet few cane toad viruses have been characterised. This study samples cane toads from different Australian locations and uses an RNA sequencing and computational approach to find new viruses. We report novel complete picornavirus and retrovirus sequences which were genetically similar to viruses infecting frogs, reptiles and fish. Using data generated in other studies, we show that these viral sequences are present in cane toads from distinct Australian locations. Three sequences related to circoviruses were also found in the toad genome. The identification of new viral sequences will aid future studies which investigate their prevalence and potential as agents for biocontrol. Copyright © 2018 American Society for Microbiology.
Weiss, Eric R; Lamers, Susanna L; Henderson, Jennifer L; Melnikov, Alexandre; Somasundaran, Mohan; Garber, Manuel; Selin, Liisa; Nusbaum, Chad; Luzuriaga, Katherine
2018-01-15
Over 90% of the world's population is persistently infected with Epstein-Barr virus. While EBV does not cause disease in most individuals, it is the common cause of acute infectious mononucleosis (AIM) and has been associated with several cancers and autoimmune diseases, highlighting a need for a preventive vaccine. At present, very few primary, circulating EBV genomes have been sequenced directly from infected individuals. While low levels of diversity and low viral evolution rates have been predicted for double-stranded DNA (dsDNA) viruses, recent studies have demonstrated appreciable diversity in common dsDNA pathogens (e.g., cytomegalovirus). Here, we report 40 full-length EBV genome sequences obtained from matched oral wash and B cell fractions from a cohort of 10 AIM patients. Both intra- and interpatient diversity were observed across the length of the entire viral genome. Diversity was most pronounced in viral genes required for establishing latent infection and persistence, with appreciable levels of diversity also detected in structural genes, including envelope glycoproteins. Interestingly, intrapatient diversity declined significantly over time ( P < 0.01), and this was particularly evident on comparison of viral genomes sequenced from B cell fractions in early primary infection and convalescence ( P < 0.001). B cell-associated viral genomes were observed to converge, becoming nearly identical to the B95.8 reference genome over time (Spearman rank-order correlation test; r = -0.5589, P = 0.0264). The reduction in diversity was most marked in the EBV latency genes. In summary, our data suggest independent convergence of diverse viral genome sequences toward a reference-like strain within a relatively short period following primary EBV infection. IMPORTANCE Identification of viral proteins with low variability and high immunogenicity is important for the development of a protective vaccine. Knowledge of genome diversity within circulating viral populations is a key step in this process, as is the expansion of intrahost genomic variation during infection. We report full-length EBV genomes sequenced from the blood and oral wash of 10 individuals early in primary infection and during convalescence. Our data demonstrate considerable diversity within the pool of circulating EBV strains, as well as within individual patients. Overall viral diversity decreased from early to persistent infection, particularly in latently infected B cells, which serve as the viral reservoir. Reduction in B cell-associated viral genome diversity coincided with a convergence toward a reference-like EBV genotype. Greater convergence positively correlated with time after infection, suggesting that the reference-like genome is the result of selection. Copyright © 2018 American Society for Microbiology.
Yun, Ki Wook; Choi, Eun Hwa; Lee, Hoan Jong
2017-01-01
Pneumococcal surface protein A (PspA) is an important virulence factor of pneumococci and has been investigated as a primary component of a capsular serotype-independent pneumococcal vaccine. Thus, we sought to determine the genetic diversity of PspA to explore its potential as a vaccine candidate. Among the 190 invasive pneumococcal isolates collected from Korean children between 1991 and 2016, two (1.1%) isolates were found to have no pspA by multiple polymerase chain reactions. The full length pspA genes from 185 pneumococcal isolates were sequenced. The length of pspA varied, ranging from 1,719 to 2,301 base pairs with 55.7-100% nucleotide identity. Based on the sequences of the clade-defining regions, 68.7% and 49.7% were in PspA family 2 and clade 3/family 2, respectively. PspA clade types were correlated with genotypes using multilocus sequence typing and divided into several subclades based on diversity analysis of the N-terminal α-helical regions, which showed nucleotide sequence identities of 45.7-100% and amino acid sequence identities of 23.1-100%. Putative antigenicity plots were also diverse among individual clades and subclades. The differences in antigenicity patterns were concentrated within the N-terminal 120 amino acids. In conclusion, the N-terminal α-helical domain, which is known to be the major immunogenic portion of PspA, is genetically variable and should be further evaluated for antigenic differences and cross-reactivity between various PspA types from pneumococcal isolates.
Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.
1996-01-01
The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
Molecular cloning and characterization of Hymenolepis diminuta alpha-tubulin gene.
Mohajer-Maghari, Behrokh; Amini-Bavil-Olyaee, Samad; Webb, Rodney A; Coe, Imogen R
2007-02-01
To isolate a full-length alpha-tubulin cDNA from an eucestode, Hymenolepis diminuta, a lambda phage cDNA library was constructed. The alpha-tubulin gene was cloned, sequenced and characterized. The H. diminuta alpha-tubulin consisted of 450 amino acids. This protein contained putative sites for all posttranslational modifications as detyrosination/tyrosination at the carboxyl-terminal of protien, phosphorylation at residues R79 and K336, glycylation/glutamylation at residue G445 and acetylation at residue K40. Comparisons of H. diminuta alpha-tubulin with all full-length alpha-tubulin proteins revealed that H. diminuta alpha-tubulin possesses 10 distinctive residues, which are not found in any other alpha-tubulins. Phylogenetic analysis showed that H. diminuta alpha-tubulin has grouped in a separated branch adjacent eucestode and trematodes branch with 92% bootstrap value (1000 replicates). In conclusion, this is the first report of H. diminuta cDNA library construction, cloning and characterization of H. diminuta alpha-tubulin gene.
Experimental annotation of the human genome using microarray technology.
Shoemaker, D D; Schadt, E E; Armour, C D; He, Y D; Garrett-Engele, P; McDonagh, P D; Loerch, P M; Leonardson, A; Lum, P Y; Cavet, G; Wu, L F; Altschuler, S J; Edwards, S; King, J; Tsang, J S; Schimmack, G; Schelter, J M; Koch, J; Ziman, M; Marton, M J; Li, B; Cundiff, P; Ward, T; Castle, J; Krolewski, M; Meyer, M R; Mao, M; Burchard, J; Kidd, M J; Dai, H; Phillips, J W; Linsley, P S; Stoughton, R; Scherer, S; Boguski, M S
2001-02-15
The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using 'exon' and 'tiling' arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu
2011-09-07
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model
Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu
2011-01-01
Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach
NASA Astrophysics Data System (ADS)
Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.
2012-10-01
In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
probeBase—an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016
Greuter, Daniel; Loy, Alexander; Horn, Matthias; Rattei, Thomas
2016-01-01
probeBase http://www.probebase.net is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. Here we present a major update of probeBase, which was last featured in the NAR Database Issue 2007. This update describes a complete remodeling of the database architecture and environment to accommodate computationally efficient access. Improved search functions, sequence match tools and data output now extend the opportunities for finding suitable hierarchical probe sets that target an organism or taxon at different taxonomic levels. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase. PMID:26586809
Chakraborty, Mohua; Dhar, Bishal; Ghosh, Sankar Kumar
2017-11-01
The DNA barcodes are generally interpreted using distance-based and character-based methods. The former uses clustering of comparable groups, based on the relative genetic distance, while the latter is based on the presence or absence of discrete nucleotide substitutions. The distance-based approach has a limitation in defining a universal species boundary across the taxa as the rate of mtDNA evolution is not constant throughout the taxa. However, character-based approach more accurately defines this using a unique set of nucleotide characters. The character-based analysis of full-length barcode has some inherent limitations, like sequencing of the full-length barcode, use of a sparse-data matrix and lack of a uniform diagnostic position for each group. A short continuous stretch of a fragment can be used to resolve the limitations. Here, we observe that a 154-bp fragment, from the transversion-rich domain of 1367 COI barcode sequences can successfully delimit species in the three most diverse orders of freshwater fishes. This fragment is used to design species-specific barcode motifs for 109 species by the character-based method, which successfully identifies the correct species using a pattern-matching program. The motifs also correctly identify geographically isolated population of the Cypriniformes species. Further, this region is validated as a species-specific mini-barcode for freshwater fishes by successful PCR amplification and sequencing of the motif (154 bp) using the designed primers. We anticipate that use of such motifs will enhance the diagnostic power of DNA barcode, and the mini-barcode approach will greatly benefit the field-based system of rapid species identification. © 2017 John Wiley & Sons Ltd.
Grossmann, Sebastian; Nowak, Piotr; Neogi, Ujjwal
2015-01-01
HIV-1 near full-length genome (HIV-NFLG) sequencing from plasma is an attractive multidimensional tool to apply in large-scale population-based molecular epidemiological studies. It also enables genotypic resistance testing (GRT) for all drug target sites allowing effective intervention strategies for control and prevention in high-risk population groups. Thus, the main objective of this study was to develop a simplified subtype-independent, cost- and labour-efficient HIV-NFLG protocol that can be used in clinical management as well as in molecular epidemiological studies. Plasma samples (n=30) were obtained from HIV-1B (n=10), HIV-1C (n=10), CRF01_AE (n=5) and CRF01_AG (n=5) infected individuals with minimum viral load >1120 copies/ml. The amplification was performed with two large amplicons of 5.5 kb and 3.7 kb, sequenced with 17 primers to obtain HIV-NFLG. GRT was validated against ViroSeq™ HIV-1 Genotyping System. After excluding four plasma samples with low-quality RNA, a total of 26 samples were attempted. Among them, NFLG was obtained from 24 (92%) samples with the lowest viral load being 3000 copies/ml. High (>99%) concordance was observed between HIV-NFLG and ViroSeq™ when determining the drug resistance mutations (DRMs). The N384I connection mutation was additionally detected by NFLG in two samples. Our high efficiency subtype-independent HIV-NFLG is a simple and promising approach to be used in large-scale molecular epidemiological studies. It will facilitate the understanding of the HIV-1 pandemic population dynamics and outline effective intervention strategies. Furthermore, it can potentially be applicable in clinical management of drug resistance by evaluating DRMs against all available antiretrovirals in a single assay.
Inada, Mari; Kihara, Keisuke; Kono, Tomoya; Sudhakaran, Raja; Mekata, Tohru; Sakai, Masahiro; Yoshida, Terutoyo; Itami, Toshiaki
2013-02-01
In many physiological processes, including the innate immune system, free radicals such as nitric oxide (NO) and reactive oxygen species (ROS) play significant roles. In humans, 2 homologs of Dual oxidases (Duox) generate hydrogen peroxide (H(2)O(2)), which is a type of ROS. Here, we report the identification and characterization of a Duox from kuruma shrimp, Marsupenaeus japonicus. The full-length cDNA sequence of the M. japonicus Dual oxidase (MjDuox) gene contains 4695 bp and was generated using reverse transcriptase-polymerase chain reaction (RT-PCR) and random amplification of cDNA ends (RACE). The open reading frame of MjDuox encodes a protein of 1498 amino acids with an estimated mass of 173 kDa. In a homology analysis using amino acid sequences, MjDuox exhibited 69.3% sequence homology with the Duox of the red flour beetle, Tribolium castaneum. A transcriptional analysis revealed that the MjDuox mRNA is highly expressed in the gills of healthy kuruma shrimp. In the gills, MjDuox expression reached its peak 60 h after injection with WSSV and decreased to its normal level at 72 h. In gene knockdown experiments of free radical-generating enzymes, the survival rates decreased during the early stages of a white spot syndrome virus (WSSV) infection following the knockdown of the NADPH oxidase (MjNox) or MjDuox genes. In the present study, the identification, cloning and gene knockdown of the kuruma shrimp MjDuox are reported. Duoxes have been identified in vertebrates and some insects; however, few reports have investigated Duoxes in crustaceans. This study is the first to identify and clone a Dual oxidase from a crustacean species. Copyright © 2012 Elsevier Ltd. All rights reserved.
Augustin, A; Muller-Steffner, H; Schuber, F
2000-01-01
Bovine spleen ecto-NAD(+) glycohydrolase, an archetypal member of the mammalian membrane-associated NAD(P)(+) glycohydrolase enzyme family (EC 3.2.2.6), displays catalytic features similar to those of CD38, i.e. a protein originally described as a lymphocyte differentiation marker involved in the metabolism of cyclic ADP-ribose and signal transduction. Using amino acid sequence information obtained from NAD(+) glycohydrolase and from a truncated and hydrosoluble form of the enzyme (hNADase) purified to homogeneity, a full-length cDNA clone was obtained. The deduced sequence indicates a protein of 278 residues with a molecular mass of 31.5 kDa. It predicts that bovine ecto-NAD(+) glycohydrolase is a type II transmembrane protein, with a very short intracellular tail. The bulk of the enzyme, which is extracellular and contains two potential N-glycosylation sites, yields the fully catalytically active hNADase which is truncated by 71 residues. Transfection of HeLa cells with the full-length cDNA resulted in the expression of the expected NAD(+) glycohydrolase, ADP-ribosyl cyclase and GDP-ribosyl cyclase activities at the surface of the cells. The bovine enzyme, which is the first 'classical' NAD(P)(+) glycohydrolase whose structure has been established, presents a particularly high sequence identity with CD38, including the presence of 10 strictly conserved cysteine residues in the ectodomain and putative catalytic residues. However, it lacks two otherwise conserved cysteine residues near its C-terminus. Thus hNADase, the truncated protein of 207 amino acids, represents the smallest functional domain endowed with all the catalytic activities of CD38/NAD(+) glycohydrolases so far identified. Altogether, our data strongly suggest that the cloned bovine spleen ecto-NAD(+) glycohydrolase is the bovine equivalent of CD38. PMID:10600637
Trémeaux, P; Caporossi, A; Ramière, C; Santoni, E; Tarbouriech, N; Thélu, M-A; Fusillier, K; Geneletti, L; François, O; Leroy, V; Burmeister, W P; André, P; Morand, P; Larrat, S
2016-05-01
Directly acting antiviral drugs have contributed considerable progress to hepatitis C virus (HCV) treatment, but they show variable activity depending on virus genotypes and subtypes. Therefore, accurate genotyping including recombinant form detection is still of major importance, as is the detection of resistance-associated mutations in case of therapeutic failure. To meet these goals, an approach to amplify the HCV near-complete genome with a single long-range PCR and sequence it with Roche GS Junior was developed. After optimization, the overall amplification success rate was 73% for usual genotypes (i.e. HCV 1a, 1b, 3a and 4a, 16/22) and 45% for recombinant forms RF_2k/1b (5/11). After pyrosequencing and subsequent de novo assembly, a near-full-length genomic consensus sequence was obtained for 19 of 21 samples. The genotype and subtype were confirmed by phylogenetic analysis for every sample, including the suspected recombinant forms. Resistance-associated mutations were detected in seven of 13 samples at baseline, in the NS3 (n = 3) or NS5A (n = 4) region. Of these samples, the treatment of one patient included daclatasvir, and that patient experienced a relapse. Virus sequences from pre- and posttreatment samples of four patients who experienced relapse after sofosbuvir-based therapy were compared: the selected variants seem too far from the NS5B catalytic site to be held responsible. Although tested on a limited set of samples and with technical improvements still necessary, this assay has proven to be successful for both genotyping and resistance-associated variant detection on several HCV types. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing
2015-01-01
ABSTRACT Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1–PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. PMID:26430213
Su, Xuefeng; Wu, Maoqing; Yao, Gang; El-Jouni, Wassim; Luo, Chong; Tabari, Azadeh; Zhou, Jing
2015-11-15
Failure to localize membrane proteins to the primary cilium causes a group of diseases collectively named ciliopathies. Polycystin-1 (PC1, also known as PKD1) is a large ciliary membrane protein defective in autosomal dominant polycystic kidney disease (ADPKD). Here, we developed a large set of PC1 expression constructs and identified multiple sequences, including a coiled-coil motif in the C-terminal tail of PC1, regulating full-length PC1 trafficking to the primary cilium. Ciliary trafficking of wild-type and mutant PC1 depends on the dose of polycystin-2 (PC2, also known as PKD2), and the formation of a PC1-PC2 complex. Modulation of the ciliary trafficking module mediated by the VxP ciliary-targeting sequence and Arf4 and Asap1 does not affect the ciliary localization of full-length PC1. PC1 also promotes PC2 ciliary trafficking. PC2 mutations truncating its C-terminal tail but not those changing the VxP sequence to AxA or impairing the pore of the channel, leading to a dead channel, affect PC1 ciliary trafficking. Cleavage at the GPCR proteolytic site (GPS) of PC1 is not required for PC1 trafficking to cilia. We propose a mutually dependent model for the ciliary trafficking of PC1 and PC2, and that PC1 ciliary trafficking is regulated by multiple cis-acting elements. As all pathogenic PC1 mutations tested here are defective in ciliary trafficking, ciliary trafficking might serve as a functional read-out for ADPKD. © 2015. Published by The Company of Biologists Ltd.
Ogi, Miki; Yano, Yoshihiko; Chikahira, Masatsugu; Takai, Denshi; Oshibe, Tomohiro; Arashiro, Takeshi; Hanaoka, Nozomu; Fujimoto, Tsuguto; Hayashi, Yoshitake
2017-08-01
Coxsackievirus A6 (CV-A6) is an enterovirus, which is known to cause herpangina. However, since 2009 it has frequently been isolated from children with hand, foot, and mouth disease (HFMD). In Japan, CV-A6 has been linked to HFMD outbreaks in 2011 and 2013. In this study, the full-length genome sequencing of CV-A6 strains were analyzed to identify the association with clinical manifestations. Five thousand six hundred and twelve children with suspected enterovirus infection (0-17 years old) between 1999 and 2013 in Hyogo Prefecture, Japan, were enrolled. Enterovirus infection was confirmed with reverse transcriptase-PCR in 753 children (791 samples), 127 of whom (133 samples) were positive for CV-A6 based on the direct sequencing of the VP4 region. The complete genomes of CV-A6 from 22 positive patients with different clinical manifestations were investigated. A phylogenetic analysis divided these 22 strains into two clusters based on the VP1 region; cluster I contained strains collected in 1999-2009 and mostly related to herpangina, and cluster II contained strains collected in 2011-2013 and related to HFMD outbreak. Based on the full-length polyprotein analysis, the amino acid differences between the strains in cluster I and II were 97.7 ± 0.28%. Amino acid differences were detected in 17 positions within the polyprotein. Strains collected in 1999-2009 and those in 2011-2013 were separately clustered by phylogenetic analysis based on 5'UTR and 3Dpol region, as well as VP1 region. In conclusion, HFMD outbreaks by CV-A6 were recently frequent in Japan and the accumulation of genomic change might be associated with the clinical course. © 2017 Wiley Periodicals, Inc.
Memory for tonal pitches: a music-length effect hypothesis.
Akiva-Kabiri, Lilach; Vecchi, Tomaso; Granot, Roni; Basso, Demis; Schön, Daniele
2009-07-01
One of the most studied effects of verbal working memory (WM) is the influence of the length of the words that compose the list to be remembered. This work aims to investigate the nature of musical WM by replicating the word length effect in the musical domain. Length and rate of presentation were manipulated in a recognition task of tone sequences. Results showed significant effects for both factors (length and presentation rate) as well as their interaction, suggesting the existence of different strategies (e.g., chunking and rehearsal) for the immediate memory of musical information, depending upon the length of the sequences.
On the normalization of the minimum free energy of RNAs by sequence length.
Trotta, Edoardo
2014-01-01
The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.
On the Normalization of the Minimum Free Energy of RNAs by Sequence Length
Trotta, Edoardo
2014-01-01
The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size. PMID:25405875
A note on chaotic unimodal maps and applications.
Zhou, C T; He, X T; Yu, M Y; Chew, L Y; Wang, X G
2006-09-01
Based on the word-lift technique of symbolic dynamics of one-dimensional unimodal maps, we investigate the relation between chaotic kneading sequences and linear maximum-length shift-register sequences. Theoretical and numerical evidence that the set of the maximum-length shift-register sequences is a subset of the set of the universal sequence of one-dimensional chaotic unimodal maps is given. By stabilizing unstable periodic orbits on superstable periodic orbits, we also develop techniques to control the generation of long binary sequences.
What is a melody? On the relationship between pitch and brightness of timbre.
Cousineau, Marion; Carcagno, Samuele; Demany, Laurent; Pressnitzer, Daniel
2013-01-01
Previous studies showed that the perceptual processing of sound sequences is more efficient when the sounds vary in pitch than when they vary in loudness. We show here that sequences of sounds varying in brightness of timbre are processed with the same efficiency as pitch sequences. The sounds used consisted of two simultaneous pure tones one octave apart, and the listeners' task was to make same/different judgments on pairs of sequences varying in length (one, two, or four sounds). In one condition, brightness of timbre was varied within the sequences by changing the relative level of the two pure tones. In other conditions, pitch was varied by changing fundamental frequency, or loudness was varied by changing the overall level. In all conditions, only two possible sounds could be used in a given sequence, and these two sounds were equally discriminable. When sequence length increased from one to four, discrimination performance decreased substantially for loudness sequences, but to a smaller extent for brightness sequences and pitch sequences. In the latter two conditions, sequence length had a similar effect on performance. These results suggest that the processes dedicated to pitch and brightness analysis, when probed with a sequence-discrimination task, share unexpected similarities.
Paraskevis, D; Magiorkinis, M; Vandamme, A M; Kostrikis, L G; Hatzakis, A
2001-03-01
Human immunodeficiency virus type 1 (HIV-1) has been classified into three main groups and 11 distinct subtypes. Moreover, several circulating recombinant forms (CRFs) of HIV-1 have been recently documented to have spread widely causing extensive HIV-1 epidemics. A subtype, initially designated I (CRF04_cpx), was documented in Cyprus and Greece and was found to comprise regions of sequence derived from subtypes A and G as well as regions of unclassified sequence. Re-analysis of the three full-length CRF04_cpx sequences that were available revealed a mosaic genomic organization of unique complexity comprising regions of sequence from at least five distinct subtypes, A, G, H, K and unclassified regions. These strains account for approximately 2% of the total HIV-1-infected population in Greece, thus providing evidence of the great capability of HIV-1 to recombine and produce highly divergent strains which can be spread successfully through different infection routes.
Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee
2015-09-21
Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
Pelnena, Dita; Burnyte, Birute; Jankevics, Eriks; Lace, Baiba; Dagyte, Evelina; Grigalioniene, Kristina; Utkus, Algirdas; Krumina, Zita; Rozentale, Jolanta; Adomaitiene, Irina; Stavusis, Janis; Pliss, Liana; Inashkina, Inna
2017-12-12
The most common mitochondrial disorder in children is Leigh syndrome, which is a progressive and genetically heterogeneous neurodegenerative disorder caused by mutations in nuclear genes or mitochondrial DNA (mtDNA). In the present study, a novel and robust method of complete mtDNA sequencing, which allows amplification of the whole mitochondrial genome, was tested. Complete mtDNA sequencing was performed in a cohort of patients with suspected mitochondrial mutations. Patients from Latvia and Lithuania (n = 92 and n = 57, respectively) referred by clinical geneticists were included. The de novo point mutations m.9185T>C and m.13513G>A, respectively, were detected in two patients with lactic acidosis and neurodegenerative lesions. In one patient with neurodegenerative lesions, the mutation m.9185T>C was identified. These mutations are associated with Leigh syndrome. The present data suggest that full-length mtDNA sequencing is recommended as a supplement to nuclear gene testing and enzymatic assays to enhance mitochondrial disease diagnostics.
Bedon, Frank; Grima-Pettenati, Jacqueline; Mackay, John
2007-01-01
Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco. PMID:17397551
Poly A tail length analysis of in vitro transcribed mRNA by LC-MS.
Beverly, Michael; Hagen, Caitlin; Slack, Olga
2018-02-01
The 3'-polyadenosine (poly A) tail of in vitro transcribed (IVT) mRNA was studied using liquid chromatography coupled to mass spectrometry (LC-MS). Poly A tails were cleaved from the mRNA using ribonuclease T1 followed by isolation with dT magnetic beads. Extracted tails were then analyzed by LC-MS which provided tail length information at single-nucleotide resolution. A 2100-nt mRNA with plasmid-encoded poly A tail lengths of either 27, 64, 100, or 117 nucleotides was used for these studies as enzymatically added poly A tails showed significant length heterogeneity. The number of As observed in the tails closely matched Sanger sequencing results of the DNA template, and even minor plasmid populations with sequence variations were detected. When the plasmid sequence contained a discreet number of poly As in the tail, analysis revealed a distribution that included tails longer than the encoded tail lengths. These observations were consistent with transcriptional slippage of T7 RNAP taking place within a poly A sequence. The type of RNAP did not alter the observed tail distribution, and comparison of T3, T7, and SP6 showed all three RNAPs produced equivalent tail length distributions. The addition of a sequence at the 3' end of the poly A tail did, however, produce narrower tail length distributions which supports a previously described model of slippage where the 3' end can be locked in place by having a G or C after the poly nucleotide region. Graphical abstract Determination of mRNA poly A tail length using magnetic beads and LC-MS.
Characterisation of single domain ATP-binding cassette protien homologues of Theileria parva.
Kibe, M K; Macklin, M; Gobright, E; Bishop, R; Urakawa, T; ole-MoiYoi, O K
2001-09-01
Two distinct genes encoding single domain, ATP-binding cassette transport protein homologues of Theileria parva were cloned and sequenced. Neither of the genes is tandemly duplicated. One gene, TpABC1, encodes a predicted protein of 593 amino acids with an N-terminal hydrophobic domain containing six potential membrane-spanning segments. A single discontinuous ATP-binding element was located in the C-terminal region of TpABC1. The second gene, TpABC2, also contains a single C-terminal ATP-binding motif. Copies of TpABC2 were present at four loci in the T. parva genome on three different chromosomes. TpABC1 exhibited allelic polymorphism between stocks of the parasite. Comparison of cDNA and genomic sequences revealed that TpABC1 contained seven short introns, between 29 and 84 bp in length. The full-length TpABC1 protein was expressed in insect cells using the baculovirus system. Application of antibodies raised against the recombinant antigen to western blots of T. parva piroplasm lysates detected an 85 kDa protein in this life-cycle stage.
Improved maize reference genome with single-molecule technologies.
Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen
2017-06-22
Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.
The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.
Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L
2003-11-01
Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.
Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang
2013-08-01
To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy persons. There was no cross reaction with sera of schistosomiasis and cysticercosis patients. The cross reaction with sera of paragonimiasis westermani patients was 1/15. The recombinant proteins rCs22M-2r and rCs22M-3r which only contained tandem repeats were specifically recognized by pooled sera of clonorchiasis patients. The Cs22 antigen gene of Clonorchis sinensis is obtained, and the recombinant proteins have certain diagnostic value. The antigenic determinant is located in tandem repeat sequences.