full gene sequencing: Topics by Science.gov

Sample records for full gene sequencing

[Cloning and sequence analysis of full-length cDNA of secoisolariciresinol dehydrogenase of Dysosma versipellis].

PubMed

Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen

2009-06-01

To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.
Evaluation of full S1 gene sequencing of classical and variant infectious bronchitis viruses extracted from allantoic fluid and FTA cards.

PubMed

Manswr, Basim; Ball, Christopher; Forrester, Anne; Chantrey, Julian; Ganapathy, Kannan

2018-08-01

Sequence variability in the S1 gene determines the genotype of infectious bronchitis virus (IBV) strains. A single RT-PCR assay was developed to amplify and sequence the full S1 gene for six classical and variant IBVs (M41, D274, 793B, IS/885/00, IS/1494/06 and Q1) enriched in allantoic fluid (AF) or the same AF inoculated onto Flinders Technology Association (FTA) cards. Representative strains from each genotype were grown in specific-pathogen-free eggs and RNA was extracted from AF. Full S1 gene amplification was achieved using primer A and primer 22.51. Products were sequenced using primers A, 1050+, 1380+ and SX3+ to obtain short sequences covering the full gene. Following serial dilutions of AF, detection limits of the partial assay were higher than those of the full S1 gene. Partial S1 sequences exhibited higher-than-average nucleotide similarity percentages (79%; 352 bp) compared to full S1 sequences (77%; 1756 bp), suggesting that full S1 analysis allows greater strain differentiation. For IBV detection from AF-inoculated FTA cards, four serotypes were incubated for up to 21 days at three temperatures, 4°C, room temperature (approximately 24°C) and 40°C. RNA was extracted and tested with partial and full S1 protocols. Through partial sequencing, all IBVs were successfully detected at all sampling points and storage temperatures. In contrast, using full S1 sequencing it was not possible to amplify the gene beyond 14 days or when stored at 40°C. Data presented show that for full S1 sequencing, a substantial amount of RNA is needed. Field samples collected onto FTA cards are unlikely to yield such quantity or quality. AF: allantoic fluid; CD50: ciliostatic dose 50; FTA: Flinders Technology Association; IB: infectious bronchitis; IBV: infectious bronchitis virus.
Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

PubMed Central

Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

2012-01-01

To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

PubMed Central

Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

2013-01-01

A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698
Construction and EST sequencing of full-length, drought stress cDNA libraries for common beans (Phaseolus vulgaris L.)

PubMed Central

2011-01-01

Background Common bean is an important legume crop with only a moderate number of short expressed sequence tags (ESTs) made with traditional methods. The goal of this research was to use full-length cDNA technology to develop ESTs that would overlap with the beginning of open reading frames and therefore be useful for gene annotation of genomic sequences. The library was also constructed to represent genes expressed under drought, low soil phosphorus and high soil aluminum toxicity. We also undertook comparisons of the full-length cDNA library to two previous non-full clone EST sets for common bean. Results Two full-length cDNA libraries were constructed: one for the drought tolerant Mesoamerican genotype BAT477 and the other one for the acid-soil tolerant Andean genotype G19833 which has been selected for genome sequencing. Plants were grown in three soil types using deep rooting cylinders subjected to drought and non-drought stress and tissues were collected from both roots and above ground parts. A total of 20,000 clones were selected robotically, half from each library. Then, nearly 10,000 clones from the G19833 library were sequenced with an average read length of 850 nucleotides. A total of 4,219 unigenes were identified consisting of 2,981 contigs and 1,238 singletons. These were functionally annotated with gene ontology terms and placed into KEGG pathways. Compared to other EST sequencing efforts in common bean, about half of the sequences were novel or represented the 5' ends of known genes. Conclusions The present full-length cDNA libraries add to the technological toolbox available for common bean and our sequencing of these clones substantially increases the number of unique EST sequences available for the common bean genome. All of this should be useful for both functional gene annotation, analysis of splice site variants and intron/exon boundary determination by comparison to soybean genes or with common bean whole-genome sequences. In addition the library has a large number of transcription factors and will be interesting for discovery and validation of drought or abiotic stress related genes in common bean. PMID:22118559
Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

PubMed Central

Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi

2006-01-01

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

NASA Astrophysics Data System (ADS)

Kikuchi, Shoshi

2009-02-01

Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
A simplified Sanger sequencing method for full genome sequencing of multiple subtypes of human influenza A viruses.

PubMed

Deng, Yi-Mo; Spirason, Natalie; Iannello, Pina; Jelley, Lauren; Lau, Hilda; Barr, Ian G

2015-07-01

Full genome sequencing of influenza A viruses (IAV), including those that arise from annual influenza epidemics, is undertaken to determine if reassorting has occurred or if other pathogenic traits are present. Traditionally IAV sequencing has been biased toward the major surface glycoproteins haemagglutinin and neuraminidase, while the internal genes are often ignored. Despite the development of next generation sequencing (NGS), many laboratories are still reliant on conventional Sanger sequencing to sequence IAV. To develop a minimal and robust set of primers for Sanger sequencing of the full genome of IAV currently circulating in humans. A set of 13 primer pairs was designed that enabled amplification of the six internal genes of multiple human IAV subtypes including the recent avian influenza A(H7N9) virus from China. Specific primers were designed to amplify the HA and NA genes of each IAV subtype of interest. Each of the primers also incorporated a binding site at its 5'-end for either a forward or reverse M13 primer, such that only two M13 primers were required for all subsequent sequencing reactions. This minimal set of primers was suitable for sequencing the six internal genes of all currently circulating human seasonal influenza A subtypes as well as the avian A(H7N9) viruses that have infected humans in China. This streamlined Sanger sequencing protocol could be used to generate full genome sequence data more rapidly and easily than existing influenza genome sequencing protocols. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
A survey of the sorghum transcriptome using single-molecule long reads

DOE PAGES

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

2016-06-24

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads

PubMed Central

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

2016-01-01

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
The full genome sequences of 8 equine herpesvirus type 4 isolates from horses in Japan.

PubMed

Izume, Satoko; Kirisawa, Rikio; Ohya, Kenji; Ohnuma, Aiko; Kimura, Takashi; Omatsu, Tsutomu; Katayama, Yukie; Mizutani, Tetsuya; Fukushi, Hideto

2017-01-24

Equine herpesvirus type 4 (EHV-4) is one of the most important pathogens in horses. To clarify the key genes of the EHV-4 genome that cause abortion in female horses, we determined the whole genome sequences of a laboratory strain and 7 Japanese EHV-4 isolates that were isolated from 2 aborted fetuses and nasal swabs of 5 horses with respiratory disease. The full genome sequences and predicted amino acid sequences of each gene of these isolates were compared with of the reference EHV-4 strain NS80567 and Australian isolates that were reported in 2015. The EHV-4 isolates clustered in 2 groups which did not reflect their pathogenicity. A comparison of the predicted amino acid sequences of the genes did not reveal any genes that were associated with EHV-4-induced abortion.
A phylogenetic comparison of urease-positive thermophilic Campylobacter (UPTC) and urease-negative (UN) C. lari.

PubMed

Hirayama, Junichi; Tazumi, Akihiro; Hayashi, Kyohei; Tasaki, Erina; Kuribayashi, Takashi; Moore, John E; Millar, Beverley C; Matsuda, Motoo

2011-06-01

In the present study, the reliability of full-length gene sequence information for several genes including 16S rRNA was examined, for the discrimination of the two representative Campylobacter lari taxa, namely urease-negative (UN) C. lari and urease-positive thermophilic Campylobacter (UPTC). As previously described, 16S rRNA gene sequence are not reliable for the molecular discrimination of UN C. lari from UPTC organisms employing both the unweighted pair group method using arithmetic means analysis (UPGMA) and neighbor joining (NJ) methods. In addition, three composite full-length gene sequences (ciaB, flaC and vacJ) out of seven gene loci examined were reliable for discrimination employing dendrograms constructed by the UPGMA method. In addition, all the dendrograms of the NJ phylogenetic trees constructed based on the nine gene information were not reliable for the discrimination. Three composite full-length gene sequences (ciaB, flaC and vacJ) were reliable for the molecular discrimination between UN C. lari and UPTC organisms employing the UPGMA method, as well as among four thermophilic Campylobacter species. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

PubMed Central

Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

2009-01-01

Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species. PMID:19747386
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Study on the Evolution of Genes Mutation Related With Gastrointestinal Stromal Tumors

ClinicalTrials.gov

2012-01-05

Full Gene Sequences of c-KIT、PDGFRA and DOG1 Are Analyzed With the Screening-sequencing Approach; Investigate the Characteristics and Variations Associated With the Different Gene Mutations of c-KIT、PDGFRA and DOG1 in GIST Patients
Employment of Near Full-Length Ribosome Gene TA-Cloning and Primer-Blast to Detect Multiple Species in a Natural Complex Microbial Community Using Species-Specific Primers Designed with Their Genome Sequences.

PubMed

Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou

2016-11-01

It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

PubMed Central

2011-01-01

Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
UniGene Tabulator: a full parser for the UniGene format.

PubMed

Lenzi, Luca; Frabetti, Flavia; Facchin, Federica; Casadei, Raffaella; Vitale, Lorenza; Canaider, Silvia; Carinci, Paolo; Zannotti, Maria; Strippoli, Pierluigi

2006-10-15

UniGene Tabulator 1.0 provides a solution for full parsing of UniGene flat file format; it implements a structured graphical representation of each data field present in UniGene following import into a common database managing system usable in a personal computer. This database includes related tables for sequence, protein similarity, sequence-tagged site (STS) and transcript map interval (TXMAP) data, plus a summary table where each record represents a UniGene cluster. UniGene Tabulator enables full local management of UniGene data, allowing parsing, querying, indexing, retrieving, exporting and analysis of UniGene data in a relational database form, usable on Macintosh (OS X 10.3.9 or later) and Windows (2000, with service pack 4, XP, with service pack 2 or later) operating systems-based computers. The current release, including both the FileMaker runtime applications, is freely available at http://apollo11.isto.unibo.it/software/
High-Resolution Sequence-Function Mapping of Full-Length Proteins

PubMed Central

Kowalsky, Caitlin A.; Klesmith, Justin R.; Stapleton, James A.; Kelly, Vince; Reichkitzer, Nolan; Whitehead, Timothy A.

2015-01-01

Comprehensive sequence-function mapping involves detailing the fitness contribution of every possible single mutation to a gene by comparing the abundance of each library variant before and after selection for the phenotype of interest. Deep sequencing of library DNA allows frequency reconstruction for tens of thousands of variants in a single experiment, yet short read lengths of current sequencers makes it challenging to probe genes encoding full-length proteins. Here we extend the scope of sequence-function maps to entire protein sequences with a modular, universal sequence tiling method. We demonstrate the approach with both growth-based selections and FACS screening, offer parameters and best practices that simplify design of experiments, and present analytical solutions to normalize data across independent selections. Using this protocol, sequence-function maps covering full sequences can be obtained in four to six weeks. Best practices introduced in this manuscript are fully compatible with, and complementary to, other recently published sequence-function mapping protocols. PMID:25790064
Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response

PubMed Central

Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu

2007-01-01

Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061

Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

PubMed Central

Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

2015-01-01

Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
Integrating De Novo Transcriptome Assembly and Cloning to Obtain Chicken Ovocleidin-17 Full-Length cDNA

PubMed Central

Ning, ZhongHua; Hincke, Maxwell T.; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not ‘finished’. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences. PMID:24676480
Integrating de novo transcriptome assembly and cloning to obtain chicken Ovocleidin-17 full-length cDNA.

PubMed

Zhang, Quan; Liu, Long; Zhu, Feng; Ning, ZhongHua; Hincke, Maxwell T; Yang, Ning; Hou, ZhuoCheng

2014-01-01

Efficiently obtaining full-length cDNA for a target gene is the key step for functional studies and probing genetic variations. However, almost all sequenced domestic animal genomes are not 'finished'. Many functionally important genes are located in these gapped regions. It can be difficult to obtain full-length cDNA for which only partial amino acid/EST sequences exist. In this study we report a general pipeline to obtain full-length cDNA, and illustrate this approach for one important gene (Ovocleidin-17, OC-17) that is associated with chicken eggshell biomineralization. Chicken OC-17 is one of the best candidates to control and regulate the deposition of calcium carbonate in the calcified eggshell layer. OC-17 protein has been purified, sequenced, and has had its three-dimensional structure solved. However, researchers still cannot conduct OC-17 mRNA related studies because the mRNA sequence is unknown and the gene is absent from the current chicken genome. We used RNA-Seq to obtain the entire transcriptome of the adult hen uterus, and then conducted de novo transcriptome assembling with bioinformatics analysis to obtain candidate OC-17 transcripts. Based on this sequence, we used RACE and PCR cloning methods to successfully obtain the full-length OC-17 cDNA. Temporal and spatial OC-17 mRNA expression analyses were also performed to demonstrate that OC-17 is predominantly expressed in the adult hen uterus during the laying cycle and barely at immature developmental stages. Differential uterine expression of OC-17 was observed in hens laying eggs with weak versus strong eggshell, confirming its important role in the regulation of eggshell mineralization and providing a new tool for genetic selection for eggshell quality parameters. This study is the first one to report the full-length OC-17 cDNA sequence, and builds a foundation for OC-17 mRNA related studies. We provide a general method for biologists experiencing difficulty in obtaining candidate gene full-length cDNA sequences.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

PubMed

Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

2015-03-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

PubMed Central

DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

2015-01-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
Solution Hybrid Selection Capture for the Recovery of Functional Full-Length Eukaryotic cDNAs From Complex Environmental Samples

PubMed Central

Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia

2014-01-01

Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
RT-PCR and sequence analysis of the full-length fusion protein of Canine Distemper Virus from domestic dogs.

PubMed

Romanutti, Carina; Gallo Calderón, Marina; Keller, Leticia; Mattion, Nora; La Torre, José

2016-02-01

During 2007-2014, 84 out of 236 (35.6%) samples from domestic dogs submitted to our laboratory for diagnostic purposes were positive for Canine Distemper Virus (CDV), as analyzed by RT-PCR amplification of a fragment of the nucleoprotein gene. Fifty-nine of them (70.2%) were from dogs that had been vaccinated against CDV. The full-length gene encoding the Fusion (F) protein of fifteen isolates was sequenced and compared with that of those of other CDVs, including wild-type and vaccine strains. Phylogenetic analysis using the F gene full-length sequences grouped all the Argentinean CDV strains in the SA2 clade. Sequence identity with the Onderstepoort vaccine strain was 89.0-90.6%, and the highest divergence was found in the 135 amino acids corresponding to the F protein signal-peptide, Fsp (64.4-66.7% identity). In contrast, this region was highly conserved among the local strains (94.1-100% identity). One extra putative N-glycosylation site was identified in the F gene of CDV Argentinean strains with respect to the vaccine strain. The present report is the first to analyze full-length F protein sequences of CDV strains circulating in Argentina, and contributes to the knowledge of molecular epidemiology of CDV, which may help in understanding future disease outbreaks. Copyright © 2015 Elsevier B.V. All rights reserved.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2016-02-16

The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

2015-11-04

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

2015-09-01

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-09-15

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

2015-08-18

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Identification of gyrB and rpoB gene mutations and differentially expressed proteins between a novobiocin-resistant Aeromonas hydrophila catfish vaccine strain and its virulent parent strain

USDA-ARS?s Scientific Manuscript database

Sequence comparison between the full-length 2412 bp DNA gyrase subunit B (gyrB) gene of a novobiocin resistant Aeromonas hydrophila AH11NOVO vaccine strain and that of its virulent parent strain AH11P revealed 10 missense mutations. Similarly, sequence comparison between the full-length 4092 bp RNA ...
Complete Mitochondrial Genome Sequence of Aethina tumida (Coleoptera: Nitidulidae), a Beekeeping Pest.

PubMed

Duquesne, Véronique; Delcont, Aurélie; Huleux, Anthéa; Beven, Véronique; Touzain, Fabrice; Ribière-Chabert, Magali

2017-11-02

We report here the full mitochondrial genome sequence of Aethina tumida , a Nitidulidae species beetle, that is a pest of bee hives. The obtained sequence is 16,576 bp in length and contains 13 protein-coding genes, 2 rRNA genes, and 22 tRNAs. Copyright © 2017 Duquesne et al.
Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

PubMed

Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

2010-11-01

Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
TypeLoader: A fast and efficient automated workflow for the annotation and submission of novel full-length HLA alleles.

PubMed

Surendranath, V; Albrecht, V; Hayhurst, J D; Schöne, B; Robinson, J; Marsh, S G E; Schmidt, A H; Lange, V

2017-07-01

Recent years have seen a rapid increase in the discovery of novel allelic variants of the human leukocyte antigen (HLA) genes. Commonly, only the exons encoding the peptide binding domains of novel HLA alleles are submitted. As a result, the IPD-IMGT/HLA Database lacks sequence information outside those regions for the majority of known alleles. This has implications for the application of the new sequencing technologies, which deliver sequence data often covering the complete gene. As these technologies simplify the characterization of the complete gene regions, it is desirable for novel alleles to be submitted as full-length sequences to the database. However, the manual annotation of full-length alleles and the generation of specific formats required by the sequence repositories is prone to error and time consuming. We have developed TypeLoader to address both these facets. With only the full-length sequence as a starting point, Typeloader performs automatic sequence annotation and subsequently handles all steps involved in preparing the specific formats for submission with very little manual intervention. TypeLoader is routinely used at the DKMS Life Science Lab and has aided in the successful submission of more than 900 novel HLA alleles as full-length sequences to the European Nucleotide Archive repository and the IPD-IMGT/HLA Database with a 95% reduction in the time spent on annotation and submission when compared with handling these processes manually. TypeLoader is implemented as a web application and can be easily installed and used on a standalone Linux desktop system or within a Linux client/server architecture. TypeLoader is downloadable from http://www.github.com/DKMS-LSL/typeloader. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Long-read sequencing of the coffee bean transcriptome reveals the diversity of full-length transcripts

PubMed Central

Cheng, Bing; Furtado, Agnelo

2017-01-01

Abstract Polyploidization contributes to the complexity of gene expression, resulting in numerous related but different transcripts. This study explored the transcriptome diversity and complexity of the tetraploid Arabica coffee (Coffea arabica) bean. Long-read sequencing (LRS) by Pacbio Isoform sequencing (Iso-seq) was used to obtain full-length transcripts without the difficulty and uncertainty of assembly required for reads from short-read technologies. The tetraploid transcriptome was annotated and compared with data from the sub-genome progenitors. Caffeine and sucrose genes were targeted for case analysis. An isoform-level tetraploid coffee bean reference transcriptome with 95 995 distinct transcripts (average 3236 bp) was obtained. A total of 88 715 sequences (92.42%) were annotated with BLASTx against NCBI non-redundant plant proteins, including 34 719 high-quality annotations. Further BLASTn analysis against NCBI non-redundant nucleotide sequences, Coffea canephora coding sequences with UTR, C. arabica ESTs, and Rfam resulted in 1213 sequences without hits, were potential novel genes in coffee. Longer UTRs were captured, especially in the 5΄UTRs, facilitating the identification of upstream open reading frames. The LRS also revealed more and longer transcript variants in key caffeine and sucrose metabolism genes from this polyploid genome. Long sequences (>10 kilo base) were poorly annotated. LRS technology shows the limitation of previous studies. It provides an important tool to produce a reference transcriptome including more of the diversity of full-length transcripts to help understand the biology and support the genetic improvement of polyploid species such as coffee. PMID:29048540
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome

PubMed Central

Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.

2001-01-01

Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022

The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.

PubMed

Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M

2001-10-09

Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
Molecular cloning of a putative gene encoding isopentenyltransferase from pingyitiancha (Malus hupehensis) and characterization of its response to nitrate.

PubMed

Peng, Jing; Peng, Futian; Zhu, Chunfu; Wei, Shaochong

2008-06-01

A putative isopentenyltransferase (IPT) encoding gene was identified from a pingyitiancha (Malus hupehensis Rehd.) expressed sequence tag database, and the full-length gene was cloned by RACE. Based on expression profile and sequence alignment, the nucleotide sequence of the clone, named MhIPT3, was most similar to AtIPT3, an IPT gene in Arabidopsis. The full-length cDNA contained a 963-bp open reading frame encoding a protein of 321 amino acids with a molecular mass of 37.3 kDa. Sequence analysis of genomic DNA revealed the absence of introns in the frame. Quantitative real-time PCR analysis demonstrated that the gene was expressed in roots, stems and leaves. Application of nitrate to roots of nitrogen-deprived seedlings strongly induced expression of MhIPT3 and was accompanied by the accumulation of cytokinins, whereas MhIPT3 expression was little affected by ammonium application to roots of nitrogen-deprived seedlings. Application of nitrate to leaves also up-regulated the expression of MhIPT3 and corresponded closely with the accumulation of isopentyladenine and isopentyladenosine in leaves.
Pervasive sequence patents cover the entire human genome.

PubMed

Rosenfeld, Jeffrey A; Mason, Christopher E

2013-01-01

The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.
A 20 bp cis-acting element is both necessary and sufficient to mediate elicitor response of a maize PRms gene.

PubMed

Raventós, D; Jensen, A B; Rask, M B; Casacuberta, J M; Mundy, J; San Segundo, B

1995-01-01

Transient gene expression assays in barley aleurone protoplasts were used to identify a cis-regulatory element involved in the elicitor-responsive expression of the maize PRms gene. Analysis of transcriptional fusions between PRms 5' upstream sequences and a chloramphenicol acetyltransferase reporter gene, as well as chimeric promoters containing PRms promoter fragments or repeated oligonucleotides fused to a minimal promoter, delineated a 20 bp sequence which functioned as an elicitor-response element (ERE). This sequence contains a motif (-246 AATTGACC) similar to sequences found in promoters of other pathogen-responsive genes. The analysis also indicated that an enhancing sequence(s) between -397 and -296 is required for full PRms activation by elicitors. The protein kinase inhibitor staurosporine was found to completely block the transcriptional activation induced by elicitors. These data indicate that protein phosphorylation is involved in the signal transduction pathway leading to PRms expression.
Sequencing and analysis of 10,967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis reveals post-tetraploidization transcriptome remodeling

PubMed Central

Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.

2006-01-01

Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
Carbohydrate degrading polypeptide and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).

PubMed

Liang, Jian-Ying; Lin, Rui-Qing

2016-11-01

In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
High-resolution phylogenetic microbial community profiling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singer, Esther; Coleman-Derr, Devin; Bowman, Brett

2014-03-17

The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).

PubMed

Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E

2005-12-02

cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.

PubMed

Laird Smith, Melissa; Murrell, Ben; Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E; Kosakovsky Pond, Sergei L; Smith, Davey M

2016-07-01

The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences' Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data.
The Drosophila gene collection: Identification of putative full-length cDNAs for 70 percent of D. melanogaster genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stapleton, Mark; Liao, Guochun; Brokstein, Peter

2002-08-12

Collections of full-length nonredundant cDNA clones are critical reagents for functional genomics. The first step toward these resources is the generation and single-pass sequencing of cDNA libraries that contain a high proportion of full-length clones. The first release of the Drosophila Gene Collection Release 1 (DGCr1) was produced from six libraries representing various tissues, developmental stages, and the cultured S2 cell line. Nearly 80,000 random 5prime expressed sequence tags (EST) from these libraries were collapsed into a nonredundant set of 5849 cDNAs, corresponding to {approx}40 percent of the 13,474 predicted genes in Drosophila. To obtain cDNA clones representing the remainingmore » genes, we have generated an additional 157,835 5prime ESTs from two previously existing and three new libraries. One new library is derived from adult testis, a tissue we previously did not exploit for gene discovery; two new cap-trapped normalized libraries are derived from 0-22hr embryos and adult heads. Taking advantage of the annotated D. melanogaster genome sequence, we clustered the ESTs by aligning them to the genome. Clusters that overlap genes not already represented by cDNA clones in the DGCr1 were analyzed further, and putative full-length clones were selected for inclusion in the new DGC. This second release of the DGC (DGCr2) contains 5061 additional clones, extending the collection to 10,910 cDNAs representing >70 percent of the predicted genes in Drosophila.« less
Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

PubMed Central

Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

2015-01-01

Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
Characterization of full-length sequenced cDNA inserts (FLIcs) from Atlantic salmon (Salmo salar)

PubMed Central

Andreassen, Rune; Lunner, Sigbjørn; Høyheim, Bjørn

2009-01-01

Background Sequencing of the Atlantic salmon genome is now being planned by an international research consortium. Full-length sequenced inserts from cDNAs (FLIcs) are an important tool for correct annotation and clustering of the genomic sequence in any species. The large amount of highly similar duplicate sequences caused by the relatively recent genome duplication in the salmonid ancestor represents a particular challenge for the genome project. FLIcs will therefore be an extremely useful resource for the Atlantic salmon sequencing project. In addition to be helpful in order to distinguish between duplicate genome regions and in determining correct gene structures, FLIcs are an important resource for functional genomic studies and for investigation of regulatory elements controlling gene expression. In contrast to the large number of ESTs available, including the ESTs from 23 developmental and tissue specific cDNA libraries contributed by the Salmon Genome Project (SGP), the number of sequences where the full-length of the cDNA insert has been determined has been small. Results High quality full-length insert sequences from 560 pre-smolt white muscle tissue specific cDNAs were generated, accession numbers [GenBank: BT043497 - BT044056]. Five hundred and ten (91%) of the transcripts were annotated using Gene Ontology (GO) terms and 440 of the FLIcs are likely to contain a complete coding sequence (cCDS). The sequence information was used to identify putative paralogs, characterize salmon Kozak motifs, polyadenylation signal variation and to identify motifs likely to be involved in the regulation of particular genes. Finally, conserved 7-mers in the 3'UTRs were identified, of which some were identical to miRNA target sequences. Conclusion This paper describes the first Atlantic salmon FLIcs from a tissue and developmental stage specific cDNA library. We have demonstrated that many FLIcs contained a complete coding sequence (cCDS). This suggests that the remaining cDNA libraries generated by SGP represent a valuable cCDS FLIc source. The conservation of 7-mers in 3'UTRs indicates that these motifs are functionally important. Identity between some of these 7-mers and miRNA target sequences suggests that they are miRNA targets in Salmo salar transcripts as well. PMID:19878547
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples

PubMed Central

Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E.; Kosakovsky Pond, Sergei L.

2016-01-01

Abstract The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences’ Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data. PMID:29492273
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

PubMed Central

Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

2013-01-01

The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

PubMed Central

Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

2012-01-01

Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.

PubMed

Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory

2017-12-01

Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
Metaxa: a software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets.

PubMed

Bengtsson, Johan; Eriksson, K Martin; Hartmann, Martin; Wang, Zheng; Shenoy, Belle Damodara; Grelet, Gwen-Aëlle; Abarenkov, Kessy; Petri, Anna; Rosenblad, Magnus Alm; Nilsson, R Henrik

2011-10-01

The ribosomal small subunit (SSU) rRNA gene has emerged as an important genetic marker for taxonomic identification in environmental sequencing datasets. In addition to being present in the nucleus of eukaryotes and the core genome of prokaryotes, the gene is also found in the mitochondria of eukaryotes and in the chloroplasts of photosynthetic eukaryotes. These three sets of genes are conceptually paralogous and should in most situations not be aligned and analyzed jointly. To identify the origin of SSU sequences in complex sequence datasets has hitherto been a time-consuming and largely manual undertaking. However, the present study introduces Metaxa ( http://microbiology.se/software/metaxa/ ), an automated software tool to extract full-length and partial SSU sequences from larger sequence datasets and assign them to an archaeal, bacterial, nuclear eukaryote, mitochondrial, or chloroplast origin. Using data from reference databases and from full-length organelle and organism genomes, we show that Metaxa detects and scores SSU sequences for origin with very low proportions of false positives and negatives. We believe that this tool will be useful in microbial and evolutionary ecology as well as in metagenomics.
Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

PubMed

Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

2016-02-27

In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.

Genome-wide analysis of esterase-like genes in the striped rice stem borer, Chilo suppressalis.

PubMed

Wang, Baoju; Wang, Ying; Zhang, Yang; Han, Ping; Li, Fei; Han, Zhaojun

2015-06-01

The striped rice stem borer, Chilo suppressalis, a destructive pest of rice, has developed high levels of resistance to certain insecticides. Esterases are reported to be involved in insecticide resistance in several insects. Therefore, this study systematically analyzed esterase-like genes in C. suppressalis. Fifty-one esterase-like genes were identified in the draft genomic sequences of the species, and 20 cDNA sequences were derived which encoded full- or nearly full-length proteins. The putative esterase proteins derived from these full-length genes are overall highly diversified. However, key residues that are functionally important including the serine residue in the active site are conserved in 18 out of the 20 proteins. Phylogenetic analysis revealed that most of these genes have homologues in other lepidoptera insects. Genes CsuEst6, CsuEst10, CsuEst11, and CsuEst51 were induced by the insecticide triazophos, and genes CsuEst9, CsuEst11, CsuEst14, and CsuEst51 were induced by the insecticide chlorantraniliprole. Our results provide a foundation for future studies of insecticide resistance in C. suppressalis and for comparative research with esterase genes from other insect species.
Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

PubMed Central

Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

2015-01-01

Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.

PubMed

Fimereli, Danai; Detours, Vincent; Konopka, Tomasz

2013-04-01

High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
The Status, Quality, and Expansion of the NIH Full-Length cDNA Project: The Mammalian Gene Collection (MGC)

PubMed Central

2004-01-01

The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
Characterization and chromosomal mapping of the human TFG gene involved in thyroid carcinoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mencinger, M.; Panagopoulos, I.; Andreasson, P.

1997-05-01

Homology searches in the Expressed Sequence Tag Database were performed using SPYGQ-rich regions as query sequences to find genes encoding protein regions similar to the N-terminal parts of the sarcoma-associated EWS and FUS proteins. Clone 22911 (T74973), encoding a SPYGQ-rich region in its 5{prime} end, and several other clones that overlapped 22911 were selected. The combined data made it possible to assemble a full-length cDNA sequence. This cDNA sequence is 1677 bp, containing an initiation codon ATG, an open reading frame of 400 amino acids, a poly(A) signal, and a poly(A) tail. We found 100% identity between the 5{prime} partmore » of the consensus sequence and the 598-bp-long sequence named TFG. The TFG sequence is fused to the 3{prime} end of NTRK1, generating the TRK-T3 fusion transcript found in papillary thyroid carcinoma. The cDNA therefore represents the full-length transcript of the TFG gene. TFG was localized to 3q11-q12 by fluorescence in situ hybridization. The 3{prime} and the 5{prime} ends of the TFG cDNA probe hybridized to a 2.2-kb band on Northern blot filters in all tissues examined. 28 refs., 5 figs., 1 tab.« less
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

PubMed Central

Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

2016-01-01

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

PubMed

Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

2016-12-23

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
Preparation and properties of pure, full-length IclR protein of Escherichia coli. Use of time-of-flight mass spectrometry to investigate the problems encountered.

PubMed Central

Donald, L. J.; Chernushevich, I. V.; Zhou, J.; Verentchikov, A.; Poppe-Schriemer, N.; Hosfield, D. J.; Westmore, J. B.; Ens, W.; Duckworth, H. W.; Standing, K. G.

1996-01-01

IclR protein, the repressor of the aceBAK operon of Escherichia coli, has been examined by time-of-flight mass spectrometry, with ionization by matrix assisted laser desorption or by electrospray. The purified protein was found to have a smaller mass than that predicted from the base sequence of the cloned iclR gene. Additional measurements were made on mixtures of peptides derived from IclR by treatment with trypsin and cyanogen bromide. They showed that the amino acid sequence is that predicted from the gene sequence, except that the protein has suffered truncation by removal of the N-terminal eight or, in some cases, nine amino acid residues. The peptide bond whose hydrolysis would remove eight residues is a typical target for the E. coli protease OmpT. We find that, by taking precautions to minimize Omp T proteolysis, or by eliminating it through mutation of the host strain, we can isolate full-length IclR protein (lacking only the N-terminal methionine residue). Full-length IclR is a much better DNA-binding protein than the truncated versions: it binds the aceBAK operator sequence 44-fold more tightly, presumably because of additional contacts that the N-terminal residues make with the DNA. Our experience thus demonstrates the advantages of using mass spectrometry to characterize newly purified proteins produced from cloned genes, especially where proteolysis or other covalent modification is a concern. This technique gives mass spectra from complex peptide mixtures that can be analyzed completely, without any fractionation of the mixtures, by reference to the amino acid sequence inferred from the base sequence of the cloned gene. PMID:8844850
Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags

PubMed Central

de Souza, Sandro J.; Camargo, Anamaria A.; Briones, Marcelo R. S.; Costa, Fernando F.; Nagai, Maria Aparecida; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; de Fátima Sonati, Maria; Tajara, Eloiza H.; Valentini, Sandro R.; Acencio, Marcio; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Bengtson, Mário Henrique; Carraro, Dirce M.; Carvalho, Alex F.; Carvalho, Lúcia Helena; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Costa, Maria Cristina R.; Curcio, Cyntia; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Leite, Luciana C. C.; Maia, Gustavo; Majumder, Paromita; Marins, Mozart; Matsukuma, Adriana; Melo, Analy S. A.; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana Gilbert; Rahal, Paula; Rainho, Claudia A.; da Ro's, Nancy; de Sá, Renata G.; Sales, Magaly M.; da Silva, Neusa P.; Silva, Tereza C.; da Silva, Wilson; Simão, Daniel F.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Zalcberg, Heloisa; Brentani, Ricardo R.; Reis, Luis F. L.; Dias-Neto, Emmanuel; Simpson, Andrew J. G.

2000-01-01

Transcribed sequences in the human genome can be identified with confidence only by alignment with sequences derived from cDNAs synthesized from naturally occurring mRNAs. We constructed a set of 250,000 cDNAs that represent partial expressed gene sequences and that are biased toward the central coding regions of the resulting transcripts. They are termed ORF expressed sequence tags (ORESTES). The 250,000 ORESTES were assembled into 81,429 contigs. Of these, 1,181 (1.45%) were found to match sequences in chromosome 22 with at least one ORESTES contig for 162 (65.6%) of the 247 known genes, for 67 (44.6%) of the 150 related genes, and for 45 of the 148 (30.4%) EST-predicted genes on this chromosome. Using a set of stringent criteria to validate our sequences, we identified a further 219 previously unannotated transcribed sequences on chromosome 22. Of these, 171 were in fact also defined by EST or full length cDNA sequences available in GenBank but not utilized in the initial annotation of the first human chromosome sequence. Thus despite representing less than 15% of all expressed human sequences in the public databases at the time of the present analysis, ORESTES sequences defined 48 transcribed sequences on chromosome 22 not defined by other sequences. All of the transcribed sequences defined by ORESTES coincided with DNA regions predicted as encoding exons by genscan. (http://genes.mit.edu/GENSCAN.html). PMID:11070084
Molecular Cloning and Characterization of cDNA Encoding a Putative Stress-Induced Heat-Shock Protein from Camelus dromedarius

PubMed Central

Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.

2011-01-01

Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074
Isolation and characterization of a water stress-specific genomic gene, pwsi 18, from rice.

PubMed

Joshee, N; Kisaka, H; Kitagawa, Y

1998-01-01

One of the water stress-specific cDNA clones of rice characterised previously, wsi18, was selected for further study. The wsi18 gene can be induced by water stress conditions such as mannitol, NaCl, and dryness, but not by ABA, cold, or heat. A genomic clone for wsi18, pwsi18, contained about 1.7 kbp of the 5' upstream sequence, two introns, and the full coding sequence. The 5'-upstream sequence of pwsi18 contained putative cis-acting elements, namely an ABA-responsive element (ABRE), three G-boxes, three E-boxes, a MEF-2 sequence, four direct and two inverted repeats, and four sequences similar to DRE, which is involved in the dehydration response of Arabidopsis genes. The gusA reporter gene under the control of the pwsi18 promoter showed transient expression in response to water stress. Deletion of the downstream DRE-like sequence between the distal G-boxes-2 and -3 resulted in rather low GUS expression.
Mining new crystal protein genes from Bacillus thuringiensis on the basis of mixed plasmid-enriched genome sequencing and a computational pipeline.

PubMed

Ye, Weixing; Zhu, Lei; Liu, Yingying; Crickmore, Neil; Peng, Donghai; Ruan, Lifang; Sun, Ming

2012-07-01

We have designed a high-throughput system for the identification of novel crystal protein genes (cry) from Bacillus thuringiensis strains. The system was developed with two goals: (i) to acquire the mixed plasmid-enriched genomic sequence of B. thuringiensis using next-generation sequencing biotechnology, and (ii) to identify cry genes with a computational pipeline (using BtToxin_scanner). In our pipeline method, we employed three different kinds of well-developed prediction methods, BLAST, hidden Markov model (HMM), and support vector machine (SVM), to predict the presence of Cry toxin genes. The pipeline proved to be fast (average speed, 1.02 Mb/min for proteins and open reading frames [ORFs] and 1.80 Mb/min for nucleotide sequences), sensitive (it detected 40% more protein toxin genes than a keyword extraction method using genomic sequences downloaded from GenBank), and highly specific. Twenty-one strains from our laboratory's collection were selected based on their plasmid pattern and/or crystal morphology. The plasmid-enriched genomic DNA was extracted from these strains and mixed for Illumina sequencing. The sequencing data were de novo assembled, and a total of 113 candidate cry sequences were identified using the computational pipeline. Twenty-seven candidate sequences were selected on the basis of their low level of sequence identity to known cry genes, and eight full-length genes were obtained with PCR. Finally, three new cry-type genes (primary ranks) and five cry holotypes, which were designated cry8Ac1, cry7Ha1, cry21Ca1, cry32Fa1, and cry21Da1 by the B. thuringiensis Toxin Nomenclature Committee, were identified. The system described here is both efficient and cost-effective and can greatly accelerate the discovery of novel cry genes.
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie.

PubMed

Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol; Na, Ki-Jeong

2010-12-01

P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance.
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie

PubMed Central

Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol

2010-01-01

P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance. PMID:21113104
Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Isolation of genes from female sterile flowers in Medicago sativa.

PubMed

Capomaccio, Stefano; Barone, Pierluigi; Reale, Lara; Veronesi, Fabio; Rosellini, Daniele

2009-06-01

A better knowledge of female sporogenesis and gametogenesis could have several practical applications, from commercial hybrid seed production to gene containment in GM crops. With the purpose of isolating genes involved in the megasporogenesis process, the cDNA-AFLP technique was employed to isolate transcript-derived fragments (TDF) differentially expressed between female-fertile and female-sterile full-sib alfalfa plants. This female sterility trait involves female-specific arrest of sporogenesis at early prophase associated with ectopic, massive callose deposition within the nucellus. Ninety-six TDFs were generated and BLAST analyses revealed similarities with genes involved in different Gene Ontology categories. Three TDFs were selected based on their putative functions: showing high similarity to a soybean flower-expressed beta 1,3-glucanase, to an Arabidopsis thaliana MAPKKK, and to an A. thaliana eukaryotic initiation translation factor eIF4G III, respectively. The full length mRNA sequences were obtained. RT-PCR and in situ hybridizations were performed to confirm differential expression during flower development. The genomic organization of the three genes was assessed through sequencing and Southern experiments. Sequence polymorphisms were found between sterile and fertile plants. Our approach based on differential display and bulked segregant analysis was successful in isolating genes that were differentially expressed between fertile and sterile alfalfa plants.
Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

PubMed Central

2010-01-01

Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar), but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST) resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius) ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate. PMID:20433749
Search Engine for Antimicrobial Resistance: A Cloud Compatible Pipeline and Web Interface for Rapidly Detecting Antimicrobial Resistance Genes Directly from Sequence Data.

PubMed

Rowe, Will; Baker, Kate S; Verner-Jeffreys, David; Baker-Austin, Craig; Ryan, Jim J; Maskell, Duncan; Pearce, Gareth

2015-01-01

Antimicrobial resistance remains a growing and significant concern in human and veterinary medicine. Current laboratory methods for the detection and surveillance of antimicrobial resistant bacteria are limited in their effectiveness and scope. With the rapidly developing field of whole genome sequencing beginning to be utilised in clinical practice, the ability to interrogate sequencing data quickly and easily for the presence of antimicrobial resistance genes will become increasingly important and useful for informing clinical decisions. Additionally, use of such tools will provide insight into the dynamics of antimicrobial resistance genes in metagenomic samples such as those used in environmental monitoring. Here we present the Search Engine for Antimicrobial Resistance (SEAR), a pipeline and web interface for detection of horizontally acquired antimicrobial resistance genes in raw sequencing data. The pipeline provides gene information, abundance estimation and the reconstructed sequence of antimicrobial resistance genes; it also provides web links to additional information on each gene. The pipeline utilises clustering and read mapping to annotate full-length genes relative to a user-defined database. It also uses local alignment of annotated genes to a range of online databases to provide additional information. We demonstrate SEAR's application in the detection and abundance estimation of antimicrobial resistance genes in two novel environmental metagenomes, 32 human faecal microbiome datasets and 126 clinical isolates of Shigella sonnei. We have developed a pipeline that contributes to the improved capacity for antimicrobial resistance detection afforded by next generation sequencing technologies, allowing for rapid detection of antimicrobial resistance genes directly from sequencing data. SEAR uses raw sequencing data via an intuitive interface so can be run rapidly without requiring advanced bioinformatic skills or resources. Finally, we show that SEAR is effective in detecting antimicrobial resistance genes in metagenomic and isolate sequencing data from both environmental metagenomes and sequencing data from clinical isolates.
Full-Genome Sequence of a Reassortant H1N2 Influenza A Virus Isolated from Pigs in Brazil.

PubMed

Schmidt, Candice; Cibulski, Samuel Paulo; Muterle Varela, Ana Paula; Mengue Scheffer, Camila; Wendlant, Adrieli; Quoos Mayer, Fabiana; Lopes de Almeida, Laura; Franco, Ana Cláudia; Roehe, Paulo Michel

2014-12-18

In this study, the full-genome sequence of a reassortant H1N2 swine influenza virus is reported. The isolate has the hemagglutinin (HA) and neuraminidase (NA) genes from human lineage (H1-δ cluster and N2), and the internal genes (polymerase basic 1 [PB1], polymerase basic 2 [PB2], polymerase acidic [PA], nucleoprotein [NP], matrix [M], and nonstructural [NS]) are derived from human 2009 pandemic H1N1 (H1N1pdm09) virus. Copyright © 2014 Schmidt et al.
Cloning and characterization of full length of a novel zebrafish gene Zsrg abundantly expressed in the germline stem cells.

PubMed

Lv, Daoyuan; Song, Ping; Chen, Yungui; Gong, Wuming; Mo, Saijun

2005-04-08

Using the digital differential display program of the National Center for Biotechnology Information, we identified a contig of expression sequence tags (ESTs) (Accession No. BM316936), which came from zebrafish ovary and testis libraries. The full-length cDNA of this transcript was cloned and further confirmed by polymerase chain reaction and sequencing. The full-length cDNA of the novel gene is 807bp and encodes a novel protein of 187 amino acids, which shares no significant homology with any other known proteins. Characterization of genomic sequences of the gene revealed that it spans 6kb on the linkage group 3 and is composed of five exons and four introns. RT-PCR analysis showed that it was expressed in mature oocytes and one-cell stage, and persisted until 24h of development. RT-PCR also revealed that it is expressed in gonad and kidney, with the highest level of expression in the testis. The expression sites of the novel gene in adult gonad were further localized by in situ hybridization to oogonia and growing oocytes in ovary and to spermatogonia, spermatocytes but not to spermatids in testis. Based on its abundance in testis and the germline stem cell-spermatogonia and oogonia, we hypothesize that it may function as a testicular development and gametogenesis related gene that plays important roles in spermatogenesis, and named it Zsrg (zebrafish testis spermatogenesis related gene, Zsrg).

Familial retinoblastoma due to intronic LINE-1 insertion causes aberrant and noncanonical mRNA splicing of the RB1 gene.

PubMed

Rodríguez-Martín, Carlos; Cidre, Florencia; Fernández-Teijeiro, Ana; Gómez-Mariano, Gema; de la Vega, Leticia; Ramos, Patricia; Zaballos, Ángel; Monzón, Sara; Alonso, Javier

2016-05-01

Retinoblastoma (RB, MIM 180200) is the paradigm of hereditary cancer. Individuals harboring a constitutional mutation in one allele of the RB1 gene have a high predisposition to develop RB. Here, we present the first case of familial RB caused by a de novo insertion of a full-length long interspersed element-1 (LINE-1) into intron 14 of the RB1 gene that caused a highly heterogeneous splicing pattern of RB1 mRNA. LINE-1 insertion was inferred by mRNA studies and full-length sequenced by massive parallel sequencing. Some of the aberrant mRNAs were produced by noncanonical acceptor splice sites, a new finding that up to date has not been described to occur upon LINE-1 retrotransposition. Our results clearly show that RNA-based strategies have the potential to detect disease-causing transposon insertions. It also confirms that the incorporation of new genetic approaches, such as massive parallel sequencing, contributes to characterize at the sequence level these unique and exceptional genetic alterations.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria.

PubMed

Gaby, John Christian; Buckley, Daniel H

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria

PubMed Central

Gaby, John Christian; Buckley, Daniel H.

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

PubMed

Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

2010-03-30

The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional genomics and molecular breeding. Full-length cDNA sequences and their annotations are provided in the database KaFTom http://www.pgb.kazusa.or.jp/kaftom/ via the website of the National Bioresource Project Tomato http://tomato.nbrp.jp.
Genome sequence determination and metagenomic characterization of a Dehalococcoides mixed culture grown on cis-1,2-dichloroethene.

PubMed

Yohda, Masafumi; Yagi, Osami; Takechi, Ayane; Kitajima, Mizuki; Matsuda, Hisashi; Miyamura, Naoaki; Aizawa, Tomoko; Nakajima, Mutsuyasu; Sunairi, Michio; Daiba, Akito; Miyajima, Takashi; Teruya, Morimi; Teruya, Kuniko; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Juan, Ayaka; Nakano, Kazuma; Aoyama, Misako; Terabayashi, Yasunobu; Satou, Kazuhito; Hirano, Takashi

2015-07-01

A Dehalococcoides-containing bacterial consortium that performed dechlorination of 0.20 mM cis-1,2-dichloroethene to ethene in 14 days was obtained from the sediment mud of the lotus field. To obtain detailed information of the consortium, the metagenome was analyzed using the short-read next-generation sequencer SOLiD 3. Matching the obtained sequence tags with the reference genome sequences indicated that the Dehalococcoides sp. in the consortium was highly homologous to Dehalococcoides mccartyi CBDB1 and BAV1. Sequence comparison with the reference sequence constructed from 16S rRNA gene sequences in a public database showed the presence of Sedimentibacter, Sulfurospirillum, Clostridium, Desulfovibrio, Parabacteroides, Alistipes, Eubacterium, Peptostreptococcus and Proteocatella in addition to Dehalococcoides sp. After further enrichment, the members of the consortium were narrowed down to almost three species. Finally, the full-length circular genome sequence of the Dehalococcoides sp. in the consortium, D. mccartyi IBARAKI, was determined by analyzing the metagenome with the single-molecule DNA sequencer PacBio RS. The accuracy of the sequence was confirmed by matching it to the tag sequences obtained by SOLiD 3. The genome is 1,451,062 nt and the number of CDS is 1566, which includes 3 rRNA genes and 47 tRNA genes. There exist twenty-eight RDase genes that are accompanied by the genes for anchor proteins. The genome exhibits significant sequence identity with other Dehalococcoides spp. throughout the genome, but there exists significant difference in the distribution RDase genes. The combination of a short-read next-generation DNA sequencer and a long-read single-molecule DNA sequencer gives detailed information of a bacterial consortium. Copyright © 2014 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

PubMed

Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

2004-01-01

Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Isolation and characterization of full-length putative alcohol dehydrogenase genes from polygonum minus

NASA Astrophysics Data System (ADS)

Hamid, Nur Athirah Abd; Ismail, Ismanizan

2013-11-01

Polygonum minus, locally named as Kesum is an aromatic herb which is high in secondary metabolite content. Alcohol dehydrogenase is an important enzyme that catalyzes the reversible oxidation of alcohol and aldehyde with the presence of NAD(P)(H) as co-factor. The main focus of this research is to identify the gene of ADH. The total RNA was extracted from leaves of P. minus which was treated with 150 μM Jasmonic acid. Full-length cDNA sequence of ADH was isolated via rapid amplification cDNA end (RACE). Subsequently, in silico analysis was conducted on the full-length cDNA sequence and PCR was done on genomic DNA to determine the exon and intron organization. Two sequences of ADH, designated as PmADH1 and PmADH2 were successfully isolated. Both sequences have ORF of 801 bp which encode 266 aa residues. Nucleotide sequence comparison of PmADH1 and PmADH2 indicated that both sequences are highly similar at the ORF region but divergent in the 3' untranslated regions (UTR). The amino acid is differ at the 107 residue; PmADH1 contains Gly (G) residue while PmADH2 contains Cys (C) residue. The intron-exon organization pattern of both sequences are also same, with 3 introns and 4 exons. Based on in silico analysis, both sequences contain "classical" short chain alcohol dehydrogenases/reductases ((c) SDRs) conserved domain. The results suggest that both sequences are the members of short chain alcohol dehydrogenase family.
Rhipicephalus (Boophilus) microplus strain Deutsch, whole genome shotgun sequencing project first submission of genome sequence

USDA-ARS?s Scientific Manuscript database

The size and repetitive nature of the Rhipicephalus microplus genome makes obtaining a full genome sequence difficult. Cot filtration/selection techniques were used to reduce the repetitive fraction of the tick genome and enrich for the fraction of DNA with gene-containing regions. The Cot-selected ...
Genetic heterogeneity of the dnaK gene locus including transcription terminator region (TTR) in Campylobacter lari.

PubMed

Shitara, M; Tsuboi, Y; Sekizuka, T; Tazumi, A; Moorei, J E; Millar, B C; Taneike, I; Matsuda, M

2008-01-01

Nucleotide sequences of approximately 3.1 kbp consisting of the full-length open reading frame (ORF) for grpE, a non-coding (NC) region and a putative ORF for the full-length dnaK gene (1860 bp) were identified from a urease-positive thermophilic Campylobacter (UPTC) CF89-12 isolate. Then, following the construction of a new degenerate polymerase chain reaction (PCR) primer pair for amplification of the dnaK structural gene, including the transcription terminator region of C. lari isolates, the dnaK region was amplified successfully, TA-cloned and sequenced in nine C. lari isolates. The dnaK gene sequences commenced with an ATG and terminated with a TAA in all 10 isolates, including CF89-12. In addition, the putative ORFs for the dnaK gene locus from seven UPTC isolates consisted of 1860 bases, and the four urease-negative (UN) C. lari isolates included C. lari RM2100 reference strain 1866. Interestingly, different probable ribosome binding sites and hypothetically intrinsic p-independent terminator structures were identified between the seven UPTC and four UN C. lari isolates, respectively. Moreover, it is interesting to note that 20 out of a total of 28 polymorphic sites occurred among amino acid sequences of the dnaK ORF from 11 C. lari isolates, identified to be alternatively UPTC-specific or UN C. lari-specific. In the neighbour-joining tree based on the nucleotide sequence information of the dnaK gene, C. lari forms two major distinct clusters consisting of UPTC and UN C. lari isolates, respectively, with UN C. lari being more closely related to other thermophilic campylobacters than to UPTC.
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

PubMed

Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

2017-10-06

Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Application of High-Throughput Next-Generation Sequencing for HLA Typing on Buccal Extracted DNA: Results from over 10,000 Donor Recruitment Samples

PubMed Central

Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F.; Zhang, Qiuheng

2016-01-01

Background Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Methods Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3’ UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Results Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Conclusion Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation. PMID:27798706
Application of High-Throughput Next-Generation Sequencing for HLA Typing on Buccal Extracted DNA: Results from over 10,000 Donor Recruitment Samples.

PubMed

Yin, Yuxin; Lan, James H; Nguyen, David; Valenzuela, Nicole; Takemura, Ping; Bolon, Yung-Tsi; Springer, Brianna; Saito, Katsuyuki; Zheng, Ying; Hague, Tim; Pasztor, Agnes; Horvath, Gyorgy; Rigo, Krisztina; Reed, Elaine F; Zhang, Qiuheng

2016-01-01

Unambiguous HLA typing is important in hematopoietic stem cell transplantation (HSCT), HLA disease association studies, and solid organ transplantation. However, current molecular typing methods only interrogate the antigen recognition site (ARS) of HLA genes, resulting in many cis-trans ambiguities that require additional typing methods to resolve. Here we report high-resolution HLA typing of 10,063 National Marrow Donor Program (NMDP) registry donors using long-range PCR by next generation sequencing (NGS) approach on buccal swab DNA. Multiplex long-range PCR primers amplified the full-length of HLA class I genes (A, B, C) from promotor to 3' UTR. Class II genes (DRB1, DQB1) were amplified from exon 2 through part of exon 4. PCR amplicons were pooled and sheared using Covaris fragmentation. Library preparation was performed using the Illumina TruSeq Nano kit on the Beckman FX automated platform. Each sample was tagged with a unique barcode, followed by 2×250 bp paired-end sequencing on the Illumina MiSeq. HLA typing was assigned using Omixon Twin software that combines two independent computational algorithms to ensure high confidence in allele calling. Consensus sequence and typing results were reported in Histoimmunogenetics Markup Language (HML) format. All homozygous alleles were confirmed by Luminex SSO typing and exon novelties were confirmed by Sanger sequencing. Using this automated workflow, over 10,063 NMDP registry donors were successfully typed under high-resolution by NGS. Despite known challenges of nucleic acid degradation and low DNA concentration commonly associated with buccal-based specimens, 97.8% of samples were successfully amplified using long-range PCR. Among these, 98.2% were successfully reported by NGS, with an accuracy rate of 99.84% in an independent blind Quality Control audit performed by the NDMP. In this study, NGS-HLA typing identified 23 null alleles (0.023%), 92 rare alleles (0.091%) and 42 exon novelties (0.042%). Long-range, unambiguous HLA genotyping is achievable on clinical buccal swab-extracted DNA. Importantly, full-length gene sequencing and the ability to curate full sequence data will permit future interrogation of the impact of introns, expanded exons, and other gene regulatory sequences on clinical outcomes in transplantation.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.

2000-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.

2001-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Informatic and genomic analysis of melanocyte cDNA libraries as a resource for the study of melanocyte development and function.

PubMed

Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J

2007-06-01

As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Human AZU-1 gene, variants thereof and expressed gene products

DOEpatents

Chen, Huei-Mei; Bissell, Mina

2004-06-22

A human AZU-1 gene, mutants, variants and fragments thereof. Protein products encoded by the AZU-1 gene and homologs encoded by the variants of AZU-1 gene acting as tumor suppressors or markers of malignancy progression and tumorigenicity reversion. Identification, isolation and characterization of AZU-1 and AZU-2 genes localized to a tumor suppressive locus at chromosome 10q26, highly expressed in nonmalignant and premalignant cells derived from a human breast tumor progression model. A recombinant full length protein sequences encoded by the AZU-1 gene and nucleotide sequences of AZU-1 and AZU-2 genes and variant and fragments thereof. Monoclonal or polyclonal antibodies specific to AZU-1, AZU-2 encoded protein and to AZU-1, or AZU-2 encoded protein homologs.
Species identification of mutans streptococci by groESL gene sequence.

PubMed

Hung, Wei-Chung; Tsai, Jui-Chang; Hsueh, Po-Ren; Chia, Jean-San; Teng, Lee-Jene

2005-09-01

The near full-length sequences of the groESL genes were determined and analysed among eight reference strains (serotypes a to h) representing five species of mutans group streptococci. The groES sequences from these reference strains revealed that there are two lengths (285 and 288 bp) in the five species. The intergenic spacer between groES and groEL appears to be a unique marker for species, with a variable size (ranging from 111 to 310 bp) and sequence. Phylogenetic analysis of groES and groEL separated the eight serotypes into two major clusters. Strains of serotypes b, c, e and f were highly related and had groES gene sequences of the same length, 288 bp, while strains of serotypes a, d, g and h were also closely related and their groES gene sequence lengths were 285 bp. The groESL sequences in clinical isolates of three serotypes of S. mutans were analysed for intraspecies polymorphism. The results showed that the groESL sequences could provide information for differentiation among species, but were unable to distinguish serotypes of the same species. Based on the determined sequences, a PCR assay was developed that could differentiate members of the mutans streptococci by amplicon size and provide an alternative way for distinguishing mutans streptococci from other viridans streptococci.
Inner retinal dystrophy in a patient with biallelic sequence variants in BRAT1.

PubMed

Oatts, Julius T; Duncan, Jacque L; Hoyt, Creig S; Slavotinek, Anne M; Moore, Anthony T

2017-12-01

Mutations in the BRCA1-associated protein required for the ataxia telangiectasia mutated (ATM) activation-1 (BRAT1) gene cause lethal neonatal rigidity and multifocal seizure syndrome characterized by rigidity and intractable seizures and a milder phenotype with intellectual disability, seizures, nonprogressive cerebellar ataxia or dyspraxia, and cerebellar atrophy. To date, nystagmus, cortical visual impairment, impairment of central vision, optic nerve hypoplasia, and optic atrophy have been described in this condition. This article describes the retinal findings in a patient with biallelic deleterious sequence variants in BRAT1. Case report of a child with biallelic sequence variants in the BRAT1 gene. This patient had developmental delay, microcephaly, nystagmus, and esotropia, and full-field electroretinography (ERG) revealed an inner retinal dystrophy. She was found on exome sequencing to have compound heterozygous sequence variants in the BRAT1 gene: one maternally inherited frameshift variant (c.294dupA, predicting p.Leu99Thrfs*92), which has previously been reported, and one paternally inherited novel missense variant (c.803G>A, p.Arg268His), which is likely to affect protein function. Biallelic sequence variants in BRAT1 have been reported to cause a variety of ocular and systemic manifestations, but to our knowledge, this is the first report of inner retinal dysfunction manifest as selective loss of full-field ERG scotopic and photopic b-wave amplitudes.
Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris.

PubMed

Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth; Welch, Kenneth C; Timp, Winston

2018-03-01

Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism.
Full-Genome Sequence of Infectious Laryngotracheitis Virus (Gallid Alphaherpesvirus 1) Strain VFAR-043, Isolated in Peru

PubMed Central

Bendezu Eguis, Jorge; Montesinos, Ricardo; Fernández-Díaz, Manolo

2018-01-01

ABSTRACT We report here the first genome sequence of infectious laryngotracheitis virus isolated in Peru from tracheal tissues of layer chickens. The genome showed 99.98% identity to the J2 strain genome sequence. Single nucleotide polymorphisms were detected in five gene-coding sequences related to vaccine development, virus attachment, and viral immune evasion. PMID:29519822

Isolation of a full-length CC-NBS-LRR resistance gene analog candidate from sugar pine showing low nucleotide diversity.

Treesearch

K.D. Jermstad; L.A. Sheppard; B.B. Kinloch; A. Delfino-Mix; E.S. Ersoz; K.V. Krutovsky; D.B Neale

2006-01-01

The nucleotide-binding-site and leucine-rich-repeat (NBSâLRR) class of R proteins is abundant and widely distributed in plants. By using degenerate primers designed on the NBS domain in lettuce, we amplified sequences in sugar pine that shared sequence identity with many of the NBSâLRR class resistance genes catalogued in GenBank. The polymerase chain reaction products...
Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

PubMed Central

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
Evaluation of vector-primed cDNA library production from microgram quantities of total RNA.

PubMed

Kuo, Jonathan; Inman, Jason; Brownstein, Michael; Usdin, Ted B

2004-12-15

cDNA sequences are important for defining the coding region of genes, and full-length cDNA clones have proven to be useful for investigation of the function of gene products. We produced cDNA libraries containing 3.5-5 x 10(5) primary transformants, starting with 5 mug of total RNA prepared from mouse pituitary, adrenal, thymus, and pineal tissue, using a vector-primed cDNA synthesis method. Of approximately 1000 clones sequenced, approximately 20% contained the full open reading frames (ORFs) of known transcripts, based on the presence of the initiating methionine residue codon. The libraries were complex, with 94, 91, 83 and 55% of the clones from the thymus, adrenal, pineal and pituitary libraries, respectively, represented only once. Twenty-five full-length clones, not yet represented in the Mammalian Gene Collection, were identified. Thus, we have produced useful cDNA libraries for the isolation of full-length cDNA clones that are not yet available in the public domain, and demonstrated the utility of a simple method for making high-quality libraries from small amounts of starting material.
Isolation and in silico analysis of a novel H+-pyrophosphatase gene orthologue from the halophytic grass Leptochloa fusca

NASA Astrophysics Data System (ADS)

Rauf, Muhammad; Saeed, Nasir A.; Habib, Imran; Ahmed, Moddassir; Shahzad, Khurram; Mansoor, Shahid; Ali, Rashid

2017-02-01

Structure prediction can provide information about function and active sites of protein which helps to design new functional proteins. H+-pyrophosphatase is transmembrane protein involved in establishing proton motive force for active transport of Na+ across membrane by Na+/H+ antiporters. A full length novel H+-pyrophosphatase gene was isolated from halophytic grass Leptochloa fusca using RT-PCR and RACE method. Full length LfVP1 gene sequence of 2292 nucleotides encodes protein of 764 amino acids. DNA and protein sequences were used for characterization using bioinformatics tools. Various important potential sites were predicted by PROSITE webserver. Primary structural analysis showed LfVP1 as stable protein and Grand average hydropathy (GRAVY) indicated that LfVP1 protein has good hydrosolubility. Secondary structure analysis showed that LfVP1 protein sequence contains significant proportion of alpha helix and random coil. Protein membrane topology suggested the presence of 14 transmembrane domains and presence of catalytic domain in TM3. Three dimensional structure from LfVP1 protein sequence also indicated the presence of 14 transmembrane domains and hydrophobicity surface model showed amino acid hydrophobicity. Ramachandran plot showed that 98% amino acid residues were predicted in the favored region.
Molecular diversity and evolutionary history of rabies virus strains circulating in the Balkans.

PubMed

McElhinney, L M; Marston, D A; Freuling, C M; Cragg, W; Stankov, S; Lalosevic, D; Lalosevic, V; Müller, T; Fooks, A R

2011-09-01

Molecular studies of European classical rabies viruses (RABV) have revealed a number of geographically clustered lineages. To study the diversity of Balkan RABV, partial nucleoprotein (N) gene sequences were analysed from a unique panel of isolates (n = 210), collected from various hosts between 1972 and 2006. All of the Balkan isolates grouped within the European/Middle East Lineage, with the majority most closely related to East European strains. A number of RABV from Bosnia & Herzegovina and Montenegro, collected between 1986 and 2006, grouped with the West European strains, believed to be responsible for the rabies epizootic that spread throughout Europe in the latter half of the 20th Century. In contrast, no Serbian RABV belonged to this sublineage. However, a distinct group of Serbian fox RABV provided further evidence for the southwards wildlife-mediated movement of rabies from Hungary, Romania and Serbia into Bulgaria. To determine the optimal region for evolutionary analysis, partial, full and concatenated N-gene and glycoprotein (G) gene sequences were compared. Whilst both the divergence times and evolutionary rates were similar irrespective of genomic region, the 95 % highest probability density (HPD) limits were significantly reduced for full N-gene and concatenated NG-gene sequences compared with partial gene sequences. Bayesian coalescent analysis estimated the date of the most common recent ancestor of the Balkan RABV to be 1885 (95 % HPD, 1852-1913), and skyline plots suggested an expansion of the local viral population in 1980-1990, which coincides with the observed emergence of fox rabies in the region.
Characterization of capsaicin synthase and identification of its gene (csy1) for pungency factor capsaicin in pepper (Capsicum sp.)

PubMed Central

Prasad, B. C. Narasimha; Kumar, Vinod; Gururaj, H. B.; Parimalan, R.; Giridhar, P.; Ravishankar, G. A.

2006-01-01

Capsaicin is a unique alkaloid of the plant kingdom restricted to the genus Capsicum. Capsaicin is the pungency factor, a bioactive molecule of food and of medicinal importance. Capsaicin is useful as a counterirritant, antiarthritic, analgesic, antioxidant, and anticancer agent. Capsaicin biosynthesis involves condensation of vanillylamine and 8-methyl nonenoic acid, brought about by capsaicin synthase (CS). We found that CS activity correlated with genotype-specific capsaicin levels. We purified and characterized CS (≈35 kDa). Immunolocalization studies confirmed that CS is specifically localized to the placental tissues of Capsicum fruits. Western blot analysis revealed concomitant enhancement of CS levels and capsaicin accumulation during fruit development. We determined the N-terminal amino acid sequence of purified CS, cloned the CS gene (csy1) and sequenced full-length cDNA (981 bp). The deduced amino acid sequence of CS from full-length cDNA was 38 kDa. Functionality of csy1 through heterologous expression in recombinant Escherichia coli was also demonstrated. Here we report the gene responsible for capsaicin biosynthesis, which is unique to Capsicum spp. With this information on the CS gene, speculation on the gene for pungency is unequivocally resolved. Our findings have implications in the regulation of capsaicin levels in Capsicum genotypes. PMID:16938870
IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing

PubMed Central

Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai

2017-01-01

Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. PMID:27899656
Genetic diversity and natural selection of Plasmodium knowlesi merozoite surface protein 1 paralog gene in Malaysia.

PubMed

Ahmed, Md Atique; Fauzi, Muh; Han, Eun-Taek

2018-03-14

Human infections due to the monkey malaria parasite Plasmodium knowlesi is on the rise in most Southeast Asian countries specifically Malaysia. The C-terminal 19 kDa domain of PvMSP1P is a potential vaccine candidate, however, no study has been conducted in the orthologous gene of P. knowlesi. This study investigates level of polymorphisms, haplotypes and natural selection of full-length pkmsp1p in clinical samples from Malaysia. A total of 36 full-length pkmsp1p sequences along with the reference H-strain and 40 C-terminal pkmsp1p sequences from clinical isolates of Malaysia were downloaded from published genomes. Genetic diversity, polymorphism, haplotype and natural selection were determined using DnaSP 5.10 and MEGA 5.0 software. Genealogical relationships were determined using haplotype network tree in NETWORK software v5.0. Population genetic differentiation index (F ST ) and population structure of parasite was determined using Arlequin v3.5 and STRUCTURE v2.3.4 software. Comparison of 36 full-length pkmsp1p sequences along with the H-strain identified 339 SNPs (175 non-synonymous and 164 synonymous substitutions). The nucleotide diversity across the full-length gene was low compared to its ortholog pvmsp1p. The nucleotide diversity was higher toward the N-terminal domains (pkmsp1p-83 and 30) compared to the C-terminal domains (pkmsp1p-38, 33 and 19). Phylogenetic analysis of full-length genes identified 2 distinct clusters of P. knowlesi from Malaysian Borneo. The 40 pkmsp1p-19 sequences showed low polymorphisms with 16 polymorphisms leading to 18 haplotypes. In total there were 10 synonymous and 6 non-synonymous substitutions and 12 cysteine residues were intact within the two EGF domains. Evidence of strong purifying selection was observed within the full-length sequences as well in all the domains. Shared haplotypes of 40 pkmsp1p-19 were identified within Malaysian Borneo haplotypes. This study is the first to report on the genetic diversity and natural selection of pkmsp1p. A low level of genetic diversity and strong evidence of negative selection was detected and observed in all the domains of pkmsp1p of P. knowlesi indicating functional constrains. Shared haplotypes were identified within pkmsp1p-19 highlighting further evaluation using larger number of clinical samples from Malaysia.
Phylotranscriptomic consolidation of the jawed vertebrate timetree.

PubMed

Irisarri, Iker; Baurain, Denis; Brinkmann, Henner; Delsuc, Frédéric; Sire, Jean-Yves; Kupfer, Alexander; Petersen, Jörn; Jarek, Michael; Meyer, Axel; Vences, Miguel; Philippe, Hervé

2017-09-01

Phylogenomics is extremely powerful but introduces new challenges as no agreement exists on "standards" for data selection, curation and tree inference. We use jawed vertebrates (Gnathostomata) as model to address these issues. Despite considerable efforts in resolving their evolutionary history and macroevolution, few studies have included a full phylogenetic diversity of gnathostomes and some relationships remain controversial. We tested a novel bioinformatic pipeline to assemble large and accurate phylogenomic datasets from RNA sequencing and find this phylotranscriptomic approach successful and highly cost-effective. Increased sequencing effort up to ca. 10Gbp allows recovering more genes, but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of full-length orthologous transcripts. We reconstruct a robust and strongly supported timetree of jawed vertebrates using 7,189 nuclear genes from 100 taxa, including 23 new transcriptomes from previously unsampled key species. Gene jackknifing of genomic data corroborates the robustness of our tree and allows calculating genome-wide divergence times by overcoming gene sampling bias. Mitochondrial genomes prove insufficient to resolve the deepest relationships because of limited signal and among-lineage rate heterogeneity. Our analyses emphasize the importance of large curated nuclear datasets to increase the accuracy of phylogenomics and provide a reference framework for the evolutionary history of jawed vertebrates.
Mining biological databases for candidate disease genes

NASA Astrophysics Data System (ADS)

Braun, Terry A.; Scheetz, Todd; Webster, Gregg L.; Casavant, Thomas L.

2001-07-01

The publicly-funded effort to sequence the complete nucleotide sequence of the human genome, the Human Genome Project (HGP), has currently produced more than 93% of the 3 billion nucleotides of the human genome into a preliminary `draft' format. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the sequencing of model organisms (rat, mouse, fly, and others), gene discovery projects (ESTs and full-length), and new technologies such as expression analysis and resources (micro-arrays or gene chips). These resources are invaluable for the researchers identifying the functional genes of the genome that transcribe and translate into the transcriptome and proteome, both of which potentially contain orders of magnitude more complexity than the genome itself. Preliminary analyses of this data identified approximately 30,000 - 40,000 human `genes.' However, the bulk of the effort still remains -- to identify the functional and structural elements contained within the transcriptome and proteome, and to associate function in the transcriptome and proteome to genes. A fortuitous consequence of the HGP is the existence of hundreds of databases containing biological information that may contain relevant data pertaining to the identification of disease-causing genes. The task of mining these databases for information on candidate genes is a commercial application of enormous potential. We are developing a system to acquire and mine data from specific databases to aid our efforts to identify disease genes. A high speed cluster of Linux of workstations is used to analyze sequence and perform distributed sequence alignments as part of our data mining and processing. This system has been used to mine GeneMap99 sequences within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedle Syndrome (BBS).
Dehydration-induced tps gene transcripts from an anhydrobiotic nematode contain novel spliced leaders and encode atypical GT-20 family proteins.

PubMed

Goyal, K; Browne, J A; Burnell, A M; Tunnacliffe, A

2005-06-01

Accumulation of the non-reducing disaccharide trehalose is associated with desiccation tolerance during anhydrobiosis in a number of invertebrates, but there is little information on trehalose biosynthetic genes in these organisms. We have identified two trehalose-6-phosphate synthase (tps) genes in the anhydrobiotic nematode Aphelenchus avenae and determined full length cDNA sequences for both; for comparison, full length tps cDNAs from the model nematode, Caenorhabditis elegans, have also been obtained. The A. avenae genes encode very similar proteins containing the catalytic domain characteristic of the GT-20 family of glycosyltransferases and are most similar to tps-2 of C. elegans; no evidence was found for a gene in A. avenae corresponding to Ce-tps-1. Analysis of A. avenae tps cDNAs revealed several features of interest, including alternative trans-splicing of spliced leader sequences in Aav-tps-1, and four different, novel SL1-related trans-spliced leaders, which were different to the canonical SL1 sequence found in all other nematodes studied. The latter observation suggests that A. avenae does not comply with the strict evolutionary conservation of SL1 sequences observed in other species. Unusual features were also noted in predicted nematode TPS proteins, which distinguish them from homologues in other higher eukaryotes (plants and insects) and in micro-organisms. Phylogenetic analysis confirmed their membership of the GT-20 glycosyltransferase family, but indicated an accelerated rate of molecular evolution. Furthermore, nematode TPS proteins possess N- and C-terminal domains, which are unrelated to those of other eukaryotes: nematode C-terminal domains, for example, do not contain trehalose-6-phosphate phosphatase-like sequences, as seen in plant and insect homologues. During onset of anhydrobiosis, both tps genes in A. avenae are upregulated, but exposure to cold or increased osmolarity also results in gene induction, although to a lesser extent. Trehalose seems likely therefore to play a role in a number of stress responses in nematodes.
The Role of 16S rRNA Gene Sequencing in Identification of Microorganisms Misidentified by Conventional Methods

PubMed Central

Petti, C. A.; Polage, C. R.; Schreckenberger, P.

2005-01-01

Traditional methods for microbial identification require the recognition of differences in morphology, growth, enzymatic activity, and metabolism to define genera and species. Full and partial 16S rRNA gene sequencing methods have emerged as useful tools for identifying phenotypically aberrant microorganisms. We report on three bacterial blood isolates from three different College of American Pathologists-certified laboratories that were referred to ARUP Laboratories for definitive identification. Because phenotypic identification suggested unusual organisms not typically associated with the submitted clinical diagnosis, consultation with the Medical Director was sought and further testing was performed including partial 16S rRNA gene sequencing. All three patients had endocarditis, and conventional methods identified isolates from patients A, B, and C as a Facklamia sp., Eubacterium tenue, and a Bifidobacterium sp. 16S rRNA gene sequencing identified the isolates as Enterococcus faecalis, Cardiobacterium valvarum, and Streptococcus mutans, respectively. We conclude that the initial identifications of these three isolates were erroneous, may have misled clinicians, and potentially impacted patient care. 16S rRNA gene sequencing is a more objective identification tool, unaffected by phenotypic variation or technologist bias, and has the potential to reduce laboratory errors. PMID:16333109
The Early ANTP Gene Repertoire: Insights from the Placozoan Genome

PubMed Central

Schierwater, Bernd; Kamm, Kai; Srivastava, Mansi; Rokhsar, Daniel; Rosengarten, Rafael D.; Dellaporta, Stephen L.

2008-01-01

The evolution of ANTP genes in the Metazoa has been the subject of conflicting hypotheses derived from full or partial gene sequences and genomic organization in higher animals. Whole genome sequences have recently filled in some crucial gaps for the basal metazoan phyla Cnidaria and Porifera. Here we analyze the complete genome of Trichoplax adhaerens, representing the basal metazoan phylum Placozoa, for its set of ANTP class genes. The Trichoplax genome encodes representatives of Hox/ParaHox-like, NKL, and extended Hox genes. This repertoire possibly mirrors the condition of a hypothetical cnidarian-bilaterian ancestor. The evolution of the cnidarian and bilaterian ANTP gene repertoires can be deduced by a limited number of cis-duplications of NKL and “extended Hox” genes and the presence of a single ancestral “ProtoHox” gene. PMID:18716659
Conifer R2R3-MYB transcription factors: sequence analyses and gene expression in wood-forming tissues of white spruce (Picea glauca)

PubMed Central

Bedon, Frank; Grima-Pettenati, Jacqueline; Mackay, John

2007-01-01

Background Several members of the R2R3-MYB family of transcription factors act as regulators of lignin and phenylpropanoid metabolism during wood formation in angiosperm and gymnosperm plants. The angiosperm Arabidopsis has over one hundred R2R3-MYBs genes; however, only a few members of this family have been discovered in gymnosperms. Results We isolated and characterised full-length cDNAs encoding R2R3-MYB genes from the gymnosperms white spruce, Picea glauca (13 sequences), and loblolly pine, Pinus taeda L. (five sequences). Sequence similarities and phylogenetic analyses placed the spruce and pine sequences in diverse subgroups of the large R2R3-MYB family, although several of the sequences clustered closely together. We searched the highly variable C-terminal region of diverse plant MYBs for conserved amino acid sequences and identified 20 motifs in the spruce MYBs, nine of which have not previously been reported and three of which are specific to conifers. The number and length of the introns in spruce MYB genes varied significantly, but their positions were well conserved relative to angiosperm MYB genes. Quantitative RTPCR of MYB genes transcript abundance in root and stem tissues revealed diverse expression patterns; three MYB genes were preferentially expressed in secondary xylem, whereas others were preferentially expressed in phloem or were ubiquitous. The MYB genes expressed in xylem, and three others, were up-regulated in the compression wood of leaning trees within 76 hours of induction. Conclusion Our survey of 18 conifer R2R3-MYB genes clearly showed a gene family structure similar to that of Arabidopsis. Three of the sequences are likely to play a role in lignin metabolism and/or wood formation in gymnosperm trees, including a close homolog of the loblolly pine PtMYB4, shown to regulate lignin biosynthesis in transgenic tobacco. PMID:17397551
Whole genome sequences of Japanese porcine species C rotaviruses reveal a high diversity of genotypes of individual genes and will contribute to a comprehensive, generally accepted classification system.

PubMed

Niira, Kazutaka; Ito, Mika; Masuda, Tsuneyuki; Saitou, Toshiya; Abe, Tadatsugu; Komoto, Satoshi; Sato, Mitsuo; Yamasato, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Tuchiaka, Shinobu; Okada, Takashi; Omatsu, Tsutomu; Furuya, Tetsuya; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Shirai, Junsuke; Taniguchi, Koki; Mizutani, Tetsuya; Nagai, Makoto

2016-10-01

Porcine rotavirus C (RVC) is distributed throughout the world and is thought to be a pathogenic agent of diarrhea in piglets. Although, the VP7, VP4, and VP6 gene sequences of Japanese porcine RVCs are currently available, there is no whole-genome sequence data of Japanese RVC. Furthermore, only one to three sequences are available for porcine RVC VP1-VP3 and NSP1-NSP3 genes. Therefore, we determined nearly full-length whole-genome sequences of nine Japanese porcine RVCs from seven piglets with diarrhea and two healthy pigs and compared them with published RVC sequences from a database. The VP7 genes of two Japanese RVCs from healthy pigs were highly divergent from other known RVC strains and were provisionally classified as G12 and G13 based on the 86% nucleotide identity cut-off value. Pairwise sequence identity calculations and phylogenetic analyses revealed that candidate novel genotypes of porcine Japanese RVC were identified in the NSP1, NSP2 and NSP3 encoding genes, respectively. Furthermore, VP3 of Japanese porcine RVCs was shown to be closely related to human RVCs, suggesting a gene reassortment event between porcine and human RVCs and past interspecies transmission. The present study demonstrated that porcine RVCs show greater genetic diversity among strains than human and bovine RVCs. Copyright © 2016 Elsevier B.V. All rights reserved.
A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

PubMed Central

Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

2015-01-01

A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules.

PubMed

Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

2015-01-01

A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Differentially expressed genes of Coptotermes formosanus (Isoptera: Rhinotermitidae) challenged by chemical insecticides.

PubMed

Zhang, Yi; Zhao, Yuanyuan; Qiu, Xuehong; Han, Richou

2013-08-01

Coptotermes formosanus Shiraki (Isoptera: Rhinotermitidae) termites are harmful social insects to wood constructions. The current control methods heavily depend on the chemical insecticides with increasing resistance. Analysis of the differentially expressed genes mediated by chemical insecticides will contribute to the understanding of the termite resistance to chemicals and to the establishment of alternative control measures. In the present article, a full-length cDNA library was constructed from the termites induced by a mixture of commonly used insecticides (0.01% sulfluramid and 0.01% triflumuron) for 24 h, by using the RNA ligase-mediated Rapid Amplification cDNA End method. Fifty-eight differentially expressed clones were obtained by polymerase chain reaction and confirmed by dot-blot hybridization. Forty-six known sequences were obtained, which clustered into 33 unique sequences grouped in 6 contigs and 27 singlets. Sixty-seven percent (22) of the sequences had counterpart genes from other organisms, whereas 33% (11) were undescribed. A Gene Ontology analysis classified 33 unique sequences into different functional categories. In general, most of the differential expression genes were involved in binding and catalytic activity.
Enrichment of individual KIR2DL4 sequences from genomic DNA using long-template PCR and allele-specific hybridization to magnetic bead-bound oligonucleotide probes.

PubMed

Roberts, C H; Turino, C; Madrigal, J A; Marsh, S G E

2007-06-01

DNA enrichment by allele-specific hybridization (DEASH) was used as a means to isolate individual alleles of the killer cell immunoglobulin-like receptor (KIR2DL4) gene from heterozygous genomic DNA. Using long-template polymerase chain reaction (LT-PCR), the complete KIR2DL4 gene was amplified from a cell line that had previously been characterized for its KIR gene content by PCR using sequence-specific primers (PCR-SSP). The whole gene amplicons were sequenced and we identified two heterozygous positions in accordance with the predictions of the PCR-SSP. The amplicons were then hybridized to allele-specific, biotinylated oligonucleotide probes and through binding to streptavidin-coated beads, the targeted alleles were enriched. A second PCR amplified only the exonic regions of the enriched allele, and these were then sequenced in full. We show DEASH to be capable of enriching single alleles from a heterozygous PCR product, and through sequencing the enriched DNA, we are able to produce complete coding sequences of the KIR2DL4 alleles in accordance with the typing predicted by PCR-SSP.
Characterization of the two intra-individual sequence variants in the 18S rRNA gene in the plant parasitic nematode, Rotylenchulus reniformis.

PubMed

Nyaku, Seloame T; Sripathi, Venkateswara R; Kantety, Ramesh V; Gu, Yong Q; Lawrence, Kathy; Sharma, Govind C

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene.

Characterization of the Two Intra-Individual Sequence Variants in the 18S rRNA Gene in the Plant Parasitic Nematode, Rotylenchulus reniformis

PubMed Central

Nyaku, Seloame T.; Sripathi, Venkateswara R.; Kantety, Ramesh V.; Gu, Yong Q.; Lawrence, Kathy; Sharma, Govind C.

2013-01-01

The 18S rRNA gene is fundamental to cellular and organismal protein synthesis and because of its stable persistence through generations it is also used in phylogenetic analysis among taxa. Sequence variation in this gene within a single species is rare, but it has been observed in few metazoan organisms. More frequently it has mostly been reported in the non-transcribed spacer region. Here, we have identified two sequence variants within the near full coding region of 18S rRNA gene from a single reniform nematode (RN) Rotylenchulus reniformis labeled as reniform nematode variant 1 (RN_VAR1) and variant 2 (RN_VAR2). All sequences from three of the four isolates had both RN variants in their sequences; however, isolate 13B had only RN variant 2 sequence. Specific variable base sites (96 or 5.5%) were found within the 18S rRNA gene that can clearly distinguish the two 18S rDNA variants of RN, in 11 (25.0%) and 33 (75.0%) of the 44 RN clones, for RN_VAR1 and RN_VAR2, respectively. Neighbor-joining trees show that the RN_VAR1 is very similar to the previously existing R. reniformis sequence in GenBank, while the RN_VAR2 sequence is more divergent. This is the first report of the identification of two major variants of the 18S rRNA gene in the same single RN, and documents the specific base variation between the two variants, and hypothesizes on simultaneous co-existence of these two variants for this gene. PMID:23593343
Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

PubMed

Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

2013-02-19

Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.
An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

PubMed

Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

2011-01-01

cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Comparative Sequence and X-Inactivation Analyses of a Domain of Escape in Human Xp11.2 and the Conserved Segment in Mouse

PubMed Central

Tsuchiya, Karen D.; Greally, John M.; Yi, Yajun; Noel, Kevin P.; Truong, Jean-Pierre; Disteche, Christine M.

2004-01-01

We have performed X-inactivation and sequence analyses on 350 kb of sequence from human Xp11.2, a region shown previously to contain a cluster of genes that escape X inactivation, and we compared this region with the region of conserved synteny in mouse. We identified several new transcripts from this region in human and in mouse, which defined the full extent of the domain escaping X inactivation in both species. In human, escape from X inactivation involves an uninterrupted 235-kb domain of multiple genes. Despite highly conserved gene content and order between the two species, Smcx is the only mouse gene from the conserved segment that escapes inactivation. As repetitive sequences are believed to facilitate spreading of X inactivation along the chromosome, we compared the repetitive sequence composition of this region between the two species. We found that long terminal repeats (LTRs) were decreased in the human domain of escape, but not in the majority of the conserved mouse region adjacent to Smcx in which genes were subject to X inactivation, suggesting that these repeats might be excluded from escape domains to prevent spreading of silencing. Our findings indicate that genomic context, as well as gene-specific regulatory elements, interact to determine expression of a gene from the inactive X-chromosome. PMID:15197169
Single-molecule, full-length transcript sequencing provides insight into the extreme metabolism of the ruby-throated hummingbird Archilochus colubris

PubMed Central

Workman, Rachael E; Myrka, Alexander M; Wong, G William; Tseng, Elizabeth

2018-01-01

Abstract Background Hummingbirds oxidize ingested nectar sugars directly to fuel foraging but cannot sustain this fuel use during fasting periods, such as during the night or during long-distance migratory flights. Instead, fasting hummingbirds switch to oxidizing stored lipids that are derived from ingested sugars. The hummingbird liver plays a key role in moderating energy homeostasis and this remarkable capacity for fuel switching. Additionally, liver is the principle location of de novo lipogenesis, which can occur at exceptionally high rates, such as during premigratory fattening. Yet understanding how this tissue and whole organism moderates energy turnover is hampered by a lack of information regarding how relevant enzymes differ in sequence, expression, and regulation. Findings We generated a de novo transcriptome of the hummingbird liver using PacBio full-length cDNA sequencing (Iso-Seq), yielding 8.6Gb of sequencing data, or 2.6M reads from 4 different size fractions. We analyzed data using the SMRTAnalysis v3.1 Iso-Seq pipeline, then clustered isoforms into gene families to generate de novo gene contigs using Cogent. We performed orthology analysis to identify closely related sequences between our transcriptome and other avian and human gene sets. Finally, we closely examined homology of critical lipid metabolism genes between our transcriptome data and avian and human genomes. Conclusions We confirmed high levels of sequence divergence within hummingbird lipogenic enzymes, suggesting a high probability of adaptive divergent function in the hepatic lipogenic pathways. Our results leverage cutting-edge technology and a novel bioinformatics pipeline to provide a first direct look at the transcriptome of this incredible organism. PMID:29618047
Spliced synthetic genes as internal controls in RNA sequencing experiments.

PubMed

Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

2016-09-01

RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

PubMed Central

2009-01-01

Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

PubMed

Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

2009-08-06

Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

PubMed

Chou, A; Burke, J

1999-05-01

DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Isoform Sequencing and State-of-Art Applications for Unravelling Complexity of Plant Transcriptomes

PubMed Central

An, Dong; Li, Changsheng; Humbeck, Klaus

2018-01-01

Single-molecule real-time (SMRT) sequencing developed by PacBio, also called third-generation sequencing (TGS), offers longer reads than the second-generation sequencing (SGS). Given its ability to obtain full-length transcripts without assembly, isoform sequencing (Iso-Seq) of transcriptomes by PacBio is advantageous for genome annotation, identification of novel genes and isoforms, as well as the discovery of long non-coding RNA (lncRNA). In addition, Iso-Seq gives access to the direct detection of alternative splicing, alternative polyadenylation (APA), gene fusion, and DNA modifications. Such applications of Iso-Seq facilitate the understanding of gene structure, post-transcriptional regulatory networks, and subsequently proteomic diversity. In this review, we summarize its applications in plant transcriptome study, specifically pointing out challenges associated with each step in the experimental design and highlight the development of bioinformatic pipelines. We aim to provide the community with an integrative overview and a comprehensive guidance to Iso-Seq, and thus to promote its applications in plant research. PMID:29346292
Deciphering the distance to antibiotic resistance for the pneumococcus using genome sequencing data

PubMed Central

Mobegi, Fredrick M.; Cremers, Amelieke J. H.; de Jonge, Marien I.; Bentley, Stephen D.; van Hijum, Sacha A. F. T.; Zomer, Aldert

2017-01-01

Advances in genome sequencing technologies and genome-wide association studies (GWAS) have provided unprecedented insights into the molecular basis of microbial phenotypes and enabled the identification of the underlying genetic variants in real populations. However, utilization of genome sequencing in clinical phenotyping of bacteria is challenging due to the lack of reliable and accurate approaches. Here, we report a method for predicting microbial resistance patterns using genome sequencing data. We analyzed whole genome sequences of 1,680 Streptococcus pneumoniae isolates from four independent populations using GWAS and identified probable hotspots of genetic variation which correlate with phenotypes of resistance to essential classes of antibiotics. With the premise that accumulation of putative resistance-conferring SNPs, potentially in combination with specific resistance genes, precedes full resistance, we retrogressively surveyed the hotspot loci and quantified the number of SNPs and/or genes, which if accumulated would confer full resistance to an otherwise susceptible strain. We name this approach the ‘distance to resistance’. It can be used to identify the creep towards complete antibiotics resistance in bacteria using genome sequencing. This approach serves as a basis for the development of future sequencing-based methods for predicting resistance profiles of bacterial strains in hospital microbiology and public health settings. PMID:28205635
Construction of Pará rubber tree genome and multi-transcriptome database accelerates rubber researches.

PubMed

Makita, Yuko; Kawashima, Mika; Lau, Nyok Sean; Othman, Ahmad Sofiman; Matsui, Minami

2018-01-19

Natural rubber is an economically important material. Currently the Pará rubber tree, Hevea brasiliensis is the main commercial source. Little is known about rubber biosynthesis at the molecular level. Next-generation sequencing (NGS) technologies brought draft genomes of three rubber cultivars and a variety of RNA sequencing (RNA-seq) data. However, no current genome or transcriptome databases (DB) are organized by gene. A gene-oriented database is a valuable support for rubber research. Based on our original draft genome sequence of H. brasiliensis RRIM600, we constructed a rubber tree genome and transcriptome DB. Our DB provides genome information including gene functional annotations and multi-transcriptome data of RNA-seq, full-length cDNAs including PacBio Isoform sequencing (Iso-Seq), ESTs and genome wide transcription start sites (TSSs) derived from CAGE technology. Using our original and publically available RNA-seq data, we calculated co-expressed genes for identifying functionally related gene sets and/or genes regulated by the same transcription factor (TF). Users can access multi-transcriptome data through both a gene-oriented web page and a genome browser. For the gene searching system, we provide keyword search, sequence homology search and gene expression search; users can also select their expression threshold easily. The rubber genome and transcriptome DB provides rubber tree genome sequence and multi-transcriptomics data. This DB is useful for comprehensive understanding of the rubber transcriptome. This will assist both industrial and academic researchers for rubber and economically important close relatives such as R. communis, M. esculenta and J. curcas. The Rubber Transcriptome DB release 2017.03 is accessible at http://matsui-lab.riken.jp/rubber/ .
RNA sequencing, de novo assembly and differential analysis of the gill transcriptome of freshwater climbing perch Anabas testudineus after six days of seawater exposure.

PubMed

Chen, X L; Lui, E Y; Ip, Y Kwong; Lam, S H

2018-06-21

To obtain transcriptomic insights into branchial responses to salinity challenge in Anabas testudineus, this study employed RNA sequencing (RNA-Seq) to analyse the gill transcriptome of A. testudineus exposed to seawater (SW) for 6 days compared with the freshwater (FW) control group. A combined FW and SW gill transcriptome was de novo assembled from 169.9 million 101 bp paired-end reads. In silico validation employing 17 A. testudineus Sanger full-length coding sequences showed that 15/17 of them had greater than 80% of their sequences aligned to the de novo assembled contigs where 5/17 had their full-length (100%) aligned and 9/17 had greater than 90% of their sequences aligned. The combined FW and SW gill transcriptome was mapped to 13780 unique human identifiers at E-value < 1.0E-20 while 952 and 886 identifiers were determined as up and down-regulated by 1.5 fold, respectively, in the gills of A. testudineus in SW when compared with FW. These genes were found to be associated with at least 23 biological processes. A larger proportion of genes encoding enzymes and transporters associated with molecular transport, energy production, metabolisms were up-regulated, while a larger proportion of genes encoding transmembrane receptors, G-protein coupled receptors, kinases and transcription regulators associated with cell cycle, growth, development, signalling, morphology and gene expression were relatively lower in the gills of A. testudineus in SW when compared with FW. High correlation (R = 0.99) was observed between RNA-Seq data and real-time quantitative PCR validation for 13 selected genes. The transcriptomic sequence information will facilitate development of molecular resources and tools while the findings will provide insights for future studies into branchial iono-osmoregulation and related cellular processes in A. testudineus. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck.

PubMed

Licciardello, Concetta; D'Agostino, Nunzio; Traini, Alessandra; Recupero, Giuseppe Reforgiato; Frusciante, Luigi; Chiusano, Maria Luisa

2014-02-03

Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar.
Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System

PubMed Central

Smith, L. Courtney

2012-01-01

The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951
Genome of Horsepox Virus

PubMed Central

Tulman, E. R.; Delhon, G.; Afonso, C. L.; Lu, Z.; Zsak, L.; Sandybaev, N. T.; Kerembekova, U. Z.; Zaitsev, V. L.; Kutish, G. F.; Rock, D. L.

2006-01-01

Here we present the genomic sequence of horsepox virus (HSPV) isolate MNR-76, an orthopoxvirus (OPV) isolated in 1976 from diseased Mongolian horses. The 212-kbp genome contained 7.5-kbp inverted terminal repeats and lacked extensive terminal tandem repetition. HSPV contained 236 open reading frames (ORFs) with similarity to those in other OPVs, with those in the central 100-kbp region most conserved relative to other OPVs. Phylogenetic analysis of the conserved region indicated that HSPV is closely related to sequenced isolates of vaccinia virus (VACV) and rabbitpox virus, clearly grouping together these VACV-like viruses. Fifty-four HSPV ORFs likely represented fragments of 25 orthologous OPV genes, including in the central region the only known fragmented form of an OPV ribonucleotide reductase large subunit gene. In terminal genomic regions, HSPV lacked full-length homologues of genes variably fragmented in other VACV-like viruses but was unique in fragmentation of the homologue of VACV strain Copenhagen B6R, a gene intact in other known VACV-like viruses. Notably, HSPV contained in terminal genomic regions 17 kbp of OPV-like sequence absent in known VACV-like viruses, including fragments of genes intact in other OPVs and approximately 1.4 kb of sequence present only in cowpox virus (CPXV). HSPV also contained seven full-length genes fragmented or missing in other VACV-like viruses, including intact homologues of the CPXV strain GRI-90 D2L/I4R CrmB and D13L CD30-like tumor necrosis factor receptors, D3L/I3R and C1L ankyrin repeat proteins, B19R kelch-like protein, D7L BTB/POZ domain protein, and B22R variola virus B22R-like protein. These results indicated that HSPV contains unique genomic features likely contributing to a unique virulence/host range phenotype. They also indicated that while closely related to known VACV-like viruses, HSPV contains additional, potentially ancestral sequences absent in other VACV-like viruses. PMID:16940536
Sequence variation identified in the 18S rRNA gene of Theileria mutans and Theileria velifera from the African buffalo (Syncerus caffer).

PubMed

Chaisi, Mamohale E; Collins, Nicola E; Potgieter, Fred T; Oosthuizen, Marinda C

2013-01-16

The African buffalo (Syncerus caffer) is a natural reservoir host for both pathogenic and non-pathogenic Theileria species. These often occur naturally as mixed infections in buffalo. Although the benign and mildly pathogenic forms do not have any significant economic importance, their presence could complicate the interpretation of diagnostic test results aimed at the specific diagnosis of the pathogenic Theileria parva in cattle and buffalo in South Africa. The 18S rRNA gene has been used as the target in a quantitative real-time PCR (qPCR) assay for the detection of T. parva infections. However, the extent of sequence variation within this gene in the non-pathogenic Theileria spp. of the Africa buffalo is not well known. The aim of this study was, therefore, to characterise the full-length 18S rRNA genes of Theileria mutans, Theileria sp. (strain MSD) and T. velifera and to determine the possible influence of any sequence variation on the specific detection of T. parva using the 18S rRNA qPCR. The reverse line blot (RLB) hybridization assay was used to select samples which either tested positive for several different Theileria spp., or which hybridised only with the Babesia/Theileria genus-specific probe and not with any of the Babesia or Theileria species-specific probes. The full-length 18S rRNA genes from 14 samples, originating from 13 buffalo and one bovine from different localities in South Africa, were amplified, cloned and the resulting recombinants sequenced. Variations in the 18S rRNA gene sequences were identified in T. mutans, Theileria sp. (strain MSD) and T. velifera, with the greatest diversity observed amongst the T. mutans variants. This variation possibly explained why the RLB hybridization assay failed to detect T. mutans and T. velifera in some of the analysed samples. Copyright © 2012 Elsevier B.V. All rights reserved.
Expressed sequence tag analysis of adult human lens for the NEIBank Project: over 2000 non-redundant transcripts, novel genes and splice variants.

PubMed

Wistow, Graeme; Bernstein, Steven L; Wyatt, M Keith; Behal, Amita; Touchman, Jeffrey W; Bouffard, Gerald; Smith, Don; Peterson, Katherine

2002-06-15

To explore the expression profile of the human lens and to provide a resource for microarray studies, expressed sequence tag (EST) analysis has been performed on cDNA libraries from adult lenses. A cDNA library was constructed from two adult (40 year old) human lenses. Over two thousand clones were sequenced from the unamplified, un-normalized library. The library was then normalized and a further 2200 sequences were obtained. All the data were analyzed using GRIST (GRouping and Identification of Sequence Tags), a procedure for gene identification and clustering. The lens library (by) contains a low percentage of non-mRNA contaminants and a high fraction (over 75%) of apparently full length cDNA clones. Approximately 2000 reads from the unamplified library yields 810 clusters, potentially representing individual genes expressed in the lens. After normalization, the content of crystallins and other abundant cDNAs is markedly reduced and a similar number of reads from this library (fs) yields 1455 unique groups of which only two thirds correspond to named genes in GenBank. Among the most abundant cDNAs is one for a novel gene related to glutamine synthetase, which was designated "lengsin" (LGS). Analyses of ESTs also reveal examples of alternative transcripts, including a major alternative splice form for the lens specific membrane protein MP19. Variant forms for other transcripts, including those encoding the apoptosis inhibitor Livin and the armadillo repeat protein ARVCF, are also described. The lens cDNA libraries are a resource for gene discovery, full length cDNAs for functional studies and microarrays. The discovery of an abundant, novel transcript, lengsin, and a major novel splice form of MP19 reflect the utility of unamplified libraries constructed from dissected tissue. Many novel transcripts and splice forms are represented, some of which may be candidates for genetic diseases.
Identification of interleukin genes in Pogona vitticeps using a de novo transcriptome assembly from RNA-seq data.

PubMed

Livernois, Alexandra; Hardy, Kristine; Domaschenz, Renae; Papanicolaou, Alexie; Georges, Arthur; Sarre, Stephen D; Rao, Sudha; Ezaz, Tariq; Deakin, Janine E

2016-10-01

Interleukins are a group of cytokines with complex immunomodulatory functions that are important for regulating immunity in vertebrate species. Reptiles and mammals last shared a common ancestor more than 350 million years ago, so it is not surprising that low sequence identity has prevented divergent interleukin genes from being identified in the central bearded dragon lizard, Pogona vitticeps, in its genome assembly. To determine the complete nucleotide sequences of key interleukin genes, we constructed full-length transcripts, using the Trinity platform, from short paired-end read RNA sequences from stimulated spleen cells. De novo transcript reconstruction and analysis allowed us to identify interleukin genes that are missing from the published P. vitticeps assembly. Identification of key cytokines in P. vitticeps will provide insight into the essential molecular mechanisms and evolution of interleukin gene families and allow for characterization of the immune response in a lizard for comparison with mammals.
ECB deacylase mutants

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.

2002-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.

Both positive and negative regulatory elements mediate expression of a photoregulated CAB gene from Nicotiana plumbaginifolia.

PubMed Central

Castresana, C; Garcia-Luque, I; Alonso, E; Malik, V S; Cashmore, A R

1988-01-01

We have analyzed promoter regulatory elements from a photoregulated CAB gene (Cab-E) isolated from Nicotiana plumbaginifolia. These studies have been performed by introducing chimeric gene constructs into tobacco cells via Agrobacterium tumefaciens-mediated transformation. Expression studies on the regenerated transgenic plants have allowed us to characterize three positive and one negative cis-acting elements that influence photoregulated expression of the Cab-E gene. Within the upstream sequences we have identified two positive regulatory elements (PRE1 and PRE2) which confer maximum levels of photoregulated expression. These sequences contain multiple repeated elements related to the sequence-ACCGGCCCACTT-. We have also identified within the upstream region a negative regulatory element (NRE) extremely rich in AT sequences, which reduces the level of gene expression in the light. We have defined a light regulatory element (LRE) within the promoter region extending from -396 to -186 bp which confers photoregulated expression when fused to a constitutive nopaline synthase ('nos') promoter. Within this region there is a 132-bp element, extending from -368 to -234 bp, which on deletion from the Cab-E promoter reduces gene expression from high levels to undetectable levels. Finally, we have demonstrated for a full length Cab-E promoter conferring high levels of photoregulated expression, that sequences proximal to the Cab-E TATA box are not replaceable by corresponding sequences from a 'nos' promoter. This contrasts with the apparent equivalence of these Cab-E and 'nos' TATA box-proximal sequences in truncated promoters conferring low levels of photoregulated expression. Images PMID:2901343
Identification of a new genotype H wild-type mumps virus strain and its molecular relatedness to other virulent and attenuated strains.

PubMed

Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin

2003-06-01

A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
Generation and Analysis of Expressed Sequence Tags (ESTs) from Halophyte Atriplex canescens to Explore Salt-Responsive Related Genes

PubMed Central

Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu

2014-01-01

Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361
Completion of full length genome sequence of novel avian paramyxovirus strain APMV/Shimane67 isolated from migratory wild geese in Japan.

PubMed

Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi

2016-11-01

The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
Infectious hematopoietic necrosis virus: monophyletic origin of European isolates from North American genogroup M.

PubMed

Enzmann, P J; Kurath, G; Fichtner, D; Bergmann, S M

2005-09-23

Infectious hematopoietic necrosis virus (IHNV) was first detected in Europe in 1987 in France and Italy, and later, in 1992, in Germany. The source of the virus and the route of introduction are unknown. The present study investigates the molecular epidemiology of IHNV outbreaks in Germany since its first introduction. The complete nucleotide sequences of the glycoprotein (G) and non-virion (NV) genes from 9 IHNV isolates from Germany have been determined, and this has allowed the identification of characteristic differences between these isolates. Phylogenetic analysis of partial G gene sequences (mid-G, 303 nucleotides) from North American IHNV isolates (Kurath et al. 2003) has revealed 3 major genogroups, designated U, M and L. Using this gene region with 2 different North American IHNV data sets, it was possible to group the European IHNV strains within the M genogroup, but not in any previously defined subgroup. Analysis of the full length G gene sequences indicated that an independent evolution of IHN viruses had occurred in Europe. IHN viruses in Europe seem to be of a monophyletic origin, again most closely related to North American isolates in the M genogroup. Analysis of the NV gene sequences also showed the European isolates to be monophyletic, but resolution of the 3 genogroups was poor with this gene region. As a result of comparative sequence analyses, several different genotypes have been identified circulating in Europe.
Alternative polyadenylation of the gene transcripts encoding a rat DNA polymerase beta.

PubMed

Konopiński, R; Nowak, R; Siedlecki, J A

1996-10-17

Rat cells produce two different transcripts of DNA polymerase beta (beta-Pol). The low-molecular-weight transcript (1.4 kb) was already sequenced. We report here the cloning and sequencing of the full-length cDNA, corresponding to the high-molecular-weight (HMW) transcript (4.0 kb) of beta-Pol. Sequence data strongly suggest that both transcripts are produced from a single gene by alternative polyadenylation. The HMW transcript contains the entire 1.4 kb transcript sequence and additional 2.2 kb on the 3' end. The 3' UTR of the HMW transcript contains some regulatory sequences which are not present in the 1.4-kb transcript. The A + U-rich fragment and (GU)21 sequence are believed to influence the stability of the mRNA. The functional significance of the A-rich region locally destabilizing double-stranded secondary structure remains unknown.
Dynamics of actin evolution in dinoflagellates.

PubMed

Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

2011-04-01

Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Draft genome sequence of multidrug-resistant Staphylococcus haemolyticus IPK_TSA25 harbouring a Staphylococcus aureus plasmid, pS0385-1.

PubMed

Kim, Hyung Jun; Jang, Soojin

2017-12-01

Staphylococcus haemolyticus is the second most frequently isolated coagulase-negative staphylococci from blood cultures. Moreover, multidrug resistance associated with the genome flexibility of S. haemolyticus has been increasingly reported worldwide. Here we report the draft genome sequence of multidrug-resistant S. haemolyticus IPK_TSA25 isolated from a building surface in South Korea. Genomic DNA of S. haemolyticus IPK_TSA25 was sequenced using the PacBio RS II sequencing platform. Generated reads were assembled using PacBio SMRT Analysis 2.3.0. The draft genome was annotated and antibiotic resistance genes were identified. The genome of 2517398bp contains various antibiotic resistance genes associated with resistance to β-lactams, aminoglycosides and macrolides. Genome analysis also revealed chromosomal integration of the full-length Staphylococcus aureus plasmid pS0385-1 containing a tetracycline resistance gene. The genome sequence reported in this study will provide valuable information to understand the flexibility of the S. haemolyticus genome, which facilitates acquisition of antibiotic resistance genes and contributes to the dissemination of antibiotic resistance by this emerging pathogen. Copyright © 2017 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
Identification of genes associated with reproduction in the Mud Crab (Scylla olivacea) and their differential expression following serotonin stimulation.

PubMed

Kornthong, Napamanee; Cummins, Scott F; Chotwiwatthanakun, Charoonroj; Khornchatri, Kanjana; Engsusophon, Attakorn; Hanna, Peter J; Sobhon, Prasert

2014-01-01

The central nervous system (CNS) is often intimately involved in reproduction control and is therefore a target organ for transcriptomic investigations to identify reproduction-associated genes. In this study, 454 transcriptome sequencing was performed on pooled brain and ventral nerve cord of the female mud crab (Scylla olivacea) following serotonin injection (5 µg/g BW). A total of 197,468 sequence reads was obtained with an average length of 828 bp. Approximately 38.7% of 2,183 isotigs matched with significant similarity (E value < 1e-4) to sequences within the Genbank non-redundant (nr) database, with most significant matches being to crustacean and insect sequences. Approximately 32 putative neuropeptide genes were identified from nonmatching blast sequences. In addition, we identified full-length transcripts for crustacean reproductive-related genes, namely farnesoic acid o-methyltransferase (FAMeT), estrogen sulfotransferase (ESULT) and prostaglandin F synthase (PGFS). Following serotonin injection, which would normally initiate reproductive processes, we found up-regulation of FAMeT, ESULT and PGFS expression in the female CNS and ovary. Our data here provides an invaluable new resource for understanding the molecular role of the CNS on reproduction in S. olivacea.
Discovery, characterization and expression of a novel zebrafish gene, znfr, important for notochord formation.

PubMed

Xu, Yan; Zou, Peng; Liu, Yao; Deng, Fengjiao

2010-06-01

Genes specifically expressed in the notochord may be crucial for proper notochord development. Using the digital differential display program offered by the National Center for Biotechnology Information, we identified a novel EST sequence from a zebrafish ovary library (No. XM_701450). The full-length cDNA of this transcript was cloned by performing 3' and 5'-RACE and was further confirmed by PCR and sequencing. The resulting 614 bp gene was found to encode a novel 94 amino acid protein that did not share significant homology with any other known protein. Characterization of the genomic sequence revealed that the gene spanned 4.9 kb and was composed of four exons and three introns. RT-PCR gene expression analysis revealed that our gene of interest was expressed in ovary, kidney, brain, mature oocytes and during the early stages of embryogenesis. During embryonic development, znfr mRNA was found to be expressed in the embryonic shield, chordamesoderm and the vacuolated notochord cells by in situ hybridization. Based on this information, we hypothesize that this novel gene is an important maternal factor required for zebrafish notochord formation during early embryonic development. We have thus named this gene znfr (zebrafish notochord formation related).
Cloning and construction of recombinant palI gene from Klebsiella oxytoca on pET-32b into E. coli BL21 (DE3) pLysS for production of isomaltulose, a new generation of sugar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moeis, Maelita R., E-mail: sony@sith.itb.ac.id; Berlian, Liska, E-mail: sony@sith.itb.ac.id; Suhandono, Sony, E-mail: sony@sith.itb.ac.id

Klebsiella oxytoca produces sucrose isomerase which catalyses the conversion of sucrose to isomaltulose, a new generation of sugar. From the previous study, palI gene from Klebsiella oxytoca was succesfully isolated from sapodilla fruit (Manilkara zapota). The full-length palI gene sequence of Klebsiella oxytoca was cloned in E. coli DH5α. The deduced amino acid sequence shows 498 residues which includes conserved motif for sucrose isomerisation {sup 325}RLDRD{sup 329} and 97% identical to palI gene from Klebsiella sp. LX3 (GenBank:AAK82938.1). This fragment was succesfullly ligated into the expression vector pET-32b using overlap-extension PCR and cloned in Escherichia coli BL21 (DE3) pLysS. DNAmore » sequencing result shows that palI gene of Klebsiella oxytoca was inserted in-frame in pET-32b. This is the first report on cloning of palI gene from Klebsiella oxytoca.« less
Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

PubMed Central

Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

2015-01-01

Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877
Large-scale deletions of the ABCA1 gene in patients with hypoalphalipoproteinemia.

PubMed

Dron, Jacqueline S; Wang, Jian; Berberich, Amanda J; Iacocca, Michael A; Cao, Henian; Yang, Ping; Knoll, Joan; Tremblay, Karine; Brisson, Diane; Netzer, Christian; Gouni-Berthold, Ioanna; Gaudet, Daniel; Hegele, Robert A

2018-06-04

Copy-number variations (CNVs) have been studied in the context of familial hypercholesterolemia but have not yet been evaluated in patients with extremes of high-density lipoprotein (HDL) cholesterol levels. We evaluated targeted next-generation sequencing data from patients with very low HDL cholesterol (i.e. hypoalphalipoproteinemia) using the VarSeq-CNV caller algorithm to screen for CNVs disrupting the ABCA1, LCAT or APOA1 genes. In four individuals, we found three unique deletions in ABCA1: a heterozygous deletion of exon 4, a heterozygous deletion spanning exons 8 to 31, and a heterozygous deletion of the entire ABCA1 gene. Breakpoints were identified using Sanger sequencing, and the full-gene deletion was also confirmed using exome sequencing and the Affymetrix CytoScanTM HD Array. Before now, large-scale deletions in candidate HDL genes have not been associated with hypoalphalipoproteinemia; our findings indicate that CNVs in ABCA1 may be a previously unappreciated genetic determinant of low HDL cholesterol levels. By coupling bioinformatic analyses with next-generation sequencing data, we can successfully assess the spectrum of genetic determinants of many dyslipidemias, now including hypoalphalipoproteinemia. Published under license by The American Society for Biochemistry and Molecular Biology, Inc.
Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation

PubMed Central

Wegrzyn, Jill L.; Liechty, John D.; Stevens, Kristian A.; Wu, Le-Shin; Loopstra, Carol A.; Vasquez-Gross, Hans A.; Dougherty, William M.; Lin, Brian Y.; Zieve, Jacob J.; Martínez-García, Pedro J.; Holt, Carson; Yandell, Mark; Zimin, Aleksey V.; Yorke, James A.; Crepeau, Marc W.; Puiu, Daniela; Salzberg, Steven L.; de Jong, Pieter J.; Mockaitis, Keithanne; Main, Doreen; Langley, Charles H.; Neale, David B.

2014-01-01

The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%. PMID:24653211
Molecular Cloning and Sequence Analysis of a Phenylalanine Ammonia-Lyase Gene from Dendrobium

PubMed Central

Cai, Yongping; Lin, Yi

2013-01-01

In this study, a phenylalanine ammonia-lyase (PAL) gene was cloned from Dendrobium candidum using homology cloning and RACE. The full-length sequence and catalytic active sites that appear in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum are also found: PAL cDNA of D. candidum (designated Dc-PAL1, GenBank No. JQ765748) has 2,458 bps and contains a complete open reading frame (ORF) of 2,142 bps, which encodes 713 amino acid residues. The amino acid sequence of DcPAL1 has more than 80% sequence identity with the PAL genes of other plants, as indicated by multiple alignments. The dominant sites and catalytic active sites, which are similar to that showing in PAL proteins of Arabidopsis thaliana and Nicotiana tabacum, are also found in DcPAL1. Phylogenetic tree analysis revealed that DcPAL is more closely related to PALs from orchidaceae plants than to those of other plants. The differential expression patterns of PAL in protocorm-like body, leaf, stem, and root, suggest that the PAL gene performs multiple physiological functions in Dendrobium candidum. PMID:23638048
Molecular cloning of allelopathy related genes and their relation to HHO in Eupatorium adenophorum.

PubMed

Guo, Huiming; Pei, Xixiang; Wan, Fanghao; Cheng, Hongmei

2011-10-01

In this study, conserved sequence regions of HMGR, DXR, and CHS (encoding 3-hydroxy-3-methylglutaryl-CoA reductase, 1-deoxyxylulose-5-phosphate reductoisomerase and chalcone synthase, respectively) were amplified by reverse transcriptase (RT)-PCR from Eupatorium adenophorum. Quantitative real-time PCR showed that the expression of CHS was related to the level of HHO, an allelochemical isolated from E. adenophorum. Semi-quantitative RT-PCR showed that there was no significant difference in expression of genes among three different tissues, except for CHS. Southern blotting indicated that at least three CHS genes are present in the E. adenophorum genome. A full-length cDNA from CHS genes (named EaCHS1, GenBank ID: FJ913888) was cloned. The 1,455 bp cDNA contained an open reading frame (1,206 bp) encoding a protein of 401 amino acids. Preliminary bioinformatics analysis of EaCHS1 revealed that EaCHS1 was a member of CHS family, the subcellular localization predicted that EaCHS1 was a cytoplasmic protein. To the best of our knowledge, this is the first report of conserved sequences of these genes and of a full-length EaCHS1 gene in E. adenophorum. The results indicated that CHS gene is related to allelopathy of E. adenophorum.
GeneBuilder: interactive in silico prediction of gene structure.

PubMed

Milanesi, L; D'Angelo, D; Rogozin, I B

1999-01-01

Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Molecular characterization of a nuclear topoisomerase II from Nicotiana tabacum that functionally complements a temperature-sensitive topoisomerase II yeast mutant.

PubMed

Singh, B N; Mudgil, Yashwanti; Sopory, S K; Reddy, M K

2003-07-01

We have successfully expressed enzymatically active plant topoisomerase II in Escherichia coli for the first time, which has enabled its biochemical characterization. Using a PCR-based strategy, we obtained a full-length cDNA and the corresponding genomic clone of tobacco topoisomerase II. The genomic clone has 18 exons interrupted by 17 introns. Most of the 5' and 3' splice junctions follow the typical canonical consensus dinucleotide sequence GU-AG present in other plant introns. The position of introns and phasing with respect to primary amino acid sequence in tobacco TopII and Arabidopsis TopII are highly conserved, suggesting that the two genes are evolved from the common ancestral type II topoisomerase gene. The cDNA encodes a polypeptide of 1482 amino acids. The primary amino acid sequence shows a striking sequence similarity, preserving all the structural domains that are conserved among eukaryotic type II topoisomerases in an identical spatial order. We have expressed the full-length polypeptide in E. coli and purified the recombinant protein to homogeneity. The full-length polypeptide relaxed supercoiled DNA and decatenated the catenated DNA in a Mg(2+)- and ATP-dependent manner, and this activity was inhibited by 4'-(9-acridinylamino)-3'-methoxymethanesulfonanilide (m-AMSA). The immunofluorescence and confocal microscopic studies, with antibodies developed against the N-terminal region of tobacco recombinant topoisomerase II, established the nuclear localization of topoisomerase II in tobacco BY2 cells. The regulated expression of tobacco topoisomerase II gene under the GAL1 promoter functionally complemented a temperature-sensitive TopII(ts) yeast mutant.
Wound induced Beta vulgaris polygalacturonase-inhibiting protein genes encode a longer leucine-rich repeat domain and inhibit fungal polygalacturonases

USDA-ARS?s Scientific Manuscript database

Polygalacturonase-inhibiting proteins (PGIPs) are leucine-rich repeat (LRR) proteins involved in plant defense. Sugar beet (Beta vulgaris L.) PGIP genes, BvPGIP1, BvPGIP2 and BvPGIP3, were isolated from two breeding lines, F1016 and F1010. Full-length cDNA sequences of the three BvPGIP genes encod...
The sequence, structure and evolutionary features of HOTAIR in mammals

PubMed Central

2011-01-01

Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275

Community analysis of a full-scale anaerobic bioreactor treating paper mill wastewater.

PubMed

Roest, Kees; Heilig, Hans G H J; Smidt, Hauke; de Vos, Willem M; Stams, Alfons J M; Akkermans, Antoon D L

2005-03-01

To get insight into the microbial community of an Upflow Anaerobic Sludge Blanket reactor treating paper mill wastewater, conventional microbiological methods were combined with 16S rRNA gene analyses. Particular attention was paid to microorganisms able to degrade propionate or butyrate in the presence or absence of sulphate. Serial enrichment dilutions allowed estimating the number of microorganisms per ml sludge that could use butyrate with or without sulphate (10(5)), propionate without sulphate (10(6)), or propionate and sulphate (10(8)). Quantitative RNA dot-blot hybridisation indicated that Archaea were two-times more abundant in the microbial community of anaerobic sludge than Bacteria. The microbial community composition was further characterised by 16S rRNA-gene-targeted Denaturing Gradient Gel Electrophoresis (DGGE) fingerprinting, and via cloning and sequencing of dominant amplicons from the bacterial and archaeal patterns. Most of the nearly full length (approximately 1.45 kb) bacterial 16S rRNA gene sequences showed less than 97% similarity to sequences present in public databases, in contrast to the archaeal clones (approximately. 1.3 kb) that were highly similar to known sequences. While Methanosaeta was found as the most abundant genus, also Crenarchaeote-relatives were identified. The microbial community was relatively stable over a period of 3 years (samples taken in July 1999, May 2001, March 2002 and June 2002) as indicated by the high similarity index calculated from DGGE profiles (81.9+/-2.7% for Bacteria and 75.1+/-3.1% for Archaea). 16S rRNA gene sequence analysis indicated the presence of unknown and yet uncultured microorganisms, but also showed that known sulphate-reducing bacteria and syntrophic fatty acid-oxidising microorganisms dominated the enrichments.
PaperBLAST: Text Mining Papers for Information about Homologs

DOE PAGES

Price, Morgan N.; Arkin, Adam P.

2017-08-15

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Morgan N.; Arkin, Adam P.

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs

PubMed Central

Arkin, Adam P.

2017-01-01

ABSTRACT Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions. PMID:28845458
PaperBLAST: Text Mining Papers for Information about Homologs.

PubMed

Price, Morgan N; Arkin, Adam P

2017-01-01

Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
A taxonomic framework for cable bacteria and proposal of the candidate genera Electrothrix and Electronema.

PubMed

Trojan, Daniela; Schreiber, Lars; Bjerg, Jesper T; Bøggild, Andreas; Yang, Tingting; Kjeldsen, Kasper U; Schramm, Andreas

2016-07-01

Cable bacteria are long, multicellular filaments that can conduct electric currents over centimeter-scale distances. All cable bacteria identified to date belong to the deltaproteobacterial family Desulfobulbaceae and have not been isolated in pure culture yet. Their taxonomic delineation and exact phylogeny is uncertain, as most studies so far have reported only short partial 16S rRNA sequences or have relied on identification by a combination of filament morphology and 16S rRNA-targeted fluorescence in situ hybridization with a Desulfobulbaceae-specific probe. In this study, nearly full-length 16S rRNA gene sequences of 16 individual cable bacteria filaments from freshwater, salt marsh, and marine sites of four geographic locations are presented. These sequences formed a distinct, monophyletic sister clade to the genus Desulfobulbus and could be divided into six coherent, species-level clusters, arranged as two genus-level groups. The same grouping was retrieved by phylogenetic analysis of full or partial dsrAB genes encoding the dissimilatory sulfite reductase. Based on these results, it is proposed to accommodate cable bacteria within two novel candidate genera: the mostly marine "Candidatus Electrothrix", with four candidate species, and the mostly freshwater "Candidatus Electronema", with two candidate species. This taxonomic framework can be used to assign environmental sequences confidently to the cable bacteria clade, even without morphological information. Database searches revealed 185 16S rRNA gene sequences that affiliated within the clade formed by the proposed cable bacteria genera, of which 120 sequences could be assigned to one of the six candidate species, while the remaining 65 sequences indicated the existence of up to five additional species. Copyright © 2016 The Author(s). Published by Elsevier GmbH.. All rights reserved.
Identification of SLC20A1 and SLC15A4 among other genes as potential risk factors for combined pituitary hormone deficiency.

PubMed

Simm, Franziska; Griesbeck, Anne; Choukair, Daniela; Weiß, Birgit; Paramasivam, Nagarajan; Klammt, Jürgen; Schlesner, Matthias; Wiemann, Stefan; Martinez, Cristina; Hoffmann, Georg F; Pfäffle, Roland W; Bettendorf, Markus; Rappold, Gudrun A

2017-10-26

PurposeCombined pituitary hormone deficiency (CPHD) is characterized by a malformed or underdeveloped pituitary gland resulting in an impaired pituitary hormone secretion. Several transcription factors have been described in its etiology, but defects in known genes account for only a small proportion of cases.MethodsTo identify novel genetic causes for congenital hypopituitarism, we performed exome-sequencing studies on 10 patients with CPHD and their unaffected parents. Two candidate genes were sequenced in further 200 patients. Genotype data of known hypopituitary genes are reviewed.ResultsWe discovered 51 likely damaging variants in 38 genes; 12 of the 51 variants represent de novo events (24%); 11 of the 38 genes (29%) were present in the E12.5/E14.5 pituitary transcriptome. Targeted sequencing of two candidate genes, SLC20A1 and SLC15A4, of the solute carrier membrane transport protein family in 200 additional patients demonstrated two further variants predicted as damaging. We also found combinations of de novo (SLC20A1/SLC15A4) and transmitted variants (GLI2/LHX3) in the same individuals, leading to the full-blown CPHD phenotype.ConclusionThese data expand the pituitary target genes repertoire for diagnostics and further functional studies. Exome sequencing has identified a combination of rare variants in different genes that might explain incomplete penetrance in CPHD.Genetics in Medicine advance online publication, 26 October 2017; doi:10.1038/gim.2017.165.
Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs.

PubMed

Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi

2018-02-12

Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation.

PubMed

Hara, Yuichiro; Tatsumi, Kaori; Yoshida, Michio; Kajikawa, Eriko; Kiyonari, Hiroshi; Kuraku, Shigehiro

2015-11-18

RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses.
Genomic analysis of Ascochyta rabiei identifies dynamic genome environments of solanapyrone biosynthesis gene cluster and a novel type of pathway-specific regulator

USDA-ARS?s Scientific Manuscript database

Secondary metabolite genes are often clustered together and situated in particular genomic regions such as the subtelomere, which can facilitate niche adaptation in fungi. Solanapyrones are toxic secondary metabolites produced by fungi occupying different ecological niches. Full genome sequencing of...
Isolation of a novel Orientia species (O. chuto sp. nov.) from a patient infected in Dubai.

PubMed

Izzard, Leonard; Fuller, Andrew; Blacksell, Stuart D; Paris, Daniel H; Richards, Allen L; Aukkanit, Nuntipa; Nguyen, Chelsea; Jiang, Ju; Fenwick, Stan; Day, Nicholas P J; Graves, Stephen; Stenos, John

2010-12-01

In July 2006, an Australian tourist returning from Dubai, in the United Arab Emirates (UAE), developed acute scrub typhus. Her signs and symptoms included fever, myalgia, headache, rash, and eschar. Orientia tsutsugamushi serology demonstrated a 4-fold rise in antibody titers in paired serum collections (1:512 to 1:8,192), with the sera reacting strongest against the Gilliam strain antigen. An Orientia species was isolated by the in vitro culture of the patient's acute blood taken prior to antibiotic treatment. The gene sequencing of the 16S rRNA gene (rrs), partial 56-kDa gene, and the full open reading frame 47-kDa gene was performed, and comparisons of this new Orientia sp. isolate to previously characterized strains demonstrated significant sequence diversity. The closest homology to the rrs sequence of the new Orientia sp. isolate was with three strains of O. tsutsugamushi (Ikeda, Kato, and Karp), with a nucleotide sequence similarity of 98.5%. The closest homology to the 47-kDa gene sequence was with O. tsutsugamushi strain Gilliam, with a nucleotide similarity of 82.3%, while the closest homology to the 56-kDa gene sequence was with O. tsutsugamushi strain TA686, with a nucleotide similarity of 53.1%. The molecular divergence and geographically unique origin lead us to believe that this organism should be considered a novel species. Therefore, we have proposed the name "Orientia chuto," and the prototype strain of this species is strain Dubai, named after the location in which the patient was infected.
Isolation of a Novel Orientia Species (O. chuto sp. nov.) from a Patient Infected in Dubai ▿

PubMed Central

Izzard, Leonard; Fuller, Andrew; Blacksell, Stuart D.; Paris, Daniel H.; Richards, Allen L.; Aukkanit, Nuntipa; Nguyen, Chelsea; Jiang, Ju; Fenwick, Stan; Day, Nicholas P. J.; Graves, Stephen; Stenos, John

2010-01-01

In July 2006, an Australian tourist returning from Dubai, in the United Arab Emirates (UAE), developed acute scrub typhus. Her signs and symptoms included fever, myalgia, headache, rash, and eschar. Orientia tsutsugamushi serology demonstrated a 4-fold rise in antibody titers in paired serum collections (1:512 to 1:8,192), with the sera reacting strongest against the Gilliam strain antigen. An Orientia species was isolated by the in vitro culture of the patient's acute blood taken prior to antibiotic treatment. The gene sequencing of the 16S rRNA gene (rrs), partial 56-kDa gene, and the full open reading frame 47-kDa gene was performed, and comparisons of this new Orientia sp. isolate to previously characterized strains demonstrated significant sequence diversity. The closest homology to the rrs sequence of the new Orientia sp. isolate was with three strains of O. tsutsugamushi (Ikeda, Kato, and Karp), with a nucleotide sequence similarity of 98.5%. The closest homology to the 47-kDa gene sequence was with O. tsutsugamushi strain Gilliam, with a nucleotide similarity of 82.3%, while the closest homology to the 56-kDa gene sequence was with O. tsutsugamushi strain TA686, with a nucleotide similarity of 53.1%. The molecular divergence and geographically unique origin lead us to believe that this organism should be considered a novel species. Therefore, we have proposed the name “Orientia chuto,” and the prototype strain of this species is strain Dubai, named after the location in which the patient was infected. PMID:20926708
Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms.

PubMed

Au, Chun Hang; Wa, Anna; Ho, Dona N; Chan, Tsun Leung; Ma, Edmond S K

2016-01-22

Genomic techniques in recent years have allowed the identification of many mutated genes important in the pathogenesis of acute myeloid leukemia (AML). Together with cytogenetic aberrations, these gene mutations are powerful prognostic markers in AML and can be used to guide patient management, for example selection of optimal post-remission therapy. The mutated genes also hold promise as therapeutic targets themselves. We evaluated the applicability of a gene panel for the detection of AML mutations in a diagnostic molecular pathology laboratory. Fifty patient samples comprising 46 AML and 4 other myeloid neoplasms were accrued for the study. They consisted of 19 males and 31 females at a median age of 60 years (range: 18-88 years). A total of 54 genes (full coding exons of 15 genes and exonic hotspots of 39 genes) were targeted by 568 amplicons that ranged from 225 to 275 bp. The combined coverage was 141 kb in sequence length. Amplicon libraries were prepared by TruSight myeloid sequencing panel (Illumina, CA) and paired-end sequencing runs were performed on a MiSeq (Illumina) genome sequencer. Sequences obtained were analyzed by in-house bioinformatics pipeline, namely BWA-MEM, Samtools, GATK, Pindel, Ensembl Variant Effect Predictor and a novel algorithm ITDseek. The mean count of sequencing reads obtained per sample was 3.81 million and the mean sequencing depth was over 3000X. Seventy-seven mutations in 24 genes were detected in 37 of 50 samples (74 %). On average, 2 mutations (range 1-5) were detected per positive sample. TP53 gene mutations were found in 3 out of 4 patients with complex and unfavorable cytogenetics. Comparing NGS results with that of conventional molecular testing showed a concordance rate of 95.5 %. After further resolution and application of a novel bioinformatics algorithm ITDseek to aid the detection of FLT3 internal tandem duplication (ITD), the concordance rate was revised to 98.2 %. Gene panel testing by NGS approach was applicable for sensitive and accurate detection of actionable AML gene mutations in the clinical laboratory to individualize patient management. A novel algorithm ITDseek was presented that improved the detection of FLT3-ITD of varying length, position and at low allelic burden.
Identification and characterization of cell-specific enhancer elements for the mouse ETF/Tead2 gene.

PubMed

Tanoue, Y; Yasunami, M; Suzuki, K; Ohkubo, H

2001-12-21

We have identified and characterized by transient transfection assays the cell-specific 117-bp enhancer sequence in the first intron of the mouse ETF (Embryonic TEA domain-containing factor)/Tead2 gene required for transcriptional activation in ETF/Tead2 gene-expressing cells, such as P19 cells. The 117-bp enhancer contains one GC-rich sequence (5'-GGGGCGGGG-3'), termed the GC box, and two tandemly repeated GA-rich sequences (5'-GGGGGAGGGG-3'), termed the proximal and distal GA elements. Further analyses, including transfection studies and electrophoretic mobility shift assays using a series of deletion and mutation constructs, indicated that Sp1, a putative activator, may be required to predominate over its competition with another unknown putative repressor, termed the GA element-binding factor, for binding to both the GC box, which overlapped with the proximal GA element, and the distal GA element in the 117-bp sequence in order to achieve a full enhancer activity. We also discuss a possible mechanism underlying the cell-specific enhancer activity of the 117-bp sequence.
[Development of specific and degenerated primers to CesA genes encoding flax (Linum usitatissimum L.) cellulose synthase].

PubMed

Grushetskaia, Z E; Lemesh, V A; Khotyleva, L V

2010-01-01

Cellulose synthase catalytic subunit genes, CesA, have been discovered in several higher plant species, and it has been shown that the CesA gene family has multiple members. HVR2 fragment of these genes determine the class specificity of the CESA protein and its participation in the primary or secondary cell wall synthesis. The aim of this study was development of specific and degenerated primers to flax CesA gene fragments leading to obtaining the class specific HVR2 region of the gene. Two pairs of specific primers to the certain fragments of CesA-1 and CesA-6 genes and one pair of degenerated primers to HVR2 region of all flax CesA genes were developed basing on comparison of six CesA EST sequences of flax and full cDNA sequences of Arabidopsis, poplar, maize and cotton plants, obtained from GenBank. After amplification of flax cDNA, the bands of expected size were detected (201 and 300 b.p. for the CesA-1 and CesA-6, and 600 b.p. for the HVR2 region of CesA respectively). The developed markers can be used for cloning and sequencing of flax CesA genes, identifying their number in flax genome, tissue and stage specificity.
De novo assembly and next-generation sequencing to analyse full-length gene variants from codon-barcoded libraries.

PubMed

Cho, Namjin; Hwang, Byungjin; Yoon, Jung-ki; Park, Sangun; Lee, Joongoo; Seo, Han Na; Lee, Jeewon; Huh, Sunghoon; Chung, Jinsoo; Bang, Duhee

2015-09-21

Interpreting epistatic interactions is crucial for understanding evolutionary dynamics of complex genetic systems and unveiling structure and function of genetic pathways. Although high resolution mapping of en masse variant libraries renders molecular biologists to address genotype-phenotype relationships, long-read sequencing technology remains indispensable to assess functional relationship between mutations that lie far apart. Here, we introduce JigsawSeq for multiplexed sequence identification of pooled gene variant libraries by combining a codon-based molecular barcoding strategy and de novo assembly of short-read data. We first validate JigsawSeq on small sub-pools and observed high precision and recall at various experimental settings. With extensive simulations, we then apply JigsawSeq to large-scale gene variant libraries to show that our method can be reliably scaled using next-generation sequencing. JigsawSeq may serve as a rapid screening tool for functional genomics and offer the opportunity to explore evolutionary trajectories of protein variants.
Mammalian cDNA Library from the NIH Mammalian Gene Collection (MGC) | Office of Cancer Genomics

Cancer.gov

The MGC provides the research community full-length clones for most of the defined (as of 2006) human and mouse genes, along with selected clones of cow and rat genes. Clones were designed to allow easy transfer of the ORF sequences into nearly any type of expression vector. MGC provides protein ‘expression-ready’ clones for each of the included human genes. MGC is part of the ORFeome Collaboration (OC).
Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

PubMed Central

Yang, Fengxi; Zhu, Genfa

2015-01-01

Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566
Immune-Related Transcriptome of Coptotermes formosanus Shiraki Workers: The Defense Mechanism

PubMed Central

Hussain, Abid; Li, Yi-Feng; Cheng, Yu; Liu, Yang; Chen, Chuan-Cheng; Wen, Shuo-Yang

2013-01-01

Formosan subterranean termites, Coptotermes formosanus Shiraki, live socially in microbial-rich habitats. To understand the molecular mechanism by which termites combat pathogenic microbes, a full-length normalized cDNA library and four Suppression Subtractive Hybridization (SSH) libraries were constructed from termite workers infected with entomopathogenic fungi (Metarhizium anisopliae and Beauveria bassiana), Gram-positive Bacillus thuringiensis and Gram-negative Escherichia coli, and the libraries were analyzed. From the high quality normalized cDNA library, 439 immune-related sequences were identified. These sequences were categorized as pattern recognition receptors (47 sequences), signal modulators (52 sequences), signal transducers (137 sequences), effectors (39 sequences) and others (164 sequences). From the SSH libraries, 27, 17, 22 and 15 immune-related genes were identified from each SSH library treated with M. anisopliae, B. bassiana, B. thuringiensis and E. coli, respectively. When the normalized cDNA library was compared with the SSH libraries, 37 immune-related clusters were found in common; 56 clusters were identified in the SSH libraries, and 259 were identified in the normalized cDNA library. The immune-related gene expression pattern was further investigated using quantitative real time PCR (qPCR). Important immune-related genes were characterized, and their potential functions were discussed based on the integrated analysis of the results. We suggest that normalized cDNA and SSH libraries enable us to discover functional genes transcriptome. The results remarkably expand our knowledge about immune-inducible genes in C. formosanus Shiraki and enable the future development of novel control strategies for the management of Formosan subterranean termites. PMID:23874972
The carrot genome provides insights into crop origins and a foundation for future crop improvement

USDA-ARS?s Scientific Manuscript database

The sequencing of the carrot genome was an effort that formally began in 2012 and culminated with the publication and release of the genome in 2016. A full genome sequence provides the ultimate foundation to study genetics, gene function, and evolution of a species. The primary goal of the carrot ge...

The nop gene from Phanerochaete chrysosporium encodes a peroxidase with novel structural features

Treesearch

Luis F. Larrondo; Angel Gonzalez; Tomas Perez-Acle; Dan Cullen; Rafael Vicuna

2005-01-01

Inspection of the genome of the ligninolytic basidiomycete Phanerochaete chrysosporium revealed an unusual peroxidase-like sequence. The corresponding full length cDNA was sequenced and an archetypal secretion signal predicted. The deduced mature protein (NoP, novel peroxidase) contains 295 aa residues and is therefore considerably shorter than other Class II (fungal)...
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites

PubMed Central

Chen, Yue; Sanchez, Ana M.; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N.; Busch, Michael P.; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs. PMID:27314585
Genetic Characterization of a Panel of Diverse HIV-1 Isolates at Seven International Sites.

PubMed

Hora, Bhavna; Keating, Sheila M; Chen, Yue; Sanchez, Ana M; Sabino, Ester; Hunt, Gillian; Ledwaba, Johanna; Hackett, John; Swanson, Priscilla; Hewlett, Indira; Ragupathy, Viswanath; Vikram Vemula, Sai; Zeng, Peibin; Tee, Kok-Keng; Chow, Wei Zhen; Ji, Hezhao; Sandstrom, Paul; Denny, Thomas N; Busch, Michael P; Gao, Feng

2016-01-01

HIV-1 subtypes and drug resistance are routinely tested by many international surveillance groups. However, results from different sites often vary. A systematic comparison of results from multiple sites is needed to determine whether a standardized protocol is required for consistent and accurate data analysis. A panel of well-characterized HIV-1 isolates (N = 50) from the External Quality Assurance Program Oversight Laboratory (EQAPOL) was assembled for evaluation at seven international sites. This virus panel included seven subtypes, six circulating recombinant forms (CRFs), nine unique recombinant forms (URFs) and three group O viruses. Seven viruses contained 10 major drug resistance mutations (DRMs). HIV-1 isolates were prepared at a concentration of 107 copies/ml and compiled into blinded panels. Subtypes and DRMs were determined with partial or full pol gene sequences by conventional Sanger sequencing and/or Next Generation Sequencing (NGS). Subtype and DRM results were reported and decoded for comparison with full-length genome sequences generated by EQAPOL. The partial pol gene was amplified by RT-PCR and sequenced for 89.4%-100% of group M viruses at six sites. Subtyping results of majority of the viruses (83%-97.9%) were correctly determined for the partial pol sequences. All 10 major DRMs in seven isolates were detected at these six sites. The complete pol gene sequence was also obtained by NGS at one site. However, this method missed six group M viruses and sequences contained host chromosome fragments. Three group O viruses were only characterized with additional group O-specific RT-PCR primers employed by one site. These results indicate that PCR protocols and subtyping tools should be standardized to efficiently amplify diverse viruses and more consistently assign virus genotypes, which is critical for accurate global subtype and drug resistance surveillance. Targeted NGS analysis of partial pol sequences can serve as an alternative approach, especially for detection of low-abundance DRMs.
Infectious hematopoietic necrosis virus: Monophyletic origin of European isolates from North American Genogroup M

USGS Publications Warehouse

Enzmann, P.-J.; Kurath, G.; Fichtner, D.; Bergmann, S.M.

2005-01-01

Infectious hematopoietic necrosis virus (IHNV) was first detected in Europe in 1987 in France and Italy, and later, in 1992, in Germany. The source of the virus and the route of introduction are unknown. The present study investigates the molecular epidemiology of IHNV outbreaks in Germany since its first introduction. The complete nucleotide sequences of the glycoprotein (G) and non-virion (NV) genes from 9 IHNV isolates from Germany have been determined, and this has allowed the identification of characteristic differences between these isolates. Phylogenetic analysis of partial G gene sequences (mid-G, 303 nucleotides) from North American IHNV isolates (Kurath et al. 2003) has revealed 3 major genogroups, designated U, M and L. Using this gene region with 2 different North American IHNV data sets, it was possible to group the European IHNV strains within the M genogroup, but not in any previously defined subgroup. Analysis of the full length G gene sequences indicated that an independent evolution of IHN viruses had occurred in Europe. IHN viruses in Europe seem to be of a monophyletic origin, again most closely related to North American isolates in the M genogroup. Analysis of the NV gene sequences also showed the European isolates to be monophyletic, but resolution of the 3 genogroups was poor with this gene region. As a result of comparative sequence analyses, several different genotypes have been identified circulating in Europe. ?? Inter-Research 2005.
Cloning, characterization, and expression of Cytochrome b ( Cytb)—a key mitochondrial gene from Prorocentrum donghaiense

NASA Astrophysics Data System (ADS)

Zhao, Liyuan; Mi, Tiezhu; Zhen, Yu; Yu, Zhigang

2012-05-01

Mitochondrial cytochrome b (Cytb), one of the few proteins encoded by the mitochondrial DNA, plays an important role in transferring electrons. As a mitochondrial gene, it has been widely used for phylogenetic analysis. Previously, a 949-bp fragment of the coding gene and mRNA editing were characterized from Prorocentrum donghaiense, which might prove useful for resolving P. donghaiense from closely related species. However, the full-length coding region has not been characterized. In this study, we used rapid amplification of cDNA ends (RACE) to obtain full-length, 1 124 bp cDNA. Cytb transcript contained a standard initiation codon ATG, but did not have a recognizable stop codon. Homology comparison showed that the P. donghaiense Cytb had a high sequence identity to Cytb sequences from other dinoflagellate species. Phylogenetic analysis placed Cytb from P. donghaiense in the clade of dinoflagellates and it clustered together strongly with that from P. minimum. Based on the full-length sequence, we inferred 32 editing events at different positions, accounting for 2.93% of the Cytb gene. 34.4% (11) of the changes were A to G, 25% (8) were T to C, and 25% (8) were C to U, with smaller proportions of G to C and G to A edits (9.4% (3) and 6.2% (2), respectively). The expression level of the Cytb transcript was quantified by real-time PCR with a TaqMan probe at different times during the whole growth phase. The average Cytb transcript was present at 39.27±7.46 copies of cDNA per cell during the whole growth cycle, and the expression of Cytb was relatively stable over the different phases. These results deepen our understanding of the structure and characteristics of Cytb in P. donghaiense, and confirmed that Cytb in P. donghaiense is a candidate reference gene for studying the expression of other genes.
Cloning, sequencing, and expression of the Zymomonas mobilis phosphoglycerate mutase gene (pgm) in Escherichia coli.

PubMed Central

Yomano, L P; Scopes, R K; Ingram, L O

1993-01-01

Phosphoglycerate mutase is an essential glycolytic enzyme for Zymomonas mobilis, catalyzing the reversible interconversion of 3-phosphoglycerate and 2-phosphoglycerate. The pgm gene encoding this enzyme was cloned on a 5.2-kbp DNA fragment and expressed in Escherichia coli. Recombinants were identified by using antibodies directed against purified Z. mobilis phosphoglycerate mutase. The pgm gene contains a canonical ribosome-binding site, a biased pattern of codon usage, a long upstream untranslated region, and four promoters which share sequence homology. Interestingly, adhA and a D-specific 2-hydroxyacid dehydrogenase were found on the same DNA fragment and appear to form a cluster of genes which function in central metabolism. The translated sequence for Z. mobilis pgm was in full agreement with the 40 N-terminal amino acid residues determined by protein sequencing. The primary structure of the translated sequence is highly conserved (52 to 60% identity with other phosphoglycerate mutases) and also shares extensive homology with bisphosphoglycerate mutases (51 to 59% identity). Since Southern blots indicated the presence of only a single copy of pgm in the Z. mobilis chromosome, it is likely that the cloned pgm gene functions to provide both activities. Z. mobilis phosphoglycerate mutase is unusual in that it lacks the flexible tail and lysines at the carboxy terminus which are present in the enzyme isolated from all other organisms examined. Images PMID:8320209
Solexa-Sequencing Based Transcriptome Study of Plaice Skin Phenotype in Rex Rabbits (Oryctolagus cuniculus)

PubMed Central

Pan, Lei; Liu, Yan; Wei, Qiang; Xiao, Chenwen; Ji, Quanan; Bao, Guolian; Wu, Xinsheng

2015-01-01

Background Fur is an important genetically-determined characteristic of domestic rabbits; rabbit furs are of great economic value. We used the Solexa sequencing technology to assess gene expression in skin tissues from full-sib Rex rabbits of different phenotypes in order to explore the molecular mechanisms associated with fur determination. Methodology/Principal Findings Transcriptome analysis included de novo assembly, gene function identification, and gene function classification and enrichment. We obtained 74,032,912 and 71,126,891 short reads of 100 nt, which were assembled into 377,618 unique sequences by Trinity strategy (N50=680 nt). Based on BLAST results with known proteins, 50,228 sequences were identified at a cut-off E-value ≥ 10-5. Using Blast to Gene Ontology (GO), Clusters of Orthologous Groups (KOG) and Kyoto Encyclopedia of Genes and Genomes (KEGG), we obtained several genes with important protein functions. A total of 308 differentially expressed genes were obtained by transcriptome analysis of plaice and un-plaice phenotype animals; 209 additional differentially expressed genes were not found in any database. These genes included 49 that were only expressed in plaice skin rabbits. The novel genes may play important roles during skin growth and development. In addition, 99 known differentially expressed genes were assigned to PI3K-Akt signaling, focal adhesion, and ECM-receptor interactin, among others. Growth factors play a role in skin growth and development by regulating these signaling pathways. We confirmed the altered expression levels of seven target genes by qRT-PCR. And chosen a key gene for SNP to found the differentially between plaice and un-plaice phenotypes rabbit. Conclusions/Significance The rabbit transcriptome profiling data provide new insights in understanding the molecular mechanisms underlying rabbit skin growth and development. PMID:25955442
Complete mitochondrial genome of endangered Yellow-shouldered Amazon (Amazona barbadensis): two control region copies in parrot species of the Amazona genus.

PubMed

Urantowka, Adam Dawid; Hajduk, Kacper; Kosowska, Barbara

2013-08-01

Amazona barbadensis is an endangered species of parrot living in northern coastal Venezuela and in several Caribbean islands. In this study, we sequenced full mitochondrial genome of the considered species. The total length of the mitogenome was 18,983 bp and contained 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, duplicated control region, and degenerate copies of ND6 and tRNA (Glu) genes. High degree of identity between two copies of control region suggests their coincident evolution and functionality. Comparative analysis of both the control region sequences from four Amazona species revealed their 89.1% identity over a region of 1300 bp and indicates the presence of distinctive parts of two control region copies.
Characterization of defensin gene from abalone Haliotis discus hannai and its deduced protein

NASA Astrophysics Data System (ADS)

Hong, Xuguang; Sun, Xiuqin; Zheng, Minggang; Qu, Lingyun; Zan, Jindong; Zhang, Jinxing

2008-11-01

Defensin is one of preserved ancient host defensive materials formed in biological evolution. As a regulator and effector molecule, it is very important in animals’ acquired immune system. This paper reports the defensin gene from the mixed liver and kidney cDNA library of abalone Haliotis discus hannai Ino. Sequence analysis shows that the gene sequence of full-length cDNA encodes 42 mature peptides (including six Cys), molecular weight of 4 323 Da, and pI of 8.02. Amino acid sequence homology analysis shows that the peptides are highly similar (70% in common) to other insects defensin. Because of a typical insect-defensin structural character of mature peptide in the secondary structure, the polypeptide named Haliotis discus defensin (hd-def), a novel of antimicrobial peptides, belongs to insects defensin subfamily. The RT-PCR result of Haliotis discus defensin shows that the gene can be expressed only in the hepatopancreas by Gram-negative and positive bacteria stimulation, which is ascribed to inducible expression. Therefore, it is revealed that the Haliotis discus defensin gene expression was related to the antibacterial infection of Haliotis discus hannai Ino.
The Effect on the Transcriptome of Anemone coronaria following Infection with Rust (Tranzschelia discolor)

PubMed Central

Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea

2015-01-01

In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3’end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding. PMID:25768012
The effect on the transcriptome of Anemone coronaria following infection with rust (Tranzschelia discolor).

PubMed

Laura, Marina; Borghi, Cristina; Bobbio, Valentina; Allavena, Andrea

2015-01-01

In order to understand plant/pathogen interaction, the transcriptome of uninfected (1S) and infected (2I) plant was sequenced at 3'end by the GS FLX 454 platform. De novo assembly of high-quality reads generated 27,231 contigs leaving 37,191 singletons in the 1S and 38,393 in the 2I libraries. ESTcalc tool suggested that 71% of the transcriptome had been captured, with 99% of the genes present being represented by at least one read. Unigene annotation showed that 50.5% of the predicted translation products shared significant homology with protein sequences in GenBank. In all 253 differential transcript abundance (DTAs) were in higher abundance and 52 in lower abundance in the 2I library. 128 higher abundance DTA genes were of fungal origin and 49 were clearly plant sequences. A tBLASTn-based search of the sequences using as query the full length predicted polypeptide product of 50 R genes identified 16 R gene products. Only one R gene (PGIP) was up-regulated. The response of the plant to fungal invasion included the up-regulation of several pathogenesis related protein (PR) genes involved in JA signaling and other genes associated with defense response and down regulation of cell wall associated genes, non-race-specific disease resistance1 (NDR1) and other genes like myb, presqualene diphosphate phosphatase (PSDPase), a UDP-glycosyltransferase 74E2-like (UGT). The DTA genes identified here should provide a basis for understanding the A. coronaria/T. discolor interaction and leads for biotechnology-based disease resistance breeding.
Construction of a full-length cDNA Library from Chinese oak silkworm pupa and identification of a KK-42-binding protein gene in relation to pupa-diapause termination.

PubMed

Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai

2009-06-24

In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Construction of a full-length cDNA Library from Chinese oak silkworm pupa and identification of a KK-42-binding protein gene in relation to pupa-diapause termination

PubMed Central

Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai

2009-01-01

In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928
Transcriptional dynamics of the developing sweet cherry (Prunus avium L.) fruit: sequencing, annotation and expression profiling of exocarp-associated genes

PubMed Central

Alkio, Merianne; Jonas, Uwe; Declercq, Myriam; Van Nocker, Steven; Knoche, Moritz

2014-01-01

The exocarp, or skin, of fleshy fruit is a specialized tissue that protects the fruit, attracts seed dispersing fruit eaters, and has large economical relevance for fruit quality. Development of the exocarp involves regulated activities of many genes. This research analyzed global gene expression in the exocarp of developing sweet cherry (Prunus avium L., ‘Regina’), a fruit crop species with little public genomic resources. A catalog of transcript models (contigs) representing expressed genes was constructed from de novo assembled short complementary DNA (cDNA) sequences generated from developing fruit between flowering and maturity at 14 time points. Expression levels in each sample were estimated for 34 695 contigs from numbers of reads mapping to each contig. Contigs were annotated functionally based on BLAST, gene ontology and InterProScan analyses. Coregulated genes were detected using partitional clustering of expression patterns. The results are discussed with emphasis on genes putatively involved in cuticle deposition, cell wall metabolism and sugar transport. The high temporal resolution of the expression patterns presented here reveals finely tuned developmental specialization of individual members of gene families. Moreover, the de novo assembled sweet cherry fruit transcriptome with 7760 full-length protein coding sequences and over 20 000 other, annotated cDNA sequences together with their developmental expression patterns is expected to accelerate molecular research on this important tree fruit crop. PMID:26504533
PRAPI: post-transcriptional regulation analysis pipeline for Iso-Seq.

PubMed

Gao, Yubang; Wang, Huiyuan; Zhang, Hangxiao; Wang, Yongsheng; Chen, Jinfeng; Gu, Lianfeng

2018-05-01

The single-molecule real-time (SMRT) isoform sequencing (Iso-Seq) based on Pacific Bioscience (PacBio) platform has received increasing attention for its ability to explore full-length isoforms. Thus, comprehensive tools for Iso-Seq bioinformatics analysis are extremely useful. Here, we present a one-stop solution for Iso-Seq analysis, called PRAPI to analyze alternative transcription initiation (ATI), alternative splicing (AS), alternative cleavage and polyadenylation (APA), natural antisense transcripts (NAT), and circular RNAs (circRNAs) comprehensively. PRAPI is capable of combining Iso-Seq full-length isoforms with short read data, such as RNA-Seq or polyadenylation site sequencing (PAS-seq) for differential expression analysis of NAT, AS, APA and circRNAs. Furthermore, PRAPI can annotate new genes and correct mis-annotated genes when gene annotation is available. Finally, PRAPI generates high-quality vector graphics to visualize and highlight the Iso-Seq results. The Dockerfile of PRAPI is available at http://www.bioinfor.org/tool/PRAPI. lfgu@fafu.edu.cn.
Functional characterization of recombinant bromelain of Ananas comosus expressed in a prokaryotic system.

PubMed

George, Susan; Bhasker, Salini; Madhav, Harish; Nair, Archana; Chinnamma, Mohankumar

2014-02-01

Bromelain (BRM) is a defense protein present in the fruit and stem of pineapple (Ananas comosus) and it is grouped as a cysteine protease enzyme with diversified medicinal uses. Based on its therapeutic applications, bromelain has got sufficient attention in pharmaceutical industries. In the present study, the full coding gene of bromelain in pineapple stem (1,093 bp) was amplified by RT-PCR. The PCR product was cloned, sequenced, and characterized. The sequence analysis of the gene revealed the single nucleotide polymorphism and its phylogenetic relatedness. The peptide sequence deduced from the gene showed the amino acid variations, physicochemical properties and secondary and tertiary structural features of the protein. The full BRM gene was transformed to prokaryotic vector pET32b and expressed in Escherichia coli BL21 DE3pLysS host cells successfully. The identity of the recombinant bromelain (rBRM) protein was confirmed by Western blot analysis using anti-BRM-rabbit IgG antibody. The activity of recombinant bromelain compared with purified native bromelain was determined by protease assay. The inhibitory effect of rBRM compared with native BRM in the growth of Gram-positive and Gram-negative strains of Streptococcus agalactiae and Escherichia coli O111 was evident from the antibacterial sensitivity test. To the best of our knowledge, this is the first report showing the bactericidal property of rBRM expressed in a prokaryotic system.
Recombinant lactoferrin (Lf) of Vechur cow, the critical breed of Bos indicus and the Lf gene variants.

PubMed

Anisha, Shashidharan; Bhasker, Salini; Mohankumar, Chinnamma

2012-03-01

Vechur cow, categorized as a critically maintained breed by the FAO, is a unique breed of Bos indicus due to its extremely small size, less fodder intake, adaptability, easy domestication and traditional medicinal property of the milk. Lactoferrin (Lf) is an iron-binding glycoprotein that is found predominantly in the milk of mammals. The full coding region of Lf gene of Vechur cow was cloned, sequenced and expressed in a prokaryotic system. Antibacterial activity of the recombinant Lf showed suppression of bacterial growth. To the best of our knowledge this is the first time that the full coding region of Lf gene of B. indicus Vechur breed is sequenced, successfully expressed in a prokaryotic system and characterized. Comparative analysis of Lf gene sequence of five Vechur cows with B. taurus revealed 15 SNPs in the exon region associated with 11 amino acid substitutions. The amino acid arginine was noticed as a pronounced substitution and the tertiary structure analysis of the BLfV protein confirmed the positions of arginine in the β sheet region, random coil and helix region 1. Based on the recent reports on the nutritional therapies of arginine supplementation for wound healing and for cardiovascular diseases, the higher level of arginine in the lactoferrin protein of Vechur cow milk provides enormous scope for further therapeutic studies. Copyright © 2011 Elsevier B.V. All rights reserved.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.

PubMed

Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro

2010-05-07

Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Transcriptome sequencing and annotation of the halophytic microalga Dunaliella salina * #

PubMed Central

Hong, Ling; Liu, Jun-li; Midoun, Samira Z.; Miller, Philip C.

2017-01-01

The unicellular green alga Dunaliella salina is well adapted to salt stress and contains compounds (including β-carotene and vitamins) with potential commercial value. A large transcriptome database of D. salina during the adjustment, exponential and stationary growth phases was generated using a high throughput sequencing platform. We characterized the metabolic processes in D. salina with a focus on valuable metabolites, with the aim of manipulating D. salina to achieve greater economic value in large-scale production through a bioengineering strategy. Gene expression profiles under salt stress verified using quantitative polymerase chain reaction (qPCR) implied that salt can regulate the expression of key genes. This study generated a substantial fraction of D. salina transcriptional sequences for the entire growth cycle, providing a basis for the discovery of novel genes. This first full-scale transcriptome study of D. salina establishes a foundation for further comparative genomic studies. PMID:28990374
High-resolution phylogenetic microbial community profiling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin

Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less

High-resolution phylogenetic microbial community profiling

DOE PAGES

Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin; ...

2016-02-09

Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less
Genetic characterization of poxviruses in Camelus dromedarius in Ethiopia, 2011-2014.

PubMed

Gelaye, Esayas; Achenbach, Jenna Elizabeth; Ayelet, Gelagay; Jenberie, Shiferaw; Yami, Martha; Grabherr, Reingard; Loitsch, Angelika; Diallo, Adama; Lamien, Charles Euloge

2016-10-01

Camelpox and camel contagious ecthyma are infectious viral diseases of camelids caused by camelpox virus (CMLV) and camel contagious ecthyma virus (CCEV), respectively. Even though, in Ethiopia, pox disease has been creating significant economic losses in camel production, little is known on the responsible pathogens and their genetic diversity. Thus, the present study aimed at isolation, identification and genetic characterization of the causative viruses. Accordingly, clinical case observations, infectious virus isolation, and molecular and phylogenetic analysis of poxviruses infecting camels in three regions and six districts in the country, Afar (Chifra), Oromia (Arero, Miyu and Yabello) and Somali (Gursum and Jijiga) between 2011 and 2014 were undertaken. The full hemagglutinin (HA) and partial A-type inclusion protein (ATIP) genes of CMLV and full major envelope protein (B2L) gene of CCEV of Ethiopian isolates were sequenced, analyzed and compared among each other and to foreign isolates. The viral isolation confirmed the presence of infectious poxviruses. The preliminary screening by PCR showed 27 CMLVs and 20 CCEVs. The sequence analyses showed that the HA and ATIP gene sequences are highly conserved within the local isolates of CMLVs, and formed a single cluster together with isolates from Somalia and Syria. Unlike CMLVs, the B2L gene analysis of Ethiopian CCEV showed few genetic variations. The phylogenetic analysis revealed three clusters of CCEV in Ethiopia with the isolates clustering according to their geographical origins. To our knowledge, this is the first report indicating the existence of CCEV in Ethiopia where camel contagious ecthyma was misdiagnosed as camelpox. Additionally, this study has also disclosed the existence of co-infections with CMLV and CCEV. A comprehensive characterization of poxviruses affecting camels in Ethiopia and the full genome sequencing of representative isolates are recommended to better understand the dynamics of pox diseases of camels and to assist in the implementation of more efficient control measures. Copyright © 2016 Elsevier B.V. All rights reserved.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing

PubMed Central

Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro

2017-01-01

Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Characterization of the glutathione S-transferase gene family through ESTs and expression analyses within common and pigmented cultivars of Citrus sinensis (L.) Osbeck

PubMed Central

2014-01-01

Background Glutathione S-transferases (GSTs) represent a ubiquitous gene family encoding detoxification enzymes able to recognize reactive electrophilic xenobiotic molecules as well as compounds of endogenous origin. Anthocyanin pigments require GSTs for their transport into the vacuole since their cytoplasmic retention is toxic to the cell. Anthocyanin accumulation in Citrus sinensis (L.) Osbeck fruit flesh determines different phenotypes affecting the typical pigmentation of Sicilian blood oranges. In this paper we describe: i) the characterization of the GST gene family in C. sinensis through a systematic EST analysis; ii) the validation of the EST assembly by exploiting the genome sequences of C. sinensis and C. clementina and their genome annotations; iii) GST gene expression profiling in six tissues/organs and in two different sweet orange cultivars, Cadenera (common) and Moro (pigmented). Results We identified 61 GST transcripts, described the full- or partial-length nature of the sequences and assigned to each sequence the GST class membership exploiting a comparative approach and the classification scheme proposed for plant species. A total of 23 full-length sequences were defined. Fifty-four of the 61 transcripts were successfully aligned to the C. sinensis and C. clementina genomes. Tissue specific expression profiling demonstrated that the expression of some GST transcripts was 'tissue-affected' and cultivar specific. A comparative analysis of C. sinensis GSTs with those from other plant species was also considered. Data from the current analysis are accessible at http://biosrv.cab.unina.it/citrusGST/, with the aim to provide a reference resource for C. sinensis GSTs. Conclusions This study aimed at the characterization of the GST gene family in C. sinensis. Based on expression patterns from two different cultivars and on sequence-comparative analyses, we also highlighted that two sequences, a Phi class GST and a Mapeg class GST, could be involved in the conjugation of anthocyanin pigments and in their transport into the vacuole, specifically in fruit flesh of the pigmented cultivar. PMID:24490620
Loss of ACTH expression in cultured human corticotroph macroadenoma cells is consistent with loss of the POMC gene signal sequence.

PubMed

Rees, D A; Hepburn, P J; McNicol, A M; Francis, K; Jasani, B; Lewis, M D; Farrell, W E; Lewis, B M; Scanlon, M F; Ham, J

2002-03-28

The proopiomelanocortin (POMC) gene is highly expressed in the pituitary gland where the resulting mRNA of 1200 base pairs (bp) gives rise to a full-length protein sequence. In peripheral tissues however both shorter and longer POMC variants have been described, these include for example placental tissue which contain 800 (truncated at the 5' end) and 1500 as well as the 1200 bp transcripts. The importance of the 800 bp transcript is unclear as the lack of a signal sequence renders the molecule to be non-functional. This transcript has not been previously demonstrated in the pituitary gland. In this report we show evidence of a 5' truncated POMC gene in human pituitary corticotroph macroadenoma cells (JE) maintained in primary culture for >1 year. The original tumour tissue and the derived cells during early passage (up to passage 4-5) immunostained for ACTH and in situ hybridisation confirmed the presence of the POMC gene in the cultured cells. These cells also secreted 15-40 pg/10(5) cells/24 h ACTH. In addition, as expected RT-PCR demonstrated the presence of all three POMC gene exons and is thus indicative of a full-length POMC gene. In late culture passages (passages 8-15) JE cells ceased to express ACTH and cell growth became very slow due presumably to cells reaching their Hayflick limit. ACTH immunostaining in these cells was undetectable and ACTH secretion was also at the detection limits of the assay and no greater than 10 pg/10(5) cells/24 h. ACTH precursor molecules were also undetectable. RT-PCR for the POMC gene in these late passage cells showed that only exon 3 was detectable, in contrast to early passage cells where all three exons were present. In summary we isolated in culture, human pituitary cells that possessed initially all three exons of the POMC gene and immunostained for ACTH. On further passaging these cells showed a loss of exons 1 and 2 in the POMC gene and a loss of ACTH immunostaining and secretion. We would like to suggest that the loss of ACTH peptide expression in these late passage cells is in part due to the loss of the POMC signal sequence. An alternative explanation for our findings is that there were originally two populations of corticotrophs in the cultures, one of which possessed the full-length POMC gene and the other only the 5' truncated POMC transcript and it is these latter cells which survived in culture. In either scenario this is the first report of the 5' truncated POMC gene occurring in pituitary cells.
Unusual RNA plant virus integration in the soybean genome leads to the production of small RNAs.

PubMed

da Fonseca, Guilherme Cordenonsi; de Oliveira, Luiz Felipe Valter; de Morais, Guilherme Loss; Abdelnor, Ricardo Vilela; Nepomuceno, Alexandre Lima; Waterhouse, Peter M; Farinelli, Laurent; Margis, Rogerio

2016-05-01

Horizontal gene transfer (HGT) is known to be a major force in genome evolution. The acquisition of genes from viruses by eukaryotic genomes is a well-studied example of HGT, including rare cases of non-retroviral RNA virus integration. The present study describes the integration of cucumber mosaic virus RNA-1 into soybean genome. After an initial metatranscriptomic analysis of small RNAs derived from soybean, the de novo assembly resulted a 3029-nt contig homologous to RNA-1. The integration of this sequence in the soybean genome was confirmed by DNA deep sequencing. The locus where the integration occurred harbors the full RNA-1 sequence followed by the partial sequence of an endogenous mRNA and another sequence of RNA-1 as an inverted repeat and allowing the formation of a hairpin structure. This region recombined into a retrotransposon located inside an exon of a soybean gene. The nucleotide similarity of the integrated sequence compared to other Cucumber mosaic virus sequences indicates that the integration event occurred recently. We described a rare event of non-retroviral RNA virus integration in soybean that leads to the production of a double-stranded RNA in a similar fashion to virus resistance RNAi plants. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

PubMed Central

Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

2004-01-01

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Characterization of a new endogenous endo-ß-1,4-glucanase of Formosan subterranean termite (Coptotermes formosanus)

USDA-ARS?s Scientific Manuscript database

The present work characterized a second endogenous cellulase (endo-ß-1,4-glucanase) gene, CfEG4, uncovered in the transcriptome of Formosan subterranean termite (Coptotermes formosanus). The full-length gene was cloned and sequenced. It is similar to the CfEG3a described earlier (Zhang et al. 2009) ...
Subtraction of cap-trapped full-length cDNA libraries to select rare transcripts.

PubMed

Hirozane-Kishikawa, Tomoko; Shiraki, Toshiyuki; Waki, Kazunori; Nakamura, Mari; Arakawa, Takahiro; Kawai, Jun; Fagiolini, Michela; Hensch, Takao K; Hayashizaki, Yoshihide; Carninci, Piero

2003-09-01

The normalization and subtraction of highly expressed cDNAs from relatively large tissues before cloning dramatically enhanced the gene discovery by sequencing for the mouse full-length cDNA encyclopedia, but these methods have not been suitable for limited RNA materials. To normalize and subtract full-length cDNA libraries derived from limited quantities of total RNA, here we report a method to subtract plasmid libraries excised from size-unbiased amplified lambda phage cDNA libraries that avoids heavily biasing steps such as PCR and plasmid library amplification. The proportion of full-length cDNAs and the gene discovery rate are high, and library diversity can be validated by in silico randomization.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Liebhaber, S.A.; Weiss, I.; Cash, F.E.

Synthesis of normal human hemoglobin A, {alpha}{sub 2}{beta}{sub 2}, is based upon balanced expression of genes in the {alpha}-globin gene cluster on chromosome 15 and the {beta}-globin gene cluster on chromosome 11. Full levels of erythroid-specific activation of the {beta}-globin cluster depend on sequences located at a considerable distance 5{prime} to the {beta}-globin gene, referred to as the locus-activating or dominant control region. The existence of an analogous element(s) upstream of the {alpha}-globin cluster has been suggested from observations on naturally occurring deletions and experimental studies. The authors have identified an individual with {alpha}-thalassemia in whom structurally normal {alpha}-globin genesmore » have been inactivated in cis by a discrete de novo 35-kilobase deletion located {approximately}30 kilobases 5{prime} from the {alpha}-globin gene cluster. They conclude that this deletion inactivates expression of the {alpha}-globin genes by removing one or more of the previously identified upstream regulatory sequences that are critical to expression of the {alpha}-globin genes.« less
Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants

PubMed Central

2014-01-01

Background The source inoculum of gastrointestinal tract (GIT) microbes is largely influenced by delivery mode in full-term infants, but these influences may be decoupled in very low birth weight (VLBW, <1,500 g) neonates via conventional broad-spectrum antibiotic treatment. We hypothesize the built environment (BE), specifically room surfaces frequently touched by humans, is a predominant source of colonizing microbes in the gut of premature VLBW infants. Here, we present the first matched fecal-BE time series analysis of two preterm VLBW neonates housed in a neonatal intensive care unit (NICU) over the first month of life. Results Fresh fecal samples were collected every 3 days and metagenomes sequenced on an Illumina HiSeq2000 device. For each fecal sample, approximately 33 swabs were collected from each NICU room from 6 specified areas: sink, feeding and intubation tubing, hands of healthcare providers and parents, general surfaces, and nurse station electronics (keyboard, mouse, and cell phone). Swabs were processed using a recently developed ‘expectation maximization iterative reconstruction of genes from the environment’ (EMIRGE) amplicon pipeline in which full-length 16S rRNA amplicons were sheared and sequenced using an Illumina platform, and short reads reassembled into full-length genes. Over 24,000 full-length 16S rRNA sequences were produced, generating an average of approximately 12,000 operational taxonomic units (OTUs) (clustered at 97% nucleotide identity) per room-infant pair. Dominant gut taxa, including Staphylococcus epidermidis, Klebsiella pneumoniae, Bacteroides fragilis, and Escherichia coli, were widely distributed throughout the room environment with many gut colonizers detected in more than half of samples. Reconstructed genomes from infant gut colonizers revealed a suite of genes that confer resistance to antibiotics (for example, tetracycline, fluoroquinolone, and aminoglycoside) and sterilizing agents, which likely offer a competitive advantage in the NICU environment. Conclusions We have developed a high-throughput culture-independent approach that integrates room surveys based on full-length 16S rRNA gene sequences with metagenomic analysis of fecal samples collected from infants in the room. The approach enabled identification of discrete ICU reservoirs of microbes that also colonized the infant gut and provided evidence for the presence of certain organisms in the room prior to their detection in the gut. PMID:24468033
[Construction of the superantigen SEA transfected laryngocarcinoma cells].

PubMed

Ji, Xiaobin; Jingli, J V; Liu, Qicai; Xie, Jinghua

2013-04-01

To construct an eukaryotic expression vectors containing superantigen staphylococcal enterotoxin A (SEA) gene, and to identify its expression in laryngeal squamous carcinoma cells. SEA full-length gene fragment was obtained from ATCC13565 genome of the staphylococcus, referencing standard strains producing SEA. Coding sequence of SEA was artificially synthetized. Than, SEA gene fragments was subcloned into eukaryotic expression vector pIRES-EGFP. The recombinant plasmid pSEA-IRES-EGFP was constructed and was transfected to laryngocarcinoma Hep-2 cells. Resistant clones were screened by G418. The expression of SEA in laryngocarcinoma cells was identified with ELISA and RT-PCR method. The subclone of artificially synthetized SEA gene was subclone to eukaryotic expression vector pires-EGFP. Flanking sequence confirmed that SEA sequence was fully identical to the coding sequence of standard staphylococcus strains ATCC13565 in Genbank. After recombinant plasmid transfected to laryngocarcinoma cells, the resistant clones was obtained after screening for two weeks. The clones were selected. The specific gene fragment was obtained by RT-PCR amplification. ELISA assay confirmed that the content of SEA protein in supernatant fluid of cell culture had reached about Pg level. The recombinant eukaryotic expression vector containing superantigen SEA gene is successfully constructed, and is capable of effective expression and continued secretion of SEA protein in laryngochrcinoma Hep-2 cells after recombinant plasmid transfected to laryngocarcinoma cells.
Capsicum annuum dehydrin, an osmotic-stress gene in hot pepper plants.

PubMed

Chung, Eunsook; Kim, Soo-Yong; Yi, So Young; Choi, Doil

2003-06-30

Osmotic stress-related genes were selected from an EST database constructed from 7 cDNA libraries from different tissues of the hot pepper. A full-length cDNA of Capsicum annuum dehydrin (Cadhn), a late embryogenesis abundant (lea) gene, was selected from the 5' single pass sequenced cDNA clones and sequenced. The deduced polypeptide has 87% identity with potato dehydrin C17, but very little identity with the dehydrin genes of other organisms. It contains a serine-tract (S-segment) and 3 conserved lysine-rich domains (K-segments). Southern blot analysis showed that 2 copies are present in the hot pepper genome. Cadhn was induced by osmotic stress in leaf tissues as well as by the application of abscisic acid. The RNA was most abundant in green fruit. The expression of several osmotic stress-related genes was examined and Cadhn proved to be the most abundantly expressed of these in response to osmotic stress.
Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population

PubMed Central

Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

2016-01-01

Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991
Virulence-associated and antibiotic resistance genes of microbial populations in cattle feces analyzed using a metagenomic approach.

PubMed

Durso, Lisa M; Harhay, Gregory P; Bono, James L; Smith, Timothy P L

2011-02-01

The bovine fecal microbiota impacts human food safety as well as animal health. Although the bacteria of cattle feces have been well characterized using culture-based and culture-independent methods, techniques have been lacking to correlate total community composition with community function. We used high throughput sequencing of total DNA extracted from fecal material to characterize general community composition and examine the repertoire of microbial genes present in beef cattle feces, including genes associated with antibiotic resistance and bacterial virulence. Results suggest that traditional 16S sequencing using "universal" primers to generate full-length sequence may under represent Acitinobacteria and Proteobacteria. Over eight percent (8.4%) of the sequences from our beef cattle fecal pool sample could be categorized as virulence genes, including a suite of genes associated with resistance to antibiotic and toxic compounds (RATC). This is a higher proportion of virulence genes found in Sargasso sea, chicken cecum, and cow rumen samples, but comparable to the proportion found in Antarctic marine derived lake, human fecal, and farm soil samples. The quantitative nature of metagenomic data, combined with the large number of RATC classes represented in samples from widely different habitats indicates that metagenomic data can be used to track relative amounts of antibiotic resistance genes in individual animals over time. Consequently, these data can be used to generate sample-specific and temporal antibiotic resistance gene profiles to facilitate an understanding of the ecology of the microbial communities in each habitat as well as the epidemiology of antibiotic resistant gene transport between and among habitats. Published by Elsevier B.V.
Identification of a Linked Set of Genes in Serpulina hyodysenteriae (B204) Predicted To Encode Closely Related 39-Kilodalton Extracytoplasmic Proteins

PubMed Central

Gabe, Jeffrey D.; Dragon, Elizabeth; Chang, Ray-Jen; McCaman, Michael T.

1998-01-01

A tandem pair of nearly identical genes from Serpulina hyodysenteriae (B204) were cloned and sequenced. The full open reading frame of one gene and the partial open reading frame of the neighboring gene appear to encode secreted proteins which are homologous to, yet distinct from, the 39-kDa extracytoplasmic protein purified from the membrane fraction of S. hyodysenteriae. We have designated these newly identified genes vspA and vspB (for variable surface protein). PMID:9440540
Generation of expressed sequence tags for discovery of genes responsible for floral traits of Chrysanthemum morifolium by next-generation sequencing technology.

PubMed

Sasaki, Katsutomo; Mitsuda, Nobutaka; Nashima, Kenji; Kishimoto, Kyutaro; Katayose, Yuichi; Kanamori, Hiroyuki; Ohmiya, Akemi

2017-09-04

Chrysanthemum morifolium is one of the most economically valuable ornamental plants worldwide. Chrysanthemum is an allohexaploid plant with a large genome that is commercially propagated by vegetative reproduction. New cultivars with different floral traits, such as color, morphology, and scent, have been generated mainly by classical cross-breeding and mutation breeding. However, only limited genetic resources and their genome information are available for the generation of new floral traits. To obtain useful information about molecular bases for floral traits of chrysanthemums, we read expressed sequence tags (ESTs) of chrysanthemums by high-throughput sequencing using the 454 pyrosequencing technology. We constructed normalized cDNA libraries, consisting of full-length, 3'-UTR, and 5'-UTR cDNAs derived from various tissues of chrysanthemums. These libraries produced a total number of 3,772,677 high-quality reads, which were assembled into 213,204 contigs. By comparing the data obtained with those of full genome-sequenced species, we confirmed that our chrysanthemum contig set contained the majority of all expressed genes, which was sufficient for further molecular analysis in chrysanthemums. We confirmed that our chrysanthemum EST set (contigs) contained a number of contigs that encoded transcription factors and enzymes involved in pigment and aroma compound metabolism that was comparable to that of other species. This information can serve as an informative resource for identifying genes involved in various biological processes in chrysanthemums. Moreover, the findings of our study will contribute to a better understanding of the floral characteristics of chrysanthemums including the myriad cultivars at the molecular level.
Identification and transcription profiling of NDUFS8 in Aedes taeniorhynchus (Diptera:Culididae): developmental regulation and environmental response

USDA-ARS?s Scientific Manuscript database

The cDNA of a NADH dehydrogenase -ubiquinone Fe-S protein 8 subunit (NDUFS8) gene from Aedes (Ochlerotatus) taeniorhynchus Wiedemann has been cloned and sequenced. The full-length mRNA sequence (824 bp) of AetNDUFS8 encodes an open reading region of 651 bp (i.e., 217 amino acids). To detect whether ...
Genome Sequences for Six Rhodanobacter Strains, Isolated from Soils and the Terrestrial Subsurface, with Variable Denitrification Capabilities

PubMed Central

Green, Stefan J.; Rishishwar, Lavanya; Prakash, Om; Katz, Lee S.; Mariño-Ramírez, Leonardo; Jordan, I. King; Munk, Christine; Ivanova, Natalia; Mikhailova, Natalia; Watson, David B.; Brown, Steven D.; Palumbo, Anthony V.; Brooks, Scott C.

2012-01-01

We report the first genome sequences for six strains of Rhodanobacter species isolated from a variety of soil and subsurface environments. Three of these strains are capable of complete denitrification and three others are not. However, all six strains contain most of the genes required for the respiration of nitrate to gaseous nitrogen. The nondenitrifying members of the genus lack only the gene for nitrate reduction, the first step in the full denitrification pathway. The data suggest that the environmental role of bacteria from the genus Rhodanobacter should be reevaluated. PMID:22843592
Identification of the full-length β-actin sequence and expression profiles in the tree shrew (Tupaia belangeri).

PubMed

Zheng, Yu; Yun, Chenxia; Wang, Qihui; Smith, Wanli W; Leng, Jing

2015-02-01

The tree shrew (Tupaia belangeri) diverges from the primate order (Primates) and is classified as a separate taxonomic group of mammals - Scandentia. It has been suggested that the tree shrew can be used as an animal model for studying human diseases; however, the genomic sequence of the tree shrew is largely unidentified. In the present study, we reported the full-length cDNA sequence of the housekeeping gene, β-actin, in the tree shrew. The amino acid sequence of β-actin in the tree shrew was compared to that of humans and other species; a simple phylogenetic relationship was discovered. Quantitative polymerase chain reaction (qPCR) and western blot analysis further demonstrated that the expression profiles of β-actin, as a general conservative housekeeping gene, in the tree shrew were similar to those in humans, although the expression levels varied among different types of tissue in the tree shrew. Our data provide evidence that the tree shrew has a close phylogenetic association with humans. These findings further enhance the potential that the tree shrew, as a species, may be used as an animal model for studying human disorders.

Transcript variations, phylogenetic tree and chromosomal localization of porcine aryl hydrocarbon receptor (AhR) and AhR nuclear translocator (ARNT) genes.

PubMed

Sadowska, Agnieszka; Paukszto, Lukasz; Nynca, Anna; Szczerbal, Izabela; Orlowska, Karina; Swigonska, Sylwia; Ruszkowska, Monika; Molcan, Tomasz; Jastrzebski, Jan P; Panasiewicz, Grzegorz; Ciereszko, Renata E

2017-03-01

Aryl hydrocarbon receptor (AhR) is a ligand-activated transcription factor best known for mediating xenobiotic-induced toxicity. AhR requires aryl hydrocarbon receptor nuclear translocator (ARNT) to form an active transcription complex and promote the activation of genes which have dioxin responsive element in their regulatory regions. The present study was performed to determine the complete cDNA sequences of porcine AhR and ARNT genes and their chromosomal localization. Total RNA from porcine livers were used to obtain the sequence of the entire porcine transcriptome by next-generation sequencing (NGS; lllumina HiSeq2500). In addition, both, in silico analysis and fluorescence in situ hybridization (FISH) were used to determine chromosomal localization of porcine AhR and ARNT genes. In silico analysis of nucleotide sequences showed that there were two transcript variants of AhR and ARNT genes in the pig. In addition, computer analysis revealed that AhR gene in the pig is located on chromosome 9 and ARNT on chromosome 4. The results of FISH experiment confirmed the localization of porcine AhR and ARNT genes. In the present study, for the first time, the full cDNAs of AhR and ARNT were demonstrated in the pig. In future, it would be interesting to determine the tissue distribution of AhR and ARNT transcript variants in the pig and to test whether these variants are associated with different biological functions and/or different activation pathways.
Identification of the Drosophila eIF4A gene as a target of the DREF transcription factor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ida, Hiroyuki; Insect Biomedical Research Center, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 606-8585; Yoshida, Hideki

2007-12-10

The DNA replication-related element-binding factor (DREF) regulates cell proliferation-related gene expression in Drosophila. We have carried out a genetic screening, taking advantage of the rough eye phenotype of transgenic flies that express full-length DREF in the eye imaginal discs and identified the eukaryotic initiation factor 4A (eIF4A) gene as a dominant suppressor of the DREF-induced rough eye phenotype. The eIF4A gene was here found to carry three DRE sequences, DRE1 (- 40 to - 47), DRE2 (- 48 to - 55), and DRE3 (- 267 to - 274) in its promoter region, these all being important for the eIF4A genemore » promoter activity in cultured Drosophila Kc cells and in living flies. Knockdown of DREF in Drosophila S2 cells decreased the eIF4A mRNA level and the eIF4A gene promoter activity. Furthermore, specific binding of DREF to genomic regions containing DRE sequences was demonstrated by chromatin immunoprecipitation assays using anti-DREF antibodies. Band mobility shift assays using Kc cell nuclear extracts revealed that DREF could bind to DRE1 and DRE3 sequences in the eIF4A gene promoter in vitro, but not to the DRE2 sequence. The results suggest that the eIF4A gene is under the control of the DREF pathway and DREF is therefore involved in the regulation of protein synthesis.« less
A rapid NGS strategy for comprehensive molecular diagnosis of Birt-Hogg-Dubé syndrome in patients with primary spontaneous pneumothorax.

PubMed

Zhang, Xinxin; Ma, Dehua; Zou, Wei; Ding, Yibing; Zhu, Chengchu; Min, Haiyan; Zhang, Bin; Wang, Wei; Chen, Baofu; Ye, Minhua; Cai, Minghui; Pan, Yanqing; Cao, Lei; Wan, Yueming; Jin, Yu; Gao, Qian; Yi, Long

2016-05-27

Primary spontaneous pneumothorax (PSP) or pulmonary cysts is one of the manifestations of Birt-Hogg-Dube syndrome (BHDS) that is caused by heterozygous mutations in FLCN gene. Most of the mutations are SNVs and small indels, and there are also approximately 10 % large intragenic deletions and duplications of the mutations. These molecular findings are generally obtained by disparate methods including Sanger sequencing and Multiple Ligation-dependent Probe Amplification in the clinical laboratory. In addition, as a genetically heterogeneous disorder, PSP may be caused by mutations in multiple genes include FBN1, COL3A1, CBS, SERPINA1 and TSC1/TSC2 genes. For differential diagnosis, these genes should also be screened which makes the diagnostic procedure more time-consuming and labor-intensive. Forty PSP patients were divided into 2 groups. Nineteen patients with different pathogenic mutations of FLCN previously identified by conventional Sanger sequencing and MLPA were included in test group, 21 random PSP patients without any genetic screening were included in blinded sample group. 7 PSP genes including FLCN, FBN1, COL3A1, CBS, SERPINA1 and TSC1/TSC2 were designed and enriched by Haloplex system, sequenced on a Miseq platform and analyzed in the 40 patients to evaluate the performance of the targeted-NGS method. We demonstrated that the full spectrum of genes associated with pneumothorax including FLCN gene mutations can be identified simultaneously in multiplexed sequence data. Noteworthy, by our in-house copy number analysis of the sequence data, we could not only detect intragenic deletions, but also determine approximate deletion junctions simultaneously. NGS based Haloplex target enrichment technology is proved to be a rapid and cost-effective screening strategy for the comprehensive molecular diagnosis of BHDS in PSP patients, as it can replace Sanger sequencing and MLPA by simultaneously detecting exonic and intronic SNVs, small indels, large intragenic deletions and determining deletion junctions in PSP-related genes.
The Inference of Gene Trees with Species Trees

PubMed Central

Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

2015-01-01

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970
Analysis of the Citrullus colocynthis Transcriptome during Water Deficit Stress

PubMed Central

Wang, Zhuoyu; Hu, Hongtao; Goertzen, Leslie R.; McElroy, J. Scott; Dane, Fenny

2014-01-01

Citrullus colocynthis is a very drought tolerant species, closely related to watermelon (C. lanatus var. lanatus), an economically important cucurbit crop. Drought is a threat to plant growth and development, and the discovery of drought inducible genes with various functions is of great importance. We used high throughput mRNA Illumina sequencing technology and bioinformatic strategies to analyze the C. colocynthis leaf transcriptome under drought treatment. Leaf samples at four different time points (0, 24, 36, or 48 hours of withholding water) were used for RNA extraction and Illumina sequencing. qRT-PCR of several drought responsive genes was performed to confirm the accuracy of RNA sequencing. Leaf transcriptome analysis provided the first glimpse of the drought responsive transcriptome of this unique cucurbit species. A total of 5038 full-length cDNAs were detected, with 2545 genes showing significant changes during drought stress. Principle component analysis indicated that drought was the major contributing factor regulating transcriptome changes. Up regulation of many transcription factors, stress signaling factors, detoxification genes, and genes involved in phytohormone signaling and citrulline metabolism occurred under the water deficit conditions. The C. colocynthis transcriptome data highlight the activation of a large set of drought related genes in this species, thus providing a valuable resource for future functional analysis of candidate genes in defense of drought stress. PMID:25118696
Cloning and characterization of an abalone (Haliotis discus hannai) actin gene

NASA Astrophysics Data System (ADS)

Ma, Hongming; Xu, Wei; Mai, Kangsen; Liufu, Zhiguo; Chen, Hong

2004-10-01

An actin encoding gene was cloned by using RT-PCR, 3‧ RACE and 5‧ RACE from abalone Haliotis discus hannai. The full length of the gene is 1532 base pairs, which contains a long 3‧ untranslated region of 307 base pairs and 79 base pairs of 5‧ untranslated sequence. The open reading frame encodes 376 amino acid residues. Sequence comparison with those of human and other mollusks showed high conservation among species at amino acid level. The identities was 96%, 97% and 96% respectively compared with Aplysia californica, Biomphalaria glabrata and Homo sapience β-actin. It is also indicated that this actin is more similar to the human cytoplasmic actin (β-actin) than to human muscle actin.
In silico analysis of 16S ribosomal RNA gene sequencing‐based methods for identification of medically important anaerobic bacteria

PubMed Central

Woo, Patrick C Y; Chung, Liliane M W; Teng, Jade L L; Tse, Herman; Pang, Sherby S Y; Lau, Veronica Y T; Wong, Vanessa W K; Kam, Kwok‐ling; Lau, Susanna K P; Yuen, Kwok‐Yung

2007-01-01

This study is the first study that provides useful guidelines to clinical microbiologists and technicians on the usefulness of full 16S rRNA sequencing, 5′‐end 527‐bp 16S rRNA sequencing and the existing MicroSeq full and 500 16S rDNA bacterial identification system (MicroSeq, Perkin‐Elmer Applied Biosystems Division, Foster City, California, USA) databases for the identification of all existing medically important anaerobic bacteria. Full and 527‐bp 16S rRNA sequencing are able to identify 52–63% of 130 Gram‐positive anaerobic rods, 72–73% of 86 Gram‐negative anaerobic rods and 78% of 23 anaerobic cocci. The existing MicroSeq databases are able to identify only 19–25% of 130 Gram‐positive anaerobic rods, 38% of 86 Gram‐negative anaerobic rods and 39% of 23 anaerobic cocci. These represent only 45–46% of those that should be confidently identified by full and 527‐bp 16S rRNA sequencing. To improve the usefulness of MicroSeq, bacterial species that should be confidently identified by full and/or 527‐bp 16S rRNA sequencing but not included in the existing MicroSeq databases should be included. PMID:17046845
Description and physical localization of the bovine survival of motor neuron gene (SMN).

PubMed

Pietrowski, D; Goldammer, T; Meinert, S; Schwerin, M; Förster, M

1998-01-01

Proximal spinal muscular atrophy (SMA) is an autosomal recessive disease in humans and other mammals, characterized by degeneration of anterior horn cells of the spinal cord. In humans, the survival of motor neuron gene (SMN) has been recognized as the SMA-determining gene and has been mapped to 5q13. In cattle, SMA is a recurrent, inherited disease that plays an important economic role in breeding programs of Brown Swiss stock. Now we have identified the full- length cDNA sequence of the bovine SMN gene. Molecular analysis and characterization of the sequence documents 85% identity to its human counterpart and three evolutionarily conserved domains in different species. Physical mapping data reveals that bovine SMN is localized to chromosome region 20q12-->q13, supporting the conserved synteny of this chromosomal region between humans and cattle.
[Molecular cloning and characterization of a novel Clonorchis sinensis antigenic protein containing tandem repeat sequences].

PubMed

Liu, Qian; Xu, Xue-Nian; Zhou, Yan; Cheng, Na; Dong, Yu-Ting; Zheng, Hua-Jun; Zhu, Yong-Qiang; Zhu, Yong-Qiang

2013-08-01

To find and clone new antigen genes from the lambda-ZAP cDNA expression library of adult Clonorchis sinensis, and determine the immunological characteristics of the recombinant proteins. The cDNA expression library of adult C. sinensis was screened by pooled sera of clonorchiasis patients. The sequences of the positive phage clones were compared with the sequences in EST database, and the full-length sequence of the gene (Cs22 gene) was obtained by RT-PCR. cDNA fragments containing 2 and 3 times tandem repeat sequences were generated by jumping PCR. The sequence encoding the mature peptide or the tandem repeat sequence was respectively cloned into the prokaryotic expression vector pET28a (+), and then transformed into E. coli Rosetta DE3 cells for expression. The recombinant proteins (rCs22-2r, rCs22-3r, rCs22M-2r, and rCs22M-3r) were purified by His-bind-resin (Ni-NTA) affinity chromatography. The immunogenicity of rCs22-2r and rCs22-3r was identified by ELISA. To evaluate the immunological diagnostic value of rCs22-2r and rCs22-3r, serum samples from 35 clonorchiasis patients, 31 healthy individuals, 15 schistosomiasis patients, 15 paragonimiasis westermani patients and 13 cysticercosis patients were examined by ELISA. To locate antigenic determinants, the pooled sera of clonorchiasis patients and healthy persons were analyzed for specific antibodies by ELISA with recombinant protein rCs22M-2r and rCs22M-3r containing the tandem repeat sequences. The full-length sequence of Cs22 antigen gene of C. sinensis was obtained. It contained 13 times tandem repeat sequences of EQQDGDEEGMGGDGGRGKEKGKVEGEDGAGEQKEQA. Bioinformatics analysis indicated that the protein (Cs22) belonged to GPI-anchored proteins family. The recombinant proteins rCs22-2r and rCs22-3r showed a certain level of immunogenicity. The positive rate by ELISA coated with the purified PrCs22-2r and PrCs22-3r for sera of clonorchiasis patients both were 45.7% (16/35), and 3.2% (1/31) for those of healthy persons. There was no cross reaction with sera of schistosomiasis and cysticercosis patients. The cross reaction with sera of paragonimiasis westermani patients was 1/15. The recombinant proteins rCs22M-2r and rCs22M-3r which only contained tandem repeats were specifically recognized by pooled sera of clonorchiasis patients. The Cs22 antigen gene of Clonorchis sinensis is obtained, and the recombinant proteins have certain diagnostic value. The antigenic determinant is located in tandem repeat sequences.
Antiviral resistance due to deletion in the neuraminidase gene and defective interfering-like viral polymerase basic 2 RNA of influenza A virus subtype H3N2.

PubMed

Trebbien, Ramona; Christiansen, Claus Bohn; Fischer, Thea Kølsen

2018-05-01

Antiviral treatment of influenza virus infections can lead to drug resistance of virus. This study investigates a selection of mutations in the full genome of H3N2 influenza A virus isolated from a patient in treatment with oseltamivir. Respiratory samples from a patient were collected before, during, and after antiviral treatment. Whole genome sequencing of the influenza virus by next generation sequencing, and low-frequency-variant analysis was performed. Neuraminidase-inhibition tests were performed with oseltamivir and zanamivir, and viruses were propagated in sial-transferase gene transfected Madin-Darby Canine Kidney cells. A deletion at amino acid position 245-248 in the neuraminidase gene occurred after initiation of treatment with oseltamivir. The deleted virus had highly reduced inhibition against oseltamivir but was sensitive to zanamivir. Nine days after discontinuation of oseltamivir treatment the deleted H3N2 virus was still present in the patient. After three passages of the deleted virus in cell culture, the deletion was retained. Several variant mutations appeared in the other genes of the H3N2 virus, where most striking were two major out-of-frame deletions in the polymerase basic 2 (PB2) gene, indicating defective interfering-like viral RNA. The viruses harboring the 245-248 deletion in the neuraminidase gene were still present after discontinuation of oseltamivir treatment and passages in cell cultures, indicating a potential risk for transmission of the deleted virus. Full genome deep sequencing was useful to reveal variant mutations that might be selected due to antiviral treatment, and defective interfering-like viral PB2 RNA in the respiratory samples was detected. Copyright © 2018 Elsevier B.V. All rights reserved.
SNP discovery and development of genetic markers for mapping innate immune response genes in common carp (Cyprinus carpio).

PubMed

Kongchum, Pawapol; Palti, Yniv; Hallerman, Eric M; Hulata, Gideon; David, Lior

2010-08-01

Single nucleotide polymorphisms (SNPs) in immune response genes have been reported as markers for susceptibility to infectious diseases in human and livestock. A disease caused by cyprinid herpesvirus 3 (CyHV-3) is highly contagious and virulent in common carp (Cyprinus carpio). With the aim to develop molecular tools for breeding CyHV-3-resistant carp, we have amplified and sequenced 11 candidate genes for viral disease resistance including TLR2, TLR3, TLR4ba, TLR7, TLR9, TLR21, TLR22, MyD88, TRAF6, type I IFN and IL-1beta. For each gene, we initially cloned and sequenced PCR amplicons from 8 to 12 fish (2-3 fish per strain) from the SNP discovery panel. We then identified and evaluated putative SNPs for their polymorphisms in the SNP discovery panel and validated their usefulness for linkage analysis in a full-sib family using the SNaPshot method. Our sequencing results and phylogenetic analyses suggested that TLR3, TLR7 and MyD88 genes are duplicated in the common carp genome. We, therefore, developed locus-specific PCR primers and SNP genotyping assays for the duplicated loci. A total of 48 SNP markers were developed from PCR fragments of the 13 loci (7 single-locus and 3 duplicated genes). Thirty-nine markers were polymorphic with estimated minor allele frequencies of more than 0.1. The utility of the SNP markers was evaluated in one full-sib family and revealed that 20 markers from 9 loci segregated in a disomic and Mendelian pattern and would be useful for linkage analysis. Published by Elsevier Ltd.
Transcriptome characterisation of Pinus tabuliformis and evolution of genes in the Pinus phylogeny

PubMed Central

2013-01-01

Background The Chinese pine (Pinus tabuliformis) is an indigenous conifer species in northern China but is relatively underdeveloped as a genomic resource; thus, limiting gene discovery and breeding. Large-scale transcriptome data were obtained using a next-generation sequencing platform to compensate for the lack of P. tabuliformis genomic information. Results The increasing amount of transcriptome data on Pinus provides an excellent resource for multi-gene phylogenetic analysis and studies on how conserved genes and functions are maintained in the face of species divergence. The first P. tabuliformis transcriptome from a normalised cDNA library of multiple tissues and individuals was sequenced in a full 454 GS-FLX run, producing 911,302 sequencing reads. The high quality overlapping expressed sequence tags (ESTs) were assembled into 46,584 putative transcripts, and more than 700 SSRs and 92,000 SNPs/InDels were characterised. Comparative analysis of the transcriptome of six conifer species yielded 191 orthologues, from which we inferred a phylogenetic tree, evolutionary patterns and calculated rates of gene diversion. We also identified 938 fast evolving sequences that may be useful for identifying genes that perhaps evolved in response to positive selection and might be responsible for speciation in the Pinus lineage. Conclusions A large collection of high-quality ESTs was obtained, de novo assembled and characterised, which represents a dramatic expansion of the current transcript catalogues of P. tabuliformis and which will gradually be applied in breeding programs of P. tabuliformis. Furthermore, these data will facilitate future studies of the comparative genomics of P. tabuliformis and other related species. PMID:23597112
A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

PubMed

Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

2012-06-15

The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.
A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

PubMed Central

Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

2012-01-01

The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961
Sequencing and analysis of 10967 full-length cDNA clones from Xenopus laevis and Xenopus tropicalis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morin, R D; Chang, E; Petrescu, A

2005-10-31

Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection initiative. Here we present an analysis of 10967 clones (8049 from X. laevis and 2918 from X. tropicalis). The clone set contains 2013 orthologs between X. laevis and X. tropicalis as well as 1795 paralog pairs within X. laevis. 1199 are in-paralogs, believed to have resulted from an allotetraploidization event approximately 30 million years ago, and the remaining 546 are likely out-paralogs that have resulted from more ancient gene duplications, prior to the divergence betweenmore » the two species. We do not detect any evidence for positive selection by the Yang and Nielsen maximum likelihood method of approximating d{sub N}/d{sub S}. However, d{sub N}/d{sub S} for X. laevis in-paralogs is elevated relative to X. tropicalis orthologs. This difference is highly significant, and indicates an overall relaxation of selective pressures on duplicated gene pairs. Within both groups of paralogs, we found evidence of subfunctionalization, manifested as differential expression of paralogous genes among tissues, as measured by EST information from public resources. We have observed, as expected, a higher instance of subfunctionalization in out-paralogs relative to in-paralogs.« less
De novo assembled expressed gene catalog of a fast-growing Eucalyptus tree produced by Illumina mRNA-Seq

PubMed Central

2010-01-01

Background De novo assembly of transcript sequences produced by short-read DNA sequencing technologies offers a rapid approach to obtain expressed gene catalogs for non-model organisms. A draft genome sequence will be produced in 2010 for a Eucalyptus tree species (E. grandis) representing the most important hardwood fibre crop in the world. Genome annotation of this valuable woody plant and genetic dissection of its superior growth and productivity will be greatly facilitated by the availability of a comprehensive collection of expressed gene sequences from multiple tissues and organs. Results We present an extensive expressed gene catalog for a commercially grown E. grandis × E. urophylla hybrid clone constructed using only Illumina mRNA-Seq technology and de novo assembly. A total of 18,894 transcript-derived contigs, a large proportion of which represent full-length protein coding genes were assembled and annotated. Analysis of assembly quality, length and diversity show that this dataset represent the most comprehensive expressed gene catalog for any Eucalyptus tree. mRNA-Seq analysis furthermore allowed digital expression profiling of all of the assembled transcripts across diverse xylogenic and non-xylogenic tissues, which is invaluable for ascribing putative gene functions. Conclusions De novo assembly of Illumina mRNA-Seq reads is an efficient approach for transcriptome sequencing and profiling in Eucalyptus and other non-model organisms. The transcriptome resource (Eucspresso, http://eucspresso.bi.up.ac.za/) generated by this study will be of value for genomic analysis of woody biomass production in Eucalyptus and for comparative genomic analysis of growth and development in woody and herbaceous plants. PMID:21122097
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

PubMed Central

Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka

2010-01-01

Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Flanking sequence determination and event-specific detection of genetically modified wheat B73-6-1.

PubMed

Xu, Junyi; Cao, Jijuan; Cao, Dongmei; Zhao, Tongtong; Huang, Xin; Zhang, Piqiao; Luan, Fengxia

2013-05-01

In order to establish a specific identification method for genetically modified (GM) wheat, exogenous insert DNA and flanking sequence between exogenous fragment and recombinant chromosome of GM wheat B73-6-1 were successfully acquired by means of conventional polymerase chain reaction (PCR) and thermal asymmetric interlaced (TAIL)-PCR strategies. Newly acquired exogenous fragment covered the full-length sequence of transformed genes such as transformed plasmid and corresponding functional genes including marker uidA, herbicide-resistant bar, ubiquitin promoter, and high-molecular-weight gluten subunit. The flanking sequence between insert DNA revealed high similarity with Triticum turgidum A gene (GenBank: AY494981.1). A specific PCR detection method for GM wheat B73-6-1 was established on the basis of primers designed according to the flanking sequence. This specific PCR method was validated by GM wheat, GM corn, GM soybean, GM rice, and non-GM wheat. The specifically amplified target band was observed only in GM wheat B73-6-1. This method is of high specificity, high reproducibility, rapid identification, and excellent accuracy for the identification of GM wheat B73-6-1.
Genome-wide analysis of the WRKY transcription factors in aegilops tauschii.

PubMed

Ma, Jianhui; Zhang, Daijing; Shao, Yun; Liu, Pei; Jiang, Lina; Li, Chunxi

2014-01-01

The WRKY transcription factors (TFs) play important roles in responding to abiotic and biotic stress in plants. However, due to its unfinished genome sequencing, relatively few WRKY TFs with full-length coding sequences (CDSs) have been identified in wheat. Instead, the Aegilops tauschii genome, which is the D-genome progenitor of the hexaploid wheat genome, provides important resources for the discovery of new genes. In this study, we performed a bioinformatics analysis to identify WRKY TFs with full-length CDSs from the A. tauschii genome. A detailed evolutionary analysis for all these TFs was conducted, and quantitative real-time PCR was carried out to investigate the expression patterns of the abiotic stress-related WRKY TFs under different abiotic stress conditions in A. tauschii seedlings. A total of 93 WRKY TFs were identified from A. tauschii, and 79 of them were found to be newly discovered genes compared with wheat. Gene phylogeny, gene structure and chromosome location of the 93 WRKY TFs were fully analyzed. These studies provide a global view of the WRKY TFs from A. tauschii and a firm foundation for further investigations in both A. tauschii and wheat. © 2015 S. Karger AG, Basel.
Evolutionary mechanisms involved in the virulence of infectious salmon anaemia virus (ISAV), a piscine orthomyxovirus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Markussen, Turhan; Jonassen, Christine Monceyron; Numanovic, Sanela

2008-05-10

Infectious salmon anaemia virus (ISAV) is an orthomyxovirus causing a multisystemic, emerging disease in Atlantic salmon. Here we present, for the first time, detailed sequence analyses of the full-genome sequence of a presumed avirulent isolate displaying a full-length hemagglutinin-esterase (HE) gene (HPR0), and compare this with full-genome sequences of 11 Norwegian ISAV isolates from clinically diseased fish. These analyses revealed the presence of a virulence marker right upstream of the putative cleavage site R{sub 267} in the fusion (F) protein, suggesting a Q{sub 266} {yields} L{sub 266} substitution to be a prerequisite for virulence. To gain virulence in isolates lackingmore » this substitution, a sequence insertion near the cleavage site seems to be required. This strongly suggests the involvement of a protease recognition pattern at the cleavage site of the fusion protein as a determinant of virulence, as seen in highly pathogenic influenza A virus H5 or H7 and the paramyxovirus Newcastle disease virus.« less

Production of a full-length infectious GFP-tagged cDNA clone of Beet mild yellowing virus for the study of plant-polerovirus interactions.

PubMed

Stevens, Mark; Viganó, Felicita

2007-04-01

The full-length cDNA of Beet mild yellowing virus (Broom's Barn isolate) was sequenced and cloned into the vector pLitmus 29 (pBMYV-BBfl). The sequence of BMYV-BBfl (5721 bases) shared 96% and 98% nucleotide identity with the other complete sequences of BMYV (BMYV-2ITB, France and BMYV-IPP, Germany respectively). Full-length capped RNA transcripts of pBMYV-BBfl were synthesised and found to be biologically active in Arabidopsis thaliana protoplasts following electroporation or PEG inoculation when the protoplasts were subsequently analysed using serological and molecular methods. The BMYV sequence was modified by inserting DNA that encoded the jellyfish green fluorescent protein (GFP) into the P5 gene close to its 3' end. A. thaliana protoplasts electroporated with these RNA transcripts were biologically active and up to 2% of transfected protoplasts showed GFP-specific fluorescence. The exploitation of these cDNA clones for the study of the biology of beet poleroviruses is discussed.
Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus

PubMed Central

Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

2012-01-01

Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function. PMID:22368382
Complete genomic sequence of an infectious pancreatic necrosis virus isolated from rainbow trout (Oncorhynchus mykiss) in China.

PubMed

Ji, Feng; Zhao, Jing-Zhuang; Liu, Miao; Lu, Tong-Yan; Liu, Hong-Bai; Yin, Jiasheng; Xu, Li-Ming

2017-04-01

Infectious pancreatic necrosis (IPN) is a significant disease of farmed salmonids resulting in direct economic losses due to high mortality in China. However, no gene sequence of any Chinese infectious pancreatic necrosis virus (IPNV) isolates was available. In the study, moribund rainbow trout fry samples were collected during an outbreak of IPN in Yunnan province of southwest China in 2013. An IPNV was isolated and tentatively named ChRtm213. We determined the full genome sequence of the IPNV ChRtm213 and compared it with previously identified IPNV sequences worldwide. The sequences of different structural and non-structural protein genes were compared to those of other aquatic birnaviruses sequenced to date. The results indicated that the complete genome sequence of ChRtm213 strain contains a segment A (3099 nucleotides) coding a polyprotein VP2-VP4-VP3, and a segment B (2789 nucleotides) coding a RNA-dependent RNA polymerase VP1. The phylogenetic analyses showed that ChRtm213 strain fell within genogroup 1, serotype A9 (Jasper), having similarities of 96.3% (segment A) and 97.3% (segment B) with the IPNV strain AM98 from Japan. The results suggest that the Chinese IPNV isolate has relative closer relationship with Japanese IPNV strains. The sequence of ChRtm213 was the first gene sequence of IPNV isolates in China. This study provided a robust reference for diagnosis and/or control of IPNV prevalent in China.
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

PubMed Central

Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

2008-01-01

Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
Metagenomic Characterization of Antibiotic Resistance Genes in Full-Scale Reclaimed Water Distribution Systems and Corresponding Potable Systems.

PubMed

Garner, Emily; Chen, Chaoqi; Xia, Kang; Bowers, Jolene; Engelthaler, David M; McLain, Jean; Edwards, Marc A; Pruden, Amy

2018-06-05

Water reclamation provides a valuable resource for meeting nonpotable water demands. However, little is known about the potential for wastewater reuse to disseminate antibiotic resistance genes (ARGs). Here, samples were collected seasonally in 2014-2015 from four U.S. utilities' reclaimed and potable water distribution systems before treatment, after treatment, and at five points of use (POU). Shotgun metagenomic sequencing was used to profile the resistome (i.e., full contingent of ARGs) of a subset ( n = 38) of samples. Four ARGs ( qnrA, bla TEM , vanA, sul1) were quantified by quantitative polymerase chain reaction. Bacterial community composition (via 16S rRNA gene amplicon sequencing), horizontal gene transfer (via quantification of intI1 integrase and plasmid genes), and selection pressure (via detection of metals and antibiotics) were investigated as potential factors governing the presence of ARGs. Certain ARGs were elevated in all ( sul1; p ≤ 0.0011) or some ( bla TEM , qnrA; p ≤ 0.0145) reclaimed POU samples compared to corresponding potable samples. Bacterial community composition was weakly correlated with ARGs (Adonis, R 2 = 0.1424-0.1734) and associations were noted between 193 ARGs and plasmid-associated genes. This study establishes that reclaimed water could convey greater abundances of certain ARGs than potable waters and provides observations regarding factors that likely control ARG occurrence in reclaimed water systems.
[Construction and functional identification of eukaryotic expression vector carrying Sprague-Dawley rat MSX-2 gene].

PubMed

Yang, Xian-Xian; Zhang, Mei; Yan, Zhao-Wen; Zhang, Ru-Hong; Mu, Xiong-Zheng

2008-01-01

To construct a high effective eukaryotic expressing plasmid PcDNA 3.1-MSX-2 encoding Sprague-Dawley rat MSX-2 gene for the further study of MSX-2 gene function. The full length SD rat MSX-2 gene was amplified by PCR, and the full length DNA was inserted in the PMD1 8-T vector. It was isolated by restriction enzyme digest with BamHI and Xhol, then ligated into the cloning site of the PcDNA3.1 expression plasmid. The positive recombinant was identified by PCR analysis, restriction endonudease analysis and sequence analysis. Expression of RNA and protein was detected by RT-PCR and Western blot analysis in PcDNA3.1-MSX-2 transfected HEK293 cells. Sequence analysis and restriction endonudease analysis of PcDNA3.1-MSX-2 demonstrated that the position and size of MSX-2 cDNA insertion were consistent with the design. RT-PCR and Western blot analysis showed specific expression of mRNA and protein of MSX-2 in the transfected HEK293 cells. The high effective eukaryotic expression plasmid PcDNA3.1-MSX-2 encoding Sprague-Dawley Rat MSX-2 gene which is related to craniofacial development can be successfully reconstructed. It may serve as the basis for the further study of MSX-2 gene function.
Molecular characterization and functional analysis of a glutathione peroxidase gene from Aphelenchoides besseyi (Nematoda: Aphelenchoididae).

PubMed

Wang, Bu-Yong; Wen, Rong-Rong; Ma, Ling

2017-09-26

Aphelenchoides besseyi, the nematode agent of rice tip white disease, causes huge economic losses in almost all the rice-growing regions of the world. Glutathione peroxidase (GPx), an esophageal glands secretion protein, plays important roles in the parasitism, immune evasion, reproduction and pathogenesis of many plant-parasitic nematodes (PPNs). Therefore, GPx is a promising target for control A. besseyi. Here, the full-length sequence of the GPx gene from A. besseyi (AbGPx1) was cloned using the rapid amplification of cDNA ends method. The full-length 944 bp AbGPx1 sequence, which contains a 678 bp open reading frame, encodes a 225 amino acid protein. The deduced amino acid sequence of the AbGPxl shares highly homologous with other nematode GPxs, and showed the closest evolutionary relationship with DrGPx. In situ hybridization showed that AbGPx1 was constitutively expressed in the esophageal glands of A. besseyi, suggesting its potential roles in parasitism and reproduction. RNA interference (RNAi) was used to assess the functions of the AbGPx1 gene, and quantitative real-time PCR was used to monitor the RNAi effects. After treatment with dsRNA for 12 h, AbGPx1 expression levels and reproduction in the nematodes decreased compared with the same parameters in the control group; thus, the AbGPx1 gene is likely to be associated with the development, reproduction, and infection ability of A. besseyi. These findings may open new avenues towards nematode control.
Complete mitochondrial genome from South American catfish Pseudoplatystoma reticulatum (Eigenmann & Eigenmann) and its impact in Siluriformes phylogenetic tree.

PubMed

Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues

2017-02-01

The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.
Microbial community structure in a full-scale anaerobic treatment plant during start-up and first year of operation revealed by high-throughput 16S rRNA gene amplicon sequencing.

PubMed

Fykse, Else Marie; Aarskaug, Tone; Madslien, Elisabeth H; Dybwad, Marius

2016-12-01

High-throughput amplicon sequencing of six biomass samples from a full-scale anaerobic reactor at a Norwegian wood and pulp factory using Biothane Biobed Expanded Granular Sludge Bed (EGSB) technology during start-up and first year of operation was performed. A total of 106,166 16S rRNA gene sequences (V3-V5 region) were obtained. The number of operational taxonomic units (OTUs) ranged from 595 to 2472, and a total of 38 different phyla and 143 families were observed. The predominant phyla were Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Spirochaetes. A more diverse microbial community was observed in the inoculum biomass coming from an Upflow Anaerobic Sludge Blanket (USAB) reactor, reflecting an adaptation of the inoculum diversity to the specific conditions of the new reactor. In addition, no taxa classified as obligate pathogens were identified and potentially opportunistic pathogens were absent or observed in low abundances. No Legionella bacteria were identified by traditional culture-based and molecular methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
Complete DNA sequence of the mitochondrial genome of the treehopper Leptobelus gazella (Membracoidea: Hemiptera).

PubMed

Zhao, Xing; Liang, Ai-Ping

2016-09-01

The first complete DNA sequence of the mitochondrial genome (mitogenome) of Leptobelus gazelle (Membracoidea: Hemiptera) is determined in this study. The circular molecule is 16,007 bp in its full length, which encodes a set of 37 genes, including 13 proteins, 2 ribosomal RNAs, 22 transfer RNAs, and contains an A + T-rich region (CR). The gene numbers, content, and organization of L. gazelle are similar to other typical metazoan mitogenomes. Twelve of the 13 PCGs are initiated with ATR methionine or ATT isoleucine codons, except the atp8 gene that uses the ATC isoleucine as start signal. Ten of the 13 PCGs have complete termination codons, either TAA (nine genes) or TAG (cytb). The remaining 3 PCGs (cox1, cox2 and nad5) have incomplete termination codons T (AA). All of the 22 tRNAs can be folded in the form of a typical clover-leaf structure. The complete mitogenome sequence data of L. gazelle is useful for the phylogenetic and biogeographic studies of the Membracoidea and Hemiptera.
Major mutation events in structural genes of peste des petits ruminants virus through serial passages in vitro.

PubMed

Wu, Xiaodong; Liu, Fuxiao; Li, Lin; Zou, Yanli; Liu, Shan; Wang, Zhiliang

2016-06-01

Peste des petits ruminants (PPR) is an highly contagious disease of small ruminants, and caused by peste des petits ruminants virus (PPRV), a member of the genus Morbillivirus in the family Paramyxoviridae. The first outbreak of PPR in China was officially reported in July 2007, when a PPRV strain was successfully isolated from a sick goat in Tibet, followed by sequencing at a full-genome level (China/Tibet/Geg/07-30, GenBank: FJ905304.1). To date, this isolate has been virulently attenuated by more than 90 serial passages in Vero-Dog-SLAM cells at our laboratory. In this study, a total of nine strains by serial passages (namely the 10th, 20th, 30th, 40th, 50th, 60th, 70th, 80th, and 90th passages) were chosen for sequencing of six structural genes in PPRV. The sequence analysis showed that mutation rates in all viral genes were relatively low, and only a few identical mutations within certain genes were stably maintained after an earlier passage, perhaps indicating a predominance of mutants after such a passage.
Enhanced green fluorescent protein (egfp) gene expression in Tetraselmis subcordiformis chloroplast with endogenous regulators.

PubMed

Cui, Yulin; Zhao, Jialin; Hou, Shichang; Qin, Song

2016-05-01

On the basis of fundamental genetic transformation technologies, the goal of this study was to optimize Tetraselmis subcordiformis chloroplast transformation through the use of endogenous regulators. The genes rrn16S, rbcL, psbA, and psbC are commonly highly expressed in chloroplasts, and the regulators of these genes are often used in chloroplast transformation. For lack of a known chloroplast genome sequence, the genome-walking method was used here to obtain full sequences of T. subcordiformis endogenous regulators. The resulting regulators, including three promoters, two terminators, and a ribosome combination sequence, were inserted into the previously constructed plasmid pPSC-R, with the egfp gene included as a reporter gene, and five chloroplast expression vectors prepared. These vectors were successfully transformed into T. subcordiformis by particle bombardment and the efficiency of each vector tested by assessing EGFP fluorescence via microscopy. The results showed that these vectors exhibited higher efficiency than the former vector pPSC-G carrying exogenous regulators, and the vector pRFA with Prrn, psbA-5'RE, and TpsbA showed the highest efficiency. This research provides a set of effective endogenous regulators for T. subcordiformis and will facilitate future fundamental studies of this alga.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.

PubMed

Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji

2017-02-28

High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Comparative immunogenomics of molluscs.

PubMed

Schultz, Jonathan H; Adema, Coen M

2017-10-01

Comparative immunology, studying both vertebrates and invertebrates, provided the earliest descriptions of phagocytosis as a general immune mechanism. However, the large scale of animal diversity challenges all-inclusive investigations and the field of immunology has developed by mostly emphasizing study of a few vertebrate species. In addressing the lack of comprehensive understanding of animal immunity, especially that of invertebrates, comparative immunology helps toward management of invertebrates that are food sources, agricultural pests, pathogens, or transmit diseases, and helps interpret the evolution of animal immunity. Initial studies showed that the Mollusca (second largest animal phylum), and invertebrates in general, possess innate defenses but lack the lymphocytic immune system that characterizes vertebrate immunology. Recognizing the reality of both common and taxon-specific immune features, and applying up-to-date cell and molecular research capabilities, in-depth studies of a select number of bivalve and gastropod species continue to reveal novel aspects of molluscan immunity. The genomics era heralded a new stage of comparative immunology; large-scale efforts yielded an initial set of full molluscan genome sequences that is available for analyses of full complements of immune genes and regulatory sequences. Next-generation sequencing (NGS), due to lower cost and effort required, allows individual researchers to generate large sequence datasets for growing numbers of molluscs. RNAseq provides expression profiles that enable discovery of immune genes and genome sequences reveal distribution and diversity of immune factors across molluscan phylogeny. Although computational de novo sequence assembly will benefit from continued development and automated annotation may require some experimental validation, NGS is a powerful tool for comparative immunology, especially increasing coverage of the extensive molluscan diversity. To date, immunogenomics revealed new levels of complexity of molluscan defense by indicating sequence heterogeneity in individual snails and bivalves, and members of expanded immune gene families are expressed differentially to generate pathogen-specific defense responses. Copyright © 2017 Elsevier Ltd. All rights reserved.
De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits.

PubMed

Zhu, Haisheng; Liu, Jianting; Wen, Qingfang; Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng

2017-01-01

Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar 'Fusi-3'. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1-6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism.
Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data

PubMed Central

Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

2015-01-01

Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post-divergence gene flow. PMID:26187295
Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population.

PubMed

Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

2016-06-02

Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Flanking sequence determination and specific PCR identification of transgenic wheat B102-1-2.

PubMed

Cao, Jijuan; Xu, Junyi; Zhao, Tongtong; Cao, Dongmei; Huang, Xin; Zhang, Piqiao; Luan, Fengxia

2014-01-01

The exogenous fragment sequence and flanking sequence between the exogenous fragment and recombinant chromosome of transgenic wheat B102-1-2 were successfully acquired using genome walking technology. The newly acquired exogenous fragment encoded the full-length sequence of transformed genes with transformed plasmid and corresponding functional genes including ubi, vector pBANF-bar, vector pUbiGUSPlus, vector HSP, reporter vector pUbiGUSPlus, promoter ubiquitin, and coli DH1. A specific polymerase chain reaction (PCR) identification method for transgenic wheat B102-1-2 was established on the basis of designed primers according to flanking sequence. This established specific PCR strategy was validated by using transgenic wheat, transgenic corn, transgenic soybean, transgenic rice, and non-transgenic wheat. A specifically amplified target band was observed only in transgenic wheat B102-1-2. Therefore, this method is characterized by high specificity, high reproducibility, rapid identification, and excellent accuracy for the identification of transgenic wheat B102-1-2.
Comparative Sequence Analysis of the Plasmid-Encoded Regulator of Enteropathogenic Escherichia coli Strains

PubMed Central

Okeke, Iruka N.; Borneman, Jade A.; Shin, Sooan; Mellies, Jay L.; Quinn, Laura E.; Kaper, James B.

2001-01-01

Enteropathogenic Escherichia coli (EPEC) strains that carry the EPEC adherence factor (EAF) plasmid were screened for the presence of different EAF sequences, including those of the plasmid-encoded regulator (per). Considerable variation in gene content of EAF plasmids from different strains was seen. However, bfpA, the gene encoding the structural subunit for the bundle-forming pilus, bundlin, and per genes were found in 96.8% of strains. Sequence analysis of the per operon and its promoter region from 15 representative strains revealed that it is highly conserved. Most of the variation occurs in the 5′ two-thirds of the perA gene. In contrast, the C-terminal portion of the predicted PerA protein that contains the DNA-binding helix-turn-helix motif is 100% conserved in all strains that possess a full-length gene. In a minority of strains including the O119:H2 and canine isolates and in a subset of O128:H2 and O142:H6 strains, frameshift mutations in perA leading to premature truncation and consequent inactivation of the gene were identified. Cloned perA, -B, and -C genes from these strains, unlike those from strains with a functional operon, failed to activate the LEE1 operon and bfpA transcriptional fusions or to complement a per mutant in reference strain E2348/69. Furthermore, O119, O128, and canine strains that carry inactive per operons were deficient in virulence protein expression. The context in which the perABC operon occurs on the EAF plasmid varies. The sequence upstream of the per promoter region in EPEC reference strains E2348/69 and B171-8 was present in strains belonging to most serogroups. In a subset of O119:H2, O128:H2, and O142:H6 strains and in the canine isolate, this sequence was replaced by an IS1294-homologous sequence. PMID:11500429
Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India).

PubMed

Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

2016-03-01

16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85'N and 78°25'E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/.

Horizontal gene transfer is more frequent with increased heterotrophy and contributes to parasite adaptation.

PubMed

Yang, Zhenzhen; Zhang, Yeting; Wafula, Eric K; Honaas, Loren A; Ralph, Paula E; Jones, Sam; Clarke, Christopher R; Liu, Siming; Su, Chun; Zhang, Huiting; Altman, Naomi S; Schuster, Stephan C; Timko, Michael P; Yoder, John I; Westwood, James H; dePamphilis, Claude W

2016-10-24

Horizontal gene transfer (HGT) is the transfer of genetic material across species boundaries and has been a driving force in prokaryotic evolution. HGT involving eukaryotes appears to be much less frequent, and the functional implications of HGT in eukaryotes are poorly understood. We test the hypothesis that parasitic plants, because of their intimate feeding contacts with host plant tissues, are especially prone to horizontal gene acquisition. We sought evidence of HGTs in transcriptomes of three parasitic members of Orobanchaceae, a plant family containing species spanning the full spectrum of parasitic capabilities, plus the free-living Lindenbergia Following initial phylogenetic detection and an extensive validation procedure, 52 high-confidence horizontal transfer events were detected, often from lineages of known host plants and with an increasing number of HGT events in species with the greatest parasitic dependence. Analyses of intron sequences in putative donor and recipient lineages provide evidence for integration of genomic fragments far more often than retro-processed RNA sequences. Purifying selection predominates in functionally transferred sequences, with a small fraction of adaptively evolving sites. HGT-acquired genes are preferentially expressed in the haustorium-the organ of parasitic plants-and are strongly biased in predicted gene functions, suggesting that expression products of horizontally acquired genes are contributing to the unique adaptive feeding structure of parasitic plants.
The FLEXGene repository: exploiting the fruits of the genome projects by creating a needed resource to face the challenges of the post-genomic era.

PubMed

Brizuela, Leonardo; Richardson, Aaron; Marsischky, Gerald; Labaer, Joshua

2002-01-01

Thanks to the results of the multiple completed and ongoing genome sequencing projects and to the newly available recombination-based cloning techniques, it is now possible to build gene repositories with no precedent in their composition, formatting, and potential. This new type of gene repository is necessary to address the challenges imposed by the post-genomic era, i.e., experimentation on a genome-wide scale. We are building the FLEXGene (Full Length EXpression-ready) repository. This unique resource will contain clones representing the complete ORFeome of different organisms, including Homo sapiens as well as several pathogens and model organisms. It will consist of a comprehensive, characterized (sequence-verified), and arrayed gene repository. This resource will allow full exploitation of the genomic information by enabling genome-wide scale experimentation at the level of functional/phenotypic assays as well as at the level of protein expression, purification, and analysis. Here we describe the rationale and construction of this resource and focus on the data obtained from the Saccharomyces cerevisiae project.
Metatranscriptomics of Soil Eukaryotic Communities.

PubMed

Yadav, Rajiv K; Bragalini, Claudia; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia

2016-01-01

Functions expressed by eukaryotic organisms in soil can be specifically studied by analyzing the pool of eukaryotic-specific polyadenylated mRNA directly extracted from environmental samples. In this chapter, we describe two alternative protocols for the extraction of high-quality RNA from soil samples. Total soil RNA or mRNA can be converted to cDNA for direct high-throughput sequencing. Polyadenylated mRNA-derived full-length cDNAs can also be cloned in expression plasmid vectors to constitute soil cDNA libraries, which can be subsequently screened for functional gene categories. Alternatively, the diversity of specific gene families can also be explored following cDNA sequence capture using exploratory oligonucleotide probes.
Horse cDNA clones encoding two MHC class I genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barbis, D.P.; Maher, J.K.; Stanek, J.

1994-12-31

Two full-length clones encoding MHC class I genes were isolated by screening a horse cDNA library, using a probe encoding in human HLA-A2.2Y allele. The library was made in the pcDNA1 vector (Invitrogen, San Diego, CA), using mRNA from peripheral blood lymphocytes obtained from a Thoroughbred stallion (No. 0834) homozygous for a common horse MHC haplotype (ELA-A2, -B2, -D2; Antczak et al. 1984; Donaldson et al. 1988). The clones were sequenced, using SP6 and T7 universal primers and horse-specific oligonucleotides designed to extend previously determined sequences.
Hybrid Sequencing of Full-Length cDNA Transcripts of Stems and Leaves in Dendrobium officinale

PubMed Central

He, Liu; Fu, Shuhua; Xu, Zhichao; Yan, Jun; Xu, Jiang; Zhou, Hong; Zhou, Jianguo; Chen, Xinlian; Li, Ying; Au, Kin Fai; Yao, Hui

2017-01-01

Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem. PMID:28981454
VaDiR: an integrated approach to Variant Detection in RNA.

PubMed

Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

2018-02-01

Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Mutations in the NDP gene: contribution to Norrie disease, familial exudative vitreoretinopathy and retinopathy of prematurity.

PubMed

Dickinson, Joanne L; Sale, Michèle M; Passmore, Abraham; FitzGerald, Liesel M; Wheatley, Catherine M; Burdon, Kathryn P; Craig, Jamie E; Tengtrisorn, Supaporn; Carden, Susan M; Maclean, Hector; Mackey, David A

2006-01-01

To examine the contribution of mutations within the Norrie disease (NDP) gene to the clinically similar retinal diseases Norrie disease, X-linked familial exudative vitreoretinopathy (FEVR), Coat's disease and retinopathy of prematurity (ROP). A dataset comprising 13 Norrie-FEVR, one Coat's disease, 31 ROP patients and 90 ex-premature babies of <32 weeks' gestation underwent an ophthalmologic examination and were screened for mutations within the NDP gene by direct DNA sequencing, denaturing high-performance liquid chromatography or gel electrophoresis. Controls were only screened using denaturing high-performance liquid chromatography and gel electrophoresis. Confirmation of mutations identified was obtained by DNA sequencing. Evidence for two novel mutations in the NDP gene was presented: Leu103Val in one FEVR patient and His43Arg in monozygotic twin Norrie disease patients. Furthermore, a previously described 14-bp deletion located in the 5' unstranslated region of the NDP gene was detected in three cases of regressed ROP. A second heterozygotic 14-bp deletion was detected in an unaffected ex-premature girl. Only two of the 13 Norrie-FEVR index cases had the full features of Norrie disease with deafness and mental retardation. Two novel mutations within the coding region of the NDP gene were found, one associated with a severe disease phenotypes of Norrie disease and the other with FEVR. A deletion within the non-coding region was associated with only mild-regressed ROP, despite the presence of low birthweight, prematurity and exposure to oxygen. In full-term children with retinal detachment only 15% appear to have the full features of Norrie disease and this is important for counselling parents on the possible long-term outcome.
Study of canine parvovirus evolution: comparative analysis of full-length VP2 gene sequences from Argentina and international field strains.

PubMed

Gallo Calderón, Marina; Wilda, Maximiliano; Boado, Lorena; Keller, Leticia; Malirat, Viviana; Iglesias, Marcela; Mattion, Nora; La Torre, Jose

2012-02-01

The continuous emergence of new strains of canine parvovirus (CPV), poorly protected by current vaccination, is a concern among breeders, veterinarians, and dog owners around the world. Therefore, the understanding of the genetic variation in emerging CPV strains is crucial for the design of disease control strategies, including vaccines. In this paper, we obtained the sequences of the full-length gene encoding for the main capsid protein (VP2) of 11 canine parvovirus type 2 (CPV-2) Argentine representative field strains, selected from a total of 75 positive samples studied in our laboratory in the last 9 years. A comparative sequence analysis was performed on 9 CPV-2c, one CPV-2a, and one CPV-2b Argentine strains with respect to international strains reported in the GenBank database. In agreement with previous reports, a high degree of identity was found among CPV-2c Argentine strains (99.6-100% and 99.7-100% at nucleotide and amino acid levels, respectively). However, the appearance of a new substitution in the 440 position (T440A) in four CPV-2c Argentine strains obtained after the year 2009 gives support to the variability observed for this position located within the VP2, three-fold spike. This is the first report on the genetic characterization of the full-length VP2 gene of emerging CPV strains in South America and shows that all the Argentine CPV-2c isolates cluster together with European and North American CPV-2c strains.
Sequence analysis and gene expression of putative exo- and endo-glucanases from oil palm (Elaeis guineensis) during fungal infection.

PubMed

Yeoh, Keat-Ai; Othman, Abrizah; Meon, Sariah; Abdullah, Faridah; Ho, Chai-Ling

2012-10-15

Glucanases are enzymes that hydrolyze a variety β-d-glucosidic linkages. Plant β-1,3-glucanases are able to degrade fungal cell walls; and promote the release of cell-wall derived fungal elicitors. In this study, three full-length cDNA sequences encoding oil palm (Elaeis guineensis) glucanases were analyzed. Sequence analyses of the cDNA sequences suggested that EgGlc1-1 is a putative β-d-glucan exohydolase belonging to glycosyl hydrolase (GH) family 3 while EgGlc5-1 and EgGlc5-2 are putative glucan endo-1,3-β-glucosidases belonging to GH family 17. The transcript abundance of these genes in the roots and leaves of oil palm seedlings treated with Ganoderma boninense and Trichoderma harzianum was profiled to investigate the involvement of these glucanases in oil palm during fungal infection. The gene expression of EgGlc1-1 in the root of oil palm seedlings was increased by T. harzianum but suppressed by G. boninense; while the gene expression of both EgGlc5-1 and EgGlc5-2 in the roots of oil palm seedlings was suppressed by G. boninense or/and T. harzianum. Copyright © 2012 Elsevier GmbH. All rights reserved.
Microbial community structure in the gut of the New Zealand insect Auckland tree weta (Hemideina thoracica).

PubMed

Waite, David W; Dsouza, Melissa; Biswas, Kristi; Ward, Darren F; Deines, Peter; Taylor, Michael W

2015-05-01

The endemic New Zealand weta is an enigmatic insect. Although the insect is well known by its distinctive name, considerable size, and morphology, many basic aspects of weta biology remain unknown. Here, we employed cultivation-independent enumeration techniques and rRNA gene sequencing to investigate the gut microbiota of the Auckland tree weta (Hemideina thoracica). Fluorescence in situ hybridisation performed on different sections of the gut revealed a bacterial community of fluctuating density, while rRNA gene-targeted amplicon pyrosequencing revealed the presence of a microbial community containing high bacterial diversity, but an apparent absence of archaea. Bacteria were further studied using full-length 16S rRNA gene sequences, with statistical testing of bacterial community membership against publicly available termite- and cockroach-derived sequences, revealing that the weta gut microbiota is similar to that of cockroaches. These data represent the first analysis of the weta microbiota and provide initial insights into the potential function of these microorganisms.
Loss of GATA-1 Full Length as a Cause of Diamond–Blackfan Anemia Phenotype

PubMed Central

Parrella, Sara; Aspesi, Anna; Quarello, Paola; Garelli, Emanuela; Pavesi, Elisa; Carando, Adriana; Nardi, Margherita; Ellis, Steven R.; Ramenghi, Ugo; Dianzani, Irma

2015-01-01

Mutations in the hematopoietic transcription factor GATA-1 alter the proliferation/differentiation of hemopoietic progenitors. Mutations in exon 2 interfere with the synthesis of the full-length isoform of GATA-1 and lead to the production of a shortened isoform, GATA-1s. These mutations have been found in patients with Diamond–Blackfan anemia (DBA), a congenital erythroid aplasia typically caused by mutations in genes encoding ribosomal proteins. We sequenced GATA-1 in 23 patients that were negative for mutations in the most frequently mutated DBA genes. One patient showed a c.2T > C mutation in the initiation codon leading to the loss of the full-length GATA-1 isoform. PMID:24453067
Novel Method for High-Throughput Full-Length IGHV-D-J Sequencing of the Immune Repertoire from Bulk B-Cells with Single-Cell Resolution.

PubMed

Vergani, Stefano; Korsunsky, Ilya; Mazzarello, Andrea Nicola; Ferrer, Gerardo; Chiorazzi, Nicholas; Bagnara, Davide

2017-01-01

Efficient and accurate high-throughput DNA sequencing of the adaptive immune receptor repertoire (AIRR) is necessary to study immune diversity in healthy subjects and disease-related conditions. The high complexity and diversity of the AIRR coupled with the limited amount of starting material, which can compromise identification of the full biological diversity makes such sequencing particularly challenging. AIRR sequencing protocols often fail to fully capture the sampled AIRR diversity, especially for samples containing restricted numbers of B lymphocytes. Here, we describe a library preparation method for immunoglobulin sequencing that results in an exhaustive full-length repertoire where virtually every sampled B-cell is sequenced. This maximizes the likelihood of identifying and quantifying the entire IGHV-D-J repertoire of a sample, including the detection of rearrangements present in only one cell in the starting population. The methodology establishes the importance of circumventing genetic material dilution in the preamplification phases and incorporates the use of certain described concepts: (1) balancing the starting material amount and depth of sequencing, (2) avoiding IGHV gene-specific amplification, and (3) using Unique Molecular Identifier. Together, this methodology is highly efficient, in particular for detecting rare rearrangements in the sampled population and when only a limited amount of starting material is available.
A diverse family of serine proteinase genes expressed in cotton boll weevil (Anthonomus grandis): implications for the design of pest-resistant transgenic cotton plants.

PubMed

Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F

2004-09-01

Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Evaluation of an Optimal Epidemiological Typing Scheme for Legionella pneumophila with Whole-Genome Sequence Data Using Validation Guidelines

PubMed Central

Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R.; Afshar, Baharak; Underwood, Anthony; Harrison, Timothy G.

2016-01-01

Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current “gold standard” typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila. However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard “typing panel,” previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. PMID:27280420
Evaluation of an Optimal Epidemiological Typing Scheme for Legionella pneumophila with Whole-Genome Sequence Data Using Validation Guidelines.

PubMed

David, Sophia; Mentasti, Massimo; Tewolde, Rediat; Aslett, Martin; Harris, Simon R; Afshar, Baharak; Underwood, Anthony; Fry, Norman K; Parkhill, Julian; Harrison, Timothy G

2016-08-01

Sequence-based typing (SBT), analogous to multilocus sequence typing (MLST), is the current "gold standard" typing method for investigation of legionellosis outbreaks caused by Legionella pneumophila However, as common sequence types (STs) cause many infections, some investigations remain unresolved. In this study, various whole-genome sequencing (WGS)-based methods were evaluated according to published guidelines, including (i) a single nucleotide polymorphism (SNP)-based method, (ii) extended MLST using different numbers of genes, (iii) determination of gene presence or absence, and (iv) a kmer-based method. L. pneumophila serogroup 1 isolates (n = 106) from the standard "typing panel," previously used by the European Society for Clinical Microbiology Study Group on Legionella Infections (ESGLI), were tested together with another 229 isolates. Over 98% of isolates were considered typeable using the SNP- and kmer-based methods. Percentages of isolates with complete extended MLST profiles ranged from 99.1% (50 genes) to 86.8% (1,455 genes), while only 41.5% produced a full profile with the gene presence/absence scheme. Replicates demonstrated that all methods offer 100% reproducibility. Indices of discrimination range from 0.972 (ribosomal MLST) to 0.999 (SNP based), and all values were higher than that achieved with SBT (0.940). Epidemiological concordance is generally inversely related to discriminatory power. We propose that an extended MLST scheme with ∼50 genes provides optimal epidemiological concordance while substantially improving the discrimination offered by SBT and can be used as part of a hierarchical typing scheme that should maintain backwards compatibility and increase discrimination where necessary. This analysis will be useful for the ESGLI to design a scheme that has the potential to become the new gold standard typing method for L. pneumophila. Copyright © 2016 David et al.
The central domain of bovine submaxillary mucin consists of over 50 tandem repeats of 329 amino acids. Chromosomal localization of the BSM1 gene and relations to ovine and porcine counterparts.

PubMed

Jiang, W; Gupta, D; Gallagher, D; Davis, S; Bhavanandan, V P

2000-04-01

We previously elucidated five distinct protein domains (I-V) for bovine submaxillary mucin, which is encoded by two genes, BSM1 and BSM2. Using Southern blot analysis, genomic cloning and sequencing of the BSM1 gene, we now show that the central domain (V) consists of approximately 55 tandem repeats of 329 amino acids and that domains III-V are encoded by a 58.4-kb exon, the largest exon known for all genes to date. The BSM1 gene was mapped by fluorescence in situ hybridization to the proximal half of chromosome 5 at bands q2. 2-q2.3. The amino-acid sequence of six tandem repeats (two full and four partial) were found to have only 92-94% identities. We propose that the variability in the amino-acid sequences of the mucin tandem repeat is important for generating the combinatorial library of saccharides that are necessary for the protective function of mucins. The deduced peptide sequences of the central domain match those determined from the purified bovine submaxillary mucin and also show 68-94% identity to published peptide sequences of ovine submaxillary mucin. This indicates that the core protein of ovine submaxillary mucin is closely related to that of bovine submaxillary mucin and contains similar tandem repeats in the central domain. In contrast, the central domain of porcine submaxillary mucin is reported to consist of 81-amino-acid tandem repeats. However, both bovine submaxillary mucin and porcine submaxillary mucin contain similar N-terminal and C-terminal domains and the corresponding genes are in the conserved linkage regions of the respective genomes.
Characterization and genetic analysis of an EIN4-like sequence (CaETR-1) located in QTL ARI implicated in Ascochyta blight resistance in chickpea

USDA-ARS?s Scientific Manuscript database

Two different alleles of an ethylene receptor gene (CaETR-1) of chickpea (Cicer aritinum) were isolated and characterized through synteny analysis with genome sequences of Medicago truncatula. The full length of CaETR-1 in cultivar FLIP84-92C (CaETR-1a) is 4,428 bp including the polyadenylation sig...
Analysis of expressed sequence tags from a single wheat cultivar facilitates interpretation of tandem mass spectrometry data and discrimination of gamma gliadin proteins that may play different functional roles in flour

USDA-ARS?s Scientific Manuscript database

The complement of gamma gliadin genes expressed in the wheat cultivar Butte 86 was evaluated by analyzing publicly available expressed sequence tag (EST) data. Eleven contigs were assembled from 153 Butte 86 ESTs. Nine of the contigs encoded full-length proteins and four of the proteins contained an...

[The genetical evolution of the full length genes of 5 EV 71 strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV71 infection].

PubMed

Liu, Wei-long; Yang, Gui-lin; Wei, Qing; Zhang, Ming-xia; Chen, Xin-chun; Liu, Ying-xia; Gao, Yang; Zhou, Bo-ping

2011-02-01

To investigate the characteristics of molecular epidemiology and molecular evolution of 5 EV 71 (enterovirus 71, EV71) strains from 5 Shenzhen patients with hand-food-mouth disease associated with EV 71 infection. 5 EV 71 strains were isolated, and sequenced to analyzed the full length gene sequences in order to compare nucleotide and amino acid homology with other EV71 strains from other regions and countries as well as previous strains across the world through bioinformatics software. 5 strains of EV 71 belonged to sub-genotype C4 by analysis of nucleotide sequences of VP1 and VP4 of EV 71. The differences of nucleotide and amino acid sequences were much small with nucleotide homology of 93% and amino acid homology of 98% among these 5 strains. A phylogenetic tree analysis indicated that 2008 Shenzhen epidemic strains were the most close to 2004 Shenzhen circulating strains, and also much close to 1998 Shenzhen epidemic strains and 2008 Fuyang Anhui strains. The dead strain was very close to 2008 Fuyang Anhui epidemic strains. It can be speculated that this epidemic strains of EV 71 probably originate from the same ancient strain in the history, may from 1998 Shenzhen strain.
Deciphering of the Dual oxidase (Nox family) gene from kuruma shrimp, Marsupenaeus japonicus: full-length cDNA cloning and characterization.

PubMed

Inada, Mari; Kihara, Keisuke; Kono, Tomoya; Sudhakaran, Raja; Mekata, Tohru; Sakai, Masahiro; Yoshida, Terutoyo; Itami, Toshiaki

2013-02-01

In many physiological processes, including the innate immune system, free radicals such as nitric oxide (NO) and reactive oxygen species (ROS) play significant roles. In humans, 2 homologs of Dual oxidases (Duox) generate hydrogen peroxide (H(2)O(2)), which is a type of ROS. Here, we report the identification and characterization of a Duox from kuruma shrimp, Marsupenaeus japonicus. The full-length cDNA sequence of the M. japonicus Dual oxidase (MjDuox) gene contains 4695 bp and was generated using reverse transcriptase-polymerase chain reaction (RT-PCR) and random amplification of cDNA ends (RACE). The open reading frame of MjDuox encodes a protein of 1498 amino acids with an estimated mass of 173 kDa. In a homology analysis using amino acid sequences, MjDuox exhibited 69.3% sequence homology with the Duox of the red flour beetle, Tribolium castaneum. A transcriptional analysis revealed that the MjDuox mRNA is highly expressed in the gills of healthy kuruma shrimp. In the gills, MjDuox expression reached its peak 60 h after injection with WSSV and decreased to its normal level at 72 h. In gene knockdown experiments of free radical-generating enzymes, the survival rates decreased during the early stages of a white spot syndrome virus (WSSV) infection following the knockdown of the NADPH oxidase (MjNox) or MjDuox genes. In the present study, the identification, cloning and gene knockdown of the kuruma shrimp MjDuox are reported. Duoxes have been identified in vertebrates and some insects; however, few reports have investigated Duoxes in crustaceans. This study is the first to identify and clone a Dual oxidase from a crustacean species. Copyright © 2012 Elsevier Ltd. All rights reserved.
A comparative molecular analysis of water-filled limestone sinkholes in north-eastern Mexico.

PubMed

Sahl, Jason W; Gary, Marcus O; Harris, J Kirk; Spear, John R

2011-01-01

Sistema Zacatón in north-eastern Mexico is host to several deep, water-filled, anoxic, karstic sinkholes (cenotes). These cenotes were explored, mapped, and geochemically and microbiologically sampled by the autonomous underwater vehicle deep phreatic thermal explorer (DEPTHX). The community structure of the filterable fraction of the water column and extensive microbial mats that coat the cenote walls was investigated by comparative analysis of small-subunit (SSU) 16S rRNA gene sequences. Full-length Sanger gene sequence analysis revealed novel microbial diversity that included three putative bacterial candidate phyla and three additional groups that showed high intra-clade distance with poorly characterized bacterial candidate phyla. Limited functional gene sequence analysis in these anoxic environments identified genes associated with methanogenesis, sulfate reduction and anaerobic ammonium oxidation. A directed, barcoded amplicon, multiplex pyrosequencing approach was employed to compare ∼100,000 bacterial SSU gene sequences from water column and wall microbial mat samples from five cenotes in Sistema Zacatón. A new, high-resolution sequence distribution profile (SDP) method identified changes in specific phylogenetic types (phylotypes) in microbial mats at varied depths; Mantel tests showed a correlation of the genetic distances between mat communities in two cenotes and the geographic location of each cenote. Community structure profiles from the water column of three neighbouring cenotes showed distinct variation; statistically significant differences in the concentration of geochemical constituents suggest that the variation observed in microbial communities between neighbouring cenotes are due to geochemical variation. © 2010 Society for Applied Microbiology and Blackwell Publishing Ltd.
Unraveling Haplotype Diversity of the Apical Membrane Antigen-1 Gene in Plasmodium falciparum Populations in Thailand

PubMed Central

Lumkul, Lalita; Sawaswong, Vorthon; Simpalipan, Phumin; Kaewthamasorn, Morakot; Harnyuttanakorn, Pongchai; Pattaradilokrat, Sittiporn

2018-01-01

Development of an effective vaccine is critically needed for the prevention of malaria. One of the key antigens for malaria vaccines is the apical membrane antigen 1 (AMA-1) of the human malaria parasite Plasmodium falciparum, the surface protein for erythrocyte invasion of the parasite. The gene encoding AMA-1 has been sequenced from populations of P. falciparum worldwide, but the haplotype diversity of the gene in P. falciparum populations in the Greater Mekong Subregion (GMS), including Thailand, remains to be characterized. In the present study, the AMA-1 gene was PCR amplified and sequenced from the genomic DNA of 65 P. falciparum isolates from 5 endemic areas in Thailand. The nearly full-length 1,848 nucleotide sequence of AMA-1 was subjected to molecular analyses, including nucleotide sequence diversity, haplotype diversity and deduced amino acid sequence diversity and neutrality tests. Phylogenetic analysis and pairwise population differentiation (Fst indices) were performed to infer the population structure. The analyses identified 60 single nucleotide polymorphic loci, predominately located in domain I of AMA-1. A total of 31 unique AMA-1 haplotypes were identified, which included 11 novel ones. The phylogenetic tree of the AMA-1 haplotypes revealed multiple clades of AMA-1, each of which contained parasites of multiple geographical origins, consistent with the Fst indices indicating genetic homogeneity or gene flow among geographically distinct populations of P. falciparum in Thailand’s borders with Myanmar, Laos and Cambodia. In summary, the study revealed novel haplotypes and population structure needed for the further advancement of AMA-1-based malaria vaccines in the GMS. PMID:29742870
Genetic speciation of environmental Legionella isolates in Thailand.

PubMed

Paveenkittiporn, Wantana; Dejsirilert, Surang; Kalambaheti, Thareerat

2012-10-01

Legionella-like organisms were isolated during 2003-2007 from various water resources by culturing on selective media of Wadowsky-Yee-Okuda agar. The 256 isolates were identified as belonging to the Legionella genus based on detection of 108 bp PCR product of the 5S rRNA gene, while the inclusion as Legionella pneumophila were confirmed by PCR detection of a specific mip gene region of 168 bp. The 50 isolates, identified as non-pneumophila, were then subjected to DNA tree analysis, based on mip gene of ~650 bp and rnpB genes product ranged from 304 to 354 bp. Phylogenetic tree was constructed to predict their species in relative to the available database. The isolates of which their speciation, based on those two genes were inconclusive, were then investigated for the almost full-length of 16S rRNA sequences. The isolates were assigned as 16 known Legionella species, and proposed seven novel species based on their unique 16S rRNA sequence. Copyright © 2012 Elsevier B.V. All rights reserved.
Complete Chloroplast Genome Sequence of Holoparasite Cistanche deserticola (Orobanchaceae) Reveals Gene Loss and Horizontal Gene Transfer from Its Host Haloxylon ammodendron (Chenopodiaceae)

PubMed Central

Qiao, Qin; Ren, Zhumei; Zhao, Jiayuan; Yonezawa, Takahiro; Hasegawa, Masami; Crabbe, M. James C; Li, Jianqiang; Zhong, Yang

2013-01-01

Background The central function of chloroplasts is to carry out photosynthesis, and its gene content and structure are highly conserved across land plants. Parasitic plants, which have reduced photosynthetic ability, suffer gene losses from the chloroplast (cp) genome accompanied by the relaxation of selective constraints. Compared with the rapid rise in the number of cp genome sequences of photosynthetic organisms, there are limited data sets from parasitic plants. Principal Findings/Significance Here we report the complete sequence of the cp genome of Cistanche deserticola, a holoparasitic desert species belonging to the family Orobanchaceae. The cp genome of C. deserticola is greatly reduced both in size (102,657 bp) and in gene content, indicating that all genes required for photosynthesis suffer from gene loss and pseudogenization, except for psbM. The striking difference from other holoparasitic plants is that it retains almost a full set of tRNA genes, and it has lower dN/dS for most genes than another close holoparasitic plant, E. virginiana, suggesting that Cistanche deserticola has undergone fewer losses, either due to a reduced level of holoparasitism, or to a recent switch to this life history. We also found that the rpoC2 gene was present in two copies within C. deserticola. Its own copy has much shortened and turned out to be a pseudogene. Another copy, which was not located in its cp genome, was a homolog of the host plant, Haloxylon ammodendron (Chenopodiaceae), suggesting that it was acquired from its host via a horizontal gene transfer. PMID:23554920
Complete chloroplast genome sequence of holoparasite Cistanche deserticola (Orobanchaceae) reveals gene loss and horizontal gene transfer from its host Haloxylon ammodendron (Chenopodiaceae).

PubMed

Li, Xi; Zhang, Ti-Cao; Qiao, Qin; Ren, Zhumei; Zhao, Jiayuan; Yonezawa, Takahiro; Hasegawa, Masami; Crabbe, M James C; Li, Jianqiang; Zhong, Yang

2013-01-01

The central function of chloroplasts is to carry out photosynthesis, and its gene content and structure are highly conserved across land plants. Parasitic plants, which have reduced photosynthetic ability, suffer gene losses from the chloroplast (cp) genome accompanied by the relaxation of selective constraints. Compared with the rapid rise in the number of cp genome sequences of photosynthetic organisms, there are limited data sets from parasitic plants. PRINCIPAL FINDINGS/SIGNIFICANCE: Here we report the complete sequence of the cp genome of Cistanche deserticola, a holoparasitic desert species belonging to the family Orobanchaceae. The cp genome of C. deserticola is greatly reduced both in size (102,657 bp) and in gene content, indicating that all genes required for photosynthesis suffer from gene loss and pseudogenization, except for psbM. The striking difference from other holoparasitic plants is that it retains almost a full set of tRNA genes, and it has lower dN/dS for most genes than another close holoparasitic plant, E. virginiana, suggesting that Cistanche deserticola has undergone fewer losses, either due to a reduced level of holoparasitism, or to a recent switch to this life history. We also found that the rpoC2 gene was present in two copies within C. deserticola. Its own copy has much shortened and turned out to be a pseudogene. Another copy, which was not located in its cp genome, was a homolog of the host plant, Haloxylon ammodendron (Chenopodiaceae), suggesting that it was acquired from its host via a horizontal gene transfer.
A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors

PubMed Central

Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa

2006-01-01

Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels.

PubMed

Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed

2018-01-01

The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.
A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels

PubMed Central

Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed

2018-01-01

The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening.

PubMed

Santos, Carla S; Pinheiro, Miguel; Silva, Ana I; Egas, Conceição; Vasconcelos, Marta W

2012-11-07

Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant's molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD.
CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences

PubMed Central

2012-01-01

Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Complete Deletion of the Fucose Operon in Haemophilus influenzae Is Associated with a Cluster in Multilocus Sequence Analysis-Based Phylogenetic Group II Related to Haemophilus haemolyticus: Implications for Identification and Typing

PubMed Central

de Gier, Camilla; Kirkham, Lea-Ann S.

2015-01-01

Nonhemolytic variants of Haemophilus haemolyticus are difficult to differentiate from Haemophilus influenzae despite a wide difference in pathogenic potential. A previous investigation characterized a challenging set of 60 clinical strains using multiple PCRs for marker genes and described strains that could not be unequivocally identified as either species. We have analyzed the same set of strains by multilocus sequence analysis (MLSA) and near-full-length 16S rRNA gene sequencing. MLSA unambiguously allocated all study strains to either of the two species, while identification by 16S rRNA sequence was inconclusive for three strains. Notably, the two methods yielded conflicting identifications for two strains. Most of the “fuzzy species” strains were identified as H. influenzae that had undergone complete deletion of the fucose operon. Such strains, which are untypeable by the H. influenzae multilocus sequence type (MLST) scheme, have sporadically been reported and predominantly belong to a single branch of H. influenzae MLSA phylogenetic group II. We also found evidence of interspecies recombination between H. influenzae and H. haemolyticus within the 16S rRNA genes. Establishing an accurate method for rapid and inexpensive identification of H. influenzae is important for disease surveillance and treatment. PMID:26378279
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.

PubMed Central

Sasaki, H; Yokoyama, E; Kuroiwa, A

1990-01-01

The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects

PubMed Central

Johnson, Ben; Lowe, Gillian C.; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A.; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J.; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula HB; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E.; Watson, Steve P.; Morgan, Neil V.

2016-01-01

Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×109/L to 186×109/L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified “pathogenic” or “likely pathogenic” variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. PMID:27479822
Digital data for Quick Response (QR) codes of thermophiles to identify and compare the bacterial species isolated from Unkeshwar hot springs (India)

PubMed Central

Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.

2015-01-01

16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85′N and 78°25′E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. PMID:26793757
De novo sequencing and analysis of the transcriptome during the browning of fresh-cut Luffa cylindrica 'Fusi-3' fruits

PubMed Central

Chen, Mindong; Wang, Bin; Zhang, Qianrong; Xue, Zhuzheng

2017-01-01

Fresh-cut luffa (Luffa cylindrica) fruits commonly undergo browning. However, little is known about the molecular mechanisms regulating this process. We used the RNA-seq technique to analyze the transcriptomic changes occurring during the browning of fresh-cut fruits from luffa cultivar ‘Fusi-3’. Over 90 million high-quality reads were assembled into 58,073 Unigenes, and 60.86% of these were annotated based on sequences in four public databases. We detected 35,282 Unigenes with significant hits to sequences in the NCBInr database, and 24,427 Unigenes encoded proteins with sequences that were similar to those of known proteins in the Swiss-Prot database. Additionally, 20,546 and 13,021 Unigenes were similar to existing sequences in the Eukaryotic Orthologous Groups of proteins and Kyoto Encyclopedia of Genes and Genomes databases, respectively. Furthermore, 27,301 Unigenes were differentially expressed during the browning of fresh-cut luffa fruits (i.e., after 1–6 h). Moreover, 11 genes from five gene families (i.e., PPO, PAL, POD, CAT, and SOD) identified as potentially associated with enzymatic browning as well as four WRKY transcription factors were observed to be differentially regulated in fresh-cut luffa fruits. With the assistance of rapid amplification of cDNA ends technology, we obtained the full-length sequences of the 15 Unigenes. We also confirmed these Unigenes were expressed by quantitative real-time polymerase chain reaction analysis. This study provides a comprehensive transcriptome sequence resource, and may facilitate further studies aimed at identifying genes affecting luffa fruit browning for the exploitation of the underlying mechanism. PMID:29145430
Sequence and expression of the genes for HPr (ptsH) and enzyme I (ptsI) of the phosphoenolpyruvate-dependent phosphotransferase transport system from Streptococcus mutans.

PubMed Central

Boyd, D A; Cvitkovitch, D G; Hamilton, I R

1994-01-01

We report the sequencing of a 2,242-bp region of the Streptococcus mutants NG5 genome containing the genes for ptsH and ptsI, which encode HPr and enzyme I (EI), respectively, of the phosphoenolpyruvate-dependent phosphotransferase transport system. The sequence was obtained from two cloned overlapping genomic fragments; one expresses HPr and a truncated EI, while the other expresses a full-length EI in Escherichia coli, as determined by Western immunoblotting. The ptsI gene appeared to be expressed from a region located in the ptsH gene. The S. mutans NG5 pts operon does not appear to be linked to other phosphotransferase transport system proteins as has been found in other bacteria. A positive fermentation pattern on MacConkey-glucose plates by an E. coli ptsI mutant harboring the S. mutans NG5 ptsI gene on a plasmid indicated that the S. mutans NG5 EI can complement a defect in the E. coli gene. This was confirmed by protein phosphorylation experiments with 32P-labeled phosphoenolpyruvate indicating phosphotransfer from the S. mutans NG5 EI to the E. coli HPr. Two forms of the cloned EI, both truncated to varying degrees in the C-terminal region, were inefficiently phosphorylated and unable to complement fully the ptsI defect in the E. coli mutant. The deduced amino acid sequence of HPr shows a high degree of homology, particularly around the active site, to the same protein from other gram-positive bacteria, notably, S. salivarius, and to a lesser extent with those of gram-negative bacteria. The deduced amino acid sequence of S. mutans NG5 EI also shares several regions of homology with other sequenced EIs, notably, with the region around the active site, a region that contains the only conserved cystidyl residue among the various proteins and which may be involved in substrate binding. Images PMID:8132321
Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

PubMed Central

Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

2015-01-01

Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

PubMed

Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

2015-01-01

Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat.
Molecular characterization and expression analysis of ubiquitin-activating enzyme E1 gene in Citrus reticulata.

PubMed

Miao, Hong-Xia; Qin, Yong-Hua; Ye, Zi-Xing; Hu, Gui-Bing

2013-01-25

Ubiquitin-activating enzyme E1 (UBE1) catalyzes the first step in the ubiquitination reaction, which targets a protein for degradation via a proteasome pathway. UBE1 plays an important role in metabolic processes. In this study, full-length cDNA and DNA sequences of UBE1 gene, designated CrUBE1, were obtained from 'Wuzishatangju' (self-incompatible, SI) and 'Shatangju' (self-compatible, SC) mandarins. 5 amino acids and 8 bases were different in cDNA and DNA sequences of CrUBE1 between 'Wuzishatangju' and 'Shatangju', respectively. Southern blot analysis showed that there existed only one copy of the CrUBE1 gene in genome of 'Wuzishatangju' and 'Shatangju'. The temporal and spatial expression characteristics of the CrUBE1 gene were investigated using semi-quantitative RT-PCR (SqPCR) and quantitative real-time PCR (qPCR). The expression level of the CrUBE1 gene in anthers of 'Shatangju' was approximately 10-fold higher than in anthers of 'Wuzishatangju'. The highest expression level of CrUBE1 was detected in pistils at 7days after self-pollination of 'Wuzishatangju', which was approximately 5-fold higher than at 0 h. To obtain CrUBE1 protein, the full-length cDNA of CrUBE1 genes from 'Wuzishatangju' and 'Shatangju' were successfully expressed in Pichia pastoris. Pollen germination frequency of 'Wuzishatangju' was significantly inhibited with increasing of CrUBE1 protein concentrations from 'Wuzishatangju'. Copyright © 2012 Elsevier B.V. All rights reserved.
Rational Design of High-Number dsDNA Fragments Based on Thermodynamics for the Construction of Full-Length Genes in a Single Reaction.

PubMed

Birla, Bhagyashree S; Chou, Hui-Hsien

2015-01-01

Gene synthesis is frequently used in modern molecular biology research either to create novel genes or to obtain natural genes when the synthesis approach is more flexible and reliable than cloning. DNA chemical synthesis has limits on both its length and yield, thus full-length genes have to be hierarchically constructed from synthesized DNA fragments. Gibson Assembly and its derivatives are the simplest methods to assemble multiple double-stranded DNA fragments. Currently, up to 12 dsDNA fragments can be assembled at once with Gibson Assembly according to its vendor. In practice, the number of dsDNA fragments that can be assembled in a single reaction are much lower. We have developed a rational design method for gene construction that allows high-number dsDNA fragments to be assembled into full-length genes in a single reaction. Using this new design method and a modified version of the Gibson Assembly protocol, we have assembled 3 different genes from up to 45 dsDNA fragments at once. Our design method uses the thermodynamic analysis software Picky that identifies all unique junctions in a gene where consecutive DNA fragments are specifically made to connect to each other. Our novel method is generally applicable to most gene sequences, and can improve both the efficiency and cost of gene assembly.
Genotyping of the fish rhabdovirus, viral haemorrhagic septicaemia virus, by restriction fragment length polymorphisms

USGS Publications Warehouse

Einer-Jensen, Katja; Winton, James R.; Lorenzen, Niels

2005-01-01

The aim of this study was to develop a standardized molecular assay that used limited resources and equipment for routine genotyping of isolates of the fish rhabdovirus, viral haemorrhagic septicaemia virus (VHSV). Computer generated restriction maps, based on 62 unique full-length (1524 nt) sequences of the VHSV glycoprotein (G) gene, were used to predict restriction fragment length polymorphism (RFLP) patterns that were subsequently grouped and compared with a phylogenetic analysis of the G-gene sequences of the same set of isolates. Digestion of PCR amplicons from the full-lengthG-gene by a set of three restriction enzymes was predicted to accurately enable the assignment of the VHSV isolates into the four major genotypes discovered to date. Further sub-typing of the isolates into the recently described sub-lineages of genotype I was possible by applying three additional enzymes. Experimental evaluation of the method consisted of three steps: (i) RT-PCR amplification of the G-gene of VHSV isolates using purified viral RNA as template, (ii) digestion of the PCR products with a panel of restriction endonucleases and (iii) interpretation of the resulting RFLP profiles. The RFLP analysis was shown to approximate the level of genetic discrimination obtained by other, more labour-intensive, molecular techniques such as the ribonuclease protection assay or sequence analysis. In addition, 37 previously uncharacterised isolates from diverse sources were assigned to specific genotypes. While the assay was able to distinguish between marine and continental isolates of VHSV, the differences did not correlate with the pathogenicity of the isolates.
Codon-Anticodon Recognition in the Bacillus subtilis glyQS T Box Riboswitch

PubMed Central

Caserta, Enrico; Liu, Liang-Chun; Grundy, Frank J.; Henkin, Tina M.

2015-01-01

Many amino acid-related genes in Gram-positive bacteria are regulated by the T box riboswitch. The leader RNA of genes in the T box family controls the expression of downstream genes by monitoring the aminoacylation status of the cognate tRNA. Previous studies identified a three-nucleotide codon, termed the “Specifier Sequence,” in the riboswitch that corresponds to the amino acid identity of the downstream genes. Pairing of the Specifier Sequence with the anticodon of the cognate tRNA is the primary determinant of specific tRNA recognition. This interaction mimics codon-anticodon pairing in translation but occurs in the absence of the ribosome. The goal of the current study was to determine the effect of a full range of mismatches for comparison with codon recognition in translation. Mutations were individually introduced into the Specifier Sequence of the glyQS leader RNA and tRNAGly anticodon to test the effect of all possible pairing combinations on tRNA binding affinity and antitermination efficiency. The functional role of the conserved purine 3′ of the Specifier Sequence was also verifiedin this study. We found that substitutions at the Specifier Sequence resulted in reduced binding, the magnitude of which correlates well with the predicted stability of the RNA-RNA pairing. However, the tolerance for specific mismatches in antitermination was generally different from that during decoding, which reveals a unique tRNA recognition pattern in the T box antitermination system. PMID:26229106
Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

PubMed

Kovács, Endre R; Benko, Mária

2009-03-01

Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.
[Cloning and characterization of Caveolin-1 gene in pigeon, Columba livia domestica].

PubMed

Zhang, Ying; Yu, Jian-Feng; Yang, Li; Wang, Xing-Guo; Gu, Zhi-Liang

2010-10-01

Caveolins, a class of principal proteins forming the structure of caveolae in plasmalemma, were encoded by caveolins gene family. Caveolin-1 gene is a member of caveolins gene family. In the present study, a full-length of 2605 bp caveolin-1 cDNA sequence in Columba livia domestica, which included a 537 bp complete ORF encoding a 178 amino acids long putative peptide, were obtained by using RT-PCR and RACE technique. The Columba livia domestica caveolin-1 CDS shared 80.1% - 93.4% homology with Bos taurus, Canis lupus familiaris, Gallus gallus and Rattus norvegicus. Meanwhile, the putative amino acid sequence of Columba livia domestica caveolin-1 shared 85.4% - 97.2% homology with the above species. The semi-quantity RT-PCR revealed that Caveolin-1 expressions were detectable in all the Columba livia domestica tissues and the expressional level of caveolin-1 gene was high in adipose, medium in various muscles, low in liver. These results demonstrated that Caveolin-1 gene was potentially involved in some metabolic pathways in adipose and muscle.
Determination of the Core of a Minimal Bacterial Gene Set†

PubMed Central

Gil, Rosario; Silva, Francisco J.; Peretó, Juli; Moya, Andrés

2004-01-01

The availability of a large number of complete genome sequences raises the question of how many genes are essential for cellular life. Trying to reconstruct the core of the protein-coding gene set for a hypothetical minimal bacterial cell, we have performed a computational comparative analysis of eight bacterial genomes. Six of the analyzed genomes are very small due to a dramatic genome size reduction process, while the other two, corresponding to free-living relatives, are larger. The available data from several systematic experimental approaches to define all the essential genes in some completely sequenced bacterial genomes were also considered, and a reconstruction of a minimal metabolic machinery necessary to sustain life was carried out. The proposed minimal genome contains 206 protein-coding genes with all the genetic information necessary for self-maintenance and reproduction in the presence of a full complement of essential nutrients and in the absence of environmental stress. The main features of such a minimal gene set, as well as the metabolic functions that must be present in the hypothetical minimal cell, are discussed. PMID:15353568
Structural analysis of the RH-like blood group gene products in nonhuman primates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Salvignol, I.; Calvas, P.; Blancher, A.

1995-03-01

Rh-related transcripts present in bone marrow samples from several species of nonhuman primates (chimpanzee, gorilla, gibbon, crab-eating macaque) have been amplified by RT-polymerase chain reaction using primers deduced from the sequence of human RH genes. Nucleotide sequence analysis of the nonhuman transcripts revealed a high degree of similarity to human blood group Rh sequences, suggesting a great conservation of the RH genes throughout evolution. Full-length transcripts, potentially encoding 417 amino acid long proteins homologous to Rh polypeptides, were characterized, as well as mRNA isoforms which harbored nucleotide deletions or insertions and potentially encode truncated proteins. Proteins of 30-40,000 M{sub r},more » immunologically related to human Rh proteins, were detected by western blot analysis with antipeptide antibodies, indicating that Rh-like transcripts are translated into membrane proteins. Comparison of human and nonhuman protein sequences was pivotal in clarifying the molecular basis of the blood group C/c polymorphism, showing that only the Pro103Ser substitution was correlated with C/c polymorphism. In addition, it was shown that a proline residue at position 102 was critical in the expression of C and c epitopes, most likely by providing an appropriate conformation of Rh polypeptides. From these data a phylogenetic reconstruction of the RH locus evolution has been calculated from which an unrooted phylogenetic tree could be proposed, indicating that African ape Rh-like genes would be closer to the human RhD gene than to the human RhCE gene. 55 refs., 4 figs., 1 tab.« less
Genome sequence of the photoarsenotrophic bacterium Ectothiorhodospira sp. strain BSL-9, isolated from a hypersaline alkaline arsenic-rich extreme environment

USGS Publications Warehouse

Hernandez-Maldonado, Jaime; Stoneburner, Brendon; Boren, Alison; Miller, Laurence; Rosen, Michael R.; Oremland, Ronald S.; Saltikov, Chad W

2016-01-01

The full genome sequence of Ectothiorhodospira sp. strain BSL-9 is reported here. This purple sulfur bacterium encodes an arxA-type arsenite oxidase within the arxB2AB1CD gene island and is capable of carrying out “photoarsenotrophy” anoxygenic photosynthetic arsenite oxidation. Its genome is composed of 3.5 Mb and has approximately 63% G+C content.
Expression of the histone chaperone SET/TAF-Iβ during the strobilation process of Mesocestoides corti (Platyhelminthes, Cestoda).

PubMed

Costa, Caroline B; Monteiro, Karina M; Teichmann, Aline; da Silva, Edileuza D; Lorenzatto, Karina R; Cancela, Martín; Paes, Jéssica A; Benitz, André de N D; Castillo, Estela; Margis, Rogério; Zaha, Arnaldo; Ferreira, Henrique B

2015-08-01

The histone chaperone SET/TAF-Iβ is implicated in processes of chromatin remodelling and gene expression regulation. It has been associated with the control of developmental processes, but little is known about its function in helminth parasites. In Mesocestoides corti, a partial cDNA sequence related to SET/TAF-Iβ was isolated in a screening for genes differentially expressed in larvae (tetrathyridia) and adult worms. Here, the full-length coding sequence of the M. corti SET/TAF-Iβ gene was analysed and the encoded protein (McSET/TAF) was compared with orthologous sequences, showing that McSET/TAF can be regarded as a SET/TAF-Iβ family member, with a typical nucleosome-assembly protein (NAP) domain and an acidic tail. The expression patterns of the McSET/TAF gene and protein were investigated during the strobilation process by RT-qPCR, using a set of five reference genes, and by immunoblot and immunofluorescence, using monospecific polyclonal antibodies. A gradual increase in McSET/TAF transcripts and McSET/TAF protein was observed upon development induction by trypsin, demonstrating McSET/TAF differential expression during strobilation. These results provided the first evidence for the involvement of a protein from the NAP family of epigenetic effectors in the regulation of cestode development.
Too much data, but little inter-changeability: a lesson learned from mining public data on tissue specificity of gene expression.

PubMed

Li, Shuyu; Li, Yiqun Helen; Wei, Tao; Su, Eric Wen; Duffin, Kevin; Liao, Birong

2006-10-25

The tissue expression pattern of a gene often provides an important clue to its potential role in a biological process. A vast amount of gene expression data have been and are being accumulated in public repository through different technology platforms. However, exploitations of these rich data sources remain limited in part due to issues of technology standardization. Our objective is to test the data comparability between SAGE and microarray technologies, through examining the expression pattern of genes under normal physiological states across variety of tissues. There are 42-54% of genes showing significant correlations in tissue expression patterns between SAGE and GeneChip, with 30-40% of genes whose expression patterns are positively correlated and 10-15% of genes whose expression patterns are negatively correlated at a statistically significant level (p = 0.05). Our analysis suggests that the discrepancy on the expression patterns derived from technology platforms is not likely from the heterogeneity of tissues used in these technologies, or other spurious correlations resulting from microarray probe design, abundance of genes, or gene function. The discrepancy can be partially explained by errors in the original assignment of SAGE tags to genes due to the evolution of sequence databases. In addition, sequence analysis has indicated that many SAGE tags and Affymetrix array probe sets are mapped to different splice variants or different sequence regions although they represent the same gene, which also contributes to the observed discrepancies between SAGE and array expression data. To our knowledge, this is the first report attempting to mine gene expression patterns across tissues using public data from different technology platforms. Unlike previous similar studies that only demonstrated the discrepancies between the two gene expression platforms, we carried out in-depth analysis to further investigate the cause for such discrepancies. Our study shows that the exploitation of rich public expression resource requires extensive knowledge about the technologies, and experiment. Informatic methodologies for better interoperability among platforms still remain a gap. One of the areas that can be improved practically is the accurate sequence mapping of SAGE tags and array probes to full-length genes.
Phylogenetic analysis of Newcastle disease viruses from Bangladesh suggests continuing evolution of genotype XIII.

PubMed

Barman, Lalita Rani; Nooruzzaman, Mohammed; Sarker, Rahul Deb; Rahman, Md Tazinur; Saife, Md Rajib Bin; Giasuddin, Mohammad; Das, Bidhan Chandra; Das, Priya Mohan; Chowdhury, Emdadul Haque; Islam, Mohammad Rafiqul

2017-10-01

A total of 23 Newcastle disease virus (NDV) isolates from Bangladesh taken between 2010 and 2012 were characterized on the basis of partial F gene sequences. All the isolates belonged to genotype XIII of class II NDV but segregated into three sub-clusters. One sub-cluster with 17 isolates aligned with sub-genotype XIIIc. The other two sub-clusters were phylogenetically distinct from the previously described sub-genotypes XIIIa, XIIIb and XIIIc and could be candidates of new sub-genotypes; however, that needs to be validated through full-length F gene sequence data. The results of the present study suggest that genotype XIII NDVs are under continuing evolution in Bangladesh.
Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

PubMed

Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

2013-07-01

Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.
Linear Lepidopteran ambidensovirus 1 sequences drive random integration of a reporter gene in transfected Spodoptera frugiperda cells.

PubMed

Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry

2018-01-01

The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
The Use of a Combined Bioinformatics Approach to Locate Antibiotic Resistance Genes on Plasmids From Whole Genome Sequences of Salmonella enterica Serovars From Humans in Ghana.

PubMed

Kudirkiene, Egle; Andoh, Linda A; Ahmed, Shahana; Herrero-Fresno, Ana; Dalsgaard, Anders; Obiri-Danso, Kwasi; Olsen, John E

2018-01-01

In the current study, we identified plasmids carrying antimicrobial resistance genes in draft whole genome sequences of 16 selected Salmonella enterica isolates representing six different serovars from humans in Ghana. The plasmids and the location of resistance genes in the genomes were predicted using a combination of PlasmidFinder, ResFinder, plasmidSPAdes and BLAST genomic analysis tools. Subsequently, S1-PFGE was employed for analysis of plasmid profiles. Whole genome sequencing confirmed the presence of antimicrobial resistance genes in Salmonella isolates showing multidrug resistance phenotypically. ESBL, either bla TEM52-B or bla CTX-M15 were present in two cephalosporin resistant isolates of S . Virchow and S . Poona, respectively. The systematic genome analysis revealed the presence of different plasmids in different serovars, with or without insertion of antimicrobial resistance genes. In S . Enteritidis, resistance genes were carried predominantly on plasmids of IncN type, in S . Typhimurium on plasmids of IncFII(S)/IncFIB(S)/IncQ1 type. In S . Virchow and in S . Poona, resistance genes were detected on plasmids of IncX1 and TrfA/IncHI2/IncHI2A type, respectively. The latter two plasmids were described for the first time in these serovars. The combination of genomic analytical tools allowed nearly full mapping of the resistance plasmids in all Salmonella strains analyzed. The results suggest that the improved analytical approach used in the current study may be used to identify plasmids that are specifically associated with resistance phenotypes in whole genome sequences. Such knowledge would allow the development of rapid multidrug resistance tracking tools in Salmonella populations using WGS.
Novel rare variations of the oxytocin receptor (OXTR) gene in autism spectrum disorder individuals.

PubMed

Liu, Xiaoxi; Kawashima, Minae; Miyagawa, Taku; Otowa, Takeshi; Latt, Khun Zaw; Thiri, Myo; Nishida, Hisami; Sugiyama, Toshiro; Tsurusaki, Yoshinori; Matsumoto, Naomichi; Mabuchi, Akihiko; Tokunaga, Katsushi; Sasaki, Tsukasa

2015-01-01

The oxytocin receptor (OXTR) gene has been implicated as a risk gene for autism spectrum disorder (ASD)-a neurodevelopmental disorder with essential features of impairments in social communication and reciprocal interaction. The genetic associations between common variations in OXTR and ASD have been reported in multiple ethnic populations. However, little is known about the distribution of rare variations within OXTR in ASD patients. In this study, we resequenced the full length of OXTR in 105 ASD individuals using an approach that combined the power of next-generation sequencing technology, long-range PCR and DNA pooling. We demonstrated that rare variants with minor allele frequency as low as 0.05% could be reliably detected by our method. We identified 28 novel variants including potential functional variants in the intron region and one rare missense variant (R150S). We subsequently performed Sanger sequencing and validated five novel variants located in previously suggested candidate regions in ASD individuals. Further sequencing of 312 healthy subjects showed that the burden of rare variants is significantly higher in ASDs compared with healthy individuals. Our results support that the rare variation in OXTR gene might be involved in ASD.
Phylogenetic analysis of nitrite, nitric oxide, and nitrous oxide respiratory enzymes reveal a complex evolutionary history for denitrification.

PubMed

Jones, Christopher M; Stres, Blaz; Rosenquist, Magnus; Hallin, Sara

2008-09-01

Denitrification is a facultative respiratory pathway in which nitrite (NO2(-)), nitric oxide (NO), and nitrous oxide (N2O) are successively reduced to nitrogen gas (N(2)), effectively closing the nitrogen cycle. The ability to denitrify is widely dispersed among prokaryotes, and this polyphyletic distribution has raised the possibility of horizontal gene transfer (HGT) having a substantial role in the evolution of denitrification. Comparisons of 16S rRNA and denitrification gene phylogenies in recent studies support this possibility; however, these results remain speculative as they are based on visual comparisons of phylogenies from partial sequences. We reanalyzed publicly available nirS, nirK, norB, and nosZ partial sequences using Bayesian and maximum likelihood phylogenetic inference. Concomitant analysis of denitrification genes with 16S rRNA sequences from the same organisms showed substantial differences between the trees, which were supported by examining the posterior probability of monophyletic constraints at different taxonomic levels. Although these differences suggest HGT of denitrification genes, the presence of structural variants for nirK, norB, and nosZ makes it difficult to determine HGT from other evolutionary events. Additional analysis using phylogenetic networks and likelihood ratio tests of phylogenies based on full-length sequences retrieved from genomes also revealed significant differences in tree topologies among denitrification and 16S rRNA gene phylogenies, with the exception of the nosZ gene phylogeny within the data set of the nirK-harboring genomes. However, inspection of codon usage and G + C content plots from complete genomes gave no evidence for recent HGT. Instead, the close proximity of denitrification gene copies in the genomes of several denitrifying bacteria suggests duplication. Although HGT cannot be ruled out as a factor in the evolution of denitrification genes, our analysis suggests that other phenomena, such gene duplication/divergence and lineage sorting, may have differently influenced the evolution of each denitrification gene.
ILG1 : a new integrase-like gene that is a marker of bacterial contamination by the laboratory Escherichia coli strain TOP10F'.

PubMed Central

Tian, Wenzhi; Chua, Kevin; Strober, Warren; Chu, Charles C.

2002-01-01

BACKGROUND: Identification of differentially expressed genes between normal and diseased states is an area of intense current medical research that can lead to the discovery of new therapeutic targets. However, isolation of differentially expressed genes by subtraction often suffers from unreported contamination of the resulting subtraction library with clones containing DNA sequences not from the original RNA samples. MATERIALS AND METHODS: Subtraction using cDNA representational difference analysis (RDA) was performed on human B cells from normal or common variable immunodeficiency patients. The material remaining after the subtraction was cloned and individual clones were sequenced. The sequence of one clone with similarity to integrases (ILG1, integrase-like gene-1) was used to obtain the full length cDNA sequence and as a probe for the presence of this sequence in RNA or genomic DNA samples. RESULTS: After five rounds of cDNA RDA, 23.3% of the clones from the resulting subtraction library contained Escherichia coli DNA. In addition, three clones contained the sequence of a new integrase, ILG1. The full length cDNA sequence of ILG1 exhibits prokaryotic, but not eukaryotic, features. At the DNA level, ILG1 is not similar to any known gene. At the protein level, ILG1 has 58% similarity to integrases from the cryptic P4 bacteriophage family (S clade). The catalytic domain of ILG1 contains the conserved features found in site-specific recombinases. The critical residues that form the catalytic active site pocket are conserved, including the highly conserved R-H-R-Y hallmark of these recombinases. Interestingly, ILG1 was not present in the original B cell populations. By probing genomic DNA, ILG1 could only be detected in the E. coli TOP10F' strain used in our laboratory for molecular cloning, but not in any of its precursor strains, including TOP10. Furthermore, bacteria cultured from the mouth of the laboratory worker who performed cDNA RDA were also positive for ILG1. CONCLUSIONS: In the course of our studies using cDNA RDA, we have isolated and identified ILG1, a likely active site-specific recombinase and new member of the bacteriophage P4 family of integrases. This family of integrases is implicated in the horizontal DNA transfer of pathogenic genes between bacterial species, such as those found in pathogenic strains of E. coli, Shigella, Yersinia, and Vibrio cholera. Using ILG1 as a marker of our laboratory E. coli strain TOP10F', our evidence suggests that contaminating bacterial DNA in our subtraction experiment is due to this laboratory bacterial strain, which colonized exposed surfaces of the laboratory worker. Thus, identification of differentially expressed genes between normal and diseased states could be dramatically improved by using extra precaution to prevent bacterial contamination of samples. PMID:12393938
Horizontal gene transfer is more frequent with increased heterotrophy and contributes to parasite adaptation

PubMed Central

Yang, Zhenzhen; Zhang, Yeting; Wafula, Eric K.; Honaas, Loren A.; Ralph, Paula E.; Jones, Sam; Clarke, Christopher R.; Liu, Siming; Su, Chun; Zhang, Huiting; Altman, Naomi S.; Schuster, Stephan C.; Timko, Michael P.; Yoder, John I.; dePamphilis, Claude W.

2016-01-01

Horizontal gene transfer (HGT) is the transfer of genetic material across species boundaries and has been a driving force in prokaryotic evolution. HGT involving eukaryotes appears to be much less frequent, and the functional implications of HGT in eukaryotes are poorly understood. We test the hypothesis that parasitic plants, because of their intimate feeding contacts with host plant tissues, are especially prone to horizontal gene acquisition. We sought evidence of HGTs in transcriptomes of three parasitic members of Orobanchaceae, a plant family containing species spanning the full spectrum of parasitic capabilities, plus the free-living Lindenbergia. Following initial phylogenetic detection and an extensive validation procedure, 52 high-confidence horizontal transfer events were detected, often from lineages of known host plants and with an increasing number of HGT events in species with the greatest parasitic dependence. Analyses of intron sequences in putative donor and recipient lineages provide evidence for integration of genomic fragments far more often than retro-processed RNA sequences. Purifying selection predominates in functionally transferred sequences, with a small fraction of adaptively evolving sites. HGT-acquired genes are preferentially expressed in the haustorium—the organ of parasitic plants—and are strongly biased in predicted gene functions, suggesting that expression products of horizontally acquired genes are contributing to the unique adaptive feeding structure of parasitic plants. PMID:27791104

Structure of Lmaj006129AAA, a hypothetical protein from Leishmania major

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arakaki, Tracy; Le Trong, Isolde; Structural Genomics of Pathogenic Protozoa

2006-03-01

The crystal structure of a conserved hypothetical protein from L. major, Pfam sequence family PF04543, structural genomics target ID Lmaj006129AAA, has been determined at a resolution of 1.6 Å. The gene product of structural genomics target Lmaj006129 from Leishmania major codes for a 164-residue protein of unknown function. When SeMet expression of the full-length gene product failed, several truncation variants were created with the aid of Ginzu, a domain-prediction method. 11 truncations were selected for expression, purification and crystallization based upon secondary-structure elements and disorder. The structure of one of these variants, Lmaj006129AAH, was solved by multiple-wavelength anomalous diffraction (MAD)more » using ELVES, an automatic protein crystal structure-determination system. This model was then successfully used as a molecular-replacement probe for the parent full-length target, Lmaj006129AAA. The final structure of Lmaj006129AAA was refined to an R value of 0.185 (R{sub free} = 0.229) at 1.60 Å resolution. Structure and sequence comparisons based on Lmaj006129AAA suggest that proteins belonging to Pfam sequence families PF04543 and PF01878 may share a common ligand-binding motif.« less
Targeting a Complex Transcriptome: The Construction of the Mouse Full-Length cDNA Encyclopedia

PubMed Central

Carninci, Piero; Waki, Kazunori; Shiraki, Toshiyuki; Konno, Hideaki; Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Arakawa, Takahiro; Ishii, Yoshiyuki; Sasaki, Daisuke; Bono, Hidemasa; Kondo, Shinji; Sugahara, Yuichi; Saito, Rintaro; Osato, Naoki; Fukuda, Shiro; Sato, Kenjiro; Watahiki, Akira; Hirozane-Kishikawa, Tomoko; Nakamura, Mari; Shibata, Yuko; Yasunishi, Ayako; Kikuchi, Noriko; Yoshiki, Atsushi; Kusakabe, Moriaki; Gustincich, Stefano; Beisel, Kirk; Pavan, William; Aidinis, Vassilis; Nakagawara, Akira; Held, William A.; Iwata, Hiroo; Kono, Tomohiro; Nakauchi, Hiromitsu; Lyons, Paul; Wells, Christine; Hume, David A.; Fagiolini, Michela; Hensch, Takao K.; Brinkmeier, Michelle; Camper, Sally; Hirota, Junji; Mombaerts, Peter; Muramatsu, Masami; Okazaki, Yasushi; Kawai, Jun; Hayashizaki, Yoshihide

2003-01-01

We report the construction of the mouse full-length cDNA encyclopedia,the most extensive view of a complex transcriptome,on the basis of preparing and sequencing 246 libraries. Before cloning,cDNAs were enriched in full-length by Cap-Trapper,and in most cases,aggressively subtracted/normalized. We have produced 1,442,236 successful 3′-end sequences clustered into 171,144 groups, from which 60,770 clones were fully sequenced cDNAs annotated in the FANTOM-2 annotation. We have also produced 547,149 5′ end reads,which clustered into 124,258 groups. Altogether, these cDNAs were further grouped in 70,000 transcriptional units (TU),which represent the best coverage of a transcriptome so far. By monitoring the extent of normalization/subtraction, we define the tentative equivalent coverage (TEC),which was estimated to be equivalent to >12,000,000 ESTs derived from standard libraries. High coverage explains discrepancies between the very large numbers of clusters (and TUs) of this project,which also include non-protein-coding RNAs,and the lower gene number estimation of genome annotations. Altogether,5′-end clusters identify regions that are potential promoters for 8637 known genes and 5′-end clusters suggest the presence of almost 63,000 transcriptional starting points. An estimate of the frequency of polyadenylation signals suggests that at least half of the singletons in the EST set represent real mRNAs. Clones accounting for about half of the predicted TUs await further sequencing. The continued high-discovery rate suggests that the task of transcriptome discovery is not yet complete. PMID:12819125
Complete mtDNA sequencing reveals mutations m.9185T>C and m.13513G>A in three patients with Leigh syndrome.

PubMed

Pelnena, Dita; Burnyte, Birute; Jankevics, Eriks; Lace, Baiba; Dagyte, Evelina; Grigalioniene, Kristina; Utkus, Algirdas; Krumina, Zita; Rozentale, Jolanta; Adomaitiene, Irina; Stavusis, Janis; Pliss, Liana; Inashkina, Inna

2017-12-12

The most common mitochondrial disorder in children is Leigh syndrome, which is a progressive and genetically heterogeneous neurodegenerative disorder caused by mutations in nuclear genes or mitochondrial DNA (mtDNA). In the present study, a novel and robust method of complete mtDNA sequencing, which allows amplification of the whole mitochondrial genome, was tested. Complete mtDNA sequencing was performed in a cohort of patients with suspected mitochondrial mutations. Patients from Latvia and Lithuania (n = 92 and n = 57, respectively) referred by clinical geneticists were included. The de novo point mutations m.9185T>C and m.13513G>A, respectively, were detected in two patients with lactic acidosis and neurodegenerative lesions. In one patient with neurodegenerative lesions, the mutation m.9185T>C was identified. These mutations are associated with Leigh syndrome. The present data suggest that full-length mtDNA sequencing is recommended as a supplement to nuclear gene testing and enzymatic assays to enhance mitochondrial disease diagnostics.
Horizontal gene transfer of the algal nuclear gene psbO to the photosynthetic sea slug Elysia chlorotica

PubMed Central

Rumpho, Mary E.; Worful, Jared M.; Lee, Jungho; Kannan, Krishna; Tyler, Mary S.; Bhattacharya, Debashish; Moustafa, Ahmed; Manhart, James R.

2008-01-01

The sea slug Elysia chlorotica acquires plastids by ingestion of its algal food source Vaucheria litorea. Organelles are sequestered in the mollusc's digestive epithelium, where they photosynthesize for months in the absence of algal nucleocytoplasm. This is perplexing because plastid metabolism depends on the nuclear genome for >90% of the needed proteins. Two possible explanations for the persistence of photosynthesis in the sea slug are (i) the ability of V. litorea plastids to retain genetic autonomy and/or (ii) more likely, the mollusc provides the essential plastid proteins. Under the latter scenario, genes supporting photosynthesis have been acquired by the animal via horizontal gene transfer and the encoded proteins are retargeted to the plastid. We sequenced the plastid genome and confirmed that it lacks the full complement of genes required for photosynthesis. In support of the second scenario, we demonstrated that a nuclear gene of oxygenic photosynthesis, psbO, is expressed in the sea slug and has integrated into the germline. The source of psbO in the sea slug is V. litorea because this sequence is identical from the predator and prey genomes. Evidence that the transferred gene has integrated into sea slug nuclear DNA comes from the finding of a highly diverged psbO 3′ flanking sequence in the algal and mollusc nuclear homologues and gene absence from the mitochondrial genome of E. chlorotica. We demonstrate that foreign organelle retention generates metabolic novelty (“green animals”) and is explained by anastomosis of distinct branches of the tree of life driven by predation and horizontal gene transfer. PMID:19004808
Avian influenza virus and Newcastle disease virus (NDV) surveillance in commercial breeding farm in China and the characterization of Class I NDV isolates.

PubMed

Hu, Beixia; Huang, Yanyan; He, Yefeng; Xu, Chuantian; Lu, Xishan; Zhang, Wei; Meng, Bin; Yan, Shigan; Zhang, Xiumei

2010-07-29

In order to determine the actual prevalence of avian influenza virus (AIV) and Newcastle disease virus (NDV) in ducks in Shandong province of China, extensive surveillance studies were carried out in the breeding ducks of an intensive farm from July 2007 to September 2008. Each month cloacal and tracheal swabs were taken from 30 randomly selected birds that appeared healthy. All of the swabs were negative for influenza A virus recovery, whereas 87.5% of tracheal swabs and 100% cloacal swabs collected in September 2007, were positive for Newcastle disease virus isolation. Several NDV isolates were recovered from tracheal and cloacal swabs of apparently healthy ducks. All of the isolates were apathogenic as determined by the MDT and ICPI. The HN gene and the variable region of F gene (nt 47-420) of four isolates selected at random were sequenced. A 374 bp region of F gene and the full length of HN gene were used for phylogenetic analysis. Four isolates were identified as the same isolate based on nucleotide sequences identities of 99.2-100%, displaying a closer phylogenetic relationship to lentogenic Class I viruses. There were 1.9-9.9% nucleotide differences between the isolates and other Class I virus in the variable region of F gene (nt 47-420), whereas there were 38.5-41.2% nucleotide difference between the isolates and Class II viruses. The amino acid sequences of the F protein cleavage sites in these isolates were 112-ERQERL-117. The full length of HN gene of these isolates was 1851 bp, coding 585 amino acids. The homology analysis of the nucleotide sequence of HN gene indicated that there were 2.0-4.2% nucleotide differences between the isolates and other Class I viruses, whereas there were 29.5-40.9% differences between the isolates and Class II viruses. The results shows that these isolates are not phylogenetically related to the vaccine strain (LaSota). This study adds to the understanding of the ecology of influenza viruses and Newcastle disease viruses in ducks and emphasizes the need for constant surveillance in times of an ongoing and expanding epidemic of AIV and NDV. Copyright (c) 2010 Elsevier B.V. All rights reserved.
HLA-E regulatory and coding region variability and haplotypes in a Brazilian population sample.

PubMed

Ramalho, Jaqueline; Veiga-Castelli, Luciana C; Donadi, Eduardo A; Mendes-Junior, Celso T; Castelli, Erick C

2017-11-01

The HLA-E gene is characterized by low but wide expression on different tissues. HLA-E is considered a conserved gene, being one of the least polymorphic class I HLA genes. The HLA-E molecule interacts with Natural Killer cell receptors and T lymphocytes receptors, and might activate or inhibit immune responses depending on the peptide associated with HLA-E and with which receptors HLA-E interacts to. Variable sites within the HLA-E regulatory and coding segments may influence the gene function by modifying its expression pattern or encoded molecule, thus, influencing its interaction with receptors and the peptide. Here we propose an approach to evaluate the gene structure, haplotype pattern and the complete HLA-E variability, including regulatory (promoter and 3'UTR) and coding segments (with introns), by using massively parallel sequencing. We investigated the variability of 420 samples from a very admixed population such as Brazilians by using this approach. Considering a segment of about 7kb, 63 variable sites were detected, arranged into 75 extended haplotypes. We detected 37 different promoter sequences (but few frequent ones), 27 different coding sequences (15 representing new HLA-E alleles) and 12 haplotypes at the 3'UTR segment, two of them presenting a summed frequency of 90%. Despite the number of coding alleles, they encode mainly two different full-length molecules, known as E*01:01 and E*01:03, which corresponds to about 90% of all. In addition, differently from what has been previously observed for other non classical HLA genes, the relationship among the HLA-E promoter, coding and 3'UTR haplotypes is not straightforward because the same promoter and 3'UTR haplotypes were many times associated with different HLA-E coding haplotypes. This data reinforces the presence of only two main full-length HLA-E molecules encoded by the many HLA-E alleles detected in our population sample. In addition, this data does indicate that the distal HLA-E promoter is by far the most variable segment. Further analyses involving the binding of transcription factors and non-coding RNAs, as well as the HLA-E expression in different tissues, are necessary to evaluate whether these variable sites at regulatory segments (or even at the coding sequence) may influence the gene expression profile. Copyright © 2017 Elsevier Ltd. All rights reserved.
Integrating alternative splicing detection into gene prediction.

PubMed

Foissac, Sylvain; Schiex, Thomas

2005-02-10

Alternative splicing (AS) is now considered as a major actor in transcriptome/proteome diversity and it cannot be neglected in the annotation process of a new genome. Despite considerable progresses in term of accuracy in computational gene prediction, the ability to reliably predict AS variants when there is local experimental evidence of it remains an open challenge for gene finders. We have used a new integrative approach that allows to incorporate AS detection into ab initio gene prediction. This method relies on the analysis of genomically aligned transcript sequences (ESTs and/or cDNAs), and has been implemented in the dynamic programming algorithm of the graph-based gene finder EuGENE. Given a genomic sequence and a set of aligned transcripts, this new version identifies the set of transcripts carrying evidence of alternative splicing events, and provides, in addition to the classical optimal gene prediction, alternative optimal predictions (among those which are consistent with the AS events detected). This allows for multiple annotations of a single gene in a way such that each predicted variant is supported by a transcript evidence (but not necessarily with a full-length coverage). This automatic combination of experimental data analysis and ab initio gene finding offers an ideal integration of alternatively spliced gene prediction inside a single annotation pipeline.
Molecular Cloning and Characterization of Novel Morus alba Germin-Like Protein Gene Which Encodes for a Silkworm Gut Digestion-Resistant Antimicrobial Protein

PubMed Central

Patnaik, Bharat Bhusan; Kim, Dong Hyun; Oh, Seung Han; Song, Yong-Su; Chanh, Nguyen Dang Minh; Kim, Jong Sun; Jung, Woo-jin; Saha, Atul Kumar; Bindroo, Bharat Bhushan; Han, Yeon Soo

2012-01-01

Background Silkworm fecal matter is considered one of the richest sources of antimicrobial and antiviral protein (substances) and such economically feasible and eco-friendly proteins acting as secondary metabolites from the insect system can be explored for their practical utility in conferring broad spectrum disease resistance against pathogenic microbial specimens. Methodology/Principal Findings Silkworm fecal matter extracts prepared in 0.02 M phosphate buffer saline (pH 7.4), at a temperature of 60°C was subjected to 40% saturated ammonium sulphate precipitation and purified by gel-filtration chromatography (GFC). SDS-PAGE under denaturing conditions showed a single band at about 21.5 kDa. The peak fraction, thus obtained by GFC wastested for homogeneityusing C18reverse-phase high performance liquid chromatography (HPLC). The activity of the purified protein was tested against selected Gram +/− bacteria and phytopathogenic Fusarium species with concentration-dependent inhibitionrelationship. The purified bioactive protein was subjected to matrix-assisted laser desorption and ionization-time of flight mass spectrometry (MALDI-TOF-MS) and N-terminal sequencing by Edman degradation towards its identification. The N-terminal first 18 amino acid sequence following the predicted signal peptide showed homology to plant germin-like proteins (Glp). In order to characterize the full-length gene sequence in detail, the partial cDNA was cloned and sequenced using degenerate primers, followed by 5′- and 3′-rapid amplification of cDNA ends (RACE-PCR). The full-length cDNA sequence composed of 630 bp encoding 209 amino acids and corresponded to germin-like proteins (Glps) involved in plant development and defense. Conclusions/Significance The study reports, characterization of novel Glpbelonging to subfamily 3 from M. alba by the purification of mature active protein from silkworm fecal matter. The N-terminal amino acid sequence of the purified protein was found similar to the deduced amino acid sequence (without the transit peptide sequence) of the full length cDNA from M. alba. PMID:23284650
Genomic organization of the neurofibromatosis 1 gene (NF1)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Y.; O`Connell, P.; Huntsman Breidenbach, H.

Neurofibromatosis 1 maps to chromosome band 17q11.2, and the NF1 locus has been partially characterized. Even though the full-length NF1 cDNA has been sequenced, the complete genomic structure of the NF1 gene has not been elucidated. The 5{prime} end of NF1 is embedded in a CpG island containing a NotI restriction site, and the remainder of the gene lies in the adjacent 350-kb NotI fragment. In our efforts to develop a comprehensive screen for NF1 mutations, we have isolated genomic DNA clones that together harbor the entire NF1 cDNA sequence. We have identified all intron-exon boundaries of the coding regionmore » and established that it is composed of 59 exons. Furthermore, we have defined the 3{prime}-untranslated region (3{prime}-UTR) of the NF1 gene; it spans approximately 3.5 kb of genomic DNA sequence and is continuous with the stop codon. Oligonucleotide primer pairs synthesized from exon-flanking DNA sequences were used in the polymerase chain reaction with cloned, chromosome 17-specific genomic DNA as template to amplify NF1 exons 1 through 27b and the exon containing the 3{prime}-UTR separately. This information should be useful for implementing a comprehensive NF1 mutation screen using genomic DNA as template. 41 refs., 3 figs., 2 tabs.« less
Plant Genome Resources at the National Center for Biotechnology Information

PubMed Central

Wheeler, David L.; Smith-White, Brian; Chetvernin, Vyacheslav; Resenchuk, Sergei; Dombrowski, Susan M.; Pechous, Steven W.; Tatusova, Tatiana; Ostell, James

2005-01-01

The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI. PMID:16010002
RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken and allows comprehensive annotation of the W-chromosome

PubMed Central

2013-01-01

Background Birds have a ZZ male: ZW female sex chromosome system and while the Z-linked DMRT1 gene is necessary for testis development, the exact mechanism of sex determination in birds remains unsolved. This is partly due to the poor annotation of the W chromosome, which is speculated to carry a female determinant. Few genes have been mapped to the W and little is known of their expression. Results We used RNA-seq to produce a comprehensive profile of gene expression in chicken blastoderms and embryonic gonads prior to sexual differentiation. We found robust sexually dimorphic gene expression in both tissues pre-dating gonadogenesis, including sex-linked and autosomal genes. This supports the hypothesis that sexual differentiation at the molecular level is at least partly cell autonomous in birds. Different sets of genes were sexually dimorphic in the two tissues, indicating that molecular sexual differentiation is tissue specific. Further analyses allowed the assembly of full-length transcripts for 26 W chromosome genes, providing a view of the W transcriptome in embryonic tissues. This is the first extensive analysis of W-linked genes and their expression profiles in early avian embryos. Conclusion Sexual differentiation at the molecular level is established in chicken early in embryogenesis, before gonadal sex differentiation. We find that the W chromosome is more transcriptionally active than previously thought, expand the number of known genes to 26 and present complete coding sequences for these W genes. This includes two novel W-linked sequences and three small RNAs reassigned to the W from the Un_Random chromosome. PMID:23531366
Gene Discovery through Transcriptome Sequencing for the Invasive Mussel Limnoperna fortunei

PubMed Central

Uliano-Silva, Marcela; Americo, Juliana Alves; Brindeiro, Rodrigo; Dondero, Francesco; Prosdocimi, Francisco; de Freitas Rebelo, Mauro

2014-01-01

The success of the Asian bivalve Limnoperna fortunei as an invader in South America is related to its high acclimation capability. It can inhabit waters with a wide range of temperatures and salinity and handle long-term periods of air exposure. We describe the transcriptome of L. fortunei aiming to give a first insight into the phenotypic plasticity that allows non-native taxa to become established and widespread. We sequenced 95,219 reads from five main tissues of the mussel L. fortunei using Roche’s 454 and assembled them to form a set of 84,063 unigenes (contigs and singletons) representing partial or complete gene sequences. We annotated 24,816 unigenes using a BLAST sequence similarity search against a NCBI nr database. Unigenes were divided into 20 eggNOG functional categories and 292 KEGG metabolic pathways. From the total unigenes, 1,351 represented putative full-length genes of which 73.2% were functionally annotated. We described the first partial and complete gene sequences in order to start understanding bivalve invasiveness. An expansion of the hsp70 gene family, seen also in other bivalves, is present in L. fortunei and could be involved in its adaptation to extreme environments, e.g. during intertidal periods. The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high. Finally, the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach. PMID:25047650
Cell and molecular biology of the spiny dogfish Squalus acanthias and little skate Leucoraja erinacea: insights from in vitro cultured cells.

PubMed

Barnes, D W

2012-04-01

Two of the most commonly used elasmobranch experimental model species are the spiny dogfish Squalus acanthias and the little skate Leucoraja erinacea. Comparative biology and genomics with these species have provided useful information in physiology, pharmacology, toxicology, immunology, evolutionary developmental biology and genetics. A wealth of information has been obtained using in vitro approaches to study isolated cells and tissues from these organisms under circumstances in which the extracellular environment can be controlled. In addition to classical work with primary cell cultures, continuously proliferating cell lines have been derived recently, representing the first cell lines from cartilaginous fishes. These lines have proved to be valuable tools with which to explore functional genomic and biological questions and to test hypotheses at the molecular level. In genomic experiments, complementary (c)DNA libraries have been constructed, and c. 8000 unique transcripts identified, with over 3000 representing previously unknown gene sequences. A sub-set of messenger (m)RNAs has been detected for which the 3' untranslated regions show elements that are remarkably well conserved evolutionarily, representing novel, potentially regulatory gene sequences. The cell culture systems provide physiologically valid tools to study functional roles of these sequences and other aspects of elasmobranch molecular cell biology and physiology. Information derived from the use of in vitro cell cultures is valuable in revealing gene diversity and information for genomic sequence assembly, as well as for identification of new genes and molecular markers, construction of gene-array probes and acquisition of full-length cDNA sequences. © 2012 The Author. Journal of Fish Biology © 2012 The Fisheries Society of the British Isles.
Deep mRNA Sequencing of the Tritonia diomedea Brain Transcriptome Provides Access to Gene Homologues for Neuronal Excitability, Synaptic Transmission and Peptidergic Signalling

PubMed Central

Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.

2015-01-01

Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

PubMed Central

Boerner, Susan; McGinnis, Karen M.

2012-01-01

Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
'Candidatus Phytoplasma palmicola', associated with a lethal yellowing-type disease of coconut (Cocos nucifera L.) in Mozambique.

PubMed

Harrison, Nigel A; Davis, Robert E; Oropeza, Carlos; Helmick, Ericka E; Narváez, María; Eden-Green, Simon; Dollet, Michel; Dickinson, Matthew

2014-06-01

In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise similarity values based on alignment of nearly full-length 16S rRNA gene sequences (1530 bp) revealed that the Mozambique coconut phytoplasma (LYDM) shared 100% identity with a comparable sequence derived from a phytoplasma strain (LDN) responsible for Awka wilt disease of coconut in Nigeria, and shared 99.0-99.6% identity with 16S rRNA gene sequences from strains associated with Cape St Paul wilt (CSPW) disease of coconut in Ghana and Côte d'Ivoire. Similarity scores further determined that the 16S rRNA gene of the LYDM phytoplasma shared <97.5% sequence identity with all previously described members of 'Candidatus Phytoplasma'. The presence of unique regions in the 16S rRNA gene sequence distinguished the LYDM phytoplasma from all currently described members of 'Candidatus Phytoplasma', justifying its recognition as the reference strain of a novel taxon, 'Candidatus Phytoplasma palmicola'. Virtual RFLP profiles of the F2n/R2 portion (1251 bp) of the 16S rRNA gene and pattern similarity coefficients delineated coconut LYDM phytoplasma strains from Mozambique as novel members of established group 16SrXXII, subgroup A (16SrXXII-A). Similarity coefficients of 0.97 were obtained for comparisons between subgroup 16SrXXII-A strains and CSPW phytoplasmas from Ghana and Côte d'Ivoire. On this basis, the CSPW phytoplasma strains were designated members of a novel subgroup, 16SrXXII-B.
Znrg, a novel gene expressed mainly in the developing notochord of zebrafish.

PubMed

Zhou, Yaping; Xu, Yan; Li, Jianzhen; Liu, Yao; Zhang, Zhe; Deng, Fengjiao

2010-06-01

The notochord, a defining characteristic of the chordate embryo is a critical midline structure required for axial skeletal formation in vertebrates, and acts as a signaling center throughout embryonic development. We utilized the digital differential display program of the National Center for Biotechnology Information, and identified a contig of expressed sequence tags (no. Dr. 83747) from the zebrafish ovary library in Genbank. Full-length cDNA of the identified gene was cloned by 5'- and 3'- RACE, and the resulting sequence was confirmed by polymerase chain reaction and sequencing. The cDNA clone contains 2,505 base pairs and encodes a novel protein of 707 amino acids that shares no significant homology with any known proteins. This gene was expressed in mature oocytes and at the one-cell stage, and persisted until the 5th day of development, as determined by RT-PCR. Transcripts were detected by whole-mount RNA in situ hybridization from the two-cell stage to 72 h of embryonic development. This gene was uniformly distributed from the cleavage stage up to the blastula stage. During early gastrulation, it was present in the dorsal region, and became restricted to the notochord and pectoral fin at 48 and 72 h of embryonic development. Based on its abundance in the notochord, we hypothesized that the novel gene may play an important role in notochord development in zebrafish; we named this gene, zebrafish notochord-related gene, or znrg.
Molecular characterization of classical and nonclassical MHC class I genes from the golden pheasant (Chrysolophus pictus).

PubMed

Zeng, Q-Q; Zhong, G-H; He, K; Sun, D-D; Wan, Q-H

2016-02-01

Classical major histocompatibility complex (MHC) class I allelic polymorphism is essential for competent antigen presentation. To improve the genotyping efforts in the golden pheasant, it is necessary to differentiate more accurately between classical and nonclassical class I molecules. In our study, all MHC class I genes were isolated from one golden pheasant based on two overlapping PCR amplifications. In total, six full-length class I nucleotide sequences (A-F) were identified, and four were novel. Two (A and C) belonged to the IA1 gene, two (B and D) were alleles derived from the IA2 gene through transgene amplification, and two (E and F) comprised a third novel locus, IA3 that was excluded from the core region of the golden pheasant MHC-B. IA1 and IA2 exhibited the broad expression profiles characteristic of classical loci, while IA3 showed no expression in multiple tissues and was therefore defined as a nonclassical gene. Phylogenetic analysis indicated that the three IA genes in the golden pheasant share a much closer evolutionary relationship than the corresponding sequences in other galliform species. This observation was consistent with high sequence similarity among them, which likely arises from the homogenizing effect of recombination. Our careful distinction between the classical and nonclassical MHC class I genes in the golden pheasant lays the foundation for developing locus-specific genotyping and establishing a good molecular marker system of classical MHC I loci. © 2015 John Wiley & Sons Ltd.
Full-length Transcriptome Sequencing and Modular Organization Analysis of Naringin/Neoeriocitrin Related Gene Expression Pattern in Drynaria roosii.

PubMed

Sun, Mei-Yu; Li, Jing-Yi; Li, Dong; Huang, Feng-Jie; Wang, Di; Li, Hui; Xing, Quan; Zhu, Hui-Bin; Shi, Lei

2018-04-12

Drynaria roosii (Nakaike) is a traditional Chinese medicinal fern, known as 'GuSuiBu'. The corresponding effective components of naringin/neoeriocitrin share highly similar chemical structure and medicinal function. Our HPLC-MS/MS results showed that the accumulation of naringin/neoeriocitrin depended on specific tissues or ages. However, little was known about the expression patterns of naringin/neoeriocitrin related genes involved in their regulatory pathways. For lack of the basic genetic information, we applied a combination of SMRT sequencing and SGS to generate the complete and full-length transcriptome of D. roosii. According to the SGS data, the DEG-based heat map analysis revealed the naringin/neoeriocitrin related gene expression exhibited obvious tissue- and time-specific transcriptomic differences. Using the systems biology method of modular organization analysis, we clustered 16,472 DEGs into 17 gene modules and studied the relationships between modules and tissue/time point samples, as well as modules and naringin/neoeriocitrin contents. Hereinto, naringin/neoeriocitrin related DEGs distributed in nine distinct modules, and DEGs in these modules showed significant different patterns of transcript abundance to be linked with specific tissues or ages. Moreover, WGCNA results further identified that PAL, 4CL, C4H and C3H, HCT acted as the major hub genes involved in naringin and neoeriocitrin synthesis respectively and exhibited high co-expression with MYB- and bHLH-regulated genes. In this work, modular organization and co-expression networks elucidated the tissue- and time-specificity of gene expression pattern, as well as hub genes associated with naringin/neoeriocitrin synthesis in D. roosii. Simultaneously, the comprehensive transcriptome dataset provided the important genetic information for further research on D. roosii.
Cloning of TPS gene from eelgrass species Zostera marina and its functional identification by genetic transformation in rice.

PubMed

Zhao, Feng; Li, Qiuying; Weng, Manli; Wang, Xiuliang; Guo, Baotai; Wang, Li; Wang, Wei; Duan, Delin; Wang, Bin

2013-12-01

The full-length cDNA sequence (2613 bp) of the trehalose-6-phosphate synthase (TPS) gene of eelgrass Zostera marina (ZmTPS) was identified and cloned. Z. marina is a kind of seed-plant growing in sea water during its whole life history. The open reading frame (ORF) region of ZmTPS gene encodes a protein of 870 amino acid residues and a stop codon. The corresponding genomic DNA sequence is 3770 bp in length, which contains 3 exons and 2 introns. The ZmTPS gene was transformed into rice variety ZH11 via Agrobacterium-mediated transformation method. After antibiotic screening, molecular characterization, salt-tolerance and trehalose content determinations, two transgenic lines resistant to 150 mM NaCL solutions were screened. Our study results indicated that the ZmTPS gene was integrated into the genomic DNA of the two transgenic rice lines and could be expressed well. Moreover, the detection of the transformed ZmTPS gene in the progenies of the two transgenic lines was performed from T1 to T4 generations; and results suggested that the transformed ZmTPS gene can be transmitted from parent to the progeny in transgenic rice. © 2013.

Identification a nonsense mutation of APC gene in Chinese patients with familial adenomatous polyposis.

PubMed

Li, Haishan; Zhang, Lingling; Jiang, Quan; Shi, Zhenwang; Tong, Hanxing

2017-04-01

Familial adenomatous polyposis (FAP; Mendelian of Inherintance in Man ID, 175100) is a rare autosomal dominant disorder characterized by the development of numerous adenomatous polyps throughout the colon and rectum associated with an increased risk of colorectal cancer. FAP is at time accompanied with certain extraintestinal manifestations such as congenital hypertrophy of the retinal pigment epithelium, dental disorders and desmoid tumors. It is caused by mutations in the adenomatous polyposis coli ( APC ) gene. The present study reported on a Chinese family with FAP. Polymerase chain reaction and direct sequencing of the full coding sequence of the APC gene were performed to identify the mutation in this family. A nonsense mutation of the APC gene was identified in this pedigree. It is a heterozygous G>T substitution at position 2,971 in exon 15 of the APC gene, which formed a premature stop codon at amino acid residue 991 (p.Glu991*). The resulting truncated protein lacked 1,853 amino acids. The present study expanded the database on APC gene mutations in FAP and enriched the spectrum of known germline mutations of the APC gene. Prophylactic proctocolectomy may be considered as a possible treatment for carriers of the mutation.
Next Generation Sequencing of Actinobacteria for the Discovery of Novel Natural Products

PubMed Central

Gomez-Escribano, Juan Pablo; Alt, Silke; Bibb, Mervyn J.

2016-01-01

Like many fields of the biosciences, actinomycete natural products research has been revolutionised by next-generation DNA sequencing (NGS). Hundreds of new genome sequences from actinobacteria are made public every year, many of them as a result of projects aimed at identifying new natural products and their biosynthetic pathways through genome mining. Advances in these technologies in the last five years have meant not only a reduction in the cost of whole genome sequencing, but also a substantial increase in the quality of the data, having moved from obtaining a draft genome sequence comprised of several hundred short contigs, sometimes of doubtful reliability, to the possibility of obtaining an almost complete and accurate chromosome sequence in a single contig, allowing a detailed study of gene clusters and the design of strategies for refactoring and full gene cluster synthesis. The impact that these technologies are having in the discovery and study of natural products from actinobacteria, including those from the marine environment, is only starting to be realised. In this review we provide a historical perspective of the field, analyse the strengths and limitations of the most relevant technologies, and share the insights acquired during our genome mining projects. PMID:27089350
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing

PubMed Central

2011-01-01

Background Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. Results From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. Conclusion The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition. PMID:21492485
De novo assembly and transcriptome analysis of five major tissues of Jatropha curcas L. using GS FLX titanium platform of 454 pyrosequencing.

PubMed

Natarajan, Purushothaman; Parani, Madasamy

2011-04-15

Jatropha curcas L. is an important non-edible oilseed crop with promising future in biodiesel production. However, factors like oil yield, oil composition, toxic compounds in oil cake, pests and diseases limit its commercial potential. Well established genetic engineering methods using cloned genes could be used to address these limitations. Earlier, 10,983 unigenes from Sanger sequencing of ESTs, and 3,484 unique assembled transcripts from 454 pyrosequencing of uncloned cDNAs were reported. In order to expedite the process of gene discovery, we have undertaken 454 pyrosequencing of normalized cDNAs prepared from roots, mature leaves, flowers, developing seeds, and embryos of J. curcas. From 383,918 raw reads, we obtained 381,957 quality-filtered and trimmed reads that are suitable for the assembly of transcript sequences. De novo contig assembly of these reads generated 17,457 assembled transcripts (contigs) and 54,002 singletons. Average length of the assembled transcripts was 916 bp. About 30% of the transcripts were longer than 1000 bases, and the size of the longest transcript was 7,173 bases. BLASTX analysis revealed that 2,589 of these transcripts are full-length. The assembled transcripts were validated by RT-PCR analysis of 28 transcripts. The results showed that the transcripts were correctly assembled and represent actively expressed genes. KEGG pathway mapping showed that 2,320 transcripts are related to major biochemical pathways including the oil biosynthesis pathway. Overall, the current study reports 14,327 new assembled transcripts which included 2589 full-length transcripts and 27 transcripts that are directly involved in oil biosynthesis. The large number of transcripts reported in the current study together with existing ESTs and transcript sequences will serve as an invaluable genetic resource for crop improvement in jatropha. Sequence information of those genes that are involved in oil biosynthesis could be used for metabolic engineering of jatropha to increase oil content, and to modify oil composition.
Characterization of a Nonclassical Class I MHC Gene in a Reptile, the Galápagos Marine Iguana (Amblyrhynchus cristatus)

PubMed Central

Glaberman, Scott; Du Pasquier, Louis; Caccone, Adalgisa

2008-01-01

Squamates are a diverse order of vertebrates, representing more than 7,000 species. Yet, descriptions of full-length major histocompatibility complex (MHC) genes in this group are nearly absent from the literature, while the number of MHC studies continues to rise in other vertebrate taxa. The lack of basic information about MHC organization in squamates inhibits investigation into the relationship between MHC polymorphism and disease, and leaves a large taxonomic gap in our understanding of amniote MHC evolution. Here, we use both cDNA and genomic sequence data to characterize a class I MHC gene (Amcr-UA) from the Galápagos marine iguana, a member of the squamate subfamily Iguaninae. Amcr-UA appears to be functional since it is expressed in the blood and contains many of the conserved peptide-binding residues that are found in classical class I genes of other vertebrates. In addition, comparison of Amcr-UA to homologous sequences from other iguanine species shows that the antigen-binding portion of this gene is under purifying selection, rather than balancing selection, and therefore may have a conserved function. A striking feature of Amcr-UA is that both the cDNA and genomic sequences lack the transmembrane and cytoplasmic domains that are necessary to anchor the class I receptor molecule into the cell membrane, suggesting that the product of this gene is secreted and consequently not involved in classical class I antigen-presentation. The truncated and conserved character of Amcr-UA lead us to define it as a nonclassical gene that is related to the few available squamate class I sequences. However, phylogenetic analysis placed Amcr-UA in a basal position relative to other published classical MHC genes from squamates, suggesting that this gene diverged near the beginning of squamate diversification. PMID:18682845
Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India.

PubMed

Rekadwad, Bhagwan N; Khobragade, Chandrahasya N

2016-06-01

Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58' N; 76° 31' E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.

PubMed

Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew

2012-12-20

The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Alternative Splicing as a Target for Cancer Treatment.

PubMed

Martinez-Montiel, Nancy; Rosas-Murrieta, Nora Hilda; Anaya Ruiz, Maricruz; Monjaraz-Guzman, Eduardo; Martinez-Contreras, Rebeca

2018-02-11

Alternative splicing is a key mechanism determinant for gene expression in metazoan. During alternative splicing, non-coding sequences are removed to generate different mature messenger RNAs due to a combination of sequence elements and cellular factors that contribute to splicing regulation. A different combination of splicing sites, exonic or intronic sequences, mutually exclusive exons or retained introns could be selected during alternative splicing to generate different mature mRNAs that could in turn produce distinct protein products. Alternative splicing is the main source of protein diversity responsible for 90% of human gene expression, and it has recently become a hallmark for cancer with a full potential as a prognostic and therapeutic tool. Currently, more than 15,000 alternative splicing events have been associated to different aspects of cancer biology, including cell proliferation and invasion, apoptosis resistance and susceptibility to different chemotherapeutic drugs. Here, we present well established and newly discovered splicing events that occur in different cancer-related genes, their modification by several approaches and the current status of key tools developed to target alternative splicing with diagnostic and therapeutic purposes.
COMPLETE GENOMIC SEQUENCE OF VIRULENT PIGEON PARAMYXOVIRUS IN LAUGHING DOVES (STREPTOPELIA SENEGALENSIS) IN KENYA.

PubMed

Obanda, Vincent; Michuki, George; Jowers, Michael J; Rumberia, Cecilia; Mutinda, Mathew; Lwande, Olivia Wesula; Wangoru, Kihara; Kasiiti-Orengo, Jacquiline; Yongo, Moses; Angelone-Alasaad, Samer

2016-07-01

Following mass deaths of Laughing Doves (Streptopelia senegalensis) in different localities throughout Kenya, internal organs obtained during necropsy of two moribund birds were sampled and analyzed by next generation sequencing. We isolated the virulent strain of pigeon paramyxovirus type-1 (PPMV-1), PPMV1/Laughing Dove/Kenya/Isiolo/B2/2012, which had a characteristic fusion gene motif (110)GGRRQKRF(117). We obtained a partial full genome of 15,114 nucleotides. The phylogenetic relationship based on the fusion gene and genomic sequence grouped our isolate as class II genotype VI, a group of viruses commonly isolated from wild birds but potentially lethal to Chickens ( Gallus gallus domesticus ). The fusion gene isolate clustered with PPMV-I strains from pigeons (Columbidae) in Nigeria. The complete genome showed a basal and highly divergent lineage to American, European, and Asian strains, indicating a divergent evolutionary pathway. The isolated strain is highly virulent and apparently species-specific to Laughing Doves in Kenya. Risk of transmission of such a strain to poultry is potentially high whereas the cyclic epizootic in doves is a threat to conservation of wild Columbidae in Kenya.
Saponin Biosynthesis in Saponaria vaccaria. cDNAs Encoding β-Amyrin Synthase and a Triterpene Carboxylic Acid Glucosyltransferase1[OA

PubMed Central

Meesapyodsuk, Dauenpen; Balsevich, John; Reed, Darwin W.; Covello, Patrick S.

2007-01-01

Saponaria vaccaria (Caryophyllaceae), a soapwort, known in western Canada as cowcockle, contains bioactive oleanane-type saponins similar to those found in soapbark tree (Quillaja saponaria; Rosaceae). To improve our understanding of the biosynthesis of these saponins, a combined polymerase chain reaction and expressed sequence tag approach was taken to identify the genes involved. A cDNA encoding a β-amyrin synthase (SvBS) was isolated by reverse transcription-polymerase chain reaction and characterized by expression in yeast (Saccharomyces cerevisiae). The SvBS gene is predominantly expressed in leaves. A S. vaccaria developing seed expressed sequence tag collection was developed and used for the isolation of a full-length cDNA bearing sequence similarity to ester-forming glycosyltransferases. The gene product of the cDNA, classified as UGT74M1, was expressed in Escherichia coli, purified, and identified as a triterpene carboxylic acid glucosyltransferase. UGT74M1 is expressed in roots and leaves and appears to be involved in monodesmoside biosynthesis in S. vaccaria. PMID:17172290
A novel flavivirus detected in two Aedes spp. collected near the demilitarized zone of the Republic of Korea.

PubMed

Korkusol, Achareeya; Takhampunya, Ratree; Hang, Jun; Jarman, Richard G; Tippayachai, Bousaraporn; Kim, Heung-Chul; Chong, Sung-Tae; Davidson, Silas A; Klein, Terry A

2017-05-01

Flaviviruses comprise a large and diverse group of positive-stranded RNA viruses, including tick-, mosquito- and unknown-vector-borne flaviviruses. A novel flavivirus was detected in pools of Aedes vexans nipponii (n=1) and Aedes esoensis (n=3) collected in 2012 and 2013 near the demilitarized zone (DMZ), Republic of Korea (ROK). Phylogenetic analyses of the NS5, E gene and complete polyprotein coding sequence (CDS) showed that the novel virus fell within the Aedes-borne flaviviruses (ABFVs), with nucleotide identity ranging from 57.8-75.1 %, 46.1-74.2 % and 51.1-76.2 %, respectively. While the novel ABFV was distant from other flaviviruses within the group, it formed a clade with Ilomantsi virus (ILOV). Sequence alignments of the partial NS5 gene, full-length E gene and polyprotein CDS between the novel virus and ILOV showed approximately 76.2 % nucleotide identity and 90 % amino acid identity, respectively. The ABFV identified in Aedes mosquitoes from the ROK is a novel ABFV based on the sequence analyses and is designated as Panmunjeom flavivirus (PANFV).
Gene length as a biological timer to establish temporal transcriptional regulation

PubMed Central

Kirkconnell, Killeen S.; Magnuson, Brian; Paulsen, Michelle T.; Lu, Brian; Bedi, Karan; Ljungman, Mats

2017-01-01

ABSTRACT Transcriptional timing is inherently influenced by gene length, thus providing a mechanism for temporal regulation of gene expression. While gene size has been shown to be important for the expression timing of specific genes during early development, whether it plays a role in the timing of other global gene expression programs has not been extensively explored. Here, we investigate the role of gene length during the early transcriptional response of human fibroblasts to serum stimulation. Using the nascent sequencing techniques Bru-seq and BruUV-seq, we identified immediate genome-wide transcriptional changes following serum stimulation that were linked to rapid activation of enhancer elements. We identified 873 significantly induced and 209 significantly repressed genes. Variations in gene size allowed for a large group of genes to be simultaneously activated but produce full-length RNAs at different times. The median length of the group of serum-induced genes was significantly larger than the median length of all expressed genes, housekeeping genes, and serum-repressed genes. These gene length relationships were also observed in corresponding mouse orthologs, suggesting that relative gene size is evolutionarily conserved. The sizes of transcription factor and microRNA genes immediately induced after serum stimulation varied dramatically, setting up a cascade mechanism for temporal expression arising from a single activation event. The retention and expansion of large intronic sequences during evolution have likely played important roles in fine-tuning the temporal expression of target genes in various cellular response programs. PMID:28055303
Cloning and Expression Analysis of the Bombyx mori α-amylase Gene (Amy) from the Indigenous Thai Silkworm Strain, Nanglai

PubMed Central

Ngernyuang, Nipaporn; Kobayashi, Isao; Promboon, Amornrat; Ratanapo, Sunanta; Tamura, Toshiki; Ngernsiri, Lertluk

2011-01-01

α-Amylase is a common enzyme for hydrolyzing starch. In the silkworm, Bombyx mori L. (Lepidoptera: Bombycidae), α-amylase is found in both digestive fluid and hemolymph. Here, the complete genomic sequence of the Amy gene encoding α-amylase from a local Thai silkworm, the Nanglai strain, was obtained. This gene was 7981 bp long with 9 exons. The full length Amy cDNA sequence was 1749 bp containing a 1503 bp open reading frame. The ORF encoded 500 amino acid residues. The deduced protein showed 81–54% identity to other insect α-amylases and more than 50% identity to mammalian enzymes. Southern blot analysis revealed that in the Nanglai strain Amy is a single-copy gene. RT- PCR showed that Amy was transcribed only in the foregut. Transgenic B. mori also showed that the Amy promoter activates expression of the transgene only in the foregut. PMID:21529256
Identification of a mouse synaptic glycoprotein gene in cultured neurons.

PubMed

Yu, Albert Cheung-Hoi; Sun, Chun Xiao; Li, Qiang; Liu, Hua Dong; Wang, Chen Ran; Zhao, Guo Ping; Jin, Meilei; Lau, Lok Ting; Fung, Yin-Wan Wendy; Liu, Shuang

2005-10-01

Neuronal differentiation and aging are known to involve many genes, which may also be differentially expressed during these developmental processes. From primary cultured cerebral cortical neurons, we have previously identified various differentially expressed gene transcripts from cultured cortical neurons using the technique of arbitrarily primed PCR (RAP-PCR). Among these transcripts, clone 0-2 was found to have high homology to rat and human synaptic glycoprotein. By in silico analysis using an EST database and the FACTURA software, the full-length sequence of 0-2 was assembled and the clone was named as mouse synaptic glycoprotein homolog 2 (mSC2). DNA sequencing revealed transcript size of mSC2 being smaller than the human and rat homologs. RT-PCR indicated that mSC2 was expressed differentially at various culture days. The mSC2 gene was located in various tissues with higher expression in brain, lung, and liver. Functions of mSC2 in neurons and other tissues remain elusive and will require more investigation.
Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onda, M.; Kudo, S.; Fukuda, M.

Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
Whole-genome characterization of a Peruvian alpaca rotavirus isolate expressing a novel VP4 genotype.

PubMed

Rojas, Miguel; Gonçalves, Jorge Luiz S; Dias, Helver G; Manchego, Alberto; Pezo, Danilo; Santos, Norma

2016-11-30

The SA44 isolate of Rotavirus A (RVA) was identified from a neonatal Peruvian alpaca presenting with diarrhea, and the full-length genome sequence of the isolate (designated RVA/Alpaca-tc/PER/SA44/2014/G3P[40]) was determined. Phylogenetic analyses showed that the isolate possessed the genotype constellation G3-P[40]-I8-R3-C3-M3-A9-N3-T3-E3-H6, which differs considerably from those of RVA strains isolated from other species of the order Artiodactyla. Overall, the genetic constellation of the SA44 strain was quite similar to those of RVA strains isolated from a bat in Asia (MSLH14 and MYAS33). Nonetheless, phylogenetic analyses of each genome segment identified a distinct combination of genes. Several sequences were closely related to corresponding gene sequences in RVA strains from other species, including human (VP1, VP2, NSP1, and NSP2), simian (VP3 and NSP5), bat (VP6 and NSP4), and equine (NSP3). The VP7 gene sequence was closely related to RVA strains from a Peruvian alpaca (K'ayra/3368-10; 99.0% nucleotide and 99.7% amino acid identity) and from humans (RCH272; 95% nucleotide and 99.0% amino acid identity). The nucleotide sequence of the VP4 gene was distantly related to other VP4 sequences and was designated as the reference strain for the new P[40] genotype. This unique genetic makeup suggests that the SA44 strain emerged from multiple reassortment events between bat-, equine-, and human-like RVA strains. Copyright © 2016 Elsevier B.V. All rights reserved.
RNAi-mediated endogene silencing in strawberry fruit: detection of primary and secondary siRNAs by deep sequencing.

PubMed

Härtl, Katja; Kalinowski, Gregor; Hoffmann, Thomas; Preuss, Anja; Schwab, Wilfried

2017-05-01

RNA interference (RNAi) has been exploited as a reverse genetic tool for functional genomics in the nonmodel species strawberry (Fragaria × ananassa) since 2006. Here, we analysed for the first time different but overlapping nucleotide sections (>200 nt) of two endogenous genes, FaCHS (chalcone synthase) and FaOMT (O-methyltransferase), as inducer sequences and a transitive vector system to compare their gene silencing efficiencies. In total, ten vectors were assembled each containing the nucleotide sequence of one fragment in sense and corresponding antisense orientation separated by an intron (inverted hairpin construct, ihp). All sequence fragments along the full lengths of both target genes resulted in a significant down-regulation of the respective gene expression and related metabolite levels. Quantitative PCR data and successful application of a transitive vector system coinciding with a phenotypic change suggested propagation of the silencing signal. The spreading of the signal in strawberry fruit in the 3' direction was shown for the first time by the detection of secondary small interfering RNAs (siRNAs) outside of the primary targets by deep sequencing. Down-regulation of endogenes by the transitive method was less effective than silencing by ihp constructs probably because the numbers of primary siRNAs exceeded the quantity of secondary siRNAs by three orders of magnitude. Besides, we observed consistent hotspots of primary and secondary siRNA formation along the target sequence which fall within a distance of less than 200 nt. Thus, ihp vectors seem to be superior over the transitive vector system for functional genomics in strawberry fruit. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information

PubMed Central

Chen, Tsute; Yu, Wen-Han; Izard, Jacques; Baranova, Oxana V.; Lakshmanan, Abirami; Dewhirst, Floyd E.

2010-01-01

The human oral microbiome is the most studied human microflora, but 53% of the species have not yet been validly named and 35% remain uncultivated. The uncultivated taxa are known primarily from 16S rRNA sequence information. Sequence information tied solely to obscure isolate or clone numbers, and usually lacking accurate phylogenetic placement, is a major impediment to working with human oral microbiome data. The goal of creating the Human Oral Microbiome Database (HOMD) is to provide the scientific community with a body site-specific comprehensive database for the more than 600 prokaryote species that are present in the human oral cavity based on a curated 16S rRNA gene-based provisional naming scheme. Currently, two primary types of information are provided in HOMD—taxonomic and genomic. Named oral species and taxa identified from 16S rRNA gene sequence analysis of oral isolates and cloning studies were placed into defined 16S rRNA phylotypes and each given unique Human Oral Taxon (HOT) number. The HOT interlinks phenotypic, phylogenetic, genomic, clinical and bibliographic information for each taxon. A BLAST search tool is provided to match user 16S rRNA gene sequences to a curated, full length, 16S rRNA gene reference data set. For genomic analysis, HOMD provides comprehensive set of analysis tools and maintains frequently updated annotations for all the human oral microbial genomes that have been sequenced and publicly released. Oral bacterial genome sequences, determined as part of the Human Microbiome Project, are being added to the HOMD as they become available. We provide HOMD as a conceptual model for the presentation of microbiome data for other human body sites. Database URL: http://www.homd.org PMID:20624719
Exome sequencing and genome-wide linkage analysis in 17 families illustrate the complex contribution of TTN truncating variants to dilated cardiomyopathy.

PubMed

Norton, Nadine; Li, Duanxiang; Rampersaud, Evadnie; Morales, Ana; Martin, Eden R; Zuchner, Stephan; Guo, Shengru; Gonzalez, Michael; Hedges, Dale J; Robertson, Peggy D; Krumm, Niklas; Nickerson, Deborah A; Hershberger, Ray E

2013-04-01

BACKGROUND- Familial dilated cardiomyopathy (DCM) is a genetically heterogeneous disease with >30 known genes. TTN truncating variants were recently implicated in a candidate gene study to cause 25% of familial and 18% of sporadic DCM cases. METHODS AND RESULTS- We used an unbiased genome-wide approach using both linkage analysis and variant filtering across the exome sequences of 48 individuals affected with DCM from 17 families to identify genetic cause. Linkage analysis ranked the TTN region as falling under the second highest genome-wide multipoint linkage peak, multipoint logarithm of odds, 1.59. We identified 6 TTN truncating variants carried by individuals affected with DCM in 7 of 17 DCM families (logarithm of odds, 2.99); 2 of these 7 families also had novel missense variants that segregated with disease. Two additional novel truncating TTN variants did not segregate with DCM. Nucleotide diversity at the TTN locus, including missense variants, was comparable with 5 other known DCM genes. The average number of missense variants in the exome sequences from the DCM cases or the ≈5400 cases from the Exome Sequencing Project was ≈23 per individual. The average number of TTN truncating variants in the Exome Sequencing Project was 0.014 per individual. We also identified a region (chr9q21.11-q22.31) with no known DCM genes with a maximum heterogeneity logarithm of odds score of 1.74. CONCLUSIONS- These data suggest that TTN truncating variants contribute to DCM cause. However, the lack of segregation of all identified TTN truncating variants illustrates the challenge of determining variant pathogenicity even with full exome sequencing.
IG and TR single chain fragment variable (scFv) sequence analysis: a new advanced functionality of IMGT/V-QUEST and IMGT/HighV-QUEST.

PubMed

Giudicelli, Véronique; Duroux, Patrice; Kossida, Sofia; Lefranc, Marie-Paule

2017-06-26

IMGT®, the international ImMunoGeneTics information system® ( http://www.imgt.org ), was created in 1989 in Montpellier, France (CNRS and Montpellier University) to manage the huge and complex diversity of the antigen receptors, and is at the origin of immunoinformatics, a science at the interface between immunogenetics and bioinformatics. Immunoglobulins (IG) or antibodies and T cell receptors (TR) are managed and described in the IMGT® databases and tools at the level of receptor, chain and domain. The analysis of the IG and TR variable (V) domain rearranged nucleotide sequences is performed by IMGT/V-QUEST (online since 1997, 50 sequences per batch) and, for next generation sequencing (NGS), by IMGT/HighV-QUEST, the high throughput version of IMGT/V-QUEST (portal begun in 2010, 500,000 sequences per batch). In vitro combinatorial libraries of engineered antibody single chain Fragment variable (scFv) which mimic the in vivo natural diversity of the immune adaptive responses are extensively screened for the discovery of novel antigen binding specificities. However the analysis of NGS full length scFv (~850 bp) represents a challenge as they contain two V domains connected by a linker and there is no tool for the analysis of two V domains in a single chain. The functionality "Analyis of single chain Fragment variable (scFv)" has been implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST for the analysis of the two V domains of IG and TR scFv. It proceeds in five steps: search for a first closest V-REGION, full characterization of the first V-(D)-J-REGION, then search for a second V-REGION and full characterization of the second V-(D)-J-REGION, and finally linker delimitation. For each sequence or NGS read, positions of the 5'V-DOMAIN, linker and 3'V-DOMAIN in the scFv are provided in the 'V-orientated' sense. Each V-DOMAIN is fully characterized (gene identification, sequence description, junction analysis, characterization of mutations and amino changes). The functionality is generic and can analyse any IG or TR single chain nucleotide sequence containing two V domains, provided that the corresponding species IMGT reference directory is available. The "Analysis of single chain Fragment variable (scFv)" implemented in IMGT/V-QUEST and, for NGS, in IMGT/HighV-QUEST provides the identification and full characterization of the two V domains of full-length scFv (~850 bp) nucleotide sequences from combinatorial libraries. The analysis can also be performed on concatenated paired chains of expressed antigen receptor IG or TR repertoires.

NemaPath: online exploration of KEGG-based metabolic pathways for nematodes

PubMed Central

Wylie, Todd; Martin, John; Abubucker, Sahar; Yin, Yong; Messina, David; Wang, Zhengyuan; McCarter, James P; Mitreva, Makedonka

2008-01-01

Background Nematode.net is a web-accessible resource for investigating gene sequences from parasitic and free-living nematode genomes. Beyond the well-characterized model nematode C. elegans, over 500,000 expressed sequence tags (ESTs) and nearly 600,000 genome survey sequences (GSSs) have been generated from 36 nematode species as part of the Parasitic Nematode Genomics Program undertaken by the Genome Center at Washington University School of Medicine. However, these sequencing data are not present in most publicly available protein databases, which only include sequences in Swiss-Prot. Swiss-Prot, in turn, relies on GenBank/Embl/DDJP for predicted proteins from complete genomes or full-length proteins. Description Here we present the NemaPath pathway server, a web-based pathway-level visualization tool for navigating putative metabolic pathways for over 30 nematode species, including 27 parasites. The NemaPath approach consists of two parts: 1) a backend tool to align and evaluate nematode genomic sequences (curated EST contigs) against the annotated Kyoto Encyclopedia of Genes and Genomes (KEGG) protein database; 2) a web viewing application that displays annotated KEGG pathway maps based on desired confidence levels of primary sequence similarity as defined by a user. NemaPath also provides cross-referenced access to nematode genome information provided by other tools available on Nematode.net, including: detailed NemaGene EST cluster information; putative translations; GBrowse EST cluster views; links from nematode data to external databases for corresponding synonymous C. elegans counterparts, subject matches in KEGG's gene database, and also KEGG Ontology (KO) identification. Conclusion The NemaPath server hosts metabolic pathway mappings for 30 nematode species and is available on the World Wide Web at . The nematode source sequences used for the metabolic pathway mappings are available via FTP , as provided by the Genome Center at Washington University School of Medicine. PMID:18983679
Identification of a novel species of papillomavirus in giraffe lesions using nanopore sequencing.

PubMed

Vanmechelen, Bert; Bertelsen, Mads Frost; Rector, Annabel; Van den Oord, Joost J; Laenen, Lies; Vergote, Valentijn; Maes, Piet

2017-03-01

Papillomaviridae form a large family of viruses that are known to infect a variety of vertebrates, including mammals, reptiles, birds and fish. Infections usually give rise to minor skin lesions but can in some cases lead to the development of malignant neoplasia. In this study, we identified a novel species of papillomavirus (PV), isolated from warts of four giraffes (Giraffa camelopardalis). The sequence of the L1 gene was determined and found to be identical for all isolates. Using nanopore sequencing, the full sequence of the PV genome could be determined. The coding region of the genome was found to contain seven open reading frames (ORF), encoding the early proteins E1, E2 and E5-E7 as well as the late proteins L1 and L2. In addition to these ORFs, a region located within the E2 gene is thought, based on sequence similarities to other papillomaviruses, to encode an E4 protein, although no start codon could be identified. Based on the sequence of the L1 gene, this novel PV was found to be most similar to Capreolus capreolus papillomavirus 1 (CcaPV1), with 67.96% nucleotide identity. We therefore suggest that the virus identified here is given the name Giraffa camelopardalis papillomavirus 1 (GcPV1) and is classified as a novel species within the genus Deltapapillomavirus, in line with the current guidelines for the nomenclature and classification of PVs. Copyright © 2017 Elsevier B.V. All rights reserved.
Whole exome sequencing identifies genetic variants in inherited thrombocytopenia with secondary qualitative function defects.

PubMed

Johnson, Ben; Lowe, Gillian C; Futterer, Jane; Lordkipanidzé, Marie; MacDonald, David; Simpson, Michael A; Sanchez-Guiú, Isabel; Drake, Sian; Bem, Danai; Leo, Vincenzo; Fletcher, Sarah J; Dawood, Ban; Rivera, José; Allsup, David; Biss, Tina; Bolton-Maggs, Paula Hb; Collins, Peter; Curry, Nicola; Grimley, Charlotte; James, Beki; Makris, Mike; Motwani, Jayashree; Pavord, Sue; Talks, Katherine; Thachil, Jecko; Wilde, Jonathan; Williams, Mike; Harrison, Paul; Gissen, Paul; Mundell, Stuart; Mumford, Andrew; Daly, Martina E; Watson, Steve P; Morgan, Neil V

2016-10-01

Inherited thrombocytopenias are a heterogeneous group of disorders characterized by abnormally low platelet counts which can be associated with abnormal bleeding. Next-generation sequencing has previously been employed in these disorders for the confirmation of suspected genetic abnormalities, and more recently in the discovery of novel disease-causing genes. However its full potential has not yet been exploited. Over the past 6 years we have sequenced the exomes from 55 patients, including 37 index cases and 18 additional family members, all of whom were recruited to the UK Genotyping and Phenotyping of Platelets study. All patients had inherited or sustained thrombocytopenia of unknown etiology with platelet counts varying from 11×10 9 /L to 186×10 9 /L. Of the 51 patients phenotypically tested, 37 (73%), had an additional secondary qualitative platelet defect. Using whole exome sequencing analysis we have identified "pathogenic" or "likely pathogenic" variants in 46% (17/37) of our index patients with thrombocytopenia. In addition, we report variants of uncertain significance in 12 index cases, including novel candidate genetic variants in previously unreported genes in four index cases. These results demonstrate that whole exome sequencing is an efficient method for elucidating potential pathogenic genetic variants in inherited thrombocytopenia. Whole exome sequencing also has the added benefit of discovering potentially pathogenic genetic variants for further study in novel genes not previously implicated in inherited thrombocytopenia. Copyright© Ferrata Storti Foundation.
Sexual selection drives evolution and rapid turnover of male gene expression.

PubMed

Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Dean, Rebecca; Montgomery, Stephen H; Pointer, Marie A; Mank, Judith E

2015-04-07

The profound and pervasive differences in gene expression observed between males and females, and the unique evolutionary properties of these genes in many species, have led to the widespread assumption that they are the product of sexual selection and sexual conflict. However, we still lack a clear understanding of the connection between sexual selection and transcriptional dimorphism, often termed sex-biased gene expression. Moreover, the relative contribution of sexual selection vs. drift in shaping broad patterns of expression, divergence, and polymorphism remains unknown. To assess the role of sexual selection in shaping these patterns, we assembled transcriptomes from an avian clade representing the full range of sexual dimorphism and sexual selection. We use these species to test the links between sexual selection and sex-biased gene expression evolution in a comparative framework. Through ancestral reconstruction of sex bias, we demonstrate a rapid turnover of sex bias across this clade driven by sexual selection and show it to be primarily the result of expression changes in males. We use phylogenetically controlled comparative methods to demonstrate that phenotypic measures of sexual selection predict the proportion of male-biased but not female-biased gene expression. Although male-biased genes show elevated rates of coding sequence evolution, consistent with previous reports in a range of taxa, there is no association between sexual selection and rates of coding sequence evolution, suggesting that expression changes may be more important than coding sequence in sexual selection. Taken together, our results highlight the power of sexual selection to act on gene expression differences and shape genome evolution.
First complete genome sequence of infectious laryngotracheitis virus

PubMed Central

2011-01-01

Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528
Definition of RNA Polymerase II CoTC Terminator Elements in the Human Genome

PubMed Central

Nojima, Takayuki; Dienstbier, Martin; Murphy, Shona; Proudfoot, Nicholas J.; Dye, Michael J.

2013-01-01

Summary Mammalian RNA polymerase II (Pol II) transcription termination is an essential step in protein-coding gene expression that is mediated by pre-mRNA processing activities and DNA-encoded terminator elements. Although much is known about the role of pre-mRNA processing in termination, our understanding of the characteristics and generality of terminator elements is limited. Whereas promoter databases list up to 40,000 known and potential Pol II promoter sequences, fewer than ten Pol II terminator sequences have been described. Using our knowledge of the human β-globin terminator mechanism, we have developed a selection strategy for mapping mammalian Pol II terminator elements. We report the identification of 78 cotranscriptional cleavage (CoTC)-type terminator elements at endogenous gene loci. The results of this analysis pave the way for the full understanding of Pol II termination pathways and their roles in gene expression. PMID:23562152
Identification and characterization of transcripts from the biotin biosynthetic operon of Bacillus subtilis.

PubMed Central

Perkins, J B; Bower, S; Howitt, C L; Yocum, R R; Pero, J

1996-01-01

Northern (RNA) blot analysis of the Bacillus subtilis biotin operon, bioWAFDBIorf2, detected at least two steady-state polycistronic transcripts initiated from a putative vegetative (Pbio) promoter that precedes the operon, i.e., a full-length 7.2-kb transcript covering the entire operon and a more abundant 5.1-kb transcript covering just the first five genes of the operon. Biotin and the B. subtilis birA gene product regulated synthesis of the transcripts. Moreover, replacing the putative Pbio promoter and regulatory sequence with a constitutive SP01 phage promoter resulted in higher-level constitutive synthesis. Removal of a rho-independent terminator-like sequence located between the fifth (bioB) and sixth (bioI) genes prevented accumulation of the 5.1-kb transcript, suggesting that the putative terminator functions to limit expression of bioI, which is thought to be involved in an early step in biotin synthesis. PMID:8892842
Identification and characterization of transcripts from the biotin biosynthetic operon of Bacillus subtilis.

PubMed

Perkins, J B; Bower, S; Howitt, C L; Yocum, R R; Pero, J

1996-11-01

Northern (RNA) blot analysis of the Bacillus subtilis biotin operon, bioWAFDBIorf2, detected at least two steady-state polycistronic transcripts initiated from a putative vegetative (Pbio) promoter that precedes the operon, i.e., a full-length 7.2-kb transcript covering the entire operon and a more abundant 5.1-kb transcript covering just the first five genes of the operon. Biotin and the B. subtilis birA gene product regulated synthesis of the transcripts. Moreover, replacing the putative Pbio promoter and regulatory sequence with a constitutive SP01 phage promoter resulted in higher-level constitutive synthesis. Removal of a rho-independent terminator-like sequence located between the fifth (bioB) and sixth (bioI) genes prevented accumulation of the 5.1-kb transcript, suggesting that the putative terminator functions to limit expression of bioI, which is thought to be involved in an early step in biotin synthesis.
Phylogenetic analysis of canine distemper virus in domestic dogs in Nanjing, China.

PubMed

Bi, Zhenwei; Wang, Yongshan; Wang, Xiaoli; Xia, Xingxia

2015-02-01

Canine distemper virus (CDV) infects a broad range of carnivores, including wild and domestic Canidae. The hemagglutinin gene, which encodes the attachment protein that determines viral tropism, has been widely used to determine the relationship between CDV strains of different lineages circulating worldwide. We determined the full-length H gene sequences of seven CDV field strains detected in domestic dogs in Nanjing, China. A phylogenetic analysis of the H gene sequences of CDV strains from different geographic regions and vaccine strains was performed. Four of the seven CDV strains were grouped in the same cluster of the Asia-1 lineage to which the vast majority of Chinese CDV strains belong, whereas the other three were clustered within the Asia-4 lineage, which has never been detected in China. This represents the first record of detection of strains of the Asia-4 lineage in China since this lineage was reported in Thailand in 2013.
Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening

PubMed Central

2012-01-01

Background Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant’s molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Results Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Conclusions Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD. PMID:23134679
BSP-01: Full Four-Digit Typing for Class I and II HLA Genes | Frederick National Laboratory for Cancer Research

Cancer.gov

The Basic Science Program will receive genomic DNA at a concentration of 50 ng/ul.Human leukocyte antigen (HLA) typing will be performed using atargeted next-generation sequencing (NGS) method.Briefly, locus-specific primers are use
Sequence Elements Upstream of the Core Promoter Are Necessary for Full Transcription of the Capsule Gene Operon in Streptococcus pneumoniae Strain D39

PubMed Central

Wen, Zhensong; Sertil, Odeniel; Cheng, Yongxin; Zhang, Shanshan; Liu, Xue; Wang, Wen-Ching

2015-01-01

Streptococcus pneumoniae is a major bacterial pathogen in humans. Its polysaccharide capsule is a key virulence factor that promotes bacterial evasion of human phagocytic killing. While S. pneumoniae produces at least 94 antigenically different types of capsule, the genes for biosynthesis of almost all capsular types are arranged in the same locus. The transcription of the capsular polysaccharide (cps) locus is not well understood. This study determined the transcriptional features of the cps locus in the type 2 virulent strain D39. The initial analysis revealed that the cps genes are cotranscribed from a major transcription start site at the −25 nucleotide (G) upstream of cps2A, the first gene in the locus. Using unmarked chromosomal truncations and a luciferase-based transcriptional reporter, we showed that the full transcription of the cps genes not only depends on the core promoter immediately upstream of cps2A, but also requires additional elements upstream of the core promoter, particularly a 59-bp sequence immediately upstream of the core promoter. Unmarked deletions of these promoter elements in the D39 genome also led to significant reduction in CPS production and virulence in mice. Lastly, common cps gene (cps2ABCD) mutants did not show significant abnormality in cps transcription, although they produced significantly less CPS, indicating that the CpsABCD proteins are involved in the encapsulation of S. pneumoniae in a posttranscriptional manner. This study has yielded important information on the transcriptional characteristics of the cps locus in S. pneumoniae. PMID:25733517
Statistical method to compare massive parallel sequencing pipelines.

PubMed

Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

2017-03-01

Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

PubMed Central

2014-01-01

Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
Detailed transcriptome description of the neglected cestode Taenia multiceps.

PubMed

Wu, Xuhang; Fu, Yan; Yang, Deying; Zhang, Runhui; Zheng, Wanpeng; Nie, Huaming; Xie, Yue; Yan, Ning; Hao, Guiying; Gu, Xiaobin; Wang, Shuxian; Peng, Xuerong; Yang, Guangyou

2012-01-01

The larval stage of Taenia multiceps, a global cestode, encysts in the central nervous system (CNS) of sheep and other livestock. This frequently leads to their death and huge socioeconomic losses, especially in developing countries. This parasite can also cause zoonotic infections in humans, but has been largely neglected due to a lack of diagnostic techniques and studies. Recent developments in next-generation sequencing provide an opportunity to explore the transcriptome of T. multiceps. We obtained a total of 31,282 unigenes (mean length 920 bp) using Illumina paired-end sequencing technology and a new Trinity de novo assembler without a referenced genome. Individual transcription molecules were determined by sequence-based annotations and/or domain-based annotations against public databases (Nr, UniprotKB/Swiss-Prot, COG, KEGG, UniProtKB/TrEMBL, InterPro and Pfam). We identified 26,110 (83.47%) unigenes and inferred 20,896 (66.8%) coding sequences (CDS). Further comparative transcripts analysis with other cestodes (Taenia pisiformis, Taenia solium, Echincoccus granulosus and Echincoccus multilocularis) and intestinal parasites (Trichinella spiralis, Ancylostoma caninum and Ascaris suum) showed that 5,100 common genes were shared among three Taenia tapeworms, 261 conserved genes were detected among five Taeniidae cestodes, and 109 common genes were found in four zoonotic intestinal parasites. Some of the common genes were genes required for parasite survival, involved in parasite-host interactions. In addition, we amplified two full-length CDS of unigenes from the common genes using RT-PCR. This study provides an extensive transcriptome of the adult stage of T. multiceps, and demonstrates that comparative transcriptomic investigations deserve to be further studied. This transcriptome dataset forms a substantial public information platform to achieve a fundamental understanding of the biology of T. multiceps, and helps in the identification of drug targets and parasite-host interaction studies.
Characterisation of a collagen gene subfamily from the potato cyst nematode Globodera pallida.

PubMed

Gray, L J; Curtis, R H; Jones, J T

2001-01-24

We have isolated two full-length genomic DNA sequences, which encode the cuticle collagen proteins GP-COL-1 and GP-COL-2, from the potato cyst nematode Globodera pallida. A third, partial collagen gene ORF termed gp-col-t(t=truncated) has also been isolated and appears to represent an unexpressed pseudogene. The gp-col-1 and gp-col-2 genes both contain three short (<97 bp) introns which disrupt coding regions predicted to specify proteins with molecular weights of 33 and 32.7 kDa respectively. All three sequences show high similarity to each other and to the previously isolated G. pallida cDNA clone gp-col-8. The conserved pattern of cysteine residues and non-(Gly-X-Y)(n) region sequence similarity observed in all four G. pallida genes suggests that these molecules form part of the same subfamily of collagens. Southern analysis indicates that this subfamily is likely to contain further members. The G. pallida collagen sequences show striking similarity to twelve genes from Caenorhabditis elegans which collectively represent the recently classified Group 1a collagen subfamily. No data exists on the function of this subfamily in C. elegans. gp-col-1 and gp-col-2 are developmentally regulated with transcripts of both genes detected in adult virgin and gravid females but not in pre-parasitic second stage juveniles. A similar expression pattern is observed for the Group 1a collagen lemmi 5 from Meloidogyne incognita perhaps indicating a generic link between subfamily and function during the various changes in cuticular structure which accompany nematode growth and reproduction. Immunochemical studies indicate that the GP-COL-1 protein is specifically located in the hypodermis of G. pallida adult females.
Phylogenomics from Whole Genome Sequences Using aTRAM.

PubMed

Allen, Julie M; Boyd, Bret; Nguyen, Nam-Phuong; Vachaspati, Pranjal; Warnow, Tandy; Huang, Daisie I; Grady, Patrick G S; Bell, Kayce C; Cronk, Quentin C B; Mugisha, Lawrence; Pittendrigh, Barry R; Leonardi, M Soledad; Reed, David L; Johnson, Kevin P

2017-09-01

Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long-term utility of the data. Currently, for organisms with moderate to small genomes ($<$1000 Mbp) it is feasible to sequence the entire genome at modest coverage ($10-30\\times$). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out-groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close- to distantly related taxa at high to low levels of coverage.Both the concatenated analysis and the coalescent-based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above $5-10\\times$, aTRAM was successful at assembling 80-90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
A decade after the first full human genome sequencing: when will we understand our own genome?

PubMed

Eisenhaber, Frank

2012-10-01

The contrast between the pomp of celebrating the first full human genome sequencing in 2000 and the cautious tone of recollections a decade thereafter could hardly be greater. The promises with regard to medical cures and biotechnology applications have been realized not even nearly to the expectations. Understanding the human genomes means knowing the genes' and proteins' functions and their interconnectedness via biomolecular mechanisms. This articles estimates how long will it take to achieve this goal if we extrapolate from the previous decade (indeed, a century!) and the possible disruptive trends in science, technology and society that may accelerate the pace of progress dramatically.
Translation-coupling systems

DOEpatents

Pfleger, Brian; Mendez-Perez, Daniel

2013-11-05

Disclosed are systems and methods for coupling translation of a target gene to a detectable response gene. A version of the invention includes a translation-coupling cassette. The translation-coupling cassette includes a target gene, a response gene, a response-gene translation control element, and a secondary structure-forming sequence that reversibly forms a secondary structure masking the response-gene translation control element. Masking of the response-gene translation control element inhibits translation of the response gene. Full translation of the target gene results in unfolding of the secondary structure and consequent translation of the response gene. Translation of the target gene is determined by detecting presence of the response-gene protein product. The invention further includes RNA transcripts of the translation-coupling cassettes, vectors comprising the translation-coupling cassettes, hosts comprising the translation-coupling cassettes, methods of using the translation-coupling cassettes, and gene products produced with the translation-coupling cassettes.
Translation-coupling systems

DOEpatents

Pfleger, Brian; Mendez-Perez, Daniel

2015-05-19

Disclosed are systems and methods for coupling translation of a target gene to a detectable response gene. A version of the invention includes a translation-coupling cassette. The translation-coupling cassette includes a target gene, a response gene, a response-gene translation control element, and a secondary structure-forming sequence that reversibly forms a secondary structure masking the response-gene translation control element. Masking of the response-gene translation control element inhibits translation of the response gene. Full translation of the target gene results in unfolding of the secondary structure and consequent translation of the response gene. Translation of the target gene is determined by detecting presence of the response-gene protein product. The invention further includes RNA transcripts of the translation-coupling cassettes, vectors comprising the translation-coupling cassettes, hosts comprising the translation-coupling cassettes, methods of using the translation-coupling cassettes, and gene products produced with the translation-coupling cassettes.

Variable Copy Number, Intra-Genomic Heterogeneities and Lateral Transfers of the 16S rRNA Gene in Pseudomonas

PubMed Central

Bodilis, Josselin; Nsigue-Meilo, Sandrine; Besaury, Ludovic; Quillet, Laurent

2012-01-01

Even though the 16S rRNA gene is the most commonly used taxonomic marker in microbial ecology, its poor resolution is still not fully understood at the intra-genus level. In this work, the number of rRNA gene operons, intra-genomic heterogeneities and lateral transfers were investigated at a fine-scale resolution, throughout the Pseudomonas genus. In addition to nineteen sequenced Pseudomonas strains, we determined the 16S rRNA copy number in four other Pseudomonas strains by Southern hybridization and Pulsed-Field Gel Electrophoresis, and studied the intra-genomic heterogeneities by Denaturing Gradient Gel Electrophoresis and sequencing. Although the variable copy number (from four to seven) seems to be correlated with the evolutionary distance, some close strains in the P. fluorescens lineage showed a different number of 16S rRNA genes, whereas all the strains in the P. aeruginosa lineage displayed the same number of genes (four copies). Further study of the intra-genomic heterogeneities revealed that most of the Pseudomonas strains (15 out of 19 strains) had at least two different 16S rRNA alleles. A great difference (5 or 19 nucleotides, essentially grouped near the V1 hypervariable region) was observed only in two sequenced strains. In one of our strains studied (MFY30 strain), we found a difference of 12 nucleotides (grouped in the V3 hypervariable region) between copies of the 16S rRNA gene. Finally, occurrence of partial lateral transfers of the 16S rRNA gene was further investigated in 1803 full-length sequences of Pseudomonas available in the databases. Remarkably, we found that the two most variable regions (the V1 and V3 hypervariable regions) had probably been laterally transferred from another evolutionary distant Pseudomonas strain for at least 48.3 and 41.6% of the 16S rRNA sequences, respectively. In conclusion, we strongly recommend removing these regions of the 16S rRNA gene during the intra-genus diversity studies. PMID:22545126
The Complete Genome of a New Betabaculovirus from Clostera anastomosis

PubMed Central

Yin, Feifei; Zhu, Zheng; Liu, Xiaoping; Hou, Dianhai; Wang, Jun; Zhang, Lei; Wang, Manli; Kou, Zheng; Wang, Hualin; Deng, Fei; Hu, Zhihong

2015-01-01

Clostera anastomosis (Lepidoptera: Notodontidae) is a defoliating forest insect pest. Clostera anastomosis granulovirus-B (ClasGV-B) belonging to the genus Betabaculovirus of family Baculoviridae has been used for biological control of the pest. Here we reported the full genome sequence of ClasGV-B and compared it to other previously sequenced baculoviruses. The circular double-stranded DNA genome is 107,439 bp in length, with a G+C content of 37.8% and contains 123 open reading frames (ORFs) representing 93% of the genome. ClasGV-B contains 37 baculovirus core genes, 25 lepidopteran baculovirus specific genes, 19 betabaculovirus specific genes, 39 other genes with homologues to baculoviruses and 3 ORFs unique to ClasGV-B. Hrs appear to be absent from the ClasGV-B genome, however, two non-hr repeats were found. Phylogenetic tree based on 37 core genes from 73 baculovirus genomes placed ClasGV-B in the clade b of betabaculoviruses and was most closely related to Erinnyis ello GV (ErelGV). The gene arrangement of ClasGV-B also shared the strongest collinearity with ErelGV but differed from Clostera anachoreta GV (ClanGV), Clostera anastomosis GV-A (ClasGV-A, previously also called CaLGV) and Epinotia aporema GV (EpapGV) with a 20 kb inversion. ClasGV-B genome contains three copies of polyhedron envelope protein gene (pep) and phylogenetic tree divides the PEPs of betabaculoviruses into three major clades: PEP-1, PEP-2 and PEP/P10. ClasGV-B also contains three homologues of P10 which all harbor an N-terminal coiled-coil domain and a C-terminal basic sequence. ClasGV-B encodes three fibroblast growth factor (FGF) homologues which are conserved in all sequenced betabaculoviruses. Phylogenetic analysis placed these three FGFs into different groups and suggested that the FGFs were evolved at the early stage of the betabaculovirus expansion. ClasGV-B is different from previously reported ClasGV-A and ClanGV isolated from Notodontidae in sequence and gene arrangement, indicating the virus is a new notodontid betabaculovirus. PMID:26168260
The most conserved genome segments for life detection on Earth and other planets.

PubMed

Isenbarger, Thomas A; Carr, Christopher E; Johnson, Sarah Stewart; Finney, Michael; Church, George M; Gilbert, Walter; Zuber, Maria T; Ruvkun, Gary

2008-12-01

On Earth, very simple but powerful methods to detect and classify broad taxa of life by the polymerase chain reaction (PCR) are now standard practice. Using DNA primers corresponding to the 16S ribosomal RNA gene, one can survey a sample from any environment for its microbial inhabitants. Due to massive meteoritic exchange between Earth and Mars (as well as other planets), a reasonable case can be made for life on Mars or other planets to be related to life on Earth. In this case, the supremely sensitive technologies used to study life on Earth, including in extreme environments, can be applied to the search for life on other planets. Though the 16S gene has become the standard for life detection on Earth, no genome comparisons have established that the ribosomal genes are, in fact, the most conserved DNA segments across the kingdoms of life. We present here a computational comparison of full genomes from 13 diverse organisms from the Archaea, Bacteria, and Eucarya to identify genetic sequences conserved across the widest divisions of life. Our results identify the 16S and 23S ribosomal RNA genes as well as other universally conserved nucleotide sequences in genes encoding particular classes of transfer RNAs and within the nucleotide binding domains of ABC transporters as the most conserved DNA sequence segments across phylogeny. This set of sequences defines a core set of DNA regions that have changed the least over billions of years of evolution and provides a means to identify and classify divergent life, including ancestrally related life on other planets.
Characterization of an AGAMOUS-like MADS Box Protein, a Probable Constituent of Flowering and Fruit Ripening Regulatory System in Banana

PubMed Central

Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N.

2012-01-01

The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ∼95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development. PMID:22984496
Characterization of an AGAMOUS-like MADS box protein, a probable constituent of flowering and fruit ripening regulatory system in banana.

PubMed

Roy Choudhury, Swarup; Roy, Sujit; Nag, Anish; Singh, Sanjay Kumar; Sengupta, Dibyendu N

2012-01-01

The MADS-box family of genes has been shown to play a significant role in the development of reproductive organs, including dry and fleshy fruits. In this study, the molecular properties of an AGAMOUS like MADS box transcription factor in banana cultivar Giant governor (Musa sp, AAA group, subgroup Cavendish) has been elucidated. We have detected a CArG-box sequence binding AGAMOUS MADS-box protein in banana flower and fruit nuclear extracts in DNA-protein interaction assays. The protein fraction in the DNA-protein complex was analyzed by mass spectrometry and using this information we have obtained the full length cDNA of the corresponding protein. The deduced protein sequence showed ~95% amino acid sequence homology with MA-MADS5, a MADS-box protein described previously from banana. We have characterized the domains of the identified AGAMOUS MADS-box protein involved in DNA binding and homodimer formation in vitro using full-length and truncated versions of affinity purified recombinant proteins. Furthermore, in order to gain insight about how DNA bending is achieved by this MADS-box factor, we performed circular permutation and phasing analysis using the wild type recombinant protein. The AGAMOUS MADS-box protein identified in this study has been found to predominantly accumulate in the climacteric fruit pulp and also in female flower ovary. In vivo and in vitro assays have revealed specific binding of the identified AGAMOUS MADS-box protein to CArG-box sequence in the promoters of major ripening genes in banana fruit. Overall, the expression patterns of this MADS-box protein in banana female flower ovary and during various phases of fruit ripening along with the interaction of the protein to the CArG-box sequence in the promoters of major ripening genes lead to interesting assumption about the possible involvement of this AGAMOUS MADS-box factor in banana fruit ripening and floral reproductive organ development.
Alternative Splicing Profile and Sex-Preferential Gene Expression in the Female and Male Pacific Abalone Haliotis discus hannai.

PubMed

Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang

2017-03-09

In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone.
Alternative Splicing Profile and Sex-Preferential Gene Expression in the Female and Male Pacific Abalone Haliotis discus hannai

PubMed Central

Kim, Mi Ae; Rhee, Jae-Sung; Kim, Tae Ha; Lee, Jung Sick; Choi, Ah-Young; Choi, Beom-Soon; Choi, Ik-Young; Sohn, Young Chang

2017-01-01

In order to characterize the female or male transcriptome of the Pacific abalone and further increase genomic resources, we sequenced the mRNA of full-length complementary DNA (cDNA) libraries derived from pooled tissues of female and male Haliotis discus hannai by employing the Iso-Seq protocol of the PacBio RSII platform. We successfully assembled whole full-length cDNA sequences and constructed a transcriptome database that included isoform information. After clustering, a total of 15,110 and 12,145 genes that coded for proteins were identified in female and male abalones, respectively. A total of 13,057 putative orthologs were retained from each transcriptome in abalones. Overall Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analyzed in each database showed a similar composition between sexes. In addition, a total of 519 and 391 isoforms were genome-widely identified with at least two isoforms from female and male transcriptome databases. We found that the number of isoforms and their alternatively spliced patterns are variable and sex-dependent. This information represents the first significant contribution to sex-preferential genomic resources of the Pacific abalone. The availability of whole female and male transcriptome database and their isoform information will be useful to improve our understanding of molecular responses and also for the analysis of population dynamics in the Pacific abalone. PMID:28282934
Orthology and paralogy constraints: satisfiability and consistency.

PubMed

Lafond, Manuel; El-Mabrouk, Nadia

2014-01-01

A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Orthology and paralogy constraints: satisfiability and consistency

PubMed Central

2014-01-01

Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Results Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships. PMID:25572629
Evolution and molecular epidemiology of classical swine fever virus during a multi-annual outbreak amongst European wild boar.

PubMed

Goller, Katja V; Gabriel, Claudia; Dimna, Mireille Le; Le Potier, Marie-Frédérique; Rossi, Sophie; Staubach, Christoph; Merboth, Matthias; Beer, Martin; Blome, Sandra

2016-03-01

Classical swine fever is a viral disease of pigs that carries tremendous socio-economic impact. In outbreak situations, genetic typing is carried out for the purpose of molecular epidemiology in both domestic pigs and wild boar. These analyses are usually based on harmonized partial sequences. However, for high-resolution analyses towards the understanding of genetic variability and virus evolution, full-genome sequences are more appropriate. In this study, a unique set of representative virus strains was investigated that was collected during an outbreak in French free-ranging wild boar in the Vosges-du-Nord mountains between 2003 and 2007. Comparative sequence and evolutionary analyses of the nearly full-length sequences showed only slow evolution of classical swine fever virus strains over the years and no impact of vaccination on mutation rates. However, substitution rates varied amongst protein genes; furthermore, a spatial and temporal pattern could be observed whereby two separate clusters were formed that coincided with physical barriers.
Full genome sequence analysis of a novel adenovirus of rhesus macaque origin indicates a new simian adenovirus type and species.

PubMed

Malouli, Daniel; Howell, Grant L; Legasse, Alfred W; Kahl, Christoph; Axthelm, Michael K; Hansen, Scott G; Früh, Klaus

2014-09-01

Multiple novel simian adenoviruses have been isolated over the past years and their potential to cross the species barrier and infect the human population is an ever present threat. Here we describe the isolation and full genome sequencing of a novel simian adenovirus (SAdV) isolated from the urine of two independent, never co-housed, late stage simian immunodeficiency virus (SIV)-infected rhesus macaques. The viral genome sequences revealed a novel type with a unique genome length, GC content, E3 region and DNA polymerase amino acid sequence that is sufficiently distinct from all currently known human- or simian adenovirus species to warrant classifying these isolates as a novel species of simian adenovirus. This new species, termed Simian mastadenovirus D (SAdV-D), displays the standard genome organization for the genus Mastadenovirus containing only one copy of the fiber gene which sets it apart from the old world monkey adenovirus species HAdV-G, SAdV-B and SAdV-C.
Characterization of hairless (Hr) and FGF5 genes provides insights into the molecular basis of hair loss in cetaceans

PubMed Central

2013-01-01

Background Hair is one of the main distinguishing characteristics of mammals and it has many important biological functions. Cetaceans originated from terrestrial mammals and they have evolved a series of adaptations to aquatic environments, which are of evolutionary significance. However, the molecular mechanisms underlying their aquatic adaptations have not been well explored. This study provided insights into the evolution of hair loss during the transition from land to water by investigating and comparing two essential regulators of hair follicle development and hair follicle cycling, i.e., the Hairless (Hr) and FGF5 genes, in representative cetaceans and their terrestrial relatives. Results The full open reading frame sequences of the Hr and FGF5 genes were characterized in seven cetaceans. The sequence characteristics and evolutionary analyses suggested the functional loss of the Hr gene in cetaceans, which supports the loss of hair during their full adaptation to aquatic habitats. By contrast, positive selection for the FGF5 gene was found in cetaceans where a series of positively selected amino acid residues were identified. Conclusions This is the first study to investigate the molecular basis of the hair loss in cetaceans. Our investigation of Hr and FGF5, two indispensable regulators of the hair cycle, provide some new insights into the molecular basis of hair loss in cetaceans. The results suggest that positive selection for the FGF5 gene might have promoted the termination of hair growth and early entry into the catagen stage of hair follicle cycling. Consequently, the hair follicle cycle was disrupted and the hair was lost completely due to the loss of the Hr gene function in cetaceans. This suggests that cetaceans have evolved an effective and complex mechanism for hair loss. PMID:23394579
Sequencing of Pax6 Loci from the Elephant Shark Reveals a Family of Pax6 Genes in Vertebrate Genomes, Forged by Ancient Duplications and Divergences

PubMed Central

Gautier, Philippe; Loosli, Felix; Tay, Boon-Hui; Tay, Alice; Murdoch, Emma; Coutinho, Pedro; van Heyningen, Veronica; Brenner, Sydney; Venkatesh, Byrappa; Kleinjan, Dirk A.

2013-01-01

Pax6 is a developmental control gene essential for eye development throughout the animal kingdom. In addition, Pax6 plays key roles in other parts of the CNS, olfactory system, and pancreas. In mammals a single Pax6 gene encoding multiple isoforms delivers these pleiotropic functions. Here we provide evidence that the genomes of many other vertebrate species contain multiple Pax6 loci. We sequenced Pax6-containing BACs from the cartilaginous elephant shark (Callorhinchus milii) and found two distinct Pax6 loci. Pax6.1 is highly similar to mammalian Pax6, while Pax6.2 encodes a paired-less Pax6. Using synteny relationships, we identify homologs of this novel paired-less Pax6.2 gene in lizard and in frog, as well as in zebrafish and in other teleosts. In zebrafish two full-length Pax6 duplicates were known previously, originating from the fish-specific genome duplication (FSGD) and expressed in divergent patterns due to paralog-specific loss of cis-elements. We show that teleosts other than zebrafish also maintain duplicate full-length Pax6 loci, but differences in gene and regulatory domain structure suggest that these Pax6 paralogs originate from a more ancient duplication event and are hence renamed as Pax6.3. Sequence comparisons between mammalian and elephant shark Pax6.1 loci highlight the presence of short- and long-range conserved noncoding elements (CNEs). Functional analysis demonstrates the ancient role of long-range enhancers for Pax6 transcription. We show that the paired-less Pax6.2 ortholog in zebrafish is expressed specifically in the developing retina. Transgenic analysis of elephant shark and zebrafish Pax6.2 CNEs with homology to the mouse NRE/Pα internal promoter revealed highly specific retinal expression. Finally, morpholino depletion of zebrafish Pax6.2 resulted in a “small eye” phenotype, supporting a role in retinal development. In summary, our study reveals that the pleiotropic functions of Pax6 in vertebrates are served by a divergent family of Pax6 genes, forged by ancient duplication events and by independent, lineage-specific gene losses. PMID:23359656
Cloning, expression and activity analysis of a novel fibrinolytic serine protease from Arenicola cristata

NASA Astrophysics Data System (ADS)

Zhao, Chunling; Ju, Jiyu

2015-06-01

The full-length cDNA of a protease gene from a marine annelid Arenicola cristata was amplified through rapid amplification of cDNA ends technique and sequenced. The size of the cDNA was 936 bp in length, including an open reading frame encoding a polypeptide of 270 amino acid residues. The deduced amino acid sequnce consisted of pro- and mature sequences. The protease belonged to the serine protease family because it contained the highly conserved sequence GDSGGP. This protease was novel as it showed a low amino acid sequence similarity (< 40%) to other serine proteases. The gene encoding the active form of A. cristata serine protease was cloned and expressed in E. coli. Purified recombinant protease in a supernatant could dissolve an artificial fibrin plate with plasminogen-rich fibrin, whereas the plasminogen-free fibrin showed no clear zone caused by hydrolysis. This result suggested that the recombinant protease showed an indirect fibrinolytic activity of dissolving fibrin, and was probably a plasminogen activator. A rat model with venous thrombosis was established to demonstrate that the recombinant protease could also hydrolyze blood clot in vivo. Therefore, this recombinant protease may be used as a thrombolytic agent for thrombosis treatment. To our knowledge, this study is the first of reporting the fibrinolytic serine protease gene in A. cristata.
Biochemical and Molecular Phylogenetic Study of Agriculturally Useful Association of a Nitrogen-Fixing Cyanobacterium and Nodule Sinorhizobium with Medicago sativa L.

PubMed Central

Karaushu, E. V.; Kravzova, T. R.; Vorobey, N. A.; Kiriziy, D. A.; Olkhovich, O. P.; Taran, N. Yu.; Kots, S. Ya.; Omarova, E.

2015-01-01

Seed inoculation with bacterial consortium was found to increase legume yield, providing a higher growth than the standard nitrogen treatment methods. Alfalfa plants were inoculated by mono- and binary compositions of nitrogen-fixing microorganisms. Their physiological and biochemical properties were estimated. Inoculation by microbial consortium of Sinorhizobium meliloti T17 together with a new cyanobacterial isolate Nostoc PTV was more efficient than the single-rhizobium strain inoculation. This treatment provides an intensification of the processes of biological nitrogen fixation by rhizobia bacteria in the root nodules and an intensification of plant photosynthesis. Inoculation by bacterial consortium stimulates growth of plant mass and rhizogenesis and leads to increased productivity of alfalfa and to improving the amino acid composition of plant leaves. The full nucleotide sequence of the rRNA gene cluster and partial sequence of the dinitrogenase reductase (nifH) gene of Nostoc PTV were deposited to GenBank (JQ259185.1, JQ259186.1). Comparison of these gene sequences of Nostoc PTV with all sequences present at the GenBank shows that this cyanobacterial strain does not have 100% identity with any organisms investigated previously. Phylogenetic analysis showed that this cyanobacterium clustered with high credibility values with Nostoc muscorum. PMID:26114100
Biochemical and Molecular Phylogenetic Study of Agriculturally Useful Association of a Nitrogen-Fixing Cyanobacterium and Nodule Sinorhizobium with Medicago sativa L.

PubMed

Karaushu, E V; Lazebnaya, I V; Kravzova, T R; Vorobey, N A; Lazebny, O E; Kiriziy, D A; Olkhovich, O P; Taran, N Yu; Kots, S Ya; Popova, A A; Omarova, E; Koksharova, O A

2015-01-01

Seed inoculation with bacterial consortium was found to increase legume yield, providing a higher growth than the standard nitrogen treatment methods. Alfalfa plants were inoculated by mono- and binary compositions of nitrogen-fixing microorganisms. Their physiological and biochemical properties were estimated. Inoculation by microbial consortium of Sinorhizobium meliloti T17 together with a new cyanobacterial isolate Nostoc PTV was more efficient than the single-rhizobium strain inoculation. This treatment provides an intensification of the processes of biological nitrogen fixation by rhizobia bacteria in the root nodules and an intensification of plant photosynthesis. Inoculation by bacterial consortium stimulates growth of plant mass and rhizogenesis and leads to increased productivity of alfalfa and to improving the amino acid composition of plant leaves. The full nucleotide sequence of the rRNA gene cluster and partial sequence of the dinitrogenase reductase (nifH) gene of Nostoc PTV were deposited to GenBank (JQ259185.1, JQ259186.1). Comparison of these gene sequences of Nostoc PTV with all sequences present at the GenBank shows that this cyanobacterial strain does not have 100% identity with any organisms investigated previously. Phylogenetic analysis showed that this cyanobacterium clustered with high credibility values with Nostoc muscorum.
Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

PubMed

Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

2018-04-01

Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Multi-species sequence comparison reveals conservation of ghrelin gene-derived splice variants encoding a truncated ghrelin peptide.

PubMed

Seim, Inge; Jeffery, Penny L; Thomas, Patrick B; Walpole, Carina M; Maugham, Michelle; Fung, Jenny N T; Yap, Pei-Yi; O'Keeffe, Angela J; Lai, John; Whiteside, Eliza J; Herington, Adrian C; Chopin, Lisa K

2016-06-01

The peptide hormone ghrelin is a potent orexigen produced predominantly in the stomach. It has a number of other biological actions, including roles in appetite stimulation, energy balance, the stimulation of growth hormone release and the regulation of cell proliferation. Recently, several ghrelin gene splice variants have been described. Here, we attempted to identify conserved alternative splicing of the ghrelin gene by cross-species sequence comparisons. We identified a novel human exon 2-deleted variant and provide preliminary evidence that this splice variant and in1-ghrelin encode a C-terminally truncated form of the ghrelin peptide, termed minighrelin. These variants are expressed in humans and mice, demonstrating conservation of alternative splicing spanning 90 million years. Minighrelin appears to have similar actions to full-length ghrelin, as treatment with exogenous minighrelin peptide stimulates appetite and feeding in mice. Forced expression of the exon 2-deleted preproghrelin variant mirrors the effect of the canonical preproghrelin, stimulating cell proliferation and migration in the PC3 prostate cancer cell line. This is the first study to characterise an exon 2-deleted preproghrelin variant and to demonstrate sequence conservation of ghrelin gene-derived splice variants that encode a truncated ghrelin peptide. This adds further impetus for studies into the alternative splicing of the ghrelin gene and the function of novel ghrelin peptides in vertebrates.
Chalcone synthase genes from milk thistle (Silybum marianum): isolation and expression analysis.

PubMed

Sanjari, Sepideh; Shobbar, Zahra Sadat; Ebrahimi, Mohsen; Hasanloo, Tahereh; Sadat-Noori, Seyed-Ahmad; Tirnaz, Soodeh

2015-12-01

Silymarin is a flavonoid compound derived from milk thistle (Silybum marianum) seeds which has several pharmacological applications. Chalcone synthase (CHS) is a key enzyme in the biosynthesis of flavonoids; thereby, the identification of CHS encoding genes in milk thistle plant can be of great importance. In the current research, fragments of CHS genes were amplified using degenerate primers based on the conserved parts of Asteraceae CHS genes, and then cloned and sequenced. Analysis of the resultant nucleotide and deduced amino acid sequences led to the identification of two different members of CHS gene family,SmCHS1 and SmCHS2. Third member, full-length cDNA (SmCHS3) was isolated by rapid amplification of cDNA ends (RACE), whose open reading frame contained 1239 bp including exon 1 (190 bp) and exon 2 (1049 bp), encoding 63 and 349 amino acids, respectively. In silico analysis of SmCHS3 sequence contains all the conserved CHS sites and shares high homology with CHS proteins from other plants.Real-time PCR analysis indicated that SmCHS1 and SmCHS3 had the highest transcript level in petals in the early flowering stage and in the stem of five upper leaves, followed by five upper leaves in the mid-flowering stage which are most probably involved in anthocyanin and silymarin biosynthesis.
Full-length genome sequences of porcine epidemic diarrhoea virus strain CV777; Use of NGS to analyse genomic and sub-genomic RNAs

PubMed Central

Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice; Grasland, Béatrice; Frossard, Jean-Pierre; Dastjerdi, Akbar; Hulst, Marcel; Hanke, Dennis; Pohlmann, Anne; Blome, Sandra; van der Poel, Wim H. M.; Steinbach, Falko; Blanchard, Yannick; Lavazza, Antonio; Bøtner, Anette

2018-01-01

Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence (Accession number AF353511) but they are >99% identical. Unique and shared differences between sequences were identified. The coding region for the surface-exposed spike protein showed the highest proportion of variability including both point mutations and small deletions. The predicted expression of the ORF3 gene product was more dramatically affected in three different variants of this virus through either loss of the initiation codon or gain of a premature termination codon. The genome of one isolate had a substantially rearranged 5´-terminal sequence. This rearrangement was validated through the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies. PMID:29494671

Analysis of prostate-specific antigen transcripts in chimpanzees, cynomolgus monkeys, baboons, and African green monkeys.

PubMed

Mubiru, James N; Yang, Alice S; Olsen, Christian; Nayak, Sudhir; Livi, Carolina B; Dick, Edward J; Owston, Michael; Garcia-Forey, Magdalena; Shade, Robert E; Rogers, Jeffrey

2014-01-01

The function of prostate-specific antigen (PSA) is to liquefy the semen coagulum so that the released sperm can fuse with the ovum. Fifteen spliced variants of the PSA gene have been reported in humans, but little is known about alternative splicing in nonhuman primates. Positive selection has been reported in sex- and reproductive-related genes from sea urchins to Drosophila to humans; however, there are few studies of adaptive evolution of the PSA gene. Here, using polymerase chain reaction (PCR) product cloning and sequencing, we study PSA transcript variant heterogeneity in the prostates of chimpanzees (Pan troglodytes), cynomolgus monkeys (Macaca fascicularis), baboons (Papio hamadryas anubis), and African green monkeys (Chlorocebus aethiops). Six PSA variants were identified in the chimpanzee prostate, but only two variants were found in cynomolgus monkeys, baboons, and African green monkeys. In the chimpanzee the full-length transcript is expressed at the same magnitude as the transcripts that retain intron 3. We have found previously unidentified splice variants of the PSA gene, some of which might be linked to disease conditions. Selection on the PSA gene was studied in 11 primate species by computational methods using the sequences reported here for African green monkey, cynomolgus monkey, baboon, and chimpanzee and other sequences available in public databases. A codon-based analysis (dN/dS) of the PSA gene identified potential adaptive evolution at five residue sites (Arg45, Lys70, Gln144, Pro189, and Thr203).
Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis

PubMed Central

dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães

2017-01-01

Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages. PMID:28068432
Transcriptome Analysis of Leaves, Flowers and Fruits Perisperm of Coffea arabica L. Reveals the Differential Expression of Genes Involved in Raffinose Biosynthesis.

PubMed

Ivamoto, Suzana Tiemi; Reis, Osvaldo; Domingues, Douglas Silva; Dos Santos, Tiago Benedito; de Oliveira, Fernanda Freitas; Pot, David; Leroy, Thierry; Vieira, Luiz Gonzaga Esteves; Carazzolle, Marcelo Falsarella; Pereira, Gonçalo Amarante Guimarães; Pereira, Luiz Filipe Protasio

2017-01-01

Coffea arabica L. is an important crop in several developing countries. Despite its economic importance, minimal transcriptome data are available for fruit tissues, especially during fruit development where several compounds related to coffee quality are produced. To understand the molecular aspects related to coffee fruit and grain development, we report a large-scale transcriptome analysis of leaf, flower and perisperm fruit tissue development. Illumina sequencing yielded 41,881,572 high-quality filtered reads. De novo assembly generated 65,364 unigenes with an average length of 1,264 bp. A total of 24,548 unigenes were annotated as protein coding genes, including 12,560 full-length sequences. In the annotation process, we identified nine candidate genes related to the biosynthesis of raffinose family oligossacarides (RFOs). These sugars confer osmoprotection and are accumulated during initial fruit development. Four genes from this pathway had their transcriptional pattern validated by quantitative reverse transcription polymerase chain reaction (qRT-PCR). Furthermore, we identified ~24,000 putative target sites for microRNAs (miRNAs) and 134 putative transcriptionally active transposable elements (TE) sequences in our dataset. This C. arabica transcriptomic atlas provides an important step for identifying candidate genes related to several coffee metabolic pathways, especially those related to fruit chemical composition and therefore beverage quality. Our results are the starting point for enhancing our knowledge about the coffee genes that are transcribed during the flowering and initial fruit development stages.
Isolation of expressed sequences from the region commonly deleted in Velo-cardio-facial syndrome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sirotkin, H.; Morrow, B.; DasGupta, R.

Velo-cardio-facial syndrome (VCFS) is a relatively common autosomal dominant genetic disorder characterized by cleft palate, cardiac abnormalities, learning disabilities and a characteristic facial dysmorphology. Most VCFS patients have interstitial deletions of 22q11 of 1-2 mb. In an effort to isolate the gene(s) responsible for VCFS we have utilized a hybrid selection protocol to recover expressed sequences from three non-overlapping YACs comprising almost 1 mb of the commonly deleted region. Total yeast genomic DNA or isolated YAC DNA was immobilized on Hybond-N filters, blocked with yeast and human ribosomal and human repetitive sequences and hybridized with a mixture of random primedmore » short fragment cDNA libraries. Six human short fragment libraries derived from total fetus, fetal brain, adult brain, testes, thymus and spleen have been used for the selections. Short fragment cDNAs retained on the filter were passed through a second round of selection and cloned into lambda gt10. cDNAs shown to originate from the YACs and from chromosome 22 are being used to isolate full length cDNAs. Three genes known to be present on these YACs, catechol-O-methyltransferase, tuple 1 and clathrin heavy chain have been recovered. Additionally, a gene related to the murine p120 gene and a number of novel short cDNAs have been isolated. The role of these genes in VCFS is being investigated.« less
Indexed variation graphs for efficient and accurate resistome profiling.

PubMed

Rowe, Will P M; Winn, Martyn D

2018-05-14

Antimicrobial resistance remains a major threat to global health. Profiling the collective antimicrobial resistance genes within a metagenome (the "resistome") facilitates greater understanding of antimicrobial resistance gene diversity and dynamics. In turn, this can allow for gene surveillance, individualised treatment of bacterial infections and more sustainable use of antimicrobials. However, resistome profiling can be complicated by high similarity between reference genes, as well as the sheer volume of sequencing data and the complexity of analysis workflows. We have developed an efficient and accurate method for resistome profiling that addresses these complications and improves upon currently available tools. Our method combines a variation graph representation of gene sets with an LSH Forest indexing scheme to allow for fast classification of metagenomic sequence reads using similarity-search queries. Subsequent hierarchical local alignment of classified reads against graph traversals enables accurate reconstruction of full-length gene sequences using a scoring scheme. We provide our implementation, GROOT, and show it to be both faster and more accurate than a current reference-dependent tool for resistome profiling. GROOT runs on a laptop and can process a typical 2 gigabyte metagenome in 2 minutes using a single CPU. Our method is not restricted to resistome profiling and has the potential to improve current metagenomic workflows. GROOT is written in Go and is available at https://github.com/will-rowe/groot (MIT license). will.rowe@stfc.ac.uk. Supplementary data are available at Bioinformatics online.
[Cloning and functional characterization of phytoene desaturase in Andrographis paniculata].

PubMed

Shen, Qin-qin; Li, Li-xia; Zhan, Peng-lin; Wang, Qiang

2015-10-01

A full-length cDNA of phytoene desaturase (PDS) gene from Andrographis paniculata was obtained through RACE-PCR. The cDNA sequence consists of 2 224 bp with an intact ORF of 1 752 bp (GeneBank: KP982892), encoding a ploypeptide of 584 amino acids. Homology analysis showed that the deduced protein has extensive sequence similarities to PDS from other plants, and contains a conserved NAD ( H) -binding domain of plant dehydrase cofactor binding-domain in N-terminal. Phylogenetic analysis demonstrated that ApPDS was more related to PDS of Sesamum indicum and Pogostemon cablin. The semi-quantitative RT-PCR analysis revealed that ApPDS expressed in whole aboveground tissues with the highest expression in leaves. Virus induced gene silencing (VIGS) was performed to characterize the functional of ApPDS in planta. Significant photobleaching was not observed in infiltrated leaves, while the PDS gene has been down-regulated significantly at the yellowish area. To the best of our knowledge, this represents the first report of PDS gene cloning and functional characterization from A. paniculata, which lays the foundation for further investigation of new genes, especially that correlative to andrographolide biosynthetic pathway.
Cotesia vestalis parasitization suppresses expression of a Plutella xylostella thioredoxin

USDA-ARS?s Scientific Manuscript database

Thioredoxins (Trxs) are a family of small, highly conserved and ubiquitous proteins involved in protecting organisms against toxic reactive oxygen species (ROS). In this study, a typical thioredoxin gene, PxTrx, was isolated from Plutella xylostella. The full-length cDNA sequence is composed of 959 ...
A developmental transcriptome map for allotetraploid arachis hypogaea

USDA-ARS?s Scientific Manuscript database

The advent of the genome sequences of Arachis duranensis and Arachis ipaensis has ushered in a new era for peanut genomics. With the goal of producing a gene atlas for cultivated peanut (Arachis hypogaea), 22 different tissue types and ontogenies that represent the full development of peanut were s...
Genome-wide identification and expression profiling of the SnRK2 gene family in Malus prunifolia.

PubMed

Shao, Yun; Qin, Yuan; Zou, Yangjun; Ma, Fengwang

2014-11-15

Sucrose non-fermenting-1-related protein kinase 2 (SnRK2) constitutes a small plant-specific serine/threonine kinase family with essential roles in the abscisic acid (ABA) signal pathway and in responses to osmotic stress. Although a genome-wide analysis of this family has been conducted in some species, little is known about SnRK2 genes in apple (Malus domestica). We identified 14 putative sequences encoding 12 deduced SnRK2 proteins within the apple genome. Gene chromosomal location and synteny analysis of the apple SnRK2 genes indicated that tandem and segmental duplications have likely contributed to the expansion and evolution of these genes. All 12 full-length coding sequences were confirmed by cloning from Malus prunifolia. The gene structure and motif compositions of the apple SnRK2 genes were analyzed. Phylogenetic analysis showed that MpSnRK2s could be classified into four groups. Profiling of these genes presented differential patterns of expression in various tissues. Under stress conditions, transcript levels for some family members were up-regulated in the leaves in response to drought, salinity, or ABA treatments. This suggested their possible roles in plant response to abiotic stress. Our findings provide essential information about SnRK2 genes in apple and will contribute to further functional dissection of this gene family. Copyright © 2014 Elsevier B.V. All rights reserved.
Antennal transcriptome analysis and comparison of olfactory genes in two sympatric defoliators, Dendrolimus houi and Dendrolimus kikuchii (Lepidoptera: Lasiocampidae).

PubMed

Zhang, Sufang; Zhang, Zhen; Wang, Hongbin; Kong, Xiangbo

2014-09-01

The Yunnan pine and Simao pine caterpillar moths, Dendrolimus houi Lajonquière and Dendrolimus kikuchii Matsumura (Lepidoptera: Lasiocampidae), are two closely related and sympatric pests of coniferous forests in southwestern China, and olfactory communication systems of these two insects have received considerable attention because of their economic importance. However, there is little information on the molecular aspect of odor detection about these insects. Furthermore, although lepidopteran species have been widely used in studies of insect olfaction, few work made comparison between sister moths on the olfactory recognition mechanisms. In this study, next-generation sequencing of the antennal transcriptome of these two moths were performed to identify the major olfactory genes. After comparing the antennal transcriptome of these two moths, we found that they exhibit highly similar transcripts-associated GO terms. Chemosensory gene families were further analyzed in both species. We identified 23 putative odorant binding proteins (OBP), 17 chemosensory proteins (CSP), two sensory neuron membrane proteins (SNMP), 33 odorant receptors (OR), and 10 ionotropic receptors (IR) in D. houi; and 27 putative OBPs, 17 CSPs, two SNMPs, 33 ORs, and nine IRs in D. kikuchii. All these transcripts were full-length or almost full-length. The predicted protein sequences were compared with orthologs in other species of Lepidoptera and model insects, including Bombyx mori, Manduca sexta, Heliothis virescens, Danaus plexippus, Sesamia inferens, Cydia pomonella, and Drosophila melanogaster. The sequence homologies of the orthologous genes in D. houi and D. kikuchii are very high. Furthermore, the olfactory genes were classed according to their expression level, and the highly expressed genes are our target for further function investigation. Interestingly, many highly expressed genes are ortholog gene of D. houi and D. kikuchii. We also found that the Classic OBPs were further separated into three groups according to their motifs, which will help future functional researches. Surprisingly, no pheromone receptor was identified in the two Dendrolimus species, which may indicate a special pheromone identification mechanism in Dendrolimus. Our work allows for further functional studies of pheromones and host volatile recognition genes, and give novel candidate targets for pest management. Copyright © 2014 Elsevier Ltd. All rights reserved.
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

PubMed

Herold, Keith E; Rasooly, Avraham

2003-12-01

Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Comparison of next generation sequencing technologies for transcriptome characterization

PubMed Central

2009-01-01

Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Chitin synthase in the filarial parasite, Brugia malayi.

PubMed

Harris, M T; Lai, K; Arnold, K; Martinez, H F; Specht, C A; Fuhrman, J A

2000-12-01

Fragments of putative chitin synthase (chs) genes from two filarial species (Brugia malayi and Dirofilaria immitis) were amplified by PCR using degenerate primers. The full genomic and cDNA sequences were obtained for the B. malayi chs gene (Bm-chs-1); the predicted amino acid sequence is highly similar, over a large region, to two CHS sequences of the nematode Caenorhabditis elegans and also to two insect CHS sequences. Bm-chs-1 is abundantly transcribed in B. malayi adult females, independent of their fertilization status, but is also expressed in males and microfilariae. Oocytes and early embryos contain large amounts of Bm-chs-1 transcript by in situ hybridization, but later stage embryos within the maternal uterus show little or no Bm-chs-1 transcript. No specific hybridization could be demonstrated in maternal somatic tissues. Polyclonal antibodies were raised against a peptide expressed from a recombinant cDNA fragment of Bm-chs-1; immunostaining detected CHS protein in oocytes and early to midstage embryos. These studies characterize a gene that is likely to be essential to oogenesis and embryonic development in a parasitic nematode. Because chitin synthesis and eggshell formation begin after fertilization, the presence of CHS protein in early oocytes suggests that the enzyme must be activated as a result of fertilization. These studies also demonstrate that chitin synthesis may not be restricted to eggshell formation in nematodes, as the Bm-chs-1 gene is transcribed in life cycle stages other than adult females.
Whole genome co-expression analysis of soybean cytochrome P450 genes identifies nodulation-specific P450 monooxygenases

PubMed Central

2010-01-01

Background Cytochrome P450 monooxygenases (P450s) catalyze oxidation of various substrates using oxygen and NAD(P)H. Plant P450s are involved in the biosynthesis of primary and secondary metabolites performing diverse biological functions. The recent availability of the soybean genome sequence allows us to identify and analyze soybean putative P450s at a genome scale. Co-expression analysis using an available soybean microarray and Illumina sequencing data provides clues for functional annotation of these enzymes. This approach is based on the assumption that genes that have similar expression patterns across a set of conditions may have a functional relationship. Results We have identified a total number of 332 full-length P450 genes and 378 pseudogenes from the soybean genome. From the full-length sequences, 195 genes belong to A-type, which could be further divided into 20 families. The remaining 137 genes belong to non-A type P450s and are classified into 28 families. A total of 178 probe sets were found to correspond to P450 genes on the Affymetrix soybean array. Out of these probe sets, 108 represented single genes. Using the 28 publicly available microarray libraries that contain organ-specific information, some tissue-specific P450s were identified. Similarly, stress responsive soybean P450s were retrieved from 99 microarray soybean libraries. We also utilized Illumina transcriptome sequencing technology to analyze the expressions of all 332 soybean P450 genes. This dataset contains total RNAs isolated from nodules, roots, root tips, leaves, flowers, green pods, apical meristem, mock-inoculated and Bradyrhizobium japonicum-infected root hair cells. The tissue-specific expression patterns of these P450 genes were analyzed and the expression of a representative set of genes were confirmed by qRT-PCR. We performed the co-expression analysis on many of the 108 P450 genes on the Affymetrix arrays. First we confirmed that CYP93C5 (an isoflavone synthase gene) is co-expressed with several genes encoding isoflavonoid-related metabolic enzymes. We then focused on nodulation-induced P450s and found that CYP728H1 was co-expressed with the genes involved in phenylpropanoid metabolism. Similarly, CYP736A34 was highly co-expressed with lipoxygenase, lectin and CYP83D1, all of which are involved in root and nodule development. Conclusions The genome scale analysis of P450s in soybean reveals many unique features of these important enzymes in this crop although the functions of most of them are largely unknown. Gene co-expression analysis proves to be a useful tool to infer the function of uncharacterized genes. Our work presented here could provide important leads toward functional genomics studies of soybean P450s and their regulatory network through the integration of reverse genetics, biochemistry, and metabolic profiling tools. The identification of nodule-specific P450s and their further exploitation may help us to better understand the intriguing process of soybean and rhizobium interaction. PMID:21062474
Positive selection on MHC class II DRB and DQB genes in the bank vole (Myodes glareolus).

PubMed

Scherman, Kristin; Råberg, Lars; Westerdahl, Helena

2014-05-01

The major histocompatibility complex (MHC) class IIB genes show considerable sequence similarity between loci. The MHC class II DQB and DRB genes are known to exhibit a high level of polymorphism, most likely maintained by parasite-mediated selection. Studies of the MHC in wild rodents have focused on DRB, whilst DQB has been given much less attention. Here, we characterised DQB genes in Swedish bank voles Myodes glareolus, using full-length transcripts. We then designed primers that specifically amplify exon 2 from DRB (202 bp) and DQB (205 bp) and investigated molecular signatures of natural selection on DRB and DQB alleles. The presence of two separate gene clusters was confirmed using BLASTN and phylogenetic analysis, where our seven transcripts clustered according to either DQB or DRB homologues. These gene clusters were again confirmed on exon 2 data from 454-amplicon sequencing. Our DRB primers amplify a similar number of alleles per individual as previously published DRB primers, though our reads are longer. Traditional d N/d S analyses of DRB sequences in the bank vole have not found a conclusive signal of positive selection. Using a more advanced substitution model (the Kumar method) we found positive selection in the peptide binding region (PBR) of both DRB and DQB genes. Maximum likelihood models of codon substitutions detected positively selected sites located in the PBR of both DQB and DRB. Interestingly, these analyses detected at least twice as many positively selected sites in DQB than DRB, suggesting that DQB has been under stronger positive selection than DRB over evolutionary time.
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

PubMed Central

2010-01-01

Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
Identification and Characterization of 40 Isolated Rehmannia glutinosa MYB Family Genes and Their Expression Profiles in Response to Shading and Continuous Cropping

PubMed Central

Wang, Fengqing; Suo, Yanfei; Wei, He; Li, Mingjie; Xie, Caixia; Wang, Lina; Chen, Xinjian; Zhang, Zhongyi

2015-01-01

The v-myb avian myeloblastosis viral oncogene homolog (MYB) superfamily constitutes one of the most abundant groups of transcription factors (TFs) described in plants. To date, little is known about the MYB genes in Rehmannia glutinosa. Forty unique MYB genes with full-length cDNA sequences were isolated. These 40 genes were grouped into five categories, one R1R2R3-MYB, four TRFL MYBs, four SMH MYBs, 25 R2R3-MYBs, and six MYB-related members. The MYB DNA-binding domain (DBD) sequence composition was conserved among proteins of the same subgroup. As expected, most of the closely related members in the phylogenetic tree exhibited common motifs. Additionally, the gene structure and motifs of the R. glutinosa MYB genes were analyzed. MYB gene expression was analyzed in the leaf and the tuberous root under two abiotic stress conditions. Expression profiles showed that most R. glutinosa MYB genes were expressed in the leaf and the tuberous root, suggesting that MYB genes are involved in various physiological and developmental processes in R. glutinosa. Seven MYB genes were up-regulated in response to shading in at least one tissue. Two MYB genes showed increased expression and 13 MYB genes showed decreased expression in the tuberous root under continuous cropping. This investigation is the first comprehensive study of the MYB gene family in R. glutinosa. PMID:26147429
Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels

PubMed Central

Jasim, Anfal A.; Al-Bustan, Suzanne A.; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

2018-01-01

Common variants of Apolipoprotein A5 (APOA5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3′ UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism. PMID:29686695
Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels.

PubMed

Jasim, Anfal A; Al-Bustan, Suzanne A; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

2018-01-01

Common variants of Apolipoprotein A5 ( APOA 5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3' UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism.
Isoform Sequencing Provides a More Comprehensive View of the Panax ginseng Transcriptome.

PubMed

Jo, Ick-Hyun; Lee, Jinsu; Hong, Chi Eun; Lee, Dong Jin; Bae, Wonsil; Park, Sin-Gi; Ahn, Yong Ju; Kim, Young Chang; Kim, Jang Uk; Lee, Jung Woo; Hyun, Dong Yun; Rhee, Sung-Keun; Hong, Chang Pyo; Bang, Kyong Hwan; Ryu, Hojin

2017-09-15

Korean ginseng ( Panax ginseng C.A. Meyer) has been widely used for medicinal purposes and contains potent plant secondary metabolites, including ginsenosides. To obtain transcriptomic data that offers a more comprehensive view of functional genomics in P. ginseng , we generated genome-wide transcriptome data from four different P. ginseng tissues using PacBio isoform sequencing (Iso-Seq) technology. A total of 135,317 assembled transcripts were generated with an average length of 3.2 kb and high assembly completeness. Of those unigenes, 67.5% were predicted to be complete full-length (FL) open reading frames (ORFs) and exhibited a high gene annotation rate. Furthermore, we successfully identified unique full-length genes involved in triterpenoid saponin synthesis and plant hormonal signaling pathways, including auxin and cytokinin. Studies on the functional genomics of P. ginseng seedlings have confirmed the rapid upregulation of negative feed-back loops by auxin and cytokinin signaling cues. The conserved evolutionary mechanisms in the auxin and cytokinin canonical signaling pathways of P. ginseng are more complex than those in Arabidopsis thaliana . Our analysis also revealed a more detailed view of transcriptome-wide alternative isoforms for 88 genes. Finally, transposable elements (TEs) were also identified, suggesting transcriptional activity of TEs in P. ginseng . In conclusion, our results suggest that long-read, full-length or partial-unigene data with high-quality assemblies are invaluable resources as transcriptomic references in P. ginseng and can be used for comparative analyses in closely related medicinal plants.

Near full-length 16S rRNA gene next-generation sequencing revealed Asaia as a common midgut bacterium of wild and domesticated Queensland fruit fly larvae.

PubMed

Deutscher, Ania T; Burke, Catherine M; Darling, Aaron E; Riegler, Markus; Reynolds, Olivia L; Chapman, Toni A

2018-05-05

Gut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets). Near full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains. Variation in the gut bacterial communities of B. tryoni larvae depends on diet, domestication, and horizontal acquisition. Bacterial variation in wild larvae suggests that more than one bacterial species can perform the same functional role; however, Asaia could be an important gut bacterium in larvae and warrants further study. A greater understanding of the functions of the bacteria detected in larvae could lead to increased fly quality and performance as part of the SIT.
Analysis of the complete genome of subgroup A' hepatitis B virus isolates from South Africa.

PubMed

Kramvis, Anna; Weitzmann, Louise; Owiredu, William K B A; Kew, Michael C

2002-04-01

A phylogenetic analysis is presented of six complete and seven pre-S1/S2/S gene sequences of hepatitis B virus (HBV) isolates from South Africa. Five of the full-length sequences and all of the pre-S2/S sequences have been previously reported. Four of the six complete genomes and three of the five incomplete sequences clustered with subgroup A', a unique segment of genotype A of HBV previously identified in 60% of South African isolates using analysis of the pre-S2/S region alone. This separation was also evident when the polymerase open reading frame was analysed, but not on analysis of either the X or pre-core/core genes. Amino acids were identified in the pre-S1 and polymerase regions specific to subgroup A'. In common with genotype D, 10 of 11 genotype A South African isolates had an 11 amino acid deletion in the amino end of the pre-S1 region. This deletion is also found in hepadnaviruses from non-human primates.
Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology

PubMed Central

Udy, Dylan B.; Voorhies, Mark; Chan, Patricia P.; Lowe, Todd M.; Dumont, Sophie

2015-01-01

The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes—and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics. PMID:26252667
SELMAP - SELEX affinity landscape MAPping of transcription factor binding sites using integrated microfluidics

PubMed Central

Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron

2016-01-01

Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
Draft De Novo Transcriptome of the Rat Kangaroo Potorous tridactylus as a Tool for Cell Biology.

PubMed

Udy, Dylan B; Voorhies, Mark; Chan, Patricia P; Lowe, Todd M; Dumont, Sophie

2015-01-01

The rat kangaroo (long-nosed potoroo, Potorous tridactylus) is a marsupial native to Australia. Cultured rat kangaroo kidney epithelial cells (PtK) are commonly used to study cell biological processes. These mammalian cells are large, adherent, and flat, and contain large and few chromosomes-and are thus ideal for imaging intra-cellular dynamics such as those of mitosis. Despite this, neither the rat kangaroo genome nor transcriptome have been sequenced, creating a challenge for probing the molecular basis of these cellular dynamics. Here, we present the sequencing, assembly and annotation of the draft rat kangaroo de novo transcriptome. We sequenced 679 million reads that mapped to 347,323 Trinity transcripts and 20,079 Unigenes. We present statistics emerging from transcriptome-wide analyses, and analyses suggesting that the transcriptome covers full-length sequences of most genes, many with multiple isoforms. We also validate our findings with a proof-of-concept gene knockdown experiment. We expect that this high quality transcriptome will make rat kangaroo cells a more tractable system for linking molecular-scale function and cellular-scale dynamics.
Conservation and divergence of ADAM family proteins in the Xenopus genome

PubMed Central

2010-01-01

Background Members of the disintegrin metalloproteinase (ADAM) family play important roles in cellular and developmental processes through their functions as proteases and/or binding partners for other proteins. The amphibian Xenopus has long been used as a model for early vertebrate development, but genome-wide analyses for large gene families were not possible until the recent completion of the X. tropicalis genome sequence and the availability of large scale expression sequence tag (EST) databases. In this study we carried out a systematic analysis of the X. tropicalis genome and uncovered several interesting features of ADAM genes in this species. Results Based on the X. tropicalis genome sequence and EST databases, we identified Xenopus orthologues of mammalian ADAMs and obtained full-length cDNA clones for these genes. The deduced protein sequences, synteny and exon-intron boundaries are conserved between most human and X. tropicalis orthologues. The alternative splicing patterns of certain Xenopus ADAM genes, such as adams 22 and 28, are similar to those of their mammalian orthologues. However, we were unable to identify an orthologue for ADAM7 or 8. The Xenopus orthologue of ADAM15, an active metalloproteinase in mammals, does not contain the conserved zinc-binding motif and is hence considered proteolytically inactive. We also found evidence for gain of ADAM genes in Xenopus as compared to other species. There is a homologue of ADAM10 in Xenopus that is missing in most mammals. Furthermore, a single scaffold of X. tropicalis genome contains four genes encoding ADAM28 homologues, suggesting genome duplication in this region. Conclusions Our genome-wide analysis of ADAM genes in X. tropicalis revealed both conservation and evolutionary divergence of these genes in this amphibian species. On the one hand, all ADAMs implicated in normal development and health in other species are conserved in X. tropicalis. On the other hand, some ADAM genes and ADAM protease activities are absent, while other novel ADAM proteins in this species are predicted by this study. The conservation and unique divergence of ADAM genes in Xenopus probably reflect the particular selective pressures these amphibian species faced during evolution. PMID:20630080
Cloning and developmental expression of pea ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit N-methyltransferase

DOEpatents

Houtz, Robert L.

1998-01-01

The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .epsilon.N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Cloning and developmental expression of pea ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit epsilon N-methyltransferase

DOEpatents

Houtz, Robert L.

1999-01-01

The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) .sup..epsilon. N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the .epsilon.-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed.
Cloning and developmental expression of pea ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit N-methyltransferase

DOEpatents

Houtz, R.L.

1998-03-03

The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS) {epsilon}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 5 figs.
Cloning and developmental expression of pea ribulose-1,5-bisphosphate carboxylase/oxygenase large subunit epsilon N-methyltransferase

DOEpatents

Houtz, R.L.

1999-02-02

The gene sequence for ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) large subunit (LS){sup {epsilon}}N-methyltransferase (protein methylase III or Rubisco LSMT) is disclosed. This enzyme catalyzes methylation of the {epsilon}-amine of lysine-14 in the large subunit of Rubisco. In addition, a full-length cDNA clone for Rubisco LSMT is disclosed. Transgenic plants and methods of producing same which (1) have the Rubisco LSMT gene inserted into the DNA, and (2) have the Rubisco LSMT gene product or the action of the gene product deleted from the DNA are also provided. Further, methods of using the gene to selectively deliver desired agents to a plant are also disclosed. 8 figs.
[Identification of genes that are specifically/preferentially expressed in developing cotton fibers by mRNA fluorescence differential display (FDD)].

PubMed

Sun, Jie; Li, Yuan-Li; Wang, Ruo-Hai; Xia, Gui-Xian

2004-01-01

Fluorescence differential display (FDD) technique was used to identify genes that are specifically or preferentially expressed in different developmental stages of cotton fiber cells. One hundred and nine differentially displayed cDNA fragments were isolated using 9, 21 and 27 DPA (days postanthesis) fibers as experimental materials. By a combination of two rounds of reverse Northern hybridization and Northern blot analyses, a number of such cDNA fragments were proved to represent fiber-specific/preferential genes. Sequencing determination and database searching indicated that most of these genes are novel. This work is an important step towards cloning the full-length cDNAs and characterizing the cellular functions of aforementioned genes in fiber development.
Characterization of two chitin synthase genes of the red flour beetle, Tribolium castaneum, and alternate exon usage in one of the genes during development.

PubMed

Arakane, Yasuyuki; Hogenkamp, David G; Zhu, Yu Cheng; Kramer, Karl J; Specht, Charles A; Beeman, Richard W; Kanost, Michael R; Muthukrishnan, Subbaratnam

2004-03-01

Two chitin synthase (CHS) genes of the red flour beetle, Tribolium castaneum, were sequenced and their transcription patterns during development examined. By screening a BAC library of genomic DNA from T. castaneum (Tc) with a DNA probe encoding the catalytic domain of a putative Tribolium CHS, several clones that contained CHS genes were identified. Two distinct PCR products were amplified from these BAC clones and confirmed to be highly similar to CHS genes from other insects, nematodes and fungi. The DNA sequences of these genes, TcCHS1 and TcCHS2, were determined by amplification of overlapping PCR fragments from two of the BAC DNAs and mapped to different linkage groups. Each ORF was identified and full-length cDNAs were also amplified, cloned and sequenced. TcCHS1 and TcCHS2 encode transmembrane proteins of 1558 and 1464 amino acids, respectively. The TcCHS1 gene was found to use alternate exons, each encoding 59 amino acids, a feature not found in the TcCHS2 gene. During development, Tribolium expressed TcCHS1 predominantly in the embryonic and pupal stages, whereas TcCHS2 was prevalent in the late larval and adult stages. The alternate exon 8a of TcCHS1 was utilized over a much broader range of development than exon 8b. We propose that the two isoforms of the TcCHS1 enzyme are used predominantly for the formation of chitin in embryonic and pupal cuticles, whereas TcCHS2 is utilized primarily for the synthesis of peritrophic membrane-associated chitin in the midgut.
Molecular and functional characterization of a Taenia adhesion gene family (TAF) encoding potential protective antigens of Taenia saginata oncospheres.

PubMed

Gonzalez, Luis Miguel; Bonay, Pedro; Benitez, Laura; Ferrer, Elizabeth; Harrison, Leslie J S; Parkhouse, R Michael E; Garate, Teresa

2007-02-01

Two clones from an activated Taenia saginata oncosphere cDNA library, Ts45W and Ts45S, were isolated and sequenced. Both of these genes belong to the Taenia ovis 45W gene family. The Ts45W and Ts45S cDNAs are 997- and 1,004-bp-long, each corresponding to 255 amino acids and with theoretical molecular masses of 27.8 and 27.7 kDa, respectively. Southern blot profiles obtained with Ts45W cDNA as a probe suggest that these two genes are members of a multigene family with tandem organization. The full genomic sequence was determined for the Ts45W gene and a new family member, the Ts45W/2 gene. The genomic sequences of the T. saginata Ts45W and Ts45W/2 genes were at least 2.2 kb in length with four exons separated by three introns. Exons 1 and 4 coded for hydrophobic domains, while, importantly, exons 2 and 3 coded for fibronectin homologous domains. These domains are presumably responsible for the demonstrated cell adhesion and, perhaps, the protective nature of this family of molecules and the acronym TAF (Taenia adhesion family) is proposed for this group of genes. We hypothesize that these TAF proteins and another T. saginata-protective antigen, HP6, have evolved the dual functions of facilitating tissue invasion and stimulating protective immunity to first ensure primary infection and subsequently to establish a concomitant protective immunity to protect the host from death or debilitation through superinfection by subsequent infections and thus help ensure parasite survival.
The suppression of tomato defence response genes upon potato cyst nematode infection indicates a key regulatory role of miRNAs.

PubMed

Święcicka, Magdalena; Skowron, Waldemar; Cieszyński, Piotr; Dąbrowska-Bronk, Joanna; Matuszkiewicz, Mateusz; Filipecki, Marcin; Koter, Marek Daniel

2017-04-01

Potato cyst nematode Globodera rostochiensis is an obligate parasite of solanaceous plants, triggering metabolic and morphological changes in roots which may result in substantial crop yield losses. Previously, we used the cDNA-AFLP to study the transcriptional dynamics in nematode infected tomato roots. Now, we present the rescreening of already published, upregulated transcript-derived fragment dataset using the most current tomato transcriptome sequences. Our reanalysis allowed to add 54 novel genes to 135, already found as upregulated in tomato roots upon G. rostochiensis infection (in total - 189). We also created completely new catalogue of downregulated sequences leading to the discovery of 76 novel genes. Functional classification of candidates showed that the 'wound, stress and defence response' category was enriched in the downregulated genes. We confirmed the transcriptional dynamics of six genes by qRT-PCR. To place our results in a broader context, we compared the tomato data with Arabidopsis thaliana, revealing similar proportions of upregulated and downregulated genes as well as similar enrichment of defence related transcripts in the downregulated group. Since transcript suppression is quite common in plant-nematode interactions, we assessed the possibility of miRNA-mediated inverse correlation on several tomato sequences belonging to NB-LRR and receptor-like kinase families. The qRT-PCR of miRNAs and putative target transcripts showed an opposite expression pattern in 9 cases. These results together with in silico analyses of potential miRNA targeting to the full repertoire of tomato R-genes show that miRNA mediated gene suppression may be a key regulatory mechanism during nematode parasitism. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Sequence and Role in Virulence of the Three Plasmid Complement of the Model Tumor-Inducing Bacterium Pseudomonas savastanoi pv. savastanoi NCPPB 3335

PubMed Central

Bardaji, Leire; Pérez-Martínez, Isabel; Rodríguez-Moreno, Luis; Rodríguez-Palenzuela, Pablo; Sundin, George W.; Ramos, Cayo; Murillo, Jesús

2011-01-01

Pseudomonas savastanoi pv. savastanoi NCPPB 3335 is a model for the study of the molecular basis of disease production and tumor formation in woody hosts, and its draft genome sequence has been recently obtained. Here we closed the sequence of the plasmid complement of this strain, composed of three circular molecules of 78,357 nt (pPsv48A), 45,220 nt (pPsv48B), and 42,103 nt (pPsv48C), all belonging to the pPT23A-like family of plasmids widely distributed in the P. syringae complex. A total of 152 coding sequences were predicted in the plasmid complement, of which 38 are hypothetical proteins and seven correspond to putative virulence genes. Plasmid pPsv48A contains an incomplete Type IVB secretion system, the type III secretion system (T3SS) effector gene hopAF1, gene ptz, involved in cytokinin biosynthesis, and three copies of a gene highly conserved in plant-associated proteobacteria, which is preceded by a hrp box motif. A complete Type IVA secretion system, a well conserved origin of transfer (oriT), and a homolog of the T3SS effector gene hopAO1 are present in pPsv48B, while pPsv48C contains a gene with significant homology to isopentenyl-diphosphate delta-isomerase, type 1. Several potential mobile elements were found on the three plasmids, including three types of MITE, a derivative of IS801, and a new transposon effector, ISPsy30. Although the replication regions of these three plasmids are phylogenetically closely related, their structure is diverse, suggesting that the plasmid architecture results from an active exchange of sequences. Artificial inoculations of olive plants with mutants cured of plasmids pPsv48A and pPsv48B showed that pPsv48A is necessary for full virulence and for the development of mature xylem vessels within the knots; we were unable to obtain mutants cured of pPsv48C, which contains five putative toxin-antitoxin genes. PMID:22022435
Candidate Genes Expressed in Tolerant Common Wheat With Resistant to English Grain Aphid.

PubMed

Luo, Kun; Zhang, Gaisheng; Wang, Chunping; Ouellet, Thérèse; Wu, Jingjing; Zhu, Qidi; Zhao, Huiyan

2014-10-01

The English grain aphid, Sitobion avenae (F.) (Hemiptera: Aphididae), is a common worldwide pest of wheat (Triticum aestivum L.). The use of improved resistant cultivars by the farmers is the most effective and environmentally friendly method to control this aphid in the field. The winter wheat genotypes 98-10-35 and Amigo are resistant to S. avenae. To identify genes responsible for resistance to S. avenae in these genotypes, differential-display reverse transcription-polymerase chain reaction was used to identify the corresponding differentially expressed sequences in current study. Two backcross progenies were obtained by crossing the two resistant genotypes with the susceptible genotype 1376. Six potential expected-differential bands were sequenced. Lengths of the expressed sequence tags ranged from 128 to 532 bp. Although these expressed sequences were likely associated with S. avenae resistance, there was one expressed sequence tag located on 7DL chromosome, and its potential function may associate with the ability to maintain photosynthesis in wheat. That serves as an active way for tolerant common wheat with resistant to S. avenae. Cloning the full length of these sequences would help us thoroughly understand the mechanism of wheat resistance to S. avenae and be valuable for breeding cultivars with S. avenae resistance. © 2014 Entomological Society of America.
GeneSilico protein structure prediction meta-server.

PubMed

Kurowski, Michal A; Bujnicki, Janusz M

2003-07-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server

PubMed Central

Kurowski, Michal A.; Bujnicki, Janusz M.

2003-01-01

Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
Bacterial diversity at different stages of the composting process

PubMed Central

2010-01-01

Background Composting is an aerobic microbiological process that is facilitated by bacteria and fungi. Composting is also a method to produce fertilizer or soil conditioner. Tightened EU legislation now requires treatment of the continuously growing quantities of organic municipal waste before final disposal. However, some full-scale composting plants experience difficulties with the efficiency of biowaste degradation and with the emission of noxious odours. In this study we examine the bacterial species richness and community structure of an optimally working pilot-scale compost plant, as well as a full-scale composting plant experiencing typical problems. Bacterial species composition was determined by isolating total DNA followed by amplifying and sequencing the gene encoding the 16S ribosomal RNA. Results Over 1500 almost full-length 16S rRNA gene sequences were analysed and of these, over 500 were present only as singletons. Most of the sequences observed in either one or both of the composting processes studied here were similar to the bacterial species reported earlier in composts, including bacteria from the phyla Actinobacteria, Bacteroidetes, Firmicutes, Proteobacteria and Deinococcus-Thermus. In addition, a number of previously undetected bacterial phylotypes were observed. Statistical calculations estimated a total bacterial diversity of over 2000 different phylotypes in the studied composts. Conclusions Interestingly, locally enriched or evolved bacterial variants of familiar compost species were observed in both composts. A detailed comparison of the bacterial diversity revealed a large difference in composts at the species and strain level from the different composting plants. However, at the genus level, the difference was much smaller and illustrated a delay of the composting process in the full-scale, sub-optimally performing plants. PMID:20350306
Single molecule sequencing to track plasmid diversity of hospital-associated carbapenemase-producing Enterobacteriaceae

PubMed Central

Conlan, Sean; Thomas, Pamela J.; Deming, Clayton; Park, Morgan; Lau, Anna F.; Dekker, John P.; Snitkin, Evan S.; Clark, Tyson A.; Luong, Khai; Song, Yi; Tsai, Yu-Chih; Boitano, Matthew; Gupta, Jyoti; Brooks, Shelise Y.; Schmidt, Brian; Young, Alice C.; Thomas, James W.; Bouffard, Gerard G.; Blakesley, Robert W.; Mullikin, James C.; Korlach, Jonas; Henderson, David K.; Frank, Karen M.; Palmore, Tara N.; Segre, Julia A.

2014-01-01

Public health officials have raised concerns that plasmid transfer between Enterobacteriaceae species may spread resistance to carbapenems, an antibiotic class of last resort, thereby rendering common healthcare-associated infections nearly impossible to treat. We performed comprehensive surveillance and genomic sequencing to identify carbapenem-resistant Enterobacteriaceae in the NIH Clinical Center patient population and hospital environment in order to to articulate the diversity of carbapenemase-encoding plasmids and survey the mobility of and assess the mobility of these plasmids between bacterial species. We isolated a repertoire of carbapenemase-encoding Enterobacteriaceae, including multiple strains of Klebsiella pneumoniae, Klebsiella oxytoca, Escherichia coli, Enterobacter cloacae, Citrobacter freundii, and Pantoea species. Long-read genome sequencing with full end-to-end assembly revealed that these organisms carry the carbapenem-resistance genes on a wide array of plasmids. Klebsiella pneumoniae and Enterobacter cloacae isolated simultaneously from a single patient harbored two different carbapenemase-encoding plasmids, overriding the epidemiological scenario of plasmid transfer between organisms within this patient. We did, however, find evidence supporting horizontal transfer of carbapenemase-encoding plasmids between Klebsiella pneumoniae, Enterobacter cloacae and Citrobacter freundii in the hospital environment. Our comprehensive sequence data, with full plasmid identification, challenges assumptions about horizontal gene transfer events within patients and identified wider possible connections between patients and the hospital environment. In addition, we identified a new carbapenemase-encoding plasmid of potentially high clinical impact carried by Klebsiella pneumoniae, Escherichia coli, Enterobacter cloacae and Pantoea species, from unrelated patients and the hospital environment. PMID:25232178

Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai

PubMed Central

Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung

2016-01-01

An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His69, Asp117, and Ser216. The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5′ donor splice (GT) and 3′ acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai. PMID:27399771
Cloning a Chymotrypsin-Like 1 (CTRL-1) Protease cDNA from the Jellyfish Nemopilema nomurai.

PubMed

Heo, Yunwi; Kwon, Young Chul; Bae, Seong Kyeong; Hwang, Duhyeon; Yang, Hye Ryeon; Choudhary, Indu; Lee, Hyunkyoung; Yum, Seungshic; Shin, Kyoungsoon; Yoon, Won Duk; Kang, Changkeun; Kim, Euikyung

2016-07-05

An enzyme in a nematocyst extract of the Nemopilema nomurai jellyfish, caught off the coast of the Republic of Korea, catalyzed the cleavage of chymotrypsin substrate in an amidolytic kinetic assay, and this activity was inhibited by the serine protease inhibitor, phenylmethanesulfonyl fluoride. We isolated the full-length cDNA sequence of this enzyme, which contains 850 nucleotides, with an open reading frame of 801 encoding 266 amino acids. A blast analysis of the deduced amino acid sequence showed 41% identity with human chymotrypsin-like (CTRL) and the CTRL-1 precursor. Therefore, we designated this enzyme N. nomurai CTRL-1. The primary structure of N. nomurai CTRL-1 includes a leader peptide and a highly conserved catalytic triad of His(69), Asp(117), and Ser(216). The disulfide bonds of chymotrypsin and the substrate-binding sites are highly conserved compared with the CTRLs of other species, including mammalian species. Nemopilema nomurai CTRL-1 is evolutionarily more closely related to Actinopterygii than to Scyphozoan (Aurelia aurita) or Hydrozoan (Hydra vulgaris). The N. nomurai CTRL1 was amplified from the genomic DNA with PCR using specific primers designed based on the full-length cDNA, and then sequenced. The N. nomurai CTRL1 gene contains 2434 nucleotides and four distinct exons. The 5' donor splice (GT) and 3' acceptor splice sequences (AG) are wholly conserved. This is the first report of the CTRL1 gene and cDNA structures in the jellyfish N. nomurai.
Identification of candidate MLO powdery mildew susceptibility genes in cultivated Solanaceae and functional characterization of tobacco NtMLO1.

PubMed

Appiano, Michela; Pavan, Stefano; Catalano, Domenico; Zheng, Zheng; Bracuto, Valentina; Lotti, Concetta; Visser, Richard G F; Ricciardi, Luigi; Bai, Yuling

2015-10-01

Specific homologs of the plant Mildew Locus O (MLO) gene family act as susceptibility factors towards the powdery mildew (PM) fungal disease, causing significant economic losses in agricultural settings. Thus, in order to obtain PM resistant phenotypes, a general breeding strategy has been proposed, based on the selective inactivation of MLO susceptibility genes across cultivated species. In this study, PCR-based methodologies were used in order to isolate MLO genes from cultivated solanaceous crops that are hosts for PM fungi, namely eggplant, potato and tobacco, which were named SmMLO1, StMLO1 and NtMLO1, respectively. Based on phylogenetic analysis and sequence alignment, these genes were predicted to be orthologs of tomato SlMLO1 and pepper CaMLO2, previously shown to be required for PM pathogenesis. Full-length sequence of the tobacco homolog NtMLO1 was used for a heterologous transgenic complementation assay, resulting in its characterization as a PM susceptibility gene. The same assay showed that a single nucleotide change in a mutated NtMLO1 allele leads to complete gene loss-of-function. Results here presented, also including a complete overview of the tobacco and potato MLO gene families, are valuable to study MLO gene evolution in Solanaceae and for molecular breeding approaches aimed at introducing PM resistance using strategies of reverse genetics.
Identification and characterization of a novel serine-threonine kinase gene from the Xp22 region.

PubMed

Montini, E; Andolfi, G; Caruso, A; Buchner, G; Walpole, S M; Mariani, M; Consalez, G; Trump, D; Ballabio, A; Franco, B

1998-08-01

Eukaryotic protein kinases are part of a large and expanding family of proteins. Through our transcriptional mapping effort in the Xp22 region, we have isolated and sequenced the full-length transcript of STK9, a novel cDNA highly homologous to serine-threonine kinases. A number of human genetic disorders have been mapped to the region where STK9 has been localized including Nance-Horan (NH) syndrome, oral-facial-digital syndrome type 1 (OFD1), and a novel locus for nonsyndromic sensorineural deafness (DFN6). To evaluate the possible involvement of STK9 in any of the above-mentioned disorders, a 2416-bp full-length cDNA was assembled. The entire genomic structure of the gene, which is composed of 20 coding exons, was determined. Northern analysis revealed a transcript larger than 9.5 kb in several tissues including brain, lung, and kidney. The mouse homologue (Stk9) was identified and mapped in the mouse in the region syntenic to human Xp. This location is compatible with the location of the Xcat mutant, which shows congenital cataracts very similar to those observed in NH patients. Sequence homologies, expression pattern, and mapping information in both human and mouse make STK9 a candidate gene for the above-mentioned disorders. Copyright 1998 Academic Press.
Identification of tissue-specific, abiotic stress-responsive gene expression patterns in wine grape (Vitis vinifera L.) based on curation and mining of large-scale EST data sets

PubMed Central

2011-01-01

Background Abiotic stresses, such as water deficit and soil salinity, result in changes in physiology, nutrient use, and vegetative growth in vines, and ultimately, yield and flavor in berries of wine grape, Vitis vinifera L. Large-scale expressed sequence tags (ESTs) were generated, curated, and analyzed to identify major genetic determinants responsible for stress-adaptive responses. Although roots serve as the first site of perception and/or injury for many types of abiotic stress, EST sequencing in root tissues of wine grape exposed to abiotic stresses has been extremely limited to date. To overcome this limitation, large-scale EST sequencing was conducted from root tissues exposed to multiple abiotic stresses. Results A total of 62,236 expressed sequence tags (ESTs) were generated from leaf, berry, and root tissues from vines subjected to abiotic stresses and compared with 32,286 ESTs sequenced from 20 public cDNA libraries. Curation to correct annotation errors, clustering and assembly of the berry and leaf ESTs with currently available V. vinifera full-length transcripts and ESTs yielded a total of 13,278 unique sequences, with 2302 singletons and 10,976 mapped to V. vinifera gene models. Of these, 739 transcripts were found to have significant differential expression in stressed leaves and berries including 250 genes not described previously as being abiotic stress responsive. In a second analysis of 16,452 ESTs from a normalized root cDNA library derived from roots exposed to multiple, short-term, abiotic stresses, 135 genes with root-enriched expression patterns were identified on the basis of their relative EST abundance in roots relative to other tissues. Conclusions The large-scale analysis of relative EST frequency counts among a diverse collection of 23 different cDNA libraries from leaf, berry, and root tissues of wine grape exposed to a variety of abiotic stress conditions revealed distinct, tissue-specific expression patterns, previously unrecognized stress-induced genes, and many novel genes with root-enriched mRNA expression for improving our understanding of root biology and manipulation of rootstock traits in wine grape. mRNA abundance estimates based on EST library-enriched expression patterns showed only modest correlations between microarray and quantitative, real-time reverse transcription-polymerase chain reaction (qRT-PCR) methods highlighting the need for deep-sequencing expression profiling methods. PMID:21592389
Transcriptome analysis of Bupleurum chinense focusing on genes involved in the biosynthesis of saikosaponins

PubMed Central

2011-01-01

Abstract Background Bupleurum chinense DC. is a widely used traditional Chinese medicinal plant. Saikosaponins are the major bioactive constituents of B. chinense, but relatively little is known about saikosaponin biosynthesis. The 454 pyrosequencing technology provides a promising opportunity for finding novel genes that participate in plant metabolism. Consequently, this technology may help to identify the candidate genes involved in the saikosaponin biosynthetic pathway. Results One-quarter of the 454 pyrosequencing runs produced a total of 195, 088 high-quality reads, with an average read length of 356 bases (NCBI SRA accession SRA039388). A de novo assembly generated 24, 037 unique sequences (22, 748 contigs and 1, 289 singletons), 12, 649 (52.6%) of which were annotated against three public protein databases using a basic local alignment search tool (E-value ≤1e-10). All unique sequences were compared with NCBI expressed sequence tags (ESTs) (237) and encoding sequences (44) from the Bupleurum genus, and with a Sanger-sequenced EST dataset (3, 111). The 23, 173 (96.4%) unique sequences obtained in the present study represent novel Bupleurum genes. The ESTs of genes related to saikosaponin biosynthesis were found to encode known enzymes that catalyze the formation of the saikosaponin backbone; 246 cytochrome P450 (P450s) and 102 glycosyltransferases (GTs) unique sequences were also found in the 454 dataset. Full length cDNAs of 7 P450s and 7 uridine diphosphate GTs (UGTs) were verified by reverse transcriptase polymerase chain reaction or by cloning using 5' and/or 3' rapid amplification of cDNA ends. Two P450s and three UGTs were identified as the most likely candidates involved in saikosaponin biosynthesis. This finding was based on the coordinate up-regulation of their expression with β-AS in methyl jasmonate-treated adventitious roots and on their similar expression patterns with β-AS in various B. chinense tissues. Conclusions A collection of high-quality ESTs for B. chinense obtained by 454 pyrosequencing is provided here for the first time. These data should aid further research on the functional genomics of B. chinense and other Bupleurum species. The candidate genes for enzymes involved in saikosaponin biosynthesis, especially the P450s and UGTs, that were revealed provide a substantial foundation for follow-up research on the metabolism and regulation of the saikosaponins. PMID:22047182
Molecular cloning and sequence analysis of full-length growth hormone cDNAs from six important economic fishes.

PubMed

Zhang, Jing-Nan; Song, Ping; Hu, Jia-Rui; Mo, Sai-Jun; Peng, Mao-Yu; Zhou, Wei; Zou, Ji-Xing; Hu, Yin-Chang

2005-01-01

In this study,the full-length cDNAs of GH (Growth Hormone) gene was isolated from six important economic fishes, Siniperca kneri, Epinephelus coioides, Monopterus albus, Silurus asotus, Misgurnus anguillicaudatus and Carassius auratus gibelio Bloch. It is the first time to clone these GH sequences except E. coioides GH. The lengths of the above cDNAs are as follows: 953 bp, 1 023 bp, 825 bp, 1 082 bp, 1 154 bp and 1 180 bp. Each sequence includes an ORF of about 600 bp which encodes a protein of about 200 amino acid: S. kneri, E. coioides and M. albus GHs of 204 amino acid, S. asotus GH of 200 amino acid, M. anguillicaudatus and C. auratus gibelio GHs of 210 amino acid. Then detailed sequence analysis of the six GHs with many other fish sequences was performed. The six sequences all showed high homology to other sequences, especially to sequences within the same order, and many conserved residues were identified, most localized in five domains. The phylogenetic trees (MP and NJ) of many fish GH ORF sequences (including the new six) with Amia calva as outgroup were generally resolved and largely congruent with the morphology-based tree though some incongruities were observed, suggesting GH ORF should be paid more attention to in teleostean phylogeny.
Molecular cloning, characterization and expression analysis of TLR9, MyD88 and TRAF6 genes in common carp (Cyprinus carpio).

PubMed

Kongchum, Pawapol; Hallerman, Eric M; Hulata, Gideon; David, Lior; Palti, Yniv

2011-01-01

Induction of innate immune pathways is critical for early host defense, but there is limited understanding of how teleost fishes recognize pathogen molecules and activate these pathways. In mammals, cells of the innate immune system detect pathogenic molecular structures using pattern recognition receptors (PRRs). TLR9 functions as a PRR that recognizes CpG motifs in bacterial and viral DNA and requires adaptor molecules MyD88 and TRAF6 for signal transduction. Here we report full-length cDNA isolation, structural characterization and tissue mRNA expression analysis of the common carp (cc) TLR9, MyD88 and TRAF6 gene orthologs. The ccTLR9 open-reading frame (ORF) is predicted to encode a 1064-amino acid (aa) protein. We found that MyD88 and TRAF6 genes are duplicated in common carp. This is the first report of TRAF6 duplication in a vertebrate genome and stronger evidence in support of MyD88 duplication is provided. The ccMyD88a and b ORFs are predicted to encode 288-aa and 284-aa peptides, respectively. They share 91% aa sequence identity between paralogs. The ccTRAF6a and b ORFs are both predicted to encode 543-aa peptides sharing 95% aa sequence identity between paralogs. The ccTLR9 gene is contained in a single large exon. The ccMyD88a and ccMyD88b coding sequences span five exons. The TRAF6b gene spans six exons. PCR amplification to obtain the entire coding sequence of ccTRAF6a gene was not successful. The 2104-bp fragment amplified covers the 3' end of the gene and it contains a partial sequence of one exon and three complete exons. The predicated protein domains of the ccTLR9, ccMyD88 and ccTRAF6 are conserved and resemble orthologs from other vertebrates. Real-time quantitative PCR assays of the ccTLR9, MyD88a and b, and TRAF6a and b gene transcripts in healthy common carp indicated that mRNA expression varied between tissues. Differential expression of duplicate copies were found for ccMyD88 and ccTRAF6 in white and red muscle tissues, suggesting that paralogs may have evolved and attained a new function. The genomic information we describe in this paper provides evidence of sequence and structural conservation of immune response genes in common carp. Published by Elsevier Ltd.
The Genome and Development-Dependent Transcriptomes of Pyronema confluens: A Window into Fungal Evolution

PubMed Central

Traeger, Stefanie; Altegoer, Florian; Freitag, Michael; Gabaldon, Toni; Kempken, Frank; Kumar, Abhishek; Marcet-Houben, Marina; Pöggeler, Stefanie; Stajich, Jason E.; Nowrousian, Minou

2013-01-01

Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ∼13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion mutant, showing functional conservation of this developmental regulator. PMID:24068976
Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library.

PubMed

Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen

2018-01-01

Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea)

PubMed Central

Parton, Angela; Bayne, Christopher J.; Barnes, David W.

2010-01-01

Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories “envelope” and “oxidoreductase activity” but the SAE transcripts did not. GO analysis of SAE transcripts identified the category “anatomical structure formation” that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. PMID:20471924
Analysis and functional annotation of expressed sequence tags from in vitro cell lines of elasmobranchs: Spiny dogfish shark (Squalus acanthias) and little skate (Leucoraja erinacea).

PubMed

Parton, Angela; Bayne, Christopher J; Barnes, David W

2010-09-01

Elasmobranchs are the most commonly used experimental models among the jawed, cartilaginous fish (Chondrichthyes). Previously we developed cell lines from embryos of two elasmobranchs, Squalus acanthias the spiny dogfish shark (SAE line), and Leucoraja erinacea the little skate (LEE-1 line). From these lines cDNA libraries were derived and expressed sequence tags (ESTs) generated. From the SAE cell line 4303 unique transcripts were identified, with 1848 of these representing unknown sequences (showing no BLASTX identification). From the LEE-1 cell line, 3660 unique transcripts were identified, and unknown, unique sequences totaled 1333. Gene Ontology (GO) annotation showed that GO assignments for the two cell lines were in general similar. These results suggest that the procedures used to derive the cell lines led to isolation of cell types of the same general embryonic origin from both species. The LEE-1 transcripts included GO categories "envelope" and "oxidoreductase activity" but the SAE transcripts did not. GO analysis of SAE transcripts identified the category "anatomical structure formation" that was not present in LEE-1 cells. Increased organelle compartments may exist within LEE-1 cells compared to SAE cells, and the higher oxidoreductase activity in LEE-1 cells may indicate a role for these cells in responses associated with innate immunity or in steroidogenesis. These EST libraries from elasmobranch cell lines provide information for assembly of genomic sequences and are useful in revealing gene diversity, new genes and molecular markers, as well as in providing means for elucidation of full-length cDNAs and probes for gene array analyses. This is the first study of this type with members of the Chondrichthyes. Copyright 2010 Elsevier Inc. All rights reserved.
Novel phytochrome sequences in Arabidopsis thaliana: Structure, evolution, and differential expression of a plant regulatory photoreceptor family

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharrock, R.A.; Quail, P.H.

1989-01-01

Phytochrome is a plant regulatory photoreceptor that mediates red light effects on a wide variety of physiological and molecular responses. DNA blot analysis indicates that the Arabidopsis thaliana genome contains four to five phytochrome-related gene sequences. The authors have isolated and sequenced cDNA clones corresponding to three of these genes and have deduced the amino acid sequence of the full-length polypeptide encoded in each case. One of these proteins (phyA) shows 65-80% amino acid sequence identity with the major, etiolated-tissue phytochrome apoproteins described previously in other plant species. The other two polypeptides (phyB and phyC) are unique in that theymore » have low sequence identity with each other, with phyA, and with all previously described phytochromes. The phyA, phyB, and phyC proteins are of similar molecular mass, have related hydropathic profiles, and contain a conserved chromophore attachment region. However, the sequence comparison data indicate that the three phy genes diverged early in plant evolution, well before the divergence of the two major groups of angiosperms, the monocots and dicots. The steady-state level of the phyA transcript is high in dark-grown A. thaliana seedlings and is down-regulated by light. In contrast, the phyB and phyC transcripts are present at lower levels and are not strongly light-regulated. These findings indicate that the red/far red light-responsive phytochrome photoreceptor system in A. thaliana, and perhaps in all higher plants, consists of a family of chromoproteins that are heterogeneous in structure and regulation.« less
Markers and mapping revisited: finding your gene.

PubMed

Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

2009-01-01

This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.
The spectrum and clinical impact of epigenetic modifier mutations in myeloma

PubMed Central

Pawlyn, Charlotte; Kaiser, Martin F; Heuck, Christoph; Melchor, Lorenzo; Wardell, Christopher P; Murison, Alex; Chavan, Shweta; Johnson, David C; Begum, Dil; Dahir, Nasrin; Proszek, Paula; Cairns, David A; Boyle, Eileen M; Jones, John R; Cook, Gordon; Drayson, Mark T; Owen, Roger G; Gregory, Walter M; Jackson, Graham H; Barlogie, Bart; Davies, Faith E; Walker, Brian A; Morgan, Gareth J

2016-01-01

Purpose Epigenetic dysregulation is known to be an important contributor to myeloma pathogenesis but, unlike in other B cell malignancies, the full spectrum of somatic mutations in epigenetic modifiers has not been previously reported. We sought to address this using results from whole-exome sequencing in the context of a large prospective clinical trial of newly diagnosed patients and targeted sequencing in a cohort of previously treated patients for comparison. Experimental Design Whole-exome sequencing analysis of 463 presenting myeloma cases entered in the UK NCRI Myeloma XI study and targeted sequencing analysis of 156 previously treated cases from the University of Arkansas for Medical Sciences. We correlated the presence of mutations with clinical outcome from diagnosis and compared the mutations found at diagnosis with later stages of disease. Results In diagnostic myeloma patient samples we identify significant mutations in genes encoding the histone 1 linker protein, previously identified in other B-cell malignancies. Our data suggest an adverse prognostic impact from the presence of lesions in genes encoding DNA methylation modifiers and the histone demethylase KDM6A/UTX. The frequency of mutations in epigenetic modifiers appears to increase following treatment most notably in genes encoding histone methyltransferases and DNA methylation modifiers. Conclusions Numerous mutations identified raise the possibility of targeted treatment strategies for patients either at diagnosis or relapse supporting the use of sequencing-based diagnostics in myeloma to help guide therapy as more epigenetic targeted agents become available. PMID:27235425
High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

PubMed Central

2010-01-01

Background Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. Results A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. Conclusions We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism. PMID:20937131
The Xylella fastidosa RTX operons: evidence for the evolution of protein mosaics through novel genetic exchanges.

PubMed

Gambetta, Gregory A; Matthews, Mark A; Syvanen, Michael

2018-05-04

Xylella fastidiosa (Xf) is a gram negative bacterium inhabiting the plant vascular system. In most species this bacterium lives as a benign symbiote, but in several agriculturally important plants (e.g. coffee, citrus, grapevine) Xf is pathogenic. Xf has four loci encoding homologues to hemolysin RTX proteins, virulence factors involved in a wide range of plant pathogen interactions. We show that all four genes are expressed during pathogenesis in grapevine. The sequences from these four genes have a complex repetitive structure. At the C-termini, sequence diversity between strains is what would be expected from orthologous genes. However, within strains there is no N-terminal homology, indicating these loci encode RTXs of different functions and/or specificities. More striking is that many of the orthologous loci between strains share this extreme variation at the N-termini. Thus these RTX orthologues are most easily visualized as fusions between the orthologous C-termini and different N-termini. Further, the four genes are found in operons having a peculiar structure with an extensively duplicated module encoding a small protein with homology to the N-terminal region of the full length RTX. Surprisingly, some of these small peptides are most similar not to their corresponding full length RTX, but to the N-termini of RTXs from other Xf strains, and even other remotely related species. These results demonstrate that these genes are expressed in planta during pathogenesis. Their structure suggests extensive evolutionary restructuring through horizontal gene transfers and heterologous recombination mechanisms. The sum of the evidence suggests these repetitive modules are a novel kind of mobile genetic element.
De Novo Assembly and Comparative Transcriptome Analyses of Red and Green Morphs of Sweet Basil Grown in Full Sunlight.

PubMed

Torre, Sara; Tattini, Massimiliano; Brunetti, Cecilia; Guidi, Lucia; Gori, Antonella; Marzano, Cristina; Landi, Marco; Sebastiani, Federico

2016-01-01

Sweet basil (Ocimum basilicum), one of the most popular cultivated herbs worldwide, displays a number of varieties differing in several characteristics, such as the color of the leaves. The development of a reference transcriptome for sweet basil, and the analysis of differentially expressed genes in acyanic and cyanic cultivars exposed to natural sunlight irradiance, has interest from horticultural and biological point of views. There is still great uncertainty about the significance of anthocyanins in photoprotection, and how green and red morphs may perform when exposed to photo-inhibitory light, a condition plants face on daily and seasonal basis. We sequenced the leaf transcriptome of the green-leaved Tigullio (TIG) and the purple-leaved Red Rubin (RR) exposed to full sunlight over a four-week experimental period. We assembled and annotated 111,007 transcripts. A total of 5,468 and 5,969 potential SSRs were identified in TIG and RR, respectively, out of which 66 were polymorphic in silico. Comparative analysis of the two transcriptomes showed 2,372 differentially expressed genes (DEGs) clustered in 222 enriched Gene ontology terms. Green and red basil mostly differed for transcripts abundance of genes involved in secondary metabolism. While the biosynthesis of waxes was up-regulated in red basil, the biosynthesis of flavonols and carotenoids was up-regulated in green basil. Data from our study provides a comprehensive transcriptome survey, gene sequence resources and microsatellites that can be used for further investigations in sweet basil. The analysis of DEGs and their functional classification also offers new insights on the functional role of anthocyanins in photoprotection.
Approach to determine the diversity of Legionella species by nested PCR-DGGE in aquatic environments.

PubMed

Huang, Wen-Chien; Tsai, Hsin-Chi; Tao, Chi-Wei; Chen, Jung-Sheng; Shih, Yi-Jia; Kao, Po-Min; Huang, Tung-Yi; Hsu, Bing-Mu

2017-01-01

In this study, we describe a nested PCR-DGGE strategy to detect Legionella communities from river water samples. The nearly full-length 16S rRNA gene was amplified using bacterial primer in the first step. After, the amplicons were employed as DNA templates in the second PCR using Legionella specific primer. The third round of gene amplification was conducted to gain PCR fragments apposite for DGGE analysis. Then the total numbers of amplified genes were observed in DGGE bands of products gained with primers specific for the diversity of Legionella species. The DGGE patterns are thus potential for a high-throughput preliminary determination of aquatic environmental Legionella species before sequencing. Comparative DNA sequence analysis of excised DGGE unique band patterns showed the identity of the Legionella community members, including a reference profile with two pathogenic species of Legionella strains. In addition, only members of Legionella pneumophila and uncultured Legionella sp. were detected. Development of three step nested PCR-DGGE tactic is seen as a useful method for studying the diversity of Legionella community. The method is rapid and provided sequence information for phylogenetic analysis.
Approach to determine the diversity of Legionella species by nested PCR-DGGE in aquatic environments

PubMed Central

Huang, Wen-Chien; Tsai, Hsin-Chi; Tao, Chi-Wei; Chen, Jung-Sheng; Shih, Yi-Jia; Kao, Po-Min; Huang, Tung-Yi; Hsu, Bing-Mu

2017-01-01

In this study, we describe a nested PCR-DGGE strategy to detect Legionella communities from river water samples. The nearly full-length 16S rRNA gene was amplified using bacterial primer in the first step. After, the amplicons were employed as DNA templates in the second PCR using Legionella specific primer. The third round of gene amplification was conducted to gain PCR fragments apposite for DGGE analysis. Then the total numbers of amplified genes were observed in DGGE bands of products gained with primers specific for the diversity of Legionella species. The DGGE patterns are thus potential for a high-throughput preliminary determination of aquatic environmental Legionella species before sequencing. Comparative DNA sequence analysis of excised DGGE unique band patterns showed the identity of the Legionella community members, including a reference profile with two pathogenic species of Legionella strains. In addition, only members of Legionella pneumophila and uncultured Legionella sp. were detected. Development of three step nested PCR-DGGE tactic is seen as a useful method for studying the diversity of Legionella community. The method is rapid and provided sequence information for phylogenetic analysis. PMID:28166249

The chaperonin-60 universal target is a barcode for bacteria that enables de novo assembly of metagenomic sequence data.

PubMed

Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E

2012-01-01

Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.
Characterization of a gene family abundantly expressed in Oenothera organensis pollen that shows sequence similarity to polygalacturonase.

PubMed Central

Brown, S M; Crouch, M L

1990-01-01

We have isolated and characterized cDNA clones of a gene family (P2) expressed in Oenothera organensis pollen. This family contains approximately six to eight family members and is expressed at high levels only in pollen. The predicted protein sequence from a near full-length cDNA clone shows that the protein products of these genes are at least 38,000 daltons. We identified the protein encoded by one of the cDNAs in this family by using antibodies to beta-galactosidase/pollen cDNA fusion proteins. Immunoblot analysis using these antibodies identifies a family of proteins of approximately 40 kilodaltons that is present in mature pollen, indicating that these mRNAs are not stored solely for translation after pollen germination. These proteins accumulate late in pollen development and are not detectable in other parts of the plant. Although not present in unpollinated or self-pollinated styles, the 40-kilodalton to 45-kilodalton antigens are detectable in extracts from cross-pollinated styles, suggesting that the proteins are present in pollen tubes growing through the style during pollination. The proteins are also present in pollen tubes growing in vitro. Both nucleotide and amino acid sequences are similar to the published sequences for cDNAs encoding the enzyme polygalacturonase, which suggests that the P2 gene family may function in depolymerizing pectin during pollen development, germination, and tube growth. Cross-hybridizing RNAs and immunoreactive proteins were detected in pollen from a wide variety of plant species, which indicates that the P2 family of polygalacturonase-like genes are conserved and may be expressed in the pollen from many angiosperms. PMID:2152116
[cDNA library construction from panicle meristem of finger millet].

PubMed

Radchuk, V; Pirko, Ia V; Isaenkov, S V; Emets, A I; Blium, Ia B

2014-01-01

The protocol for production of full-size cDNA using SuperScript Full-Length cDNA Library Construction Kit II (Invitrogen) was tested and high quality cDNA library from meristematic tissue of finger millet panicle (Eleusine coracana (L.) Gaertn) was created. The titer of obtained cDNA library comprised 3.01 x 10(5) CFU/ml in avarage. In average the length of cDNA insertion consisted about 1070 base pairs, the effectivity of cDNA fragment insertions--99.5%. The selective sequencing of cDNA clones from created library was performed. The sequences of cDNA clones were identified with usage of BLAST-search. The results of cDNA library analysis and selective sequencing represents prove good functionality and full length character of inserted cDNA clones. Obtained cDNA library from meristematic tissue of finger millet panicle represents good and valuable source for isolation and identification of key genes regulating metabolism and meristematic development and for mining of new molecular markers to conduct out high quality genetic investigations and molecular breeding as well.
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
Novel and recently evolved miRNA clusters regulate expansive F-box gene networks through phasiRNAs in wild diploid strawberry

USDA-ARS?s Scientific Manuscript database

The wild strawberry, Fragaria vesca, has recently emerged as an excellent model for investigating flower and fruit traits in economically important fruit crops. Its history of physiological studies combined with sequenced genome and a full complement of molecular genetic tools facilitate investigat...
Ammonia-oxidizing bacteria dominate ammonia oxidation in a full-scale wastewater treatment plant revealed by DNA-based stable isotope probing.

PubMed

Pan, Kai-Ling; Gao, Jing-Feng; Li, Hong-Yu; Fan, Xiao-Yan; Li, Ding-Chang; Jiang, Hao

2018-05-01

A full-scale wastewater treatment plant (WWTP) with three separate treatment processes was selected to investigate the effects of seasonality and treatment process on the community structures of ammonia-oxidizing archaea (AOA) and bacteria (AOB). And then DNA-based stable isotope probing (DNA-SIP) was applied to explore the active ammonia oxidizers. The results of high-throughput sequencing indicated that treatment processes varied AOB communities rather than AOA communities. AOA slightly outnumbered AOB in most of the samples, whose abundance was significantly correlated with temperature. DNA-SIP results showed that the majority of AOB amoA gene was labeled by 13 C-substrate, while just a small amount of AOA amoA gene was labeled. As revealed by high-throughput sequencing of heavy DNA, Nitrosomonadaceae-like AOB, Nitrosomonas sp. NP1, Nitrosomonas oligotropha and Nitrosomonas marina were the active AOB, and Nitrososphaera viennensis dominated the active AOA. The results indicated that AOB, not AOA, dominated active ammonia oxidation in the test WWTP. Copyright © 2018 Elsevier Ltd. All rights reserved.
Isolation of Persicaria minor sesquiterpene synthase promoter and its deletions for transgenic Arabidopsis thaliana

NASA Astrophysics Data System (ADS)

Omar, Aimi Farehah; Ismail, Ismanizan

2016-11-01

Sesquiterpene synthase (SS) catalyzes the formation of sesquiterpenes from farnesyl diphosphate (FDP) via carbocation intermediates. In this study, the promoter region of sesquiterpene synthase was isolated from Persicaria minor to identify possible cis-acting elements in the promoter. The full-length PmSS promoter of P. minor is 1824-bp sequences. The sequence was analyzed and several putative cis-acting regulatory elements were identified. Three cis-acting regulatory elements were selected for deletion analysis which are cis-acting element involved in wound responsiveness (WUN), cis - acting element involved in defense and stress responsiveness (TC) and cis-acting element involved in ABA responsiveness (ABRE). Series of deletions were conducted to assess the promoter activity producing three truncated fragments promoter; Prom 2 1606-bp, Prom 3 1144- bp, and Prom 4 921-bp. The full-length promoter and its deletion series were cloned into the pBGWFS7 vector which contain β-glucuronidase (GUS) gene and green fluorescent protein (GFP) as the reporter gene. All constructs were successfully transformed into Arabidopsis thaliana based on PCR of positive BASTA resistance plants.
Characterisation of single domain ATP-binding cassette protien homologues of Theileria parva.

PubMed

Kibe, M K; Macklin, M; Gobright, E; Bishop, R; Urakawa, T; ole-MoiYoi, O K

2001-09-01

Two distinct genes encoding single domain, ATP-binding cassette transport protein homologues of Theileria parva were cloned and sequenced. Neither of the genes is tandemly duplicated. One gene, TpABC1, encodes a predicted protein of 593 amino acids with an N-terminal hydrophobic domain containing six potential membrane-spanning segments. A single discontinuous ATP-binding element was located in the C-terminal region of TpABC1. The second gene, TpABC2, also contains a single C-terminal ATP-binding motif. Copies of TpABC2 were present at four loci in the T. parva genome on three different chromosomes. TpABC1 exhibited allelic polymorphism between stocks of the parasite. Comparison of cDNA and genomic sequences revealed that TpABC1 contained seven short introns, between 29 and 84 bp in length. The full-length TpABC1 protein was expressed in insect cells using the baculovirus system. Application of antibodies raised against the recombinant antigen to western blots of T. parva piroplasm lysates detected an 85 kDa protein in this life-cycle stage.
Correlation of ophthalmic examination with carrier status in females potentially harboring a severe Norrie disease gene mutation.

PubMed

Khan, Arif O; Aldahmesh, Mohammed A; Meyer, Brian

2008-04-01

To correlate ophthalmic findings with carrier status for a severe Norrie disease (ND) gene mutation (C95F). Prospective interventional case series. Six potential carriers and 1 obligate carrier from a family harboring the mutation. An ophthalmologist blind to the pedigree performed a full ophthalmic examination for the 7 asymptomatic family members. A peripheral blood sample was collected from each for ND gene sequencing. Ophthalmic examination findings (with attention to the presence or absence of retinal findings) and results of ND gene sequencing. Three carriers were identified by molecular genetics, and all 3 of them had peripheral retinal abnormality. However, 3 of the 4 genetically identified noncarriers also exhibited peripheral retinal abnormality. Two of these noncarriers with retinal findings were the offspring of a confirmed noncarrier. The genetically identified noncarrier with a normal peripheral retinal examination was the daughter of an obligate carrier. The presence of peripheral retinal changes was not useful for carrier prediction in a family harboring ND. There are likely additional loci responsible for phenotypic expression.
Noncontiguous finished genome sequence and description of Intestinimonas massiliensis sp. nov strain GD2T , the second Intestinimonas species cultured from the human gut.

PubMed

Afouda, Pamela; Durand, Guillaume A; Lagier, Jean-Christophe; Labas, Noémie; Cadoret, Fréderic; Armstrong, Nicholas; Raoult, Didier; Dubourg, Grégory

2018-04-14

Intestinimonas massiliensis sp. nov strain GD2 T is a new species of the genus Intestinimonas (the second, following Intestinimonas butyriciproducens gen. nov., sp. nov). First isolated from the gut microbiota of a healthy subject of French origin using a culturomics approach combined with taxono-genomics, it is strictly anaerobic, nonspore-forming, rod-shaped, with catalase- and oxidase-negative reactions. Its growth was observed after preincubation in an anaerobic blood culture enriched with sheep blood (5%) and rumen fluid (5%), incubated at 37°C. Its phenotypic and genotypic descriptions are presented in this paper with a full annotation of its genome sequence. This genome consists of 3,104,261 bp in length and contains 3,074 predicted genes, including 3,012 protein-coding genes and 62 RNA-coding genes. Strain GD2 T significantly produces butyrate and is frequently found among available 16S rRNA gene amplicon datasets, which leads consideration of Intestinimonas massiliensis as an important human gut commensal. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Evidence for Phex haploinsufficiency in murine X-linked hypophosphatemia.

PubMed

Wang, L; Du, L; Ecarot, B

1999-04-01

Mutations in the PHEX gene (phosphate-regulating gene with homology to endopeptidases on the X-chromosome) are responsible for X-linked hypophosphatemia (HYP). We previously reported the full-length coding sequence of murine Phex cDNA and provided evidence of Phex expression in bone and tooth. Here, we report the cloning of the entire 3.5-kb 3'UTR of the Phex gene, yielding a total of 6248 bp for the Phex transcript. Southern blot and RT-PCR analyses revealed that the 3' end of the coding sequence and the 3'UTR of the Phex gene, spanning exons 16 to 22, are deleted in Hyp, the mouse model for HYP. Northern blot analysis of bone revealed lack of expression of stable Phex mRNA from the mutant allele and expression of Phex transcripts from the wild-type allele in Hyp heterozygous females. Expression of the Phex protein in heterozygotes was confirmed by Western analysis with antibodies raised against a COOH-terminal peptide of the mouse Phex protein. Taken together, these results indicate that the dominant pattern of Hyp inheritance in mice is due to Phex haploinsufficiency.
Experimental annotation of the human genome using microarray technology.

PubMed

Shoemaker, D D; Schadt, E E; Armour, C D; He, Y D; Garrett-Engele, P; McDonagh, P D; Loerch, P M; Leonardson, A; Lum, P Y; Cavet, G; Wu, L F; Altschuler, S J; Edwards, S; King, J; Tsang, J S; Schimmack, G; Schelter, J M; Koch, J; Ziman, M; Marton, M J; Li, B; Cundiff, P; Ward, T; Castle, J; Krolewski, M; Meyer, M R; Mao, M; Burchard, J; Kidd, M J; Dai, H; Phillips, J W; Linsley, P S; Stoughton, R; Scherer, S; Boguski, M S

2001-02-15

The most important product of the sequencing of a genome is a complete, accurate catalogue of genes and their products, primarily messenger RNA transcripts and their cognate proteins. Such a catalogue cannot be constructed by computational annotation alone; it requires experimental validation on a genome scale. Using 'exon' and 'tiling' arrays fabricated by ink-jet oligonucleotide synthesis, we devised an experimental approach to validate and refine computational gene predictions and define full-length transcripts on the basis of co-regulated expression of their exons. These methods can provide more accurate gene numbers and allow the detection of mRNA splice variants and identification of the tissue- and disease-specific conditions under which genes are expressed. We apply our technique to chromosome 22q under 69 experimental condition pairs, and to the entire human genome under two experimental conditions. We discuss implications for more comprehensive, consistent and reliable genome annotation, more efficient, full-length complementary DNA cloning strategies and application to complex diseases.
Comparison of species identification of endocarditis associated viridans streptococci using rnpB genotyping and 2 MALDI-TOF systems.

PubMed

Isaksson, Jenny; Rasmussen, Magnus; Nilson, Bo; Stadler, Liselott Svensson; Kurland, Siri; Olaison, Lars; Ek, Elisabeth; Herrmann, Björn

2015-04-01

Streptococcus spp. are important causes of infective endocarditis but challenging in species identification. This study compared identification based on sequence determination of the rnpB gene with 2 systems of matrix-assisted laser desorption ionization-time of flight mass spectrometry, MALDI Biotyper (Bruker) and VITEK MS IVD (bioMérieux). Blood culture isolates of viridans streptococci from 63 patients with infective endocarditis were tested. The 3 methods showed full agreement for all 36 isolates identified in the Anginosus, Bovis, and Mutans groups or identified as Streptococcus cristatus, Streptococcus gordonii, or Streptococcus sanguinis. None of the methods could reliably identify the 23 isolates to the species level when designated as Streptococcus mitis, Streptococcus oralis, or Streptococcus tigurinus. In 7 isolates classified to the Mitis group, the rnpB sequences deviated strikingly from all reference sequences, and additional analysis of sodA and groEL genes indicated the occurrence of yet unidentified Streptococcus spp. Copyright © 2015 Elsevier Inc. All rights reserved.
Digital data for quick response (QR) codes of alkalophilic Bacillus pumilus to identify and to compare bacilli isolated from Lonar Crator Lake, India

PubMed Central

Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.

2016-01-01

Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58′ N; 76° 31′ E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species. PMID:27141529
RNase 1 genes from the Family Sciuridae define a novel rodent ribonuclease cluster

PubMed Central

Siegel, Steven J.; Percopo, Caroline M.; Dyer, Kimberly D.; Zhao, Wei; Roth, V. Louise; Mercer, John M.; Rosenberg, Helene F.

2009-01-01

The RNase A ribonucleases are complex group of functionally diverse secretory proteins with conserved enzymatic activity. We have identified novel RNase 1 genes from four species of squirrel (order Rodentia, family Sciuridae). Squirrel RNase 1 genes encode typical RNase A ribonucleases, each with eight cysteines, a conserved CKXXNTF signature motif, and a canonical His12-Lys41-His119 catalytic triad. Two alleles encode Callosciurus prevostii RNase 1, which include a Ser18↔Pro, analogous to the sequence polymorphisms found among the RNase 1 duplications in the genome of Rattus exulans. Interestingly, although the squirrel RNase 1 genes are closely related to one another (77 to 95% amino acid sequence identity), the cluster as a whole is distinct and divergent from the clusters including RNase 1 genes from other rodent species. We examined the specific sites at which Sciuridae RNase 1s diverge from Muridae / Cricetidae RNase 1s, and determined that the divergent sites are located on the external surface, with complete sparing of the catalytic crevice. The full significance of these findings awaits a more complete understanding of biological role of mammalian RNase 1s. PMID:19771477
Structure, organization and expression of common carp (Cyprinus carpio L.) SLP-76 gene.

PubMed

Huang, Rong; Sun, Xiao-Feng; Hu, Wei; Wang, Ya-Ping; Guo, Qiong-Lin

2008-05-01

SLP-76 is an important member of the SLP-76 family of adapters, and it plays a key role in TCR signaling and T cell function. Partial cDNA sequence of SLP-76 of common carp (Cyprinus carpio L.) was isolated from thymus cDNA library by the method of suppression subtractive hybridization (SSH). Subsequently, the full length cDNA of carp SLP-76 was obtained by means of 3' RACE and 5' RACE, respectively. The full length cDNA of carp SLP-76 was 2007 bp, consisting of a 5'-terminal untranslated region (UTR) of 285 bp, a 3'-terminal UTR of 240 bp, and an open reading frame of 1482 bp. Sequence comparison showed that the deduced amino acid sequence of carp SLP-76 had an overall similarity of 34-73% to that of other species homologues, and it was composed of an NH2-terminal domain, a central proline-rich domain, and a C-terminal SH2 domain. Amino acid sequence analysis indicated the existence of a Gads binding site R-X-X-K, a 10-aa-long sequence which binds to the SH3 domain of LCK in vitro, and three conserved tyrosine-containing sequence in the NH2-terminal domain. Then we used PCR to obtain a genomic DNA which covers the entire coding region of carp SLP-76. In the 9.2k-long genomic sequence, twenty one exons and twenty introns were identified. RT-PCR results showed that carp SLP-76 was expressed predominantly in hematopoietic tissues, and was upregulated in thymus tissue of four-month carp compared to one-year old carp. RT-PCR and virtual northern hybridization results showed that carp SLP-76 was also upregulated in thymus tissue of GH transgenic carp at the age of four-months. These results suggest that the expression level of SLP-76 gene may be related to thymocyte development in teleosts.
Incidence and Carrier Frequency of CFTR Gene Mutations in Pregnancies With Echogenic Bowel in Nova Scotia and Prince Edward Island.

PubMed

Miller, Michelle E; Allen, Victoria M; Brock, Jo-Ann K

2018-03-01

Fetal echogenic bowel (echogenic bowel) is associated with cystic fibrosis (CF), with a reported incidence ranging from 1% to 13%. Prenatal testing for CF in the setting of echogenic bowel can be done by screening parental or fetal samples for pathogenic CFTR variants. If only one pathogenic variant is identified, sequencing of the CFTR gene can be undertaken, to identify a second pathogenic variant not covered in the standard screening panel. Full gene sequencing, however, also introduces the potential to identify variants of uncertain significance (VUSs) that can create counselling challenges and cause parental anxiety. To provide accurate counselling for families in the study population, the incidence of CF associated with echogenic bowel and the carrier frequency of CFTR variants were investigated. All pregnancies for which CF testing was undertaken for the indication of echogenic bowel (from Nova Scotia and Prince Edward Island) were identified (January 2007-July 2017). The CFTR screening and sequencing results were reviewed, and fetal outcomes related to CF were assessed. A total of 463 pregnancies with echogenic bowel were tested. Four were confirmed to be affected with CF, giving an incidence of 0.9% in this cohort. The carrier frequency of CF among all parents in the cohort was 5.0% (1 in 20); however, when excluding parents of affected fetuses, the carrier frequency for the population was estimated at 4.1% (1 in 25). CFTR gene sequencing identified an additional VUS in two samples. The incidence of CF in pregnancies with echogenic bowel in Nova Scotia and Prince Edward Island is 0.9%, with an estimated population carrier frequency of 4.1%. These results provide the basis for improved counselling to assess the risk of CF in the pregnancy, after parental carrier screening, using Bayesian probability. Counselling regarding VUSs should be undertaken before gene sequencing. Copyright © 2017 Society of Obstetricians and Gynaecologists of Canada. Published by Elsevier Inc. All rights reserved.
Xander: employing a novel method for efficient gene-targeted metagenomic assembly

DOE PAGES

Wang, Qiong; Fish, Jordan A.; Gilman, Mariah; ...

2015-08-05

Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility ofmore » this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.« less
Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

PubMed

Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

2015-01-01

RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
A novel reassortant H3N8 influenza virus isolated from drinking water for duck in a domestic duck farm in Poyang Lake area.

PubMed

Dong, Bei Bei; Xu, Cui Ling; Dong, Li Bo; Cheng, Hui Jian; Yang, Lei; Zou, Shu Mei; Chen, Min; Bai, Tian; Zhang, Ye; Gao, Rong Bao; Li, Xiao Dan; Shi, Jing Hong; Yuan, Hui; Yang, Jing; Chen, Tao; Zhu, Yun; Xiong, Ying; Yang, Shuai; Shu, Yue Long

2013-07-01

To conduct a full genome sequence analysis for genetic characterization of an H3N8 influenza virus isolated from drinking water of a domestic duck farm in Poyang Lake area in 2011. The virus was cultivated by specific pathogen free (SPF) chicken embryo eggs and was subtyped into hemagglutinin (HA) and neuraminidase (NA) by real-time PCR method. Eight gene segments were sequenced and phylogenetic analysis was conducted. The NA gene of this virus belongs to North American lineage; other seven genes belong to Eurasian lineage. Compared with the viruses containing NA gene, the PB2 and PB1 gene came from different clades. And this indicates that the virus was a novel reassortant genotype. The HA receptor binding preference was avian-like and the cleavage site sequence showed a low pathogenic feature. There was no drug resistance mutation of M2 protein. The mutations of Asn30Asp, and Thr215Ala of the M1 protein implied the potential of pathogenicity increase in mice. The finding of novel genotype of H3N8 virus in drinking water in this duck farm near Poyang Lake highlighted the importance of strengthening the surveillance of avian influenza in this region, which could contribute to pinpointing the influenza ecological relations among avian, swine, and human. Copyright © 2013 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.

Seven New Complete Plastome Sequences Reveal Rampant Independent Loss of the ndh Gene Family across Orchids and Associated Instability of the Inverted Repeat/Small Single-Copy Region Boundaries.

PubMed

Kim, Hyoung Tae; Kim, Jung Sung; Moore, Michael J; Neubig, Kurt M; Williams, Norris H; Whitten, W Mark; Kim, Joo-Hwan

2015-01-01

Earlier research has revealed that the ndh loci have been pseudogenized, truncated, or deleted from most orchid plastomes sequenced to date, including in all available plastomes of the two most species-rich subfamilies, Orchidoideae and Epidendroideae. This study sought to resolve deeper-level phylogenetic relationships among major orchid groups and to refine the history of gene loss in the ndh loci across orchids. The complete plastomes of seven orchids, Oncidium sphacelatum (Epidendroideae), Masdevallia coccinea (Epidendroideae), Sobralia callosa (Epidendroideae), Sobralia aff. bouchei (Epidendroideae), Elleanthus sodiroi (Epidendroideae), Paphiopedilum armeniacum (Cypripedioideae), and Phragmipedium longifolium (Cypripedioideae) were sequenced and analyzed in conjunction with all other available orchid and monocot plastomes. Most ndh loci were found to be pseudogenized or lost in Oncidium, Paphiopedilum and Phragmipedium, but surprisingly, all ndh loci were found to retain full, intact reading frames in Sobralia, Elleanthus and Masdevallia. Character mapping suggests that the ndh genes were present in the common ancestor of orchids but have experienced independent, significant losses at least eight times across four subfamilies. In addition, ndhF gene loss was correlated with shifts in the position of the junction of the inverted repeat (IR) and small single-copy (SSC) regions. The Orchidaceae have unprecedented levels of homoplasy in ndh gene presence/absence, which may be correlated in part with the unusual life history of orchids. These results also suggest that ndhF plays a role in IR/SSC junction stability.
AUDIOME: a tiered exome sequencing-based comprehensive gene panel for the diagnosis of heterogeneous nonsyndromic sensorineural hearing loss.

PubMed

Guan, Qiaoning; Balciuniene, Jorune; Cao, Kajia; Fan, Zhiqian; Biswas, Sawona; Wilkens, Alisha; Gallo, Daniel J; Bedoukian, Emma; Tarpinian, Jennifer; Jayaraman, Pushkala; Sarmady, Mahdi; Dulik, Matthew; Santani, Avni; Spinner, Nancy; Abou Tayoun, Ahmad N; Krantz, Ian D; Conlin, Laura K; Luo, Minjie

2018-03-29

PurposeHereditary hearing loss is highly heterogeneous. To keep up with rapidly emerging disease-causing genes, we developed the AUDIOME test for nonsyndromic hearing loss (NSHL) using an exome sequencing (ES) platform and targeted analysis for the curated genes.MethodsA tiered strategy was implemented for this test. Tier 1 includes combined Sanger and targeted deletion analyses of the two most common NSHL genes and two mitochondrial genes. Nondiagnostic tier 1 cases are subjected to ES and array followed by targeted analysis of the remaining AUDIOME genes.ResultsES resulted in good coverage of the selected genes with 98.24% of targeted bases at >15 ×. A fill-in strategy was developed for the poorly covered regions, which generally fell within GC-rich or highly homologous regions. Prospective testing of 33 patients with NSHL revealed a diagnosis in 11 (33%) and a possible diagnosis in 8 cases (24.2%). Among those, 10 individuals had variants in tier 1 genes. The ES data in the remaining nondiagnostic cases are readily available for further analysis.ConclusionThe tiered and ES-based test provides an efficient and cost-effective diagnostic strategy for NSHL, with the potential to reflex to full exome to identify causal changes outside of the AUDIOME test.Genetics in Medicine advance online publication, 29 March 2018; doi:10.1038/gim.2018.48.
Expression of a polyubiquitin promoter isolated from Gladiolus.

PubMed

Joung, Young Hee; Kamo, Kathryn

2006-10-01

A polyubiquitin promoter (GUBQ1) including its 5'UTR and intron was isolated from the floral monocot Gladiolus because high levels of expression could not be obtained using publicly available promoters isolated from either cereals or dicots. Sequencing of the promoter revealed highly conserved 5' and 3' intron splicing sites for the 1.234 kb intron. The coding sequence of the first two ubiquitin genes showed the highest homology (87 and 86%, respectively) to the ubiquitin genes of Nicotiana tabacum and Oryza sativa RUBQ2. Transient expression following gene gun bombardment showed that relative levels of GUS activity with the GUBQ1 promoter were comparable to the CaMV 35S promoter in gladiolus, tobacco, rose, rice, and the floral monocot freesia. The highest levels of GUS expression with GUBQ1 were attained with Gladiolus. The full-length GUBQ1 promoter including 5'UTR and intron were necessary for maximum GUS expression in Gladiolus. The relative GUS activity for the promoter only was 9%, and the activity for the promoter with 5'UTR and 399 bp of the full-length 1.234 kb intron was 41%. Arabidopsis plants transformed with uidA under GUBQ1 showed moderate GUS expression throughout young leaves and in the vasculature of older leaves. The highest levels of transient GUS expression in Gladiolus have been achieved using the GUBQ1 promoter. This promoter should be useful for genetic engineering of disease resistance in Gladiolus, rose, and freesia, where high levels of gene expression are important.
Isolation and Expression Profile of the Ca2+-Activated Chloride Channel-like Membrane Protein 6 Gene in Xenopus laevis

PubMed Central

Lee, Ra Mi; Ryu, Rae Hyung; Jeong, Seong Won; Oh, Soo Jin; Huang, Hue; Han, Jin Soo; Lee, Chi Ho; Lee, C. Justin; Jan, Lily Yeh

2011-01-01

To clone the first anion channel from Xenopus laevis (X. laevis), we isolated a calcium-activated chloride channel (CLCA)-like membrane protein 6 gene (CMP6) in X. laevis. As a first step in gene isolation, an expressed sequence tags database was screened to find the partial cDNA fragment. A putative partial cDNA sequence was obtained by comparison with rat CLCAs identified in our laboratory. First stranded cDNA was synthesized by reverse transcription polymerase-chain reaction (RT-PCR) using a specific primer designed for the target cDNA. Repeating the 5' and 3' rapid amplification of cDNA ends, full-length cDNA was constructed from the cDNA pool. The full-length CMP6 cDNA completed via 5'- and 3'-RACE was 2,940 bp long and had an open reading frame (ORF) of 940 amino acids. The predicted 940 polypeptides have four major transmembrane domains and showed about 50% identity with that of rat brain CLCAs in our previously published data. Semi-quantification analysis revealed that CMP6 was most abundantly expressed in small intestine, colon and liver. However, all tissues except small intestine, colon and liver had undetectable levels. This result became more credible after we did real-time PCR quantification for the target gene. In view of all CLCA studies focused on human or murine channels, this finding suggests a hypothetical protein as an ion channel, an X. laevis CLCA. PMID:21826170
Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.

PubMed

Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm

2016-04-22

Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.
Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

PubMed

Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

2004-02-01

To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
Epidemic Keratoconjunctivitis Due to the Novel Hexon-Chimeric-Intermediate 22,37/H8 Human Adenovirus ▿

PubMed Central

Aoki, Koki; Ishiko, Hiroaki; Konno, Tsunetada; Shimada, Yasushi; Hayashi, Akio; Kaneko, Hisatoshi; Ohguchi, Takeshi; Tagawa, Yoshitsugu; Ohno, Shigeaki; Yamazaki, Shudo

2008-01-01

In a 2-month period in 2003, we encountered an outbreak of epidemic keratoconjunctivitis (EKC) in Japan. We detected 67 human adenoviruses (HAdVs) by PCR from eye swabs of patients with EKC at five eye clinics in different parts of Japan. Forty-one of the 67 HAdV DNAs from the swabs were identified as HAdV-37 by phylogenetic analysis using a partial hexon gene sequence. When the restriction patterns of these viral genomes were compared with that of the HAdV-37 prototype strain, one isolate showed a never-before-seen restriction pattern. Within 1 year, we encountered three more EKC cases caused by a genetically identical virus: two nosocomial infections at two different university hospitals and a sporadic infection at an eye clinic. We determined the nucleotide sequences of the full-length hexon and fiber genes of these isolates and compared them to those of the 51 prototype strains. Surprisingly, the sequence of the hexon (ɛ determinant) loop-1 and -2 regions showed the highest nucleotide identity with HAdV-22, a rare EKC isolate. However, the nucleotide sequence of the fiber gene was identical to that of the HAdV-8 prototype strain. 22 We propose that this virus is a new hexon-chimeric intermediate HAdV-22,37/H8, and may be an etiological agent of EKC. PMID:18701656
Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

PubMed

Roux-Rouquie, M; Marilley, M

2000-09-15

We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.
[Frequency of intron 1 inversion of factor VIII gene in Chinese hemophilia A patients with case report of a female patient with heterozygous intron 1 inversion].

PubMed

Yan, Zhen-yu; Liang, Yan; Yan, Mei; Fan, Lian-kai; Xiao, Bai; Hua, Bao-lai; Liu, Jing-zhong; Zhao, Yong-qiang

2008-10-21

To investigate the frequency of intron 1 inversion (inv1) in FVIII gene in Chinese hemophilia A (HA) patients and to investigate the mechanism of pathogenesis. Peripheral blood samples were collected from 158 unrelated HA patients, aged 20 (1 - 73), including one female HA patient, aged 5, and several family members of a patient positive in inv1. One-stage method was used to assay the FVIII activity (FVIII:C). Long distance PCR and multiple PCR in duplex reactions were used to screen for the intron 22 inversion (inv22) and inv1 of the FVIII coding gene (F8). The F8 coding sequence was amplified with PCR and sequenced with an automatic sequencer. Two unrelated patients (pedigrees) were detected as inv1 positive with a positive rate of 1.26%. A rare female HA patient with inv1 was also discovered in a positive family (3 HA cases were found in this family and regarded as one case in calculating the total detection rate). The full length of FVIII was sequenced, and no other mutation was detected. There frequency of FVIII inv1 is low in Chinese HA patients compared with other populations. Female HA patients are heterozygous for FVIII inv1 and that may be resulted from nonrandom inactivation of X chromosome.
Identification of cDNAs encoding viper venom hyaluronidases: cross-generic sequence conservation of full-length and unusually short variant transcripts.

PubMed

Harrison, Robert A; Ibison, Frances; Wilbraham, Davina; Wagstaff, Simon C

2007-05-01

The immobilisation of prey by snakes is most efficiently achieved by the rapid dissemination of venom from its site of injection into the blood stream. Hyaluronidase is a common component of snake venoms and has been termed the "venom spreading factor". In the absence of nucleotide or protein sequence data to confirm the functional identity of this venom component, we interrogated a venom gland EST database for the saw-scaled viper, Echis ocellatus (Nigeria), using the gene ontology (GO) term "carbohydrate metabolism". A single hyalurononglucosaminadase-activity matching sequence (EOC00242) was found and used to design PCR primers to acquire the full-length cDNA sequence. Although very different from the bee venom and mammalian hyaluronidase sequences, the E. ocellatus sequence retained all the catalytic, positional and structural residues that characterise this class of carbohydrate metabolising hydrolases. An extraordinarily high level of sequence identity (>95%) was observed in analogous venom gland cDNA sequences isolated (by PCR) from another saw-scaled viper species, E. pyramidum leakeyi (Kenya), and from the sahara horned viper, Cerastes cerastes cerastes (Egypt) and the puff adder, Bitis arietans (Nigeria). Smaller amplicons, lacking hyaluronidase catalytic residues because of 768 bp or 855 bp central deletions, appear to encode either truncated peptides without hyaluronidase activity, or are non-translated transcripts because they lack consensus translation initiating motifs.
A polymorphism upstream MIR1279 gene is associated with pericarditis development in Systemic Lupus Erythematosus and contributes to definition of a genetic risk profile for this complication.

PubMed

Ciccacci, C; Perricone, C; Politi, C; Rufini, S; Ceccarelli, F; Cipriano, E; Alessandri, C; Latini, A; Valesini, G; Novelli, G; Conti, F; Borgiani, P

2017-07-01

Recently, a study has shown that a polymorphism in the region of MIR1279 modulates the expression of the TRAF3IP2 gene. Since polymorphisms in the TRAF3IP2 gene have been described in association with systemic lupus erithematosus (SLE) susceptibility and with the development of pericarditis, our aim is to verify if the MIR1279 gene variability could also be involved. The rs1463335 SNP, located upstream MIR1279 gene, was analyzed by allelic discrimination assay in 315 Italian SLE patients and 201 healthy controls. Moreover, the MIR1279 gene was full sequenced in 50 patients. A case/control association study and a genotype/phenotype correlation analysis were performed. We also constructed a pericarditis genetic risk profile for patients with SLE. The full sequencing of the MIR1279 gene in patients with SLE did not reveal any novel or known variation. The variant allele of the rs1463335 SNP was significantly associated with susceptibility to pericarditis ( P = 0.017 and OR = 1.67). A risk profile model for pericarditis considering the risk alleles of MIR1279 and three other genes (STAT4, PTPN2 and TRAF3IP2) showed that patients with 4 or 5 risk alleles have a higher risk of developing pericarditis ( OR = 4.09 with P = 0.001 and OR = 6.04 with P = 0.04 respectively). In conclusion, we describe for the first time the contribution of a MIR1279 SNP in pericarditis development in patients with SLE and a genetic risk profile model that could be useful to identify patients more susceptible to developing pericarditis in SLE. This approach could help to improve the prediction and the management of this complication.
Sequencing and phylogenetic analysis of the coding region of six common rotavirus strains: evidence for intragenogroup reassortment among co-circulating G1P[8] and G2P[4] strains from the United States.

PubMed

Bányai, K; Mijatovic-Rustempasic, S; Hull, J J; Esona, M D; Freeman, M M; Frace, A M; Bowen, M D; Gentsch, J R

2011-03-01

The segmented genome of rotaviruses provides an opportunity for rotavirus strains to generate a large genetic diversity through reassortment; however, this mechanism is considered to play little role in the generation of mosaic gene constellations between Wa-like and DS-1-like strains in genes other than the neutralization antigens. A pilot study was undertaken to analyze these two epidemiologically important strains at the genomic level in order to (i) identify intergenogroup reassortment and (ii) to make available additional reference genome sequences of G1P[8] and G2P[4] for future genomics analyses. The full or nearly complete coding region of all 11 genes for 3 G1P[8] (LB2719, LB2758, and LB2771) and 3 G2P[4] (LB2744, LB2764, and LB2772) strains isolated from children hospitalized with severe diarrhea in Long Beach, California, where these strains were circulating at comparable rates during 2005-2006 are described in this study. Based on the full-genome classification system, all G1P[8] strains had a conserved genomic constellation: G1-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-E1-H1 and were mostly identical to the few Wa-like strains whose genome sequences have already been determined. Similarly, the genome sequences of the 3 G2P[4] strains were highly conserved: G2-P[4]-I2-R2-C2-M2-A2-N2-T2-E2-E2-H2 and displayed an overall lesser genetic divergence with reference DS-1-like strains. While intergenogroup reassortment was not seen between the G1P[8] and G2P[4] strains studied here, evidence for intragenogroup reassortment events was identified. Similar studies in the post-rotavirus genomic era will help uncover whether intergenogroup reassortment affecting the backbone genes could play a significant role in any potential vaccine breakthrough events by evading immunity of vaccinated children. Copyright © 2011 Wiley-Liss, Inc.
Transcriptome analysis and gene expression profiling of abortive and developing ovules during fruit development in hazelnut.

PubMed

Cheng, Yunqing; Liu, Jianfeng; Zhang, Huidi; Wang, Ju; Zhao, Yixin; Geng, Wanting

2015-01-01

A high ratio of blank fruit in hazelnut (Corylus heterophylla Fisch) is a very common phenomenon that causes serious yield losses in northeast China. The development of blank fruit in the Corylus genus is known to be associated with embryo abortion. However, little is known about the molecular mechanisms responsible for embryo abortion during the nut development stage. Genomic information for C. heterophylla Fisch is not available; therefore, data related to transcriptome and gene expression profiling of developing and abortive ovules are needed. In this study, de novo transcriptome sequencing and RNA-seq analysis were conducted using short-read sequencing technology (Illumina HiSeq 2000). The results of the transcriptome assembly analysis revealed genetic information that was associated with the fruit development stage. Two digital gene expression libraries were constructed, one for a full (normally developing) ovule and one for an empty (abortive) ovule. Transcriptome sequencing and assembly results revealed 55,353 unigenes, including 18,751 clusters and 36,602 singletons. These results were annotated using the public databases NR, NT, Swiss-Prot, KEGG, COG, and GO. Using digital gene expression profiling, gene expression differences in developing and abortive ovules were identified. A total of 1,637 and 715 unigenes were significantly upregulated and downregulated, respectively, in abortive ovules, compared with developing ovules. Quantitative real-time polymerase chain reaction analysis was used in order to verify the differential expression of some genes. The transcriptome and digital gene expression profiling data of normally developing and abortive ovules in hazelnut provide exhaustive information that will improve our understanding of the molecular mechanisms of abortive ovule formation in hazelnut.
Two Functional Copies of the DGCR6 Gene Are Present on Human Chromosome 22q11 Due to a Duplication of an Ancestral Locus

PubMed Central

Edelmann, Lisa; Stankiewicz, Pavel; Spiteri, Elizabeth; Pandita, Raj K.; Shaffer, Lisa; Lupski, James; Morrow, Bernice E.

2001-01-01

The DGCR6 (DiGeorge critical region) gene encodes a putative protein with sequence similarity to gonadal (gdl), a Drosophila melanogaster gene of unknown function. We mapped the DGCR6 gene to chromosome 22q11 within a low copy repeat, termed sc11.1a, and identified a second copy of the gene, DGCR6L, within the duplicate locus, termed sc11.1b. Both sc11.1 repeats are deleted in most persons with velo-cardio-facial syndrome/DiGeorge syndrome (VCFS/DGS), and they map immediately adjacent and internal to the low copy repeats, termed LCR22, that mediate the deletions associated with VCFS/DGS. We sequenced genomic clones from both loci and determined that the putative initiator methionine is located further upstream than originally described, but in a position similar to the mouse and chicken orthologs. DGCR6L encodes a highly homologous, functional copy of DGCR6, with some base changes rendering amino acid differences. Expression studies of the two genes indicate that both genes are widely expressed in fetal and adult tissues. Evolutionary studies using FISH mapping in several different species of ape combined with sequence analysis of DGCR6 in a number of different primate species indicate that the duplication is at least 12 million years old and may date back to before the divergence of Catarrhines from Platyrrhines, 35 mya. These data suggest that there has been selective evolutionary pressure toward the functional maintenance of both paralogs. Interestingly, a full-length HERV-K provirus integrated into the sc11.1a locus after the divergence of chimpanzees and humans. PMID:11157784
Analysis of cellulose synthase genes from domesticated apple identifies collinear genes WDR53 and CesA8A: partial co-expression, bicistronic mRNA, and alternative splicing of CESA8A

PubMed Central

Guerriero, Gea; Spadiut, Oliver; Kerschbamer, Christine; Giorno, Filomena; Baric, Sanja; Ezcurra, Inés

2016-01-01

Cellulose synthase (CesA) genes constitute a complex multigene family with six major phylogenetic clades in angiosperms. The recently sequenced genome of domestic apple, Malus×domestica, was mined for CesA genes, by blasting full-length cellulose synthase protein (CESA) sequences annotated in the apple genome against protein databases from the plant models Arabidopsis thaliana and Populus trichocarpa. Thirteen genes belonging to the six angiosperm CesA clades and coding for proteins with conserved residues typical of processive glycosyltransferases from family 2 were detected. Based on their phylogenetic relationship to Arabidopsis CESAs, as well as expression patterns, a nomenclature is proposed to facilitate further studies. Examination of their genomic organization revealed that MdCesA8-A is closely linked and co-oriented with WDR53, a gene coding for a WD40 repeat protein. The WDR53 and CesA8 genes display conserved collinearity in dicots and are partially co-expressed in the apple xylem. Interestingly, the presence of a bicistronic WDR53–CesA8A transcript was detected in phytoplasma-infected phloem tissues of apple. The bicistronic transcript contains a spliced intergenic sequence that is predicted to fold into hairpin structures typical of internal ribosome entry sites, suggesting its potential cap-independent translation. Surprisingly, the CesA8A cistron is alternatively spliced and lacks the zinc-binding domain. The possible roles of WDR53 and the alternatively spliced CESA8 variant during cellulose biosynthesis in M.×domestica are discussed. PMID:23048131
Gene finding in metatranscriptomic sequences.

PubMed

Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

2014-01-01

Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
Two Cycloartenol Synthases for Phytosterol Biosynthesis in Polygala tenuifolia Willd.

PubMed

Jin, Mei Lan; Lee, Woo Moon; Kim, Ok Tae

2017-11-15

Oxidosqualene cyclases (OSCs) are enzymes that play a key role in control of the biosynthesis of phytosterols and triterpene saponins. In order to uncover OSC genes from Polygala tenuifolia seedlings induced by methyl jasmonate (MeJA), RNA-sequencing analysis was performed using the Illumina sequencing platform. A total of 148,488,632 high-quality reads from two samples (control and the MeJA treated) were generated. We screened genes related to phytosterol and triterpene saponin biosynthesis and analyzed the transcriptional changes of differentially expressed unigene (DEUG) values calculated by fragments per kilobase million (FPKM). In our datasets, two full-length cDNAs of putative OSC genes, PtCAS1 , and PtCAS2 , were found, in addition to the PtBS (β-amyrin synthase) gene reported in our previous studies and the two cycloartenol synthase genes of P. tenuifolia . All genes were isolated and characterized in yeast cells. The functional expression of the two PtCAS genes in yeast cells showed that the genes all produce a cycloartenol as the sole product. When qRT-PCR analysis from different tissues was performed, the expressions of PtCAS1 and PtCAS2 were highest in flowers and roots, respectively. After MeJA treatment, the transcripts of PtCAS1 and PtCAS2 genes increased by 1.5- and 2-fold, respectively. Given these results, we discuss the potential roles of the two PtCAS genes in relation to triterpenoid biosynthesis.
Characteristics of carboxylesterase genes and their expression-level between acaricide-susceptible and resistant Tetranychus cinnabarinus (Boisduval).

PubMed

Wei, Peng; Shi, Li; Shen, Guangmao; Xu, Zhifeng; Liu, Jialu; Pan, Yu; He, Lin

2016-07-01

Carboxylesterases (CarEs) play important roles in metabolism and detoxification of dietary and environmental xenobiotics in insects and mites. On the basis of the Tetranychuscinnabarinus transcriptome dataset, 23 CarE genes (6 genes are full sequence and 17 genes are partial sequence) were identified. Synergist bioassay showed that CarEs were involved in acaricide detoxification and resistance in fenpropathrin- (FeR) and cyflumetofen-resistant (CyR) strains. In order to further reveal the relationship between CarE gene's expression and acaricide-resistance in T. cinnabarinus, we profiled their expression in susceptible (SS) and resistant strains (FeR, and CyR). There were 8 and 4 over-expressed carboxylesterase genes in FeR and CyR, respectively, from which the over-expressions were detected at mRNA level, but not DNA level. Pesticide induction experiment elucidated that 4 of 8 and 2 of 4 up-regulated genes were inducible with significance in FeR and CyR strains, respectively, but they could not be induced in SS strain, which indicated that these genes became more enhanced and effective to withstand the pesticides' stress in resistant T. cinnabarinus. Most expression-changed and all inducible genes possess the Abhydrolase_3 motif, which is a catalytic domain for hydrolyzing. As a whole, these findings in current study provide clues for further elucidating the function and regulation mechanism of these carboxylesterase genes in T. cinnabarinus' resistance formation. Copyright © 2015 Elsevier B.V. All rights reserved.
Pyrin gene and mutants thereof, which cause familial Mediterranean fever

DOEpatents

Kastner, Daniel L [Bethesda, MD; Aksentijevichh, Ivona [Bethesda, MD; Centola, Michael [Tacoma Park, MD; Deng, Zuoming [Gaithersburg, MD; Sood, Ramen [Rockville, MD; Collins, Francis S [Rockville, MD; Blake, Trevor [Laytonsville, MD; Liu, P Paul [Ellicott City, MD; Fischel-Ghodsian, Nathan [Los Angeles, CA; Gumucio, Deborah L [Ann Arbor, MI; Richards, Robert I [North Adelaide, AU; Ricke, Darrell O [San Diego, CA; Doggett, Norman A [Santa Cruz, NM; Pras, Mordechai [Tel-Hashomer, IL

2003-09-30

The invention provides the nucleic acid sequence encoding the protein associated with familial Mediterranean fever (FMF). The cDNA sequence is designated as MEFV. The invention is also directed towards fragments of the DNA sequence, as well as the corresponding sequence for the RNA transcript and fragments thereof. Another aspect of the invention provides the amino acid sequence for a protein (pyrin) associated with FMF. The invention is directed towards both the full length amino acid sequence, fusion proteins containing the amino acid sequence and fragments thereof. The invention is also directed towards mutants of the nucleic acid and amino acid sequences associated with FMF. In particular, the invention discloses three missense mutations, clustered in within about 40 to 50 amino acids, in the highly conserved rfp (B30.2) domain at the C-terminal of the protein. These mutants include M6801, M694V, K695R, and V726A. Additionally, the invention includes methods for diagnosing a patient at risk for having FMF and kits therefor.
Definition of RNA polymerase II CoTC terminator elements in the human genome.

PubMed

Nojima, Takayuki; Dienstbier, Martin; Murphy, Shona; Proudfoot, Nicholas J; Dye, Michael J

2013-04-25

Mammalian RNA polymerase II (Pol II) transcription termination is an essential step in protein-coding gene expression that is mediated by pre-mRNA processing activities and DNA-encoded terminator elements. Although much is known about the role of pre-mRNA processing in termination, our understanding of the characteristics and generality of terminator elements is limited. Whereas promoter databases list up to 40,000 known and potential Pol II promoter sequences, fewer than ten Pol II terminator sequences have been described. Using our knowledge of the human β-globin terminator mechanism, we have developed a selection strategy for mapping mammalian Pol II terminator elements. We report the identification of 78 cotranscriptional cleavage (CoTC)-type terminator elements at endogenous gene loci. The results of this analysis pave the way for the full understanding of Pol II termination pathways and their roles in gene expression. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.