study gene sequences: Topics by Science.gov

Sample records for study gene sequences

Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing.

PubMed

Seoane-Zonjic, Pedro; Cañas, Rafael A; Bautista, Rocío; Gómez-Maldonado, Josefa; Arrillaga, Isabel; Fernández-Pozo, Noé; Claros, M Gonzalo; Cánovas, Francisco M; Ávila, Concepción

2016-02-27

In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82% of the gene structures, and a high proportion (85%) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses.
Mouse mammary tumor virus-like gene sequences are present in lung patient specimens

PubMed Central

2011-01-01

Background Previous studies have reported on the presence of Murine Mammary Tumor Virus (MMTV)-like gene sequences in human cancer tissue specimens. Here, we search for MMTV-like gene sequences in lung diseases including carcinomas specimens from a Mexican population. This study was based on our previous study reporting that the INER51 lung cancer cell line, from a pleural effusion of a Mexican patient, contains MMTV-like env gene sequences. Results The MMTV-like env gene sequences have been detected in three out of 18 specimens studied, by PCR using a specific set of MMTV-like primers. The three identified MMTV-like gene sequences, which were assigned as INER6, HZ101, and HZ14, were 99%, 98%, and 97% homologous, respectively, as compared to GenBank sequence accession number AY161347. The INER6 and HZ-101 samples were isolated from lung cancer specimens, and the HZ-14 was isolated from an acute inflammatory lung infiltrate sample. Two of the env sequences exhibited disruption of the reading frame due to mutations. Conclusion In summary, we identified the presence of MMTV-like gene sequences in 2 out of 11 (18%) of the lung carcinomas and 1 out of 7 (14%) of acute inflamatory lung infiltrate specimens studied of a Mexican Population. PMID:21943279
International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

PubMed Central

Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

2015-01-01

This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030
Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

PubMed Central

Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

2012-01-01

Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922
Characterization and Amplification of Gene-Based Simple Sequence Repeat (SSR) Markers in Date Palm.

PubMed

Zhao, Yongli; Keremane, Manjunath; Prakash, Channapatna S; He, Guohao

2017-01-01

The paucity of molecular markers limits the application of genetic and genomic research in date palm (Phoenix dactylifera L.). Availability of expressed sequence tag (EST) sequences in date palm may provide a good resource for developing gene-based markers. This study characterizes a substantial fraction of transcriptome sequences containing simple sequence repeats (SSRs) from the EST sequences in date palm. The EST sequences studied are mainly homologous to those of Elaeis guineensis and Musa acuminata. A total of 911 gene-based SSR markers, characterized with functional annotations, have provided a useful basis not only for discovering candidate genes and understanding genetic basis of traits of interest but also for developing genetic and genomic tools for molecular research in date palm, such as diversity study, quantitative trait locus (QTL) mapping, and molecular breeding. The procedures of DNA extraction, polymerase chain reaction (PCR) amplification of these gene-based SSR markers, and gel electrophoresis of PCR products are described in this chapter.
Mouse Vk gene classification by nucleic acid sequence similarity.

PubMed

Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

1989-01-01

Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.
COI (cytochrome oxidase-I) sequence based studies of Carangid fishes from Kakinada coast, India.

PubMed

Persis, M; Chandra Sekhar Reddy, A; Rao, L M; Khedkar, G D; Ravinder, K; Nasruddin, K

2009-09-01

Mitochondrial DNA, cytochrome oxidase-1 gene sequences were analyzed for species identification and phylogenetic relationship among the very high food value and commercially important Indian carangid fish species. Sequence analysis of COI gene very clearly indicated that all the 28 fish species fell into five distinct groups, which are genetically distant from each other and exhibited identical phylogenetic reservation. All the COI gene sequences from 28 fishes provide sufficient phylogenetic information and evolutionary relationship to distinguish the carangid species unambiguously. This study proves the utility of mtDNA COI gene sequence based approach in identifying fish species at a faster pace.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs.

PubMed

Powell, Bradford C; Hutchison, Clyde A

2006-01-19

Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene prediction. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes.
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

PubMed Central

Powell, Bradford C; Hutchison, Clyde A

2006-01-01

Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp.

PubMed

Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong

2015-03-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
Cloning and sequence analysis demonstrate the chromate reduction ability of a novel chromate reductase gene from Serratia sp

PubMed Central

DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG

2015-01-01

The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
A cricket Gene Index: a genomic resource for studying neurobiology, speciation, and molecular evolution

PubMed Central

Danley, Patrick D; Mullen, Sean P; Liu, Fenglong; Nene, Vishvanath; Quackenbush, John; Shaw, Kerry L

2007-01-01

Background As the developmental costs of genomic tools decline, genomic approaches to non-model systems are becoming more feasible. Many of these systems may lack advanced genetic tools but are extremely valuable models in other biological fields. Here we report the development of expressed sequence tags (EST's) in an orthopteroid insect, a model for the study of neurobiology, speciation, and evolution. Results We report the sequencing of 14,502 EST's from clones derived from a nerve cord cDNA library, and the subsequent construction of a Gene Index from these sequences, from the Hawaiian trigonidiine cricket Laupala kohalensis. The Gene Index contains 8607 unique sequences comprised of 2575 tentative consensus (TC) sequences and 6032 singletons. For each of the unique sequences, an attempt was made to assign a provisional annotation and to categorize its function using a Gene Ontology-based classification through a sequence-based comparison to known proteins. In addition, a set of unique 70 base pair oligomers that can be used for DNA microarrays was developed. All Gene Index information is posted at the DFCI Gene Indices web page Conclusion Orthopterans are models used to understand the neurophysiological basis of complex motor patterns such as flight and stridulation. The sequences presented in the cricket Gene Index will provide neurophysiologists with many genetic tools that have been largely absent in this field. The cricket Gene Index is one of only two gene indices to be developed in an evolutionary model system. Species within the genus Laupala have speciated recently, rapidly, and extensively. Therefore, the genes identified in the cricket Gene Index can be used to study the genomics of speciation. Furthermore, this gene index represents a significant EST resources for basal insects. As such, this resource is a valuable comparative tool for the understanding of invertebrate molecular evolution. The sequences presented here will provide much needed genomic resources for three distinct but overlapping fields of inquiry: neurobiology, speciation, and molecular evolution. PMID:17459168
Gene finding in metatranscriptomic sequences.

PubMed

Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

2014-01-01

Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
[Progress on molecular biology of Isaria farinosa, pathogen of host of Ophiocordyceps sinensis during the artificial culture].

PubMed

Liu, Fei; Wu, Xiao-Li; Liu, Ying; Chen, Da-Xia; Zhang, De-Li; Yang, Da-Jian

2016-02-01

Isaria farinosa is the pathogen of the host of Ophiocordyceps sinensis. The present research has analyzed the progress on the molecular biology according to the bibliometrics, the sequences (including the gene sequences) of I. farinosa in the NCBI. The results indicated that different country had published different number of the papers, and had landed different kinds and different number of the sequences (including the gene sequences). China had published the most number of the papers, and had landed the most number of the sequences (including the gene sequences). America had landed the most numbers of the function genes. The main content about the pathogen study was focus on the biological controlling. The main content about the molecular study concentrated on the phylogenies classification. In recent years some protease genes and chitinase genes had been researched. With the increase of the effect on the healthy of O. sinensis, and the whole sequence and more and more pharmacological activities of I. farinosa being made known to the public, the study on the molecular biology of the I. farinosa would be deeper and wider. Copyright© by the Chinese Pharmaceutical Association.
In silico comparison of genomic regions containing genes coding for enzymes and transcription factors for the phenylpropanoid pathway in Phaseolus vulgaris L. and Glycine max L. Merr

PubMed Central

Reinprecht, Yarmilla; Yadegari, Zeinab; Perry, Gregory E.; Siddiqua, Mahbuba; Wright, Lori C.; McClean, Phillip E.; Pauls, K. Peter

2013-01-01

Legumes contain a variety of phytochemicals derived from the phenylpropanoid pathway that have important effects on human health as well as seed coat color, plant disease resistance and nodulation. However, the information about the genes involved in this important pathway is fragmentary in common bean (Phaseolus vulgaris L.). The objectives of this research were to isolate genes that function in and control the phenylpropanoid pathway in common bean, determine their genomic locations in silico in common bean and soybean, and analyze sequences of the 4CL gene family in two common bean genotypes. Sequences of phenylpropanoid pathway genes available for common bean or other plant species were aligned, and the conserved regions were used to design sequence-specific primers. The PCR products were cloned and sequenced and the gene sequences along with common bean gene-based (g) markers were BLASTed against the Glycine max v.1.0 genome and the P. vulgaris v.1.0 (Andean) early release genome. In addition, gene sequences were BLASTed against the OAC Rex (Mesoamerican) genome sequence assembly. In total, fragments of 46 structural and regulatory phenylpropanoid pathway genes were characterized in this way and placed in silico on common bean and soybean sequence maps. The maps contain over 250 common bean g and SSR (simple sequence repeat) markers and identify the positions of more than 60 additional phenylpropanoid pathway gene sequences, plus the putative locations of seed coat color genes. The majority of cloned phenylpropanoid pathway gene sequences were mapped to one location in the common bean genome but had two positions in soybean. The comparison of the genomic maps confirmed previous studies, which show that common bean and soybean share genomic regions, including those containing phenylpropanoid pathway gene sequences, with conserved synteny. Indels identified in the comparison of Andean and Mesoamerican common bean 4CL gene sequences might be used to develop inter-pool phenylpropanoid pathway gene-based markers. We anticipate that the information obtained by this study will simplify and accelerate selections of common bean with specific phenylpropanoid pathway alleles to increase the contents of beneficial phenylpropanoids in common bean and other legumes. PMID:24046770
HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

PubMed Central

Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

2016-01-01

Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

PubMed

Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

2016-08-30

Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
Targeted gene enrichment and high-throughput sequencing for environmental biomonitoring: a case study using freshwater macroinvertebrates.

PubMed

Dowle, Eddy J; Pochon, Xavier; C Banks, Jonathan; Shearer, Karen; Wood, Susanna A

2016-09-01

Recent studies have advocated biomonitoring using DNA techniques. In this study, two high-throughput sequencing (HTS)-based methods were evaluated: amplicon metabarcoding of the cytochrome C oxidase subunit I (COI) mitochondrial gene and gene enrichment using MYbaits (targeting nine different genes including COI). The gene-enrichment method does not require PCR amplification and thus avoids biases associated with universal primers. Macroinvertebrate samples were collected from 12 New Zealand rivers. Macroinvertebrates were morphologically identified and enumerated, and their biomass determined. DNA was extracted from all macroinvertebrate samples and HTS undertaken using the illumina miseq platform. Macroinvertebrate communities were characterized from sequence data using either six genes (three of the original nine were not used) or just the COI gene in isolation. The gene-enrichment method (all genes) detected the highest number of taxa and obtained the strongest Spearman rank correlations between the number of sequence reads, abundance and biomass in 67% of the samples. Median detection rates across rare (<1% of the total abundance or biomass), moderately abundant (1-5%) and highly abundant (>5%) taxa were highest using the gene-enrichment method (all genes). Our data indicated primer biases occurred during amplicon metabarcoding with greater than 80% of sequence reads originating from one taxon in several samples. The accuracy and sensitivity of both HTS methods would be improved with more comprehensive reference sequence databases. The data from this study illustrate the challenges of using PCR amplification-based methods for biomonitoring and highlight the potential benefits of using approaches, such as gene enrichment, which circumvent the need for an initial PCR step. © 2015 John Wiley & Sons Ltd.
De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

PubMed Central

2013-01-01

Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).

PubMed

Liang, Jian-Ying; Lin, Rui-Qing

2016-11-01

In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.

Adenovirus EIIA early promoter: transcriptional control elements and induction by the viral pre-early EIA gene, which appears to be sequence independent.

PubMed Central

Murthy, S C; Bhat, G P; Thimmappaya, B

1985-01-01

A molecular dissection of the adenovirus EIIA early (E) promoter was undertaken to study the sequence elements required for transcription and to examine the nucleotide sequences, if any, specific for its trans-activation by the viral pre-early EIA gene product. A chimeric gene in which the EIIA-E promoter region fused to the coding sequences of the bacterial chloramphenicol acetyltransferase (CAT) gene was used in transient assays to identify the transcriptional control regions. Deletion mapping studies revealed that the upstream DNA sequences up to -86 were sufficient for the optimal basal level transcription in HeLa cells and also for the EIA-induced transcription. A series of linker-scanning (LS) mutants were constructed to precisely identify the nucleotide sequences that control transcription. Analysis of these LS mutants allowed us to identify two regions of the promoter that are critical for the EIIA-E transcription. These regions are located between -29 and -21 (region I) and between -82 and -66 (region II). Mutations in region I affected initiation and appeared functionally similar to the "TATA" sequence of the commonly studied promoters. To examine whether or not the EIIA-E promoter contained DNA sequences specific for the trans-activation by the EIA, the LS mutants were analyzed in a cotransfection assay containing a plasmid carrying the EIA gene. CAT activity of all of the LS mutants was induced by the EIA gene in this assay, suggesting that the induction of transcription of the EIIA-E promoter by the EIA gene is not sequence-specific. Images PMID:3857577
Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes

PubMed Central

Premzl, Marko

2015-01-01

Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Complete cpDNA genome sequence of Smilax china and phylogenetic placement of Liliales--influences of gene partitions and taxon sampling.

PubMed

Liu, Juan; Qi, Zhe-Chen; Zhao, Yun-Peng; Fu, Cheng-Xin; Jenny Xiang, Qiu-Yun

2012-09-01

The complete nucleotide sequence of the chloroplast genome (cpDNA) of Smilax china L. (Smilacaceae) is reported. It is the first complete cp genome sequence in Liliales. Genomic analyses were conducted to examine the rate and pattern of cpDNA genome evolution in Smilax relative to other major lineages of monocots. The cpDNA genomic sequences were combined with those available for Lilium to evaluate the phylogenetic position of Liliales and to investigate the influence of taxon sampling, gene sampling, gene function, natural selection, and substitution rate on phylogenetic inference in monocots. Phylogenetic analyses using sequence data of gene groups partitioned according to gene function, selection force, and total substitution rate demonstrated evident impacts of these factors on phylogenetic inference of monocots and the placement of Liliales, suggesting potential evolutionary convergence or adaptation of some cpDNA genes in monocots. Our study also demonstrated that reduced taxon sampling reduced the bootstrap support for the placement of Liliales in the cpDNA phylogenomic analysis. Analyses of sequences of 77 protein genes with some missing data and sequences of 81 genes (all protein genes plus the rRNA genes) support a sister relationship of Liliales to the commelinids-Asparagales clade, consistent with the APG III system. Analyses of 63 cpDNA protein genes for 32 taxa with few missing data, however, support a sister relationship of Liliales (represented by Smilax and Lilium) to Dioscoreales-Pandanales. Topology tests indicated that these two alignments do not significantly differ given any of these three cpDNA genomic sequence data sets. Furthermore, we found no saturation effect of the data, suggesting that the cpDNA genomic sequence data used in the study are appropriate for monocot phylogenetic study and long-branch attraction is unlikely to be the cause to explain the result of two well-supported, conflict placements of Liliales. Further analyses using sufficient nuclear data remain necessary to evaluate these two phylogenetic hypotheses regarding the position of Liliales and to address the causes of signal conflict among genes and partitions. Copyright © 2012 Elsevier Inc. All rights reserved.
Molecular characterization of nucleopolyhedrovirus of three lepidopteran pests using late expression factor-8 gene.

PubMed

Jose, Jency; Jalali, S K; Shivalingaswamy, T M; Kumar, N K Krishna; Bhatnagar, R; Bandyopadhyay, A

2013-06-01

A PCR based method for detection of viral DNA in nucleopolyhedrovirus of three lepidopterans, Spodoptera litura, Amsacta albistriga and Helicoverpa armigera, was developed by employing the late expression factor-8 (lef-8) gene of three NPV using specific primers. The amplicons of 689, 699 and 665 bp were amplified, respectively, and the nucleotide sequences were submitted to GenBank and the accession numbers were obtained. The sequences of lef-8 gene of S. litura NPV and H. armigera NPV matched with those of their respective references in the GenBank database, thereby confirming their identity, however, the sequence of A. albistriga NPV was the first sequence submitted to the GenBank database. The sequence similarity analysis between the three lef-8 gene of NPV sequenced in the present study revealed that there was no significant similarity between them, however A. albistriga NPV and S. litura NPV were found to be closely related. CLUSTAL alignment of the sequences generated revealed general relatedness among NPVs lef-8 gene. The study confirmed that lef-8 gene can be used for quick and correct discriminatory identification of insect viruses.
Mining, identification and function analysis of microRNAs and target genes in peanut (Arachis hypogaea L.).

PubMed

Zhang, Tingting; Hu, Shuhao; Yan, Caixia; Li, Chunjuan; Zhao, Xiaobo; Wan, Shubo; Shan, Shihua

2017-02-01

In the present investigation, a total of 60 conserved peanut (Arachis hypogaea L.) microRNA (miRNA) sequences, belonging to 16 families, were identified using bioinformatics methods. There were 392 target gene sequences, identified from 58 miRNAs with Target-align software and BLASTx analyses. Gene Ontology (GO) functional analysis suggested that these target genes were involved in mediating peanut growth and development, signal transduction and stress resistance. There were 55 miRNA sequences, verified employing a poly (A) tailing test, with a success rate of up to 91.67%. Twenty peanut target gene sequences were randomly selected, and the 5' rapid amplification of the cDNA ends (5'-RACE) method were used to validate the cleavage sites of these target genes. Of these, 14 (70%) peanut miRNA targets were verified by means of gel electrophoresis, cloning and sequencing. Furthermore, functional analysis and homologous sequence retrieval were conducted for target gene sequences, and 26 target genes were chosen as the objects for stress resistance experimental study. Real-time fluorescence quantitative PCR (qRT-PCR) technology was applied to measure the expression level of resistance-associated miRNAs and their target genes in peanut exposed to Aspergillus flavus (A. flavus) infection and drought stress, respectively. In consequence, 5 groups of miRNAs & targets were found accorded with the mode of miRNA negatively controlling the expression of target genes. This study, preliminarily determined the biological functions of some resistance-associated miRNAs and their target genes in peanut. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
DNA sequence-based comparative studies between non-extremophile and extremophile organisms with implications in exobiology

NASA Astrophysics Data System (ADS)

Holden, Todd; Marchese, P.; Tremberger, G., Jr.; Cheung, E.; Subramaniam, R.; Sullivan, R.; Schneider, P.; Flamholz, A.; Lieberman, D.; Cheung, T.

2008-08-01

We have characterized function related DNA sequences of various organisms using informatics techniques, including fractal dimension calculation, nucleotide and multi-nucleotide statistics, and sequence fluctuation analysis. Our analysis shows trends which differentiate extremophile from non-extremophile organisms, which could be reproduced in extraterrestrial life. Among the systems studied are radiation repair genes, genes involved in thermal shocks, and genes involved in drug resistance. We also evaluate sequence level changes that have occurred during short term evolution (several thousand generations) under extreme conditions.
Discovery of genes related to insecticide resistance in Bactrocera dorsalis by functional genomic analysis of a de novo assembled transcriptome.

PubMed

Hsu, Ju-Chun; Chien, Ting-Ying; Hu, Chia-Cheng; Chen, Mei-Ju May; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S; Chen, Chien-Yu

2012-01-01

Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs.

PubMed

Holmes, Roger S; Goldberg, Erwin

2009-10-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals.
Computational analyses of mammalian lactate dehydrogenases: human, mouse, opossum and platypus LDHs

PubMed Central

Holmes, Roger S; Goldberg, Erwin

2009-01-01

Computational methods were used to predict the amino acid sequences and gene locations for mammalian lactate dehydrogenase (LDH) genes and proteins using genome sequence databanks. Human LDHA, LDHC and LDH6A genes were located in tandem on chromosome 11, while LDH6B and LDH6C genes were on chromosomes 15 and 12, respectively. Opossum LDHC and LDH6B genes were located in tandem with the opossum LDHA gene on chromosome 5 and contained 7 (LDHA and LDHC) or 8 (LDH6B) exons. An amino acid sequence prediction for the opossum LDH6B subunit gave an extended N-terminal sequence, similar to the human and mouse LDH6B sequences, which may support the export of this enzyme into mitochondria. The platypus genome contained at least 3 LDH genes encoding LDHA, LDHB and LDH6B subunits. Phylogenetic studies and sequence analyses indicated that LDHA, LDHB and LDH6B genes are present in all mammalian genomes examined, including a monotreme species (platypus), whereas the LDHC gene may have arisen more recently in marsupial mammals. PMID:19679512
De novo Transcriptome Assembly of Common Wild Rice (Oryza rufipogon Griff.) and Discovery of Drought-Response Genes in Root Tissue Based on Transcriptomic Data.

PubMed

Tian, Xin-Jie; Long, Yan; Wang, Jiao; Zhang, Jing-Wen; Wang, Yan-Yan; Li, Wei-Min; Peng, Yu-Fa; Yuan, Qian-Hua; Pei, Xin-Wu

2015-01-01

The perennial O. rufipogon (common wild rice), which is considered to be the ancestor of Asian cultivated rice species, contains many useful genetic resources, including drought resistance genes. However, few studies have identified the drought resistance and tissue-specific genes in common wild rice. In this study, transcriptome sequencing libraries were constructed, including drought-treated roots (DR) and control leaves (CL) and roots (CR). Using Illumina sequencing technology, we generated 16.75 million bases of high-quality sequence data for common wild rice and conducted de novo assembly and annotation of genes without prior genome information. These reads were assembled into 119,332 unigenes with an average length of 715 bp. A total of 88,813 distinct sequences (74.42% of unigenes) significantly matched known genes in the NCBI NT database. Differentially expressed gene (DEG) analysis showed that 3617 genes were up-regulated and 4171 genes were down-regulated in the CR library compared with the CL library. Among the DEGs, 535 genes were expressed in roots but not in shoots. A similar comparison between the DR and CR libraries showed that 1393 genes were up-regulated and 315 genes were down-regulated in the DR library compared with the CR library. Finally, 37 genes that were specifically expressed in roots were screened after comparing the DEGs identified in the above-described analyses. This study provides a transcriptome sequence resource for common wild rice plants and establishes a digital gene expression profile of wild rice plants under drought conditions using the assembled transcriptome data as a reference. Several tissue-specific and drought-stress-related candidate genes were identified, representing a fully characterized transcriptome and providing a valuable resource for genetic and genomic studies in plants.
Phylogenetic relationships of Malassezia species based on multilocus sequence analysis.

PubMed

Castellá, Gemma; Coutinho, Selene Dall' Acqua; Cabañes, F Javier

2014-01-01

Members of the genus Malassezia are lipophilic basidiomycetous yeasts, which are part of the normal cutaneous microbiota of humans and other warm-blooded animals. Currently, this genus consists of 14 species that have been characterized by phenetic and molecular methods. Although several molecular methods have been used to identify and/or differentiate Malassezia species, the sequencing of the rRNA genes and the chitin synthase-2 gene (CHS2) are the most widely employed. There is little information about the β-tubulin gene in the genus Malassezia, a gene has been used for the analysis of complex species groups. The aim of the present study was to sequence a fragment of the β-tubulin gene of Malassezia species and analyze their phylogenetic relationship using a multilocus sequence approach based on two rRNA genes (ITS including 5.8S rRNA and D1/D2 region of 26S rRNA) together with two protein encoding genes (CHS2 and β-tubulin). The phylogenetic study of the partial β-tubulin gene sequences indicated that this molecular marker can be used to assess diversity and identify new species. The multilocus sequence analysis of the four loci provides robust support to delineate species at the terminal nodes and could help to estimate divergence times for the origin and diversification of Malassezia species.
Draft Genome Sequence, and a Sequence-Defined Genetic Linkage Map of the Legume Crop Species Lupinus angustifolius L

PubMed Central

Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W.; Howieson, John G.; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species. PMID:23734219
Draft genome sequence, and a sequence-defined genetic linkage map of the legume crop species Lupinus angustifolius L.

PubMed

Yang, Huaan; Tao, Ye; Zheng, Zequn; Zhang, Qisen; Zhou, Gaofeng; Sweetingham, Mark W; Howieson, John G; Li, Chengdao

2013-01-01

Lupin (Lupinus angustifolius L.) is the most recently domesticated crop in major agricultural cultivation. Its seeds are high in protein and dietary fibre, but low in oil and starch. Medical and dietetic studies have shown that consuming lupin-enriched food has significant health benefits. We report the draft assembly from a whole genome shotgun sequencing dataset for this legume species with 26.9x coverage of the genome, which is predicted to contain 57,807 genes. Analysis of the annotated genes with metabolic pathways provided a partial understanding of some key features of lupin, such as the amino acid profile of storage proteins in seeds. Furthermore, we applied the NGS-based RAD-sequencing technology to obtain 8,244 sequence-defined markers for anchoring the genomic sequences. A total of 4,214 scaffolds from the genome sequence assembly were aligned into the genetic map. The combination of the draft assembly and a sequence-defined genetic map made it possible to locate and study functional genes of agronomic interest. The identification of co-segregating SNP markers, scaffold sequences and gene annotation facilitated the identification of a candidate R gene associated with resistance to the major lupin disease anthracnose. We demonstrated that the combination of medium-depth genome sequencing and a high-density genetic linkage map by application of NGS technology is a cost-effective approach to generating genome sequence data and a large number of molecular markers to study the genomics, genetics and functional genes of lupin, and to apply them to molecular plant breeding. This strategy does not require prior genome knowledge, which potentiates its application to a wide range of non-model species.
AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

PubMed

Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

2014-01-01

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.
Efficient identification of Y chromosome sequences in the human and Drosophila genomes.

PubMed

Carvalho, Antonio Bernardo; Clark, Andrew G

2013-11-01

Notwithstanding their biological importance, Y chromosomes remain poorly known in most species. A major obstacle to their study is the identification of Y chromosome sequences; due to its high content of repetitive DNA, in most genome projects, the Y chromosome sequence is fragmented into a large number of small, unmapped scaffolds. Identification of Y-linked genes among these fragments has yielded important insights about the origin and evolution of Y chromosomes, but the process is labor intensive, restricting studies to a small number of species. Apart from these fragmentary assemblies, in a few mammalian species, the euchromatic sequence of the Y is essentially complete, owing to painstaking BAC mapping and sequencing. Here we use female short-read sequencing and k-mer comparison to identify Y-linked sequences in two very different genomes, Drosophila virilis and human. Using this method, essentially all D. virilis scaffolds were unambiguously classified as Y-linked or not Y-linked. We found 800 new scaffolds (totaling 8.5 Mbp), and four new genes in the Y chromosome of D. virilis, including JYalpha, a gene involved in hybrid male sterility. Our results also strongly support the preponderance of gene gains over gene losses in the evolution of the Drosophila Y. In the intensively studied human genome, used here as a positive control, we recovered all previously known genes or gene families, plus a small amount (283 kb) of new, unfinished sequence. Hence, this method works in large and complex genomes and can be applied to any species with sex chromosomes.
AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

PubMed Central

Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

2014-01-01

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. PMID:24892935
De novo characterization of fall dormant and nondormant alfalfa (Medicago sativa L.) leaf transcriptome and identification of candidate genes related to fall dormancy.

PubMed

Zhang, Senhao; Shi, Yinghua; Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

2015-01-01

Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa.
De Novo Characterization of Fall Dormant and Nondormant Alfalfa (Medicago sativa L.) Leaf Transcriptome and Identification of Candidate Genes Related to Fall Dormancy

PubMed Central

Cheng, Ningning; Du, Hongqi; Fan, Wenna; Wang, Chengzhang

2015-01-01

Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide. Fall dormancy is an adaptive character related to the biomass production and winter survival in alfalfa. The physiological, biochemical and molecular mechanisms causing fall dormancy and the related genes have not been well studied. In this study, we sequenced two standard varieties of alfalfa (dormant and non-dormant) at two time points and generated approximately 160 million high quality paired-end sequence reads using sequencing by synthesis (SBS) technology. The de novo transcriptome assembly generated a set of 192,875 transcripts with an average length of 856 bp representing about 165.1 Mb of the alfalfa leaf transcriptome. After assembly, 111,062 (57.6%) transcripts were annotated against the NCBI non-redundant database. A total of 30,165 (15.6%) transcripts were mapped to 323 Kyoto Encyclopedia of Genes and Genomes pathways. We also identified 41,973 simple sequence repeats, which can be used to generate markers for alfalfa, and 1,541 transcription factors were identified across 1,350 transcripts. Gene expression between dormant and non-dormant alfalfa at different time points were performed, and we identified several differentially expressed genes potentially related to fall dormancy. The Gene Ontology and pathways information were also identified. We sequenced and assembled the leaf transcriptome of alfalfa related to fall dormancy, and also identified some genes of interest involved in the fall dormancy mechanism. Thus, our research focused on studying fall dormancy in alfalfa through transcriptome sequencing. The sequencing and gene expression data generated in this study may be used further to elucidate the complete mechanisms governing fall dormancy in alfalfa. PMID:25799491
Complexity of genetic sequences modified by horizontal gene transfer and degraded-DNA uptake

NASA Astrophysics Data System (ADS)

Tremberger, George; Dehipawala, S.; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

2015-09-01

Horizontal gene transfer has been a major vehicle for efficient transfer of genetic materials among living species and could be one of the sources for noncoding DNA incorporation into a genome. Our previous study of lnc- RNA sequence complexity in terms of fractal dimension and information entropy shows a tight regulation among the studied genes in numerous diseases. The role of sequence complexity in horizontal transferred genes was investigated with Mealybug in symbiotic relation with a 139K genome microbe and Deinococcus radiodurans as examples. The fractal dimension and entropy showed correlation R-sq of 0.82 (N = 6) for the studied Deinococcus radiodurans sequences. For comparison the Deinococcus radiodurans oxidative stress tolerant catalase and superoxide dismutase genes under extracellular dGMP growth condition showed R-sq ~ 0.42 (N = 6); and the studied arsenate reductase horizontal transferred genes for toxicity survival in several microorganisms showed no correlation. Simulation results showed that R-sq < 0.4 would be improbable at less than one percent chance, suggestive of additional selection pressure when compared to the R-sq ~ 0.29 (N = 21) in the studied transferred genes in Mealybug. The mild correlation of R-sq ~ 0.5 for fractal dimension versus transcription level in the studied Deinococcus radiodurans sequences upon extracellular dGMP growth condition would suggest that lower fractal dimension with less electron density fluctuation favors higher transcription level.

Three copies of a single protein II-encoding sequence in the genome of Neisseria gonorrhoeae JS3: evidence for gene conversion and gene duplication.

PubMed

van der Ley, P

1988-11-01

Gonococci express a family of related outer membrane proteins designated protein II (P.II). These surface proteins are subject to both phase variation and antigenic variation. The P.II gene repertoire of Neisseria gonorrhoeae strain JS3 was found to consist of at least ten genes, eight of which were cloned. Sequence analysis and DNA hybridization studies revealed that one particular P.II-encoding sequence is present in three distinct, but almost identical, copies in the JS3 genome. These genes encode the P.II protein that was previously identified as P.IIc. Comparison of their sequences shows that the multiple copies of this P.IIc-encoding gene might have been generated by both gene conversion and gene duplication.
Exome-wide DNA capture and next generation sequencing in domestic and wild species.

PubMed

Cosart, Ted; Beja-Pereira, Albano; Chen, Shanyuan; Ng, Sarah B; Shendure, Jay; Luikart, Gordon

2011-07-05

Gene-targeted and genome-wide markers are crucial to advance evolutionary biology, agriculture, and biodiversity conservation by improving our understanding of genetic processes underlying adaptation and speciation. Unfortunately, for eukaryotic species with large genomes it remains costly to obtain genome sequences and to develop genome resources such as genome-wide SNPs. A method is needed to allow gene-targeted, next-generation sequencing that is flexible enough to include any gene or number of genes, unlike transcriptome sequencing. Such a method would allow sequencing of many individuals, avoiding ascertainment bias in subsequent population genetic analyses.We demonstrate the usefulness of a recent technology, exon capture, for genome-wide, gene-targeted marker discovery in species with no genome resources. We use coding gene sequences from the domestic cow genome sequence (Bos taurus) to capture (enrich for), and subsequently sequence, thousands of exons of B. taurus, B. indicus, and Bison bison (wild bison). Our capture array has probes for 16,131 exons in 2,570 genes, including 203 candidate genes with known function and of interest for their association with disease and other fitness traits. We successfully sequenced and mapped exon sequences from across the 29 autosomes and X chromosome in the B. taurus genome sequence. Exon capture and high-throughput sequencing identified thousands of putative SNPs spread evenly across all reference chromosomes, in all three individuals, including hundreds of SNPs in our targeted candidate genes. This study shows exon capture can be customized for SNP discovery in many individuals and for non-model species without genomic resources. Our captured exome subset was small enough for affordable next-generation sequencing, and successfully captured exons from a divergent wild species using the domestic cow genome as reference.
Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

PubMed Central

Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

2015-01-01

Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729
New approach for the study of mite reproduction: The first transcriptome analysis of a mite, Phytoseiulus persimilis (Acari: Phytoseiidae).

PubMed

Cabrera, Ana R; Donohue, Kevin V; Khalil, Sayed M S; Scholl, Elizabeth; Opperman, Charles; Sonenshine, Daniel E; Roe, R Michael

2011-01-01

Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yield sequences of genes critical during physiological processes poorly understood in acarines, i.e., the regulation of female reproduction in mites. The predatory mite, Phytoseiulus persimilis, was selected to conduct a transcriptome analysis using 454 pyrosequencing. The objective of this project was to obtain DNA-sequence information of expressed genes from P. persimilis with special interest in sequences corresponding to vitellogenin (Vg) and the vitellogenin receptor (VgR). These genes are critical to the understanding of vitellogenesis, and they will facilitate the study of the regulation of mite female reproduction. A total of 12,556 contiguous sequences (contigs) were assembled with an average size of 935bp. From these sequences, the putative translated peptides of 11 contigs were similar in amino acid sequences to other arthropod Vgs, while 6 were similar to VgRs. We selected some of these sequences to conduct stage-specific expression studies to further determine their function. 2010 Elsevier Ltd. All rights reserved.
Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Molecular phylogeny, population genetics, and evolution of heterocystous cyanobacteria using nifH gene sequences.

PubMed

Singh, Prashant; Singh, Satya Shila; Elster, Josef; Mishra, Arun Kumar

2013-06-01

In order to assess phylogeny, population genetics, and approximation of future course of cyanobacterial evolution based on nifH gene sequences, 41 heterocystous cyanobacterial strains collected from all over India have been used in the present study. NifH gene sequence analysis data confirm that the heterocystous cyanobacteria are monophyletic while the stigonematales show polyphyletic origin with grave intermixing. Further, analysis of nifH gene sequence data using intricate mathematical extrapolations revealed that the nucleotide diversity and recombination frequency is much greater in Nostocales than the Stigonematales. Similarly, DNA divergence studies showed significant values of divergence with greater gene conversion tracts in the unbranched (Nostocales) than the branched (Stigonematales) strains. Our data strongly support the origin of true branching cyanobacterial strains from the unbranched strains.
Pseudomonas sp. strain CA5 (a selenite-reducing bacterium) 16S rRNA gene complete sequence. National Institute of Health, National Center for Biotechnology Information, GenBank sequence. Accession FJ422810.1.

USDA-ARS?s Scientific Manuscript database

This study used 1321 base pair 16S rRNA gene sequence methods to confirm the phylogenetic position of a soil isolate as a bacterium belonging to the genus Pesudomonas sp. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification...
DNA sequence responsible for the amplification of adjacent genes.

PubMed

Pasion, S G; Hartigan, J A; Kumar, V; Biswas, D K

1987-10-01

A 10.3-kb DNA fragment in the 5'-flanking region of the rat prolactin (rPRL) gene was isolated from F1BGH(1)2C1, a strain of rat pituitary tumor cells (GH cells) that produces prolactin in response to 5-bromodeoxyuridine (BrdU). Following transfection and integration into genomic DNA of recipient mouse L cells, this DNA induced amplification of the adjacent thymidine kinase gene from Herpes simplex virus type 1 (HSV1TK). We confirmed the ability of this "Amplicon" sequence to induce amplification of other linked or unlinked genes in DNA-mediated gene transfer studies. When transferred into the mouse L cells with the 10.3-5'rPRL gene sequence of BrdU-responsive cells, both the human growth hormone and the HSV1TK genes are amplified in response to 5-bromodeoxyuridine. This observation is substantiated by BrdU-induced amplification of the cotransferred bacterial Neo gene. Cotransfection studies reveal that the BrdU-induced amplification capability is associated with a 4-kb DNA sequence in the 5'-flanking region of the rPRL gene of BrdU-responsive cells. These results demonstrate that genes of heterologous origin, linked or unlinked, and selected or unselected, can be coamplified when located within the amplification boundary of the Amplicon sequence.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

PubMed

Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

2010-12-15

Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Lactobacillus heilongjiangensis sp. nov., isolated from Chinese pickle.

PubMed

Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

2013-11-01

A Gram-stain-positive bacterial strain, S4-3(T), was isolated from traditional pickle in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, pheS gene sequence analysis, rpoA gene sequence analysis, dnaK gene sequence analysis, fatty acid methyl ester (FAME) analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain S4-3(T) showed 97.9-98.7 % 16S rRNA gene sequence similarities, 84.4-94.1 % pheS gene sequence similarities and 94.4-96.9 % rpoA gene sequence similarities to the type strains of Lactobacillus nantensis, Lactobacillus mindensis, Lactobacillus crustorum, Lactobacillus futsaii, Lactobacillus farciminis and Lactobacillus kimchiensis. dnaK gene sequence similarities between S4-3(T) and Lactobacillus nantensis LMG 23510(T), Lactobacillus mindensis LMG 21932(T), Lactobacillus crustorum LMG 23699(T), Lactobacillus futsaii JCM 17355(T) and Lactobacillus farciminis LMG 9200(T) were 95.4, 91.5, 90.4, 91.7 and 93.1 %, respectively. Based upon the data obtained in the present study, a novel species, Lactobacillus heilongjiangensis sp. nov., is proposed and the type strain is S4-3(T) ( = LMG 26166(T) = NCIMB 14701(T)).
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing.

PubMed

Jäger, Marten; Ott, Claus-Eric; Grünhagen, Johannes; Hecht, Jochen; Schell, Hanna; Mundlos, Stefan; Duda, Georg N; Robinson, Peter N; Lienau, Jasmin

2011-03-24

The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism.
Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing

PubMed Central

2011-01-01

Background The sheep is an important model organism for many types of medically relevant research, but molecular genetic experiments in the sheep have been limited by the lack of knowledge about ovine gene sequences. Results Prior to our study, mRNA sequences for only 1,556 partial or complete ovine genes were publicly available. Therefore, we developed a composite de novo transcriptome assembly method for next-generation sequence data to combine known ovine mRNA and EST sequences, mRNA sequences from mouse and cow, and sequences assembled de novo from short read RNA-Seq data into a composite reference transcriptome, and identified transcripts from over 12 thousand previously undescribed ovine genes. Gene expression analysis based on these data revealed substantially different expression profiles in standard versus delayed bone healing in an ovine tibial osteotomy model. Hundreds of transcripts were differentially expressed between standard and delayed healing and between the time points of the standard and delayed healing groups. We used the sheep sequences to design quantitative RT-PCR assays with which we validated the differential expression of 26 genes that had been identified by RNA-seq analysis. A number of clusters of characteristic expression profiles could be identified, some of which showed striking differences between the standard and delayed healing groups. Gene Ontology (GO) analysis showed that the differentially expressed genes were enriched in terms including extracellular matrix, cartilage development, contractile fiber, and chemokine activity. Conclusions Our results provide a first atlas of gene expression profiles and differentially expressed genes in standard and delayed bone healing in a large-animal model and provide a number of clues as to the shifts in gene expression that underlie delayed bone healing. In the course of our study, we identified transcripts of 13,987 ovine genes, including 12,431 genes for which no sequence information was previously available. This information will provide a basis for future molecular research involving the sheep as a model organism. PMID:21435219
Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

PubMed

Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

2013-01-01

Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.
Genetic Diversity among Clostridium botulinum Strains Harboring bont/A2 and bont/A3 Genes

PubMed Central

Raphael, Brian H.; Joseph, Lavin A.; Meno, Sarah R.; Fernández, Rafael A.; Maslanka, Susan E.

2012-01-01

Clostridium botulinum type A strains are known to be genetically diverse and widespread throughout the world. Genetic diversity studies have focused mainly on strains harboring one type A botulinum toxin gene, bont/A1, although all reported bont/A gene variants have been associated with botulism cases. Our study provides insight into the genetic diversity of C. botulinum type A strains, which contain bont/A2 (n = 42) and bont/A3 (n = 4) genes, isolated from diverse samples and geographic origins. Genetic diversity was assessed by using bont nucleotide sequencing, content analysis of the bont gene clusters, multilocus sequence typing (MLST), and pulsed-field gel electrophoresis (PFGE). Sequences of bont genes obtained in this study showed 99.9 to 100% identity with other bont/A2 or bont/A3 gene sequences available in public databases. The neurotoxin gene clusters of the subtype A2 and A3 strains analyzed in this study were similar in gene content. C. botulinum strains harboring bont/A2 and bont/A3 genes were divided into six and two MLST profiles, respectively. Four groups of strains shared a similarity of at least 95% by PFGE; the largest group included 21 out of 46 strains. The strains analyzed in this study showed relatively limited genetic diversity using either MLST or PFGE. PMID:23042179
Bioinformatics analysis and detection of gelatinase encoded gene in Lysinibacillussphaericus

NASA Astrophysics Data System (ADS)

Repin, Rul Aisyah Mat; Mutalib, Sahilah Abdul; Shahimi, Safiyyah; Khalid, Rozida Mohd.; Ayob, Mohd. Khan; Bakar, Mohd. Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, we performed bioinformatics analysis toward genome sequence of Lysinibacillussphaericus (L. sphaericus) to determine gene encoded for gelatinase. L. sphaericus was isolated from soil and gelatinase species-specific bacterium to porcine and bovine gelatin. This bacterium offers the possibility of enzymes production which is specific to both species of meat, respectively. The main focus of this research is to identify the gelatinase encoded gene within the bacteria of L. Sphaericus using bioinformatics analysis of partially sequence genome. From the research study, three candidate gene were identified which was, gelatinase candidate gene 1 (P1), NODE_71_length_93919_cov_158.931839_21 which containing 1563 base pair (bp) in size with 520 amino acids sequence; Secondly, gelatinase candidate gene 2 (P2), NODE_23_length_52851_cov_190.061386_17 which containing 1776 bp in size with 591 amino acids sequence; and Thirdly, gelatinase candidate gene 3 (P3), NODE_106_length_32943_cov_169.147919_8 containing 1701 bp in size with 566 amino acids sequence. Three pairs of oligonucleotide primers were designed and namely as, F1, R1, F2, R2, F3 and R3 were targeted short sequences of cDNA by PCR. The amplicons were reliably results in 1563 bp in size for candidate gene P1 and 1701 bp in size for candidate gene P3. Therefore, the results of bioinformatics analysis of L. Sphaericus resulting in gene encoded gelatinase were identified.
Partial sequencing of sodA gene and its application to identification of Streptococcus dysgalactiae subsp. dysgalactiae isolated from farmed fish.

PubMed

Nomoto, R; Kagawa, H; Yoshida, T

2008-01-01

To investigate the difference between Lancefield group C Streptococcus dysgalactiae (GCSD) strains isolated from diseased fish and animals by sequencing and phylogenetic analysis of the sodA gene. The sodA gene of Strep. dysgalactiae strains isolated from fish and animals were amplified and its nucleotide sequences were determined. Although 100% sequence identity was observed among fish GCSD strains, the determined sequences from animal isolates showed variations against fish isolate sequences. Thus, all fish GCSD strains were clearly separated from the GCSD strains of other origin by using phylogenetic tree analysis. In addition, the original primer set was designed based on the determined sequences for specifically amplify the sodA gene of fish GCSD strains. The primer set yield amplification products from only fish GCSD strains. By sequencing analysis of the sodA gene, the genetic divergence between Strep. dysgalactiae strains isolated from fish and mammals was demonstrated. Moreover, an original oligonucletide primer set, which could simply detect the genotype of fish GCSD strains was designed. This study shows that Strep. dysgalactiae isolated from diseased fish could be distinguished from conventional GCSD strains by the difference in the sequence of the sodA gene.
Uptake, Results, and Outcomes of Germline Multiple-Gene Sequencing After Diagnosis of Breast Cancer.

PubMed

Kurian, Allison W; Ward, Kevin C; Hamilton, Ann S; Deapen, Dennis M; Abrahamse, Paul; Bondarenko, Irina; Li, Yun; Hawley, Sarah T; Morrow, Monica; Jagsi, Reshma; Katz, Steven J

2018-05-10

Low-cost sequencing of multiple genes is increasingly available for cancer risk assessment. Little is known about uptake or outcomes of multiple-gene sequencing after breast cancer diagnosis in community practice. To examine the effect of multiple-gene sequencing on the experience and treatment outcomes for patients with breast cancer. For this population-based retrospective cohort study, patients with breast cancer diagnosed from January 2013 to December 2015 and accrued from SEER registries across Georgia and in Los Angeles, California, were surveyed (n = 5080, response rate = 70%). Responses were merged with SEER data and results of clinical genetic tests, either BRCA1 and BRCA2 (BRCA1/2) sequencing only or including additional other genes (multiple-gene sequencing), provided by 4 laboratories. Type of testing (multiple-gene sequencing vs BRCA1/2-only sequencing), test results (negative, variant of unknown significance, or pathogenic variant), patient experiences with testing (timing of testing, who discussed results), and treatment (strength of patient consideration of, and surgeon recommendation for, prophylactic mastectomy), and prophylactic mastectomy receipt. We defined a patient subgroup with higher pretest risk of carrying a pathogenic variant according to practice guidelines. Among 5026 patients (mean [SD] age, 59.9 [10.7]), 1316 (26.2%) were linked to genetic results from any laboratory. Multiple-gene sequencing increasingly replaced BRCA1/2-only testing over time: in 2013, the rate of multiple-gene sequencing was 25.6% and BRCA1/2-only testing, 74.4%;in 2015 the rate of multiple-gene sequencing was 66.5% and BRCA1/2-only testing, 33.5%. Multiple-gene sequencing was more often ordered by genetic counselors (multiple-gene sequencing, 25.5% and BRCA1/2-only testing, 15.3%) and delayed until after surgery (multiple-gene sequencing, 32.5% and BRCA1/2-only testing, 19.9%). Multiple-gene sequencing substantially increased rate of detection of any pathogenic variant (multiple-gene sequencing: higher-risk patients, 12%; average-risk patients, 4.2% and BRCA1/2-only testing: higher-risk patients, 7.8%; average-risk patients, 2.2%) and variants of uncertain significance, especially in minorities (multiple-gene sequencing: white patients, 23.7%; black patients, 44.5%; and Asian patients, 50.9% and BRCA1/2-only testing: white patients, 2.2%; black patients, 5.6%; and Asian patients, 0%). Multiple-gene sequencing was not associated with an increase in the rate of prophylactic mastectomy use, which was highest with pathogenic variants in BRCA1/2 (BRCA1/2, 79.0%; other pathogenic variant, 37.6%; variant of uncertain significance, 30.2%; negative, 35.3%). Multiple-gene sequencing rapidly replaced BRCA1/2-only testing for patients with breast cancer in the community and enabled 2-fold higher detection of clinically relevant pathogenic variants without an associated increase in prophylactic mastectomy. However, important targets for improvement in the clinical utility of multiple-gene sequencing include postsurgical delay and racial/ethnic disparity in variants of uncertain significance.
Identification of genes from pattern formation, tyrosine kinase, and potassium channel families by DNA amplification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamb, A.; Weir, M.; Rudy, B.

1989-06-01

The study of gene family members has been aided by the isolation of related genes on the basis of DNA homology. The authors have adapted the polymerase chain reaction to screen animal genomes very rapidly and reliably for likely gene family members. Using conserved amino acid sequences to design degenerate oligonucleotide primers, they have shown that the genome of the nematode Caenorhabditis elegans contains sequences homologous to many Drosophila genes involved in pattern formation, including the segment polarity gene wingless (vertebrate int-1), and homeobox sequences characteristic of the Antennapedia, engrailed, and paired families. In addition, they have used this methodmore » to show that C. elegans contains at least five different sequences homologous to genes in the tyrosine kinase family. Lastly, they have isolated six potassium channel sequences from humans, a result that validates the utility of the method with large genomes and suggests that human potassium channel gene diversity may be extensive.« less
Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites.

PubMed

Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

2012-10-01

To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi'an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi'an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%-99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites.
Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites*

PubMed Central

Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

2012-01-01

To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi’an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi’an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%–99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites. PMID:23024043

Complete mitochondrial genome sequence of the heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus).

PubMed

Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai

2016-05-01

In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Computational Tools and Algorithms for Designing Customized Synthetic Genes

PubMed Central

Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris

2014-01-01

Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations. PMID:25340050
Advanced colorectal adenoma related gene expression signature may predict prognostic for colorectal cancer patients with adenoma-carcinoma sequence.

PubMed

Li, Bing; Shi, Xiao-Yu; Liao, Dai-Xiang; Cao, Bang-Rong; Luo, Cheng-Hua; Cheng, Shu-Jun

2015-01-01

There are still no absolute parameters predicting progression of adenoma into cancer. The present study aimed to characterize functional differences on the multistep carcinogenetic process from the adenoma-carcinoma sequence. All samples were collected and mRNA expression profiling was performed by using Agilent Microarray high-throughput gene-chip technology. Then, the characteristics of mRNA expression profiles of adenoma-carcinoma sequence were described with bioinformatics software, and we analyzed the relationship between gene expression profiles of adenoma-adenocarcinoma sequence and clinical prognosis of colorectal cancer. The mRNA expressions of adenoma-carcinoma sequence were significantly different between high-grade intraepithelial neoplasia group and adenocarcinoma group. The biological process of gene ontology function enrichment analysis on differentially expressed genes between high-grade intraepithelial neoplasia group and adenocarcinoma group showed that genes enriched in the extracellular structure organization, skeletal system development, biological adhesion and itself regulated growth regulation, with the P value after FDR correction of less than 0.05. In addition, IPR-related protein mainly focused on the insulin-like growth factor binding proteins. The variable trends of gene expression profiles for adenoma-carcinoma sequence were mainly concentrated in high-grade intraepithelial neoplasia and adenocarcinoma. The differentially expressed genes are significantly correlated between high-grade intraepithelial neoplasia group and adenocarcinoma group. Bioinformatics analysis is an effective way to study the gene expression profiles in the adenoma-carcinoma sequence, and may provide an effective tool to involve colorectal cancer research strategy into colorectal adenoma or advanced adenoma.
Exome Sequence Analysis of 14 Families With High Myopia.

PubMed

Kloss, Bethany A; Tompson, Stuart W; Whisenhunt, Kristina N; Quow, Krystina L; Huang, Samuel J; Pavelec, Derek M; Rosenberg, Thomas; Young, Terri L

2017-04-01

To identify causal gene mutations in 14 families with autosomal dominant (AD) high myopia using exome sequencing. Select individuals from 14 large Caucasian families with high myopia were exome sequenced. Gene variants were filtered to identify potential pathogenic changes. Sanger sequencing was used to confirm variants in original DNA, and to test for disease cosegregation in additional family members. Candidate genes and chromosomal loci previously associated with myopic refractive error and its endophenotypes were comprehensively screened. In 14 high myopia families, we identified 73 rare and 31 novel gene variants as candidates for pathogenicity. In seven of these families, two of the novel and eight of the rare variants were within known myopia loci. A total of 104 heterozygous nonsynonymous rare variants in 104 genes were identified in 10 out of 14 probands. Each variant cosegregated with affection status. No rare variants were identified in genes known to cause myopia or in genes closest to published genome-wide association study association signals for refractive error or its endophenotypes. Whole exome sequencing was performed to determine gene variants implicated in the pathogenesis of AD high myopia. This study provides new genes for consideration in the pathogenesis of high myopia, and may aid in the development of genetic profiling of those at greatest risk for attendant ocular morbidities of this disorder.
Transcriptome Assembly, Gene Annotation and Tissue Gene Expression Atlas of the Rainbow Trout

PubMed Central

Salem, Mohamed; Paneru, Bam; Al-Tobasei, Rafet; Abdouni, Fatima; Thorgaard, Gary H.; Rexroad, Caird E.; Yao, Jianbo

2015-01-01

Efforts to obtain a comprehensive genome sequence for rainbow trout are ongoing and will be complemented by transcriptome information that will enhance genome assembly and annotation. Previously, transcriptome reference sequences were reported using data from different sources. Although the previous work added a great wealth of sequences, a complete and well-annotated transcriptome is still needed. In addition, gene expression in different tissues was not completely addressed in the previous studies. In this study, non-normalized cDNA libraries were sequenced from 13 different tissues of a single doubled haploid rainbow trout from the same source used for the rainbow trout genome sequence. A total of ~1.167 billion paired-end reads were de novo assembled using the Trinity RNA-Seq assembler yielding 474,524 contigs > 500 base-pairs. Of them, 287,593 had homologies to the NCBI non-redundant protein database. The longest contig of each cluster was selected as a reference, yielding 44,990 representative contigs. A total of 4,146 contigs (9.2%), including 710 full-length sequences, did not match any mRNA sequences in the current rainbow trout genome reference. Mapping reads to the reference genome identified an additional 11,843 transcripts not annotated in the genome. A digital gene expression atlas revealed 7,678 housekeeping and 4,021 tissue-specific genes. Expression of about 16,000–32,000 genes (35–71% of the identified genes) accounted for basic and specialized functions of each tissue. White muscle and stomach had the least complex transcriptomes, with high percentages of their total mRNA contributed by a small number of genes. Brain, testis and intestine, in contrast, had complex transcriptomes, with a large numbers of genes involved in their expression patterns. This study provides comprehensive de novo transcriptome information that is suitable for functional and comparative genomics studies in rainbow trout, including annotation of the genome. PMID:25793877
Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

PubMed Central

Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

2015-01-01

Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study

PubMed Central

Raethong, Nachon; Wong-ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

2016-01-01

Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H+-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction. PMID:27274991
Sequence- and Structure-Based Functional Annotation and Assessment of Metabolic Transporters in Aspergillus oryzae: A Representative Case Study.

PubMed

Raethong, Nachon; Wong-Ekkabut, Jirasak; Laoteng, Kobkul; Vongsangnak, Wanwipa

2016-01-01

Aspergillus oryzae is widely used for the industrial production of enzymes. In A. oryzae metabolism, transporters appear to play crucial roles in controlling the flux of molecules for energy generation, nutrients delivery, and waste elimination in the cell. While the A. oryzae genome sequence is available, transporter annotation remains limited and thus the connectivity of metabolic networks is incomplete. In this study, we developed a metabolic annotation strategy to understand the relationship between the sequence, structure, and function for annotation of A. oryzae metabolic transporters. Sequence-based analysis with manual curation showed that 58 genes of 12,096 total genes in the A. oryzae genome encoded metabolic transporters. Under consensus integrative databases, 55 unambiguous metabolic transporter genes were distributed into channels and pores (7 genes), electrochemical potential-driven transporters (33 genes), and primary active transporters (15 genes). To reveal the transporter functional role, a combination of homology modeling and molecular dynamics simulation was implemented to assess the relationship between sequence to structure and structure to function. As in the energy metabolism of A. oryzae, the H(+)-ATPase encoded by the AO090005000842 gene was selected as a representative case study of multilevel linkage annotation. Our developed strategy can be used for enhancing metabolic network reconstruction.
Partial Gene Sequencing of CYP1A, Vitellogenin, and Metallothionein in Mosquitofish Gambusia yucatana and Gambusia sexradiata.

PubMed

Vázquez-Euán, Roberto; Escalante-Herrera, Karla S; Rodríguez-Fuentes, Gabriela

2017-01-01

Ground characteristics in the Yucatan Peninsula make recovery and treatment of wastewater very expensive. This situation has contributed to an increase of pollutants in the aquifer. Unfortunately, studies related to the effects of those pollutants in native organisms are scarce. The aim of this work was to obtain partial sequences of widely known genes used as biomarkers of pollutant effect in Gambusia yucatana and Gambusia sexradiata. The studied genes were: cytochrome P450 1A (CYP1A); vitellogenin (VTG); metallothionein (MT), and two housekeeping genes, 18S and β-actin. From reported sequences of Gambusia affinis, primers were designed and amplification was done in the local Gambusia species exposed for 48 h to gasoline (100 µL/L, stirred for 24 h pre-exposure). Preliminary results revealed partial sequences of all genes with an approximate average length of 200 bp. BLAST analysis of found sequences indicated a minimum of 97% identity with reported sequences for G. affinis or Gambusia holbrooki showing great similarity.
The zebrafish reference genome sequence and its relationship to the human genome.

PubMed

Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L

2013-04-25

Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome

PubMed Central

Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.

2013-01-01

Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Suitability of partial 16S ribosomal RNA gene sequence analysis for the identification of dangerous bacterial pathogens.

PubMed

Ruppitsch, W; Stöger, A; Indra, A; Grif, K; Schabereiter-Gurtner, C; Hirschl, A; Allerberger, F

2007-03-01

In a bioterrorism event a rapid tool is needed to identify relevant dangerous bacteria. The aim of the study was to assess the usefulness of partial 16S rRNA gene sequence analysis and the suitability of diverse databases for identifying dangerous bacterial pathogens. For rapid identification purposes a 500-bp fragment of the 16S rRNA gene of 28 isolates comprising Bacillus anthracis, Brucella melitensis, Burkholderia mallei, Burkholderia pseudomallei, Francisella tularensis, Yersinia pestis, and eight genus-related and unrelated control strains was amplified and sequenced. The obtained sequence data were submitted to three public and two commercial sequence databases for species identification. The most frequent reason for incorrect identification was the lack of the respective 16S rRNA gene sequences in the database. Sequence analysis of a 500-bp 16S rDNA fragment allows the rapid identification of dangerous bacterial species. However, for discrimination of closely related species sequencing of the entire 16S rRNA gene, additional sequencing of the 23S rRNA gene or sequencing of the 16S-23S rRNA intergenic spacer is essential. This work provides comprehensive information on the suitability of partial 16S rDNA analysis and diverse databases for rapid and accurate identification of dangerous bacterial pathogens.
Gene Deletion in Barley Mediated by LTR-retrotransposon BARE

PubMed Central

Shang, Yi; Yang, Fei; Schulman, Alan H.; Zhu, Jinghuan; Jia, Yong; Wang, Junmei; Zhang, Xiao-Qi; Jia, Qiaojun; Hua, Wei; Yang, Jianming; Li, Chengdao

2017-01-01

A poly-row branched spike (prbs) barley mutant was obtained from soaking a two-rowed barley inflorescence in a solution of maize genomic DNA. Positional cloning and sequencing demonstrated that the prbs mutant resulted from a 28 kb deletion including the inflorescence architecture gene HvRA2. Sequence annotation revealed that the HvRA2 gene is flanked by two LTR (long terminal repeat) retrotransposons (BARE) sharing 89% sequence identity. A recombination between the integrase (IN) gene regions of the two BARE copies resulted in the formation of an intact BARE and loss of HvRA2. No maize DNA was detected in the recombination region although the flanking sequences of HvRA2 gene showed over 73% of sequence identity with repetitive sequences on 10 maize chromosomes. It is still unknown whether the interaction of retrotransposons between barley and maize has resulted in the recombination observed in the present study. PMID:28252053
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-06-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. Copyright © 2016 Khedkar and Seshasayee.
Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

PubMed Central

Khedkar, Supriya; Seshasayee, Aswin Sai Narain

2016-01-01

Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194
A de novo transcriptome and valid reference genes for quantitative real-time PCR in Colaphellus bowringi.

PubMed

Tan, Qian-Qian; Zhu, Li; Li, Yi; Liu, Wen; Ma, Wei-Hua; Lei, Chao-Liang; Wang, Xiao-Ping

2015-01-01

The cabbage beetle Colaphellus bowringi Baly is a serious insect pest of crucifers and undergoes reproductive diapause in soil. An understanding of the molecular mechanisms of diapause regulation, insecticide resistance, and other physiological processes is helpful for developing new management strategies for this beetle. However, the lack of genomic information and valid reference genes limits knowledge on the molecular bases of these physiological processes in this species. Using Illumina sequencing, we obtained more than 57 million sequence reads derived from C. bowringi, which were assembled into 39,390 unique sequences. A Clusters of Orthologous Groups classification was obtained for 9,048 of these sequences, covering 25 categories, and 16,951 were assigned to 255 Kyoto Encyclopedia of Genes and Genomes pathways. Eleven candidate reference gene sequences from the transcriptome were then identified through reverse transcriptase polymerase chain reaction. Among these candidate genes, EF1α, ACT1, and RPL19 proved to be the most stable reference genes for different reverse transcriptase quantitative polymerase chain reaction experiments in C. bowringi. Conversely, aTUB and GAPDH were the least stable reference genes. The abundant putative C. bowringi transcript sequences reported enrich the genomic resources of this beetle. Importantly, the larger number of gene sequences and valid reference genes provide a valuable platform for future gene expression studies, especially with regard to exploring the molecular mechanisms of different physiological processes in this species.
The complete CDS of the prion protein (PRNP) gene of African lion (Panthera leo).

PubMed

Maj, Andrzej; Spellman, Garth M; Sarver, Shane K

2008-04-01

We provide the complete PRNP CDS sequence for the African lion, which is different from the previously published sequence and more similar to other carnivore sequences. The newly obtained prion protein sequence differs from the domestic cat sequence at three amino acid positions and contains only four octapeptide repeats. We recommend that this sequence be used as the reference sequence for future studies of the PRNP gene for this species.
IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing

PubMed Central

Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai

2017-01-01

Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. PMID:27899656
Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing

PubMed Central

Weirather, Jason L.; Afshar, Pegah Tootoonchi; Clark, Tyson A.; Tseng, Elizabeth; Powers, Linda S.; Underwood, Jason G.; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

2015-01-01

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. PMID:26040699
Maturity onset diabetes of youth (MODY) in Turkish children: sequence analysis of 11 causative genes by next generation sequencing.

PubMed

Ağladıoğlu, Sebahat Yılmaz; Aycan, Zehra; Çetinkaya, Semra; Baş, Veysel Nijat; Önder, Aşan; Peltek Kendirci, Havva Nur; Doğan, Haldun; Ceylaner, Serdar

2016-04-01

Maturity-onset diabetes of the youth (MODY), is a genetically and clinically heterogeneous group of diseasesand is often misdiagnosed as type 1 or type 2 diabetes. The aim of this study is to investigate both novel and proven mutations of 11 MODY genes in Turkish children by using targeted next generation sequencing. A panel of 11 MODY genes were screened in 43 children with MODY diagnosed by clinical criterias. Studies of index cases was done with MISEQ-ILLUMINA, and family screenings and confirmation studies of mutations was done by Sanger sequencing. We identified 28 (65%) point mutations among 43 patients. Eighteen patients have GCK mutations, four have HNF1A, one has HNF4A, one has HNF1B, two have NEUROD1, one has PDX1 gene variations and one patient has both HNF1A and HNF4A heterozygote mutations. This is the first study including molecular studies of 11 MODY genes in Turkish children. GCK is the most frequent type of MODY in our study population. Very high frequency of novel mutations (42%) in our study population, supports that in heterogenous disorders like MODY sequence analysis provides rapid, cost effective and accurate genetic diagnosis.

Dynamics of actin evolution in dinoflagellates.

PubMed

Kim, Sunju; Bachvaroff, Tsvetan R; Handy, Sara M; Delwiche, Charles F

2011-04-01

Dinoflagellates have unique nuclei and intriguing genome characteristics with very high DNA content making complete genome sequencing difficult. In dinoflagellates, many genes are found in multicopy gene families, but the processes involved in the establishment and maintenance of these gene families are poorly understood. Understanding the dynamics of gene family evolution in dinoflagellates requires comparisons at different evolutionary scales. Studies of closely related species provide fine-scale information relative to species divergence, whereas comparisons of more distantly related species provides broad context. We selected the actin gene family as a highly expressed conserved gene previously studied in dinoflagellates. Of the 142 sequences determined in this study, 103 were from the two closely related species, Dinophysis acuminata and D. caudata, including full length and partial cDNA sequences as well as partial genomic amplicons. For these two Dinophysis species, at least three types of sequences could be identified. Most copies (79%) were relatively similar and in nucleotide trees, the sequences formed two bushy clades corresponding to the two species. In comparisons within species, only eight to ten nucleotide differences were found between these copies. The two remaining types formed clades containing sequences from both species. One type included the most similar sequences in between-species comparisons with as few as 12 nucleotide differences between species. The second type included the most divergent sequences in comparisons between and within species with up to 93 nucleotide differences between sequences. In all the sequences, most variation occurred in synonymous sites or the 5' UnTranslated Region (UTR), although there was still limited amino acid variation between most sequences. Several potential pseudogenes were found (approximately 10% of all sequences depending on species) with incomplete open reading frames due to frameshifts or early stop codons. Overall, variation in the actin gene family fits best with the "birth and death" model of evolution based on recent duplications, pseudogenes, and incomplete lineage sorting. Divergence between species was similar to variation within species, so that actin may be too conserved to be useful for phylogenetic estimation of closely related species.
Application of industrial scale genomics to discovery of therapeutic targets in heart failure.

PubMed

Mehraban, F; Tomlinson, J E

2001-12-01

In recent years intense activity in both academic and industrial sectors has provided a wealth of information on the human genome with an associated impressive increase in the number of novel gene sequences deposited in sequence data repositories and patent applications. This genomic industrial revolution has transformed the way in which drug target discovery is now approached. In this article we discuss how various differential gene expression (DGE) technologies are being utilized for cardiovascular disease (CVD) drug target discovery. Other approaches such as sequencing cDNA from cardiovascular derived tissues and cells coupled with bioinformatic sequence analysis are used with the aim of identifying novel gene sequences that may be exploited towards target discovery. Additional leverage from gene sequence information is obtained through identification of polymorphisms that may confer disease susceptibility and/or affect drug responsiveness. Pharmacogenomic studies are described wherein gene expression-based techniques are used to evaluate drug response and/or efficacy. Industrial-scale genomics supports and addresses not only novel target gene discovery but also the burgeoning issues in pharmaceutical and clinical cardiovascular medicine relative to polymorphic gene responses.
Mumps virus F gene and HN gene sequencing as a molecular tool to study mumps virus transmission.

PubMed

Gouma, Sigrid; Cremer, Jeroen; Parkkali, Saara; Veldhuijzen, Irene; van Binnendijk, Rob S; Koopmans, Marion P G

2016-11-01

Various mumps outbreaks have occurred in the Netherlands since 2004, particularly among persons who had received 2 doses of measles, mumps, and rubella (MMR) vaccination. Genomic typing of pathogens can be used to track outbreaks, but the established genotyping of mumps virus based on the small hydrophobic (SH) gene sequences did not provide sufficient resolution. Therefore, we expanded the sequencing to include fusion (F) gene and haemagglutinin-neuraminidase (HN) gene sequences in addition to the SH gene sequences from 109 mumps virus genotype G strains obtained between 2004 and mid 2015 in the Netherlands. When the molecular information from these 3 genes was combined, we were able to identify separate mumps virus clusters and track mumps virus transmission. The analyses suggested that multiple mumps virus introductions occurred in the Netherlands between 2004 and 2015 resulting in several mumps outbreaks throughout this period, whereas during some local outbreaks the molecular data pointed towards endemic circulation. Combined analysis of epidemiological data and sequence data collected in 2015 showed good support for the phylogenetic clustering. Copyright Â© 2016 Elsevier B.V. All rights reserved.
[Sequencing technology in gene diagnosis and its application].

PubMed

Yibin, Guo

2014-11-01

The study of gene mutation is one of the hot topics in the field of life science nowadays, and the related detection methods and diagnostic technology have been developed rapidly. Sequencing technology plays an indispensable role in the definite diagnosis and classification of genetic diseases. In this review, we summarize the research progress in sequencing technology, evaluate the advantages and disadvantages of 1(st) ~3(rd) generation of sequencing technology, and describe its application in gene diagnosis. Also we made forecasts and prospects on its development trend.
Study of cnidarian-algal symbiosis in the "omics" age.

PubMed

Meyer, Eli; Weis, Virginia M

2012-08-01

The symbiotic associations between cnidarians and dinoflagellate algae (Symbiodinium) support productive and diverse ecosystems in coral reefs. Many aspects of this association, including the mechanistic basis of host-symbiont recognition and metabolic interaction, remain poorly understood. The first completed genome sequence for a symbiotic anthozoan is now available (the coral Acropora digitifera), and extensive expressed sequence tag resources are available for a variety of other symbiotic corals and anemones. These resources make it possible to profile gene expression, protein abundance, and protein localization associated with the symbiotic state. Here we review the history of "omics" studies of cnidarian-algal symbiosis and the current availability of sequence resources for corals and anemones, identifying genes putatively involved in symbiosis across 10 anthozoan species. The public availability of candidate symbiosis-associated genes leaves the field of cnidarian-algal symbiosis poised for in-depth comparative studies of sequence diversity and gene expression and for targeted functional studies of genes associated with symbiosis. Reviewing the progress to date suggests directions for future investigations of cnidarian-algal symbiosis that include (i) sequencing of Symbiodinium, (ii) proteomic analysis of the symbiosome membrane complex, (iii) glycomic analysis of Symbiodinium cell surfaces, and (iv) expression profiling of the gastrodermal cells hosting Symbiodinium.
Study on the Evolution of Genes Mutation Related With Gastrointestinal Stromal Tumors

ClinicalTrials.gov

2012-01-05

Full Gene Sequences of c-KIT、PDGFRA and DOG1 Are Analyzed With the Screening-sequencing Approach; Investigate the Characteristics and Variations Associated With the Different Gene Mutations of c-KIT、PDGFRA and DOG1 in GIST Patients
The complete mitochondrial genome sequence of the maned wolf (Chrysocyon brachyurus).

PubMed

Zhao, Chao; Yang, Xiufeng; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai

2016-01-01

In this study, the complete mitochondrial genome of the maned wolf (Chrysocyon brachyurus), the unique species in Chrysocyon, was sequenced and reported for the first time using blood samples obtained from a female individual in Shanghai Zoo, China. Sequence analysis showed that the genome structure was in accordance with other Canidae species and it contained 12 S rRNA gene, 16 S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region.
Phylogenetic Analysis of Ruminant Theileria spp. from China Based on 28S Ribosomal RNA Gene

PubMed Central

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze

2013-01-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode. PMID:24327775
Phylogenetic analysis of ruminant Theileria spp. from China based on 28S ribosomal RNA gene.

PubMed

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze; Yin, Hong; Luo, Jianxun

2013-10-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode.
Using variable rate models to identify genes under selection in sequence pairs: their validity and limitations for EST sequences.

PubMed

Church, Sheri A; Livingstone, Kevin; Lai, Zhao; Kozik, Alexander; Knapp, Steven J; Michelmore, Richard W; Rieseberg, Loren H

2007-02-01

Using likelihood-based variable selection models, we determined if positive selection was acting on 523 EST sequence pairs from two lineages of sunflower and lettuce. Variable rate models are generally not used for comparisons of sequence pairs due to the limited information and the inaccuracy of estimates of specific substitution rates. However, previous studies have shown that the likelihood ratio test (LRT) is reliable for detecting positive selection, even with low numbers of sequences. These analyses identified 56 genes that show a signature of selection, of which 75% were not identified by simpler models that average selection across codons. Subsequent mapping studies in sunflower show four of five of the positively selected genes identified by these methods mapped to domestication QTLs. We discuss the validity and limitations of using variable rate models for comparisons of sequence pairs, as well as the limitations of using ESTs for identification of positively selected genes.
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

PubMed

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-03-01

Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

PubMed Central

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-01-01

Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199
Mutational Analysis of Extranodal NK/T-Cell Lymphoma Using Targeted Sequencing with a Comprehensive Cancer Panel.

PubMed

Choi, Seungkyu; Go, Jai Hyang; Kim, Eun Kyung; Lee, Hojung; Lee, Won Mi; Cho, Chun-Sung; Han, Kyudong

2016-09-01

Extranodal natural killer (NK)/T-cell lymphoma, nasal type (NKTCL), is a malignant disorder of cytotoxic lymphocytes of NK or T cells. It is an aggressive neoplasm with a very poor prognosis. Although extranodal NKTCL reportedly has a strong association with Epstein-Barr virus, the molecular pathogenesis of NKTCL has been unexplored. The recent technological advancements in next-generation sequencing (NGS) have made DNA sequencing cost- and time-effective, with more reliable results. Using the Ion Proton Comprehensive Cancer Panel, we sequenced 409 cancer-related genes to identify somatic mutations in five NKTCL tissue samples. The sequencing analysis detected 25 mutations in 21 genes. Among them, KMT2D , a histone modification-related gene, was the most frequently mutated gene (four of the five cases). This result was consistent with recent NGS studies that have suggested KMT2D as a novel driver gene in NKTCL. Mutations were also found in ARID1A , a chromatin remodeling gene, and TP53 , which also recurred in recent NGS studies. We also found mutations in 18 novel candidate genes, with molecular functions that were potentially implicated in cancer development. We suggest that these genes may result in multiple oncogenic events and may be used as potential bio-markers of NKTCL in the future.
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity

NASA Astrophysics Data System (ADS)

Mukherjee, Shashi Bajaj; Sen, Pradip Kumar

2010-10-01

Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Transcriptome sequencing and de novo analysis of the copepod Calanus sinicus using 454 GS FLX.

PubMed

Ning, Juan; Wang, Minxiao; Li, Chaolun; Sun, Song

2013-01-01

Despite their species abundance and primary economic importance, genomic information about copepods is still limited. In particular, genomic resources are lacking for the copepod Calanus sinicus, which is a dominant species in the coastal waters of East Asia. In this study, we performed de novo transcriptome sequencing to produce a large number of expressed sequence tags for the copepod C. sinicus. Copepodid larvae and adults were used as the basic material for transcriptome sequencing. Using 454 pyrosequencing, a total of 1,470,799 reads were obtained, which were assembled into 56,809 high quality expressed sequence tags. Based on their sequence similarity to known proteins, about 14,000 different genes were identified, including members of all major conserved signaling pathways. Transcripts that were putatively involved with growth, lipid metabolism, molting, and diapause were also identified among these genes. Differentially expressed genes related to several processes were found in C. sinicus copepodid larvae and adults. We detected 284,154 single nucleotide polymorphisms (SNPs) that provide a resource for gene function studies. Our data provide the most comprehensive transcriptome resource available for C. sinicus. This resource allowed us to identify genes associated with primary physiological processes and SNPs in coding regions, which facilitated the quantitative analysis of differential gene expression. These data should provide foundation for future genetic and genomic studies of this and related species.
Allexiviruses may have acquired inserted sequences between the CP and CRP genes to change the translation reinitiation strategy of CRP.

PubMed

Yoshida, Naoto; Shimura, Hanako; Masuta, Chikara

2018-06-01

Allexiviruses are economically important garlic viruses that are involved in garlic mosaic diseases. In this study, we characterized the allexivirus cysteine-rich protein (CRP) gene located just downstream of the coat protein (CP) gene in the viral genome. We determined the nucleotide sequences of the CP and CRP genes from numerous allexivirus isolates and performed a phylogenetic analysis. According to the resulting phylogenetic tree, we found that allexiviruses were clearly divided into two major groups (group I and group II) based on the sequences of the CP and CRP genes. In addition, the allexiviruses in group II had distinct sequences just before the CRP gene, while group I isolates did not. The inserted sequence between the CP and CRP genes was partially complementary to garlic 18S rRNA. Using a potato virus X vector, we showed that the CRPs affected viral accumulation and symptom induction in Nicotiana benthamiana, suggesting that the allexivirus CRP is a pathogenicity determinant. We assume that the inserted sequences before the CRP gene may have been generated during viral evolution to alter the termination-reinitiation mechanism for coupled translation of CP and CRP.
The utility of DNA sequences of an intron from the beta-fibrinogen gene in phylogenetic analysis of woodpeckers (Aves: Picidae).

PubMed

Prychitko, T M; Moore, W S

1997-10-01

Estimating phylogenies from DNA sequence data has become the major methodology of molecular phylogenetics. To date, molecular phylogenetics of the vertebrates has been very dependent on mtDNA, but studies involving mtDNA are limited because the several genes comprising the mt-genome are inherited as a single linkage group. The only apparent solution to this problem is to sequence additional genes, each representing a distinct linkage group, so that the resultant gene trees provide independent estimates of the species tree. There exists the need to find novel gene sequences which contain enough phylogenetic information to resolve relationships between closely related species. A possible source is the nuclear-encoded introns, because they evolve more rapidly than exons. We designed primers to amplify and sequence the 7 intron from the beta-fibrinogen gene for a recently evolved group, the woodpeckers. We sequenced the entire intron for 10 specimens representing five species. Nucleotide substitutions are randomly distributed along the length of the intron, suggesting selective neutrality. A preliminary analysis indicates that the phylogenetic signal in the intron is as strong as that in the mitochondrial encoded cytochrome b (cyt b) gene. The topology of the beta-fibrinogen tree is identical to that of the cyt b tree. This analysis demonstrates the ability of the 7 intron of beta-fibrinogen to provide well resolved, independent gene trees for recently evolved groups and establishes it as a source of sequences to be used in other phylogenetic studies. Copyright 1997 Academic Press
SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

NASA Astrophysics Data System (ADS)

Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

2015-09-01

The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.
pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

PubMed

Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

2013-08-01

With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.
Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870

Nonencapsulated or nontypeable Haemophilus influenzae are more likely than their encapsulated or serotypeable counterparts to have mutations in their fucose operon.

PubMed

Shuel, Michelle L; Karlowsky, Kathleen E; Law, Dennis K S; Tsang, Raymond S W

2011-12-01

Population biology of Haemophilus influenzae can be studied by multilocus sequence typing (MLST), and isolates are assigned sequence types (STs) based on nucleotide sequence variations in seven housekeeping genes, including fucK. However, the ST cannot be assigned if one of the housekeeping genes is absent or cannot be detected by the current protocol. Occasionally, strains of H. influenzae have been reported to lack the fucK gene. In this study, we examined the prevalence of this mutation among our collection of H. influenzae isolates. Of the 704 isolates studied, including 282 encapsulated and 422 nonencapsulated isolates, nine were not typeable by MLST owing to failure to detect the fucK gene. All nine fucK-negative isolates were nonencapsulated and belonged to various biotypes. DNA sequencing of the fucose operon region confirmed complete deletion of genes in the operon in seven of the nine isolates, while in the remaining two isolates, some of the genes were found intact or in parts. The significance of these findings is discussed.
Leuconostoc pseudomesenteroides WCFur3 partial 16S rRNA gene

USDA-ARS?s Scientific Manuscript database

This study used a partial 535 base pair 16S rRNA gene sequence to identify a bacterial isolate. Fatty acid profiles are consistent with the 16S rRNA gene sequence identification of this bacterium. The isolate was obtained from a compost bin in Fort Collins, Colorado, USA. The 16S rRNA gene sequen...
Mitochondrial gene sequences alone or combined with ITS region sequences provide firm molecular criteria for the classification of Lecanicillium species.

PubMed

Kouvelis, Vassili N; Sialakouma, Aphrodite; Typas, Milton A

2008-07-01

The recent revision of Verticillium sect. Prostrata led to the introduction of the genus Lecanicillium, which comprises the majority of the entomopathogenic strains. Sixty-five strains previously classified as Verticillium lecanii or Verticillium sp. from different geographical regions and hosts were examined and their phylogenetic relationships were determined using sequences from three mitochondrial (mt) genes [the small rRNA subunit (rns), the NADH dehydrogenase subunits 1 (nad1) and 3 (nad3)] and the ITS region. In general, single gene phylogenetic trees differentiated and placed the strains examined in well-supported (by BS analysis) groups of L. lecanii, L. longisporum, L. muscarium, and L. nodulosum, although in some cases a few uncertainties still remained. nad1 was the most informative single gene in phylogenetic analyses and was also found to contain group I introns with putative open reading frames (ORFs) encoding for GIY-YIG endonucleases. The combined use of mt gene sequences resolved taxonomic uncertainties arisen from ITS analysis and, alone or in combination with ITS sequences, helped in placing uncharacterised Verticillium lecanii and Verticillium sp. firmly into Lecanicillium species. Combined gene data from all the mt genes and all the mt genes and the ITS region together, were very similar. Furthermore, a relaxed correlation with host specificity -- at least for Homoptera -- was indicated for the rns and the combined mt gene sequences. Thus, the usefulness of mt gene sequences as a convenient molecular tool in phylogenetic studies of entomopathogenic fungi was demonstrated.
Harnessing Whole Genome Sequencing in Medical Mycology.

PubMed

Cuomo, Christina A

2017-01-01

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
Molecular basis of length polymorphism in the human zeta-globin gene complex.

PubMed Central

Goodbourn, S E; Higgs, D R; Clegg, J B; Weatherall, D J

1983-01-01

The length polymorphism between the human zeta-globin gene and its pseudogene is caused by an allele-specific variation in the copy number of a tandemly repeating 36-base-pair sequence. This sequence is related to a tandemly repeated 14-base-pair sequence in the 5' flanking region of the human insulin gene, which is known to cause length polymorphism, and to a repetitive sequence in intervening sequence (IVS) 1 of the pseudo-zeta-globin gene. Evidence is presented that the latter is also of variable length, probably because of differences in the copy number of the tandem repeat. The homology between the three length polymorphisms may be an indication of the presence of a more widespread group of related sequences in the human genome, which might be useful for generalized linkage studies. PMID:6308667
The sequence and organization of complete mitochondrial genome of the yellowfin tuna, Thunnus albacares (Bonnaterre, 1788).

PubMed

Pang, Jiaohui; Cheng, Qiqun; Sun, Dandan; Zhang, Heng; Jin, Shaofei

2016-09-01

Yellowfin tuna (Thunnus albacares) is one of the most important economic fishes around the world. In the present study, we determined the complete mitochondrial DNA sequence and organization of T. albacares. The entire mitochondrial genome is a circular-molecule of 16,528 bp in length, which encodes 37 genes in all. These genes comprise 13 protein-coding genes (ATP6 and 8, COI-III, Cytb, ND1-6 and 4 L), 22 transfer RNA genes (tRNAs), and 2 ribosomal RNA genes (12S and 16S rRNAs). The complete mitochondrial genome sequence of T. albacares can provide basic information for the studies on molecular taxonomy and conservation genetics of teleost fishes.
Polymorphism and selection in the major histocompatibility complex DRA and DQA genes in the family Equidae.

PubMed

Janova, Eva; Matiasovic, Jan; Vahala, Jiri; Vodicka, Roman; Van Dyk, Enette; Horin, Petr

2009-07-01

The major histocompatibility complex genes coding for antigen binding and presenting molecules are the most polymorphic genes in the vertebrate genome. We studied the DRA and DQA gene polymorphism of the family Equidae. In addition to 11 previously reported DRA and 24 DQA alleles, six new DRA sequences and 13 new DQA alleles were identified in the genus Equus. Phylogenetic analysis of both DRA and DQA sequences provided evidence for trans-species polymorphism in the family Equidae. The phylogenetic trees differed from species relationships defined by standard taxonomy of Equidae and from trees based on mitochondrial or neutral gene sequence data. Analysis of selection showed differences between the less variable DRA and more variable DQA genes. DRA alleles were more often shared by more species. The DQA sequences analysed showed strong amongst-species positive selection; the selected amino acid positions mostly corresponded to selected positions in rodent and human DQA genes.
Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

PubMed Central

Parker, Jennifer K.; Havird, Justin C.

2012-01-01

Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops. PMID:22194287
Differentiation of Xylella fastidiosa strains via multilocus sequence analysis of environmentally mediated genes (MLSA-E).

PubMed

Parker, Jennifer K; Havird, Justin C; De La Fuente, Leonardo

2012-03-01

Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing improved disease management in economically important crops.
The LAM-PCR Method to Sequence LV Integration Sites.

PubMed

Wang, Wei; Bartholomae, Cynthia C; Gabriel, Richard; Deichmann, Annette; Schmidt, Manfred

2016-01-01

Integrating viral gene transfer vectors are commonly used gene delivery tools in clinical gene therapy trials providing stable integration and continuous gene expression of the transgene in the treated host cell. However, integration of the reverse-transcribed vector DNA into the host genome is a potentially mutagenic event that may directly contribute to unwanted side effects. A comprehensive and accurate analysis of the integration site (IS) repertoire is indispensable to study clonality in transduced cells obtained from patients undergoing gene therapy and to identify potential in vivo selection of affected cell clones. To date, next-generation sequencing (NGS) of vector-genome junctions allows sophisticated studies on the integration repertoire in vitro and in vivo. We have explored the use of the Illumina MiSeq Personal Sequencer platform to sequence vector ISs amplified by non-restrictive linear amplification-mediated PCR (nrLAM-PCR) and LAM-PCR. MiSeq-based high-quality IS sequence retrieval is accomplished by the introduction of a double-barcode strategy that substantially minimizes the frequency of IS sequence collisions compared to the conventionally used single-barcode protocol. Here, we present an updated protocol of (nr)LAM-PCR for the analysis of lentiviral IS using a double-barcode system and followed by deep sequencing using the MiSeq device.
Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

NASA Astrophysics Data System (ADS)

Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

2011-10-01

The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.
Exome Sequencing of 18 Chinese Families with Congenital Cataracts: A New Sight of the NHS Gene

PubMed Central

Sun, Wenmin; Xiao, Xueshan; Li, Shiqiang; Guo, Xiangming; Zhang, Qingjiong

2014-01-01

Purpose The aim of this study was to investigate the mutation spectrum and frequency of 34 known genes in 18 Chinese families with congenital cataracts. Methods Genomic DNA and clinical data was collected from 18 families with congenital cataracts. Variations in 34 cataract-associated genes were screened by whole exome sequencing and then validated by Sanger sequencing. Results Eleven candidate variants in seven of the 34 genes were detected by exome sequencing and then confirmed by Sanger sequencing, including two variants predicted to be benign and the other pathogenic mutations. The nine mutations were present in 9 of the 18 (50%) families with congenital cataracts. Of the four families with mutations in the X-linked NHS gene, no other abnormalities were recorded except for cataract, in which a pseudo-dominant inheritance form was suggested, as female carriers also had different forms of cataracts. Conclusion This study expands the mutation spectrum and frequency of genes responsible for congenital cataract. Mutation in NHS is a common cause of nonsyndromic congenital cataract with pseudo-autosomal dominant inheritance. Combined with our previous studies, a genetic basis could be identified in 67.6% of families with congenital cataracts in our case series, in which mutations in genes encoding crystallins, genes encoding connexins, and NHS are responsible for 29.4%, 14.7%, and 11.8% of families, respectively. Our results suggest that mutations in NHS are the common cause of congenital cataract, both syndromic and nonsyndromic. PMID:24968223
Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Onda, M.; Kudo, S.; Fukuda, M.

Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less
Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II

PubMed Central

Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter

2017-01-01

The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Long-PCR based next generation sequencing of the whole mitochondrial genome of the peacock skate Pavoraja nitida (Elasmobranchii: Arhynchobatidae).

PubMed

Yang, Lei; Naylor, Gavin J P

2016-01-01

We determined the complete mitochondrial genome sequence (16,760 bp) of the peacock skate Pavoraja nitida using a long-PCR based next generation sequencing method. It has 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region in the typical vertebrate arrangement. Primers, protocols, and procedures used to obtain this mitogenome are provided. We anticipate that this approach will facilitate rapid collection of mitogenome sequences for studies on phylogenetic relationships, population genetics, and conservation of cartilaginous fishes.
De Novo Assembly and Characterization of Four Anthozoan (Phylum Cnidaria) Transcriptomes.

PubMed

Kitchen, Sheila A; Crowder, Camerron M; Poole, Angela Z; Weis, Virginia M; Meyer, Eli

2015-09-17

Many nonmodel species exemplify important biological questions but lack the sequence resources required to study the genes and genomic regions underlying traits of interest. Reef-building corals are famously sensitive to rising seawater temperatures, motivating ongoing research into their stress responses and long-term prospects in a changing climate. A comprehensive understanding of these processes will require extending beyond the sequenced coral genome (Acropora digitifera) to encompass diverse coral species and related anthozoans. Toward that end, we have assembled and annotated reference transcriptomes to develop catalogs of gene sequences for three scleractinian corals (Fungia scutaria, Montastraea cavernosa, Seriatopora hystrix) and a temperate anemone (Anthopleura elegantissima). High-throughput sequencing of cDNA libraries produced ~20-30 million reads per sample, and de novo assembly of these reads produced ~75,000-110,000 transcripts from each sample with size distributions (mean ~1.4 kb, N50 ~2 kb), comparable to the distribution of gene models from the coral genome (mean ~1.7 kb, N50 ~2.2 kb). Each assembly includes matches for more than half the gene models from A. digitifera (54-67%) and many reasonably complete transcripts (~5300-6700) spanning nearly the entire gene (ortholog hit ratios ≥0.75). The catalogs of gene sequences developed in this study made it possible to identify hundreds to thousands of orthologs across diverse scleractinian species and related taxa. We used these sequences for phylogenetic inference, recovering known relationships and demonstrating superior performance over phylogenetic trees constructed using single mitochondrial loci. The resources developed in this study provide gene sequences and genetic markers for several anthozoan species. To enhance the utility of these resources for the research community, we developed searchable databases enabling researchers to rapidly recover sequences for genes of interest. Our analysis of de novo assembly quality highlights metrics that we expect will be useful for evaluating the relative quality of other de novo transcriptome assemblies. The identification of orthologous sequences and phylogenetic reconstruction demonstrates the feasibility of these methods for clarifying the substantial uncertainties in the existing scleractinian phylogeny. Copyright © 2015 Kitchen et al.
Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca).

PubMed

Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun

2010-01-01

A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep.

PubMed

Hecht, Jochen; Kuhl, Heiner; Haas, Stefan A; Bauer, Sebastian; Poustka, Albert J; Lienau, Jasmin; Schell, Hanna; Stiege, Asita C; Seitz, Volkhard; Reinhardt, Richard; Duda, Georg N; Mundlos, Stefan; Robinson, Peter N

2006-07-05

The sheep is an important model animal for testing novel fracture treatments and other medical applications. Despite these medical uses and the well known economic and cultural importance of the sheep, relatively little research has been performed into sheep genetics, and DNA sequences are available for only a small number of sheep genes. In this work we have sequenced over 47 thousand expressed sequence tags (ESTs) from libraries developed from healing bone in a sheep model of fracture healing. These ESTs were clustered with the previously available 10 thousand sheep ESTs to a total of 19087 contigs with an average length of 603 nucleotides. We used the newly identified sequences to develop RT-PCR assays for 78 sheep genes and measured differential expression during the course of fracture healing between days 7 and 42 postfracture. All genes showed significant shifts at one or more time points. 23 of the genes were differentially expressed between postfracture days 7 and 10, which could reflect an important role for these genes for the initiation of osteogenesis. The sequences we have identified in this work are a valuable resource for future studies on musculoskeletal healing and regeneration using sheep and represent an important head-start for genomic sequencing projects for Ovis aries, with partial or complete sequences being made available for over 5,800 previously unsequenced sheep genes.
Targeted Re-Sequencing Emulsion PCR Panel for Myopathies: Results in 94 Cases.

PubMed

Punetha, Jaya; Kesari, Akanchha; Uapinyoying, Prech; Giri, Mamta; Clarke, Nigel F; Waddell, Leigh B; North, Kathryn N; Ghaoui, Roula; O'Grady, Gina L; Oates, Emily C; Sandaradura, Sarah A; Bönnemann, Carsten G; Donkervoort, Sandra; Plotz, Paul H; Smith, Edward C; Tesi-Rocha, Carolina; Bertorini, Tulio E; Tarnopolsky, Mark A; Reitter, Bernd; Hausmanowa-Petrusewicz, Irena; Hoffman, Eric P

2016-05-27

Molecular diagnostics in the genetic myopathies often requires testing of the largest and most complex transcript units in the human genome (DMD, TTN, NEB). Iteratively targeting single genes for sequencing has traditionally entailed high costs and long turnaround times. Exome sequencing has begun to supplant single targeted genes, but there are concerns regarding coverage and needed depth of the very large and complex genes that frequently cause myopathies. To evaluate efficiency of next-generation sequencing technologies to provide molecular diagnostics for patients with previously undiagnosed myopathies. We tested a targeted re-sequencing approach, using a 45 gene emulsion PCR myopathy panel, with subsequent sequencing on the Illumina platform in 94 undiagnosed patients. We compared the targeted re-sequencing approach to exome sequencing for 10 of these patients studied. We detected likely pathogenic mutations in 33 out of 94 patients with a molecular diagnostic rate of approximately 35%. The remaining patients showed variants of unknown significance (35/94 patients) or no mutations detected in the 45 genes tested (26/94 patients). Mutation detection rates for targeted re-sequencing vs. whole exome were similar in both methods; however exome sequencing showed better distribution of reads and fewer exon dropouts. Given that costs of highly parallel re-sequencing and whole exome sequencing are similar, and that exome sequencing now takes considerably less laboratory processing time than targeted re-sequencing, we recommend exome sequencing as the standard approach for molecular diagnostics of myopathies.
New FeFe-hydrogenase genes identified in a metagenomic fosmid library from a municipal wastewater treatment plant as revealed by high-throughput sequencing.

PubMed

Tomazetto, Geizecler; Wibberg, Daniel; Schlüter, Andreas; Oliveira, Valéria M

2015-01-01

A fosmid metagenomic library was constructed with total community DNA obtained from a municipal wastewater treatment plant (MWWTP), with the aim of identifying new FeFe-hydrogenase genes encoding the enzymes most important for hydrogen metabolism. The dataset generated by pyrosequencing of a fosmid library was mined to identify environmental gene tags (EGTs) assigned to FeFe-hydrogenase. The majority of EGTs representing FeFe-hydrogenase genes were affiliated with the class Clostridia, suggesting that this group is the main hydrogen producer in the MWWTP analyzed. Based on assembled sequences, three FeFe-hydrogenase genes were predicted based on detection of the L2 motif (MPCxxKxxE) in the encoded gene product, confirming true FeFe-hydrogenase sequences. These sequences were used to design specific primers to detect fosmids encoding FeFe-hydrogenase genes predicted from the dataset. Three identified fosmids were completely sequenced. The cloned genomic fragments within these fosmids are closely related to members of the Spirochaetaceae, Bacteroidales and Firmicutes, and their FeFe-hydrogenase sequences are characterized by the structure type M3, which is common to clostridial enzymes. FeFe-hydrogenase sequences found in this study represent hitherto undetected sequences, indicating the high genetic diversity regarding these enzymes in MWWTP. Results suggest that MWWTP have to be considered as reservoirs for new FeFe-hydrogenase genes. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

A detailed gene expression study of the Miscanthus genus reveals changes in the transcriptome associated with the rejuvenation of spring rhizomes.

PubMed

Barling, Adam; Swaminathan, Kankshita; Mitros, Therese; James, Brandon T; Morris, Juliette; Ngamboma, Ornella; Hall, Megan C; Kirkpatrick, Jessica; Alabady, Magdy; Spence, Ashley K; Hudson, Matthew E; Rokhsar, Daniel S; Moose, Stephen P

2013-12-09

The Miscanthus genus of perennial C4 grasses contains promising biofuel crops for temperate climates. However, few genomic resources exist for Miscanthus, which limits understanding of its interesting biology and future genetic improvement. A comprehensive catalog of expressed sequences were generated from a variety of Miscanthus species and tissue types, with an emphasis on characterizing gene expression changes in spring compared to fall rhizomes. Illumina short read sequencing technology was used to produce transcriptome sequences from different tissues and organs during distinct developmental stages for multiple Miscanthus species, including Miscanthus sinensis, Miscanthus sacchariflorus, and their interspecific hybrid Miscanthus × giganteus. More than fifty billion base-pairs of Miscanthus transcript sequence were produced. Overall, 26,230 Sorghum gene models (i.e., ~ 96% of predicted Sorghum genes) had at least five Miscanthus reads mapped to them, suggesting that a large portion of the Miscanthus transcriptome is represented in this dataset. The Miscanthus × giganteus data was used to identify genes preferentially expressed in a single tissue, such as the spring rhizome, using Sorghum bicolor as a reference. Quantitative real-time PCR was used to verify examples of preferential expression predicted via RNA-Seq. Contiguous consensus transcript sequences were assembled for each species and annotated using InterProScan. Sequences from the assembled transcriptome were used to amplify genomic segments from a doubled haploid Miscanthus sinensis and from Miscanthus × giganteus to further disentangle the allelic and paralogous variations in genes. This large expressed sequence tag collection creates a valuable resource for the study of Miscanthus biology by providing detailed gene sequence information and tissue preferred expression patterns. We have successfully generated a database of transcriptome assemblies and demonstrated its use in the study of genes of interest. Analysis of gene expression profiles revealed biological pathways that exhibit altered regulation in spring compared to fall rhizomes, which are consistent with their different physiological functions. The expression profiles of the subterranean rhizome provides a better understanding of the biological activities of the underground stem structures that are essentials for perenniality and the storage or remobilization of carbon and nutrient resources.
On the value of nuclear and mitochondrial gene sequences for reconstructing the phylogeny of vanilloid orchids (Vanilloideae, Orchidaceae)

PubMed Central

Cameron, Kenneth M.

2009-01-01

Background and Aims Most molecular phylogenetic studies of Orchidaceae have relied heavily on DNA sequences from the plastid genome. Nuclear and mitochondrial loci have only been superficially examined for their systematic value. Since 40% of the genera within Vanilloideae are achlorophyllous mycoheterotrophs, this is an ideal group of orchids in which to evaluate non-plastid gene sequences. Methods Phylogenetic reconstructions for Vanilloideae were produced using independent and combined data from the nuclear 18S, 5·8S and 26S rDNA genes and the mitochondrial atpA gene and nad1b-c intron. Key Results These new data indicate placements for genera such as Lecanorchis and Galeola, for which plastid gene sequences have been mostly unavailable. Nuclear and mitochondrial parsimony jackknife trees are congruent with each other and previously published trees based solely on plastid data. Because of high rates of sequence divergence among vanilloid orchids, even the short 5·8S rDNA gene provides impressive levels of resolution and support. Conclusions Orchid systematists are encouraged to sequence nuclear and mitochondrial gene regions along with the growing number of plastid loci available. PMID:19251715
New data on epizootiology and genetics of piroplasms based on sequences of small ribosomal subunit and cytochrome b genes.

PubMed

Criado, A; Martinez, J; Buling, A; Barba, J C; Merino, S; Jefferies, R; Irwin, P J

2006-12-20

As a continuation of our studies on molecular epizootiology of piroplasmosis in Spain and other countries, we present in this contribution the finding of new hosts for some piroplasms, as well as information on their 18S rRNA gene sequences. Genetic data were complemented with sequences of apocytochrome b gene (whenever possible). The following conclusions were drawn from these molecular studies: Theileria annulata is capable of infecting dogs, since it was diagnosed in a symptomatic animal. According to cytochrome b sequences, isolates from cows and dog present slight differences. The same isolates showed, however, identical sequence in the 18S rRNA gene. This exemplifies well the usefulness of the mitochondrial gene for examining infra-specific variation. Babesia bovis is an occasional parasite of equines, since it was detected in two symptomatic horses. We found evidence of genetic polymorphism occurring in the 18S rRNA gene of Spanish T. equi-like and B. ovis isolates. B. bennetti from Spanish seagull is loosely related to B. ovis, and might represent a genetically distinct branch of babesids. A partial sequence of a cytochrome b pseudogene was obtained for the first time in Babesia canis rossi from South Africa. The pseudogene is distantly related to B. bigemina cytochrome b gene. These new findings confirm the ability of some piroplasms to infect multiple hosts, as well as the existence of a relatively wide genetic polymorphisms with respect to the cytochrome b gene. On the other hand, the existence of mtDNA-like pseudogenes of possible nuclear location in piroplasms is interesting due to their possible impact on molecular phylogeny studies.
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic

PubMed Central

Yebra, Gonzalo; Hodcroft, Emma B.; Ragonnet-Cronin, Manon L.; Pillay, Deenan; Brown, Andrew J. Leigh; Fraser, Christophe; Kellam, Paul; de Oliveira, Tulio; Dennis, Ann; Hoppe, Anne; Kityo, Cissy; Frampton, Dan; Ssemwanga, Deogratius; Tanser, Frank; Keshani, Jagoda; Lingappa, Jairam; Herbeck, Joshua; Wawer, Maria; Essex, Max; Cohen, Myron S.; Paton, Nicholas; Ratmann, Oliver; Kaleebu, Pontiano; Hayes, Richard; Fidler, Sarah; Quinn, Thomas; Novitsky, Vladimir; Haywards, Andrew; Nastouli, Eleni; Morris, Steven; Clark, Duncan; Kozlakidis, Zisis

2016-01-01

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree’s using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences. PMID:28008945
Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.

PubMed

Yebra, Gonzalo; Hodcroft, Emma B; Ragonnet-Cronin, Manon L; Pillay, Deenan; Brown, Andrew J Leigh

2016-12-23

HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
Characterization of a rabbit germ-line VH gene that is a candidate donor for VH gene conversion in mutant Alicia rabbits.

PubMed

Chen, H T; Alexander, C B; Mage, R G

1995-06-15

Normal rabbits preferentially rearrange the 3'-most VH gene, VH1, to encode Igs with VHa allotypes, which constitute the majority of rabbit serum Igs. A gene conversion-like mechanism is employed to diversify the primary Ab repertoire. In mutant Alicia rabbits that derived from a rabbit with VHa2 allotype, the VH1 gene was deleted. Our previous studies showed that the first functional gene (VH4) or VH4-like genes were rearranged in 2- to 8-wk-old homozygous Alicia. The VH1a2-like sequences that were found in splenic mRNA from 6-wk and older Alicia rabbits still had some residues that were typical of VH4. The appearances of sequences resembling that of VH1a2 may have been caused by gene conversions that altered the sequences of the rearranged VH or there may have been rearrangement of upstream VH1a2-like genes later in development. To investigate this further, we constructed a cosmid library and isolated a VH1a2-like gene, VH12-1-6, with a sequence almost identical to VH1a2. This gene had a deleted base in the heptamer of its recombination signal sequence. However, even if this defect diminished or eliminated its ability to rearrange, the a2-like gene could have acted as a donor for gene-conversion-like alteration of rearranged VH genes. Sequence comparisons suggested that this gene or a gene like it could have acted as a donor for gene conversion in mutant Alicia and in normal rabbits.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Characterization of the complete mitochondrial genome of Marshallagia marshalli and phylogenetic implications for the superfamily Trichostrongyloidea.

PubMed

Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua

2018-01-01

Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

PubMed Central

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-01-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.

PubMed

Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong

2018-05-01

This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.
Isolation of nucleotide binding site-leucine rich repeat and kinase resistance gene analogues from sugarcane (Saccharum spp.).

PubMed

Glynn, Neil C; Comstock, Jack C; Sood, Sushma G; Dang, Phat M; Chaparro, Jose X

2008-01-01

Resistance gene analogues (RGAs) have been isolated from many crops and offer potential in breeding for disease resistance through marker-assisted selection, either as closely linked or as perfect markers. Many R-gene sequences contain kinase domains, and indeed kinase genes have been reported as being proximal to R-genes, making kinase analogues an additionally promising target. The first step towards utilizing RGAs as markers for disease resistance is isolation and characterization of the sequences. Sugarcane clone US01-1158 was identified as resistant to yellow leaf caused by the sugarcane yellow leaf virus (SCYLV) and moderately resistant to rust caused by Puccinia melanocephala Sydow & Sydow. Degenerate primers that had previously proved useful for isolating RGAs and kinase analogues in wheat and soybean were used to amplify DNA from sugarcane (Saccharum spp.) clone US-01-1158. Sequences generated from 1512 positive clones were assembled into 134 contigs of between two and 105 sequences. Comparison of the contig consensuses with the NCBI sequence database using BLASTx showed that 20 had sequence homology to nuclear binding site and leucine rich repeat (NBS-LRR) RGAs, and eight to kinase genes. Alignment of the deduced amino acid sequences with similar sequences from the NCBI database allowed the identification of several conserved domains. The alignment and resulting phenetic tree showed that many of the sequences had greater similarity to sequences from other species than to one another. The use of degenerate primers is a useful method for isolating novel sugarcane RGA and kinase gene analogues. Further studies are needed to evaluate the role of these genes in disease resistance.
Differentiated evolutionary conservatism and lack of polymorphism of crucial sex determination genes (SRY and SOX9) in four species of the family Canidae.

PubMed

Nowacka-Woszuk, Joanna; Switonski, Marek

2009-01-01

The sex determination process is under the control of several genes of which two (SRY and SOX9), encoding transcription factors, play a crucial role. It is well-known that mutations at these genes may cause the development of an intersexual phenotype. The aim of this study was to conduct a comparative analysis of the coding sequence and 5'-flanking regions of both genes in four species of the family Canidae (the dog, red fox, arctic fox and Chinese raccoon dog). Similarity of the coding sequence of the SOX9 gene among the studied species was higher (99.7-99.9%) than in the case of the SRY gene (96.7-97.3%). Only single nucleotide changes were found in the compared coding sequences, whereas in the 5'-flanking region of both genes nucleotide substitutions, as well as insertions and deletions were observed. None of the changes detected in the 5'-flanking region occurred within the potential consensus sequences for transcription factors. No polymorphism was found for either of these genes in any of the analyzed species.
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride

PubMed Central

Matroudi, S.; Zamani, M.R.; Motallebi, M.

2008-01-01

In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

PubMed Central

Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

2015-01-01

There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Discrimination of the Lactobacillus acidophilus group using sequencing, species-specific PCR and SNaPshot mini-sequencing technology based on the recA gene.

PubMed

Huang, Chien-Hsun; Chang, Mu-Tzu; Huang, Mu-Chiou; Wang, Li-Tin; Huang, Lina; Lee, Fwu-Ling

2012-10-01

To clearly identify specific species and subspecies of the Lactobacillus acidophilus group using phenotypic and genotypic (16S rDNA sequence analysis) techniques alone is difficult. The aim of this study was to use the recA gene for species discrimination in the L. acidophilus group, as well as to develop a species-specific primer and single nucleotide polymorphism primer based on the recA gene sequence for species and subspecies identification. The average sequence similarity for the recA gene among type strains was 80.0%, and most members of the L. acidophilus group could be clearly distinguished. The species-specific primer was designed according to the recA gene sequencing, which was employed for polymerase chain reaction with the template DNA of Lactobacillus strains. A single 231-bp species-specific band was found only in L. delbrueckii. A SNaPshot mini-sequencing assay using recA as a target gene was also developed. The specificity of the mini-sequencing assay was evaluated using 31 strains of L. delbrueckii species and was able to unambiguously discriminate strains belonging to the subspecies L. delbrueckii subsp. bulgaricus. The phylogenetic relationships of most strains in the L. acidophilus group can be resolved using recA gene sequencing, and a novel method to identify the species and subspecies of the L. delbrueckii and L. delbrueckii subsp. bulgaricus was developed by species-specific polymerase chain reaction combined with SNaPshot mini-sequencing. Copyright © 2012 Society of Chemical Industry.
Resistance gene candidates identified by PCR with degenerate oligonucleotide primers map to clusters of resistance genes in lettuce.

PubMed

Shen, K A; Meyers, B C; Islam-Faridi, M N; Chin, D B; Stelly, D M; Michelmore, R W

1998-08-01

The recent cloning of genes for resistance against diverse pathogens from a variety of plants has revealed that many share conserved sequence motifs. This provides the possibility of isolating numerous additional resistance genes by polymerase chain reaction (PCR) with degenerate oligonucleotide primers. We amplified resistance gene candidates (RGCs) from lettuce with multiple combinations of primers with low degeneracy designed from motifs in the nucleotide binding sites (NBSs) of RPS2 of Arabidopsis thaliana and N of tobacco. Genomic DNA, cDNA, and bacterial artificial chromosome (BAC) clones were successfully used as templates. Four families of sequences were identified that had the same similarity to each other as to resistance genes from other species. The relationship of the amplified products to resistance genes was evaluated by several sequence and genetic criteria. The amplified products contained open reading frames with additional sequences characteristic of NBSs. Hybridization of RGCs to genomic DNA and to BAC clones revealed large numbers of related sequences. Genetic analysis demonstrated the existence of clustered multigene families for each of the four RGC sequences. This parallels classical genetic data on clustering of disease resistance genes. Two of the four families mapped to known clusters of resistance genes; these two families were therefore studied in greater detail. Additional evidence that these RGCs could be resistance genes was gained by the identification of leucine-rich repeat (LRR) regions in sequences adjoining the NBS similar to those in RPM1 and RPS2 of A. thaliana. Fluorescent in situ hybridization confirmed the clustered genomic distribution of these sequences. The use of PCR with degenerate oligonucleotide primers is therefore an efficient method to identify numerous RGCs in plants.
[Polymorphism of KPI-A genes from plants of the subgenus Potatoe (sect. Petota, Estolonifera and Lycopersicum) and subgenus Solanum].

PubMed

Krinitsyna, A A; Mel'nikova, N V; Belenikin, M S; Poltronieri, P; Santino, A; Kudriavtseva, A V; Savilova, A M; Speranskaia, A S

2013-01-01

Kunitz-type proteinase inhibitor proteins of group A (KPI-A) are involved in the protection of potato plants from pathogens and pests. Although sequences of large number of the KPI-A genes from different species of cultivated potato (Solanum tuberosum subsp. tuberosum) and a few genes from tomato (Solanum lycopersicum) are known to date, information about the allelic diversity of these genes in other species of the genus Solanum is lacking. In our work, the consensus sequences of the KPI-A genes were established in two species of subgenus Potatoe sect. Petota (Solanum tuberosum subsp. andigenum--5 genes and Solanum stoloniferum--2 genes) and in the subgenus Solanum (Solanum nigrum--5 genes) by amplification, cloning, sequencing and subsequent analysis. The determined sequences of KPI-A genes were 97-100% identical to known sequences of the cultivated potato of sect. Petota (cultivated potato Solanum tuberosum subsp. tuberosum) and sect. Etuberosum (S. palustre). The interspecific variability of these genes did not exceed the intraspecific variability for all studied species except Solanum lycopersicum. The distribution of highly variable and conserved sequences in the mature protein-encoding regions was uniform for all investigated KPI-A genes. However, our attempts to amplify the homologous genes using the same primers and the genomes of Solanum dulcamarum, Solanum lycopersicum and Mandragora officinarum resulted in no product formation. Phylogenetic analysis of KPI-A diversity showed that the sequences of the S. lycopersicum form independent cluster, whereas KPI-A of S. nigrum and species of sect. Etuberosum and sect. Petota are closely related and do not form species-specific subclasters. Although Solanum nigrum is resistant to all known races of economically one of the most important diseases of solanaceous plants oomycete Phytophthora infestans aminoacid sequences encoding by KPI-A genes from its genome have nearly or absolutely no differences to the same from genomes of cultivated potatoes involved by P. infestans.
Phytoplasma phylogenetics based on analysis of secA and 23S rRNA gene sequences for improved resolution of candidate species of 'Candidatus Phytoplasma'.

PubMed

Hodgetts, Jennifer; Boonham, Neil; Mumford, Rick; Harrison, Nigel; Dickinson, Matthew

2008-08-01

Phytoplasma phylogenetics has focused primarily on sequences of the non-coding 16S rRNA gene and the 16S-23S rRNA intergenic spacer region (16-23S ISR), and primers that enable amplification of these regions from all phytoplasmas by PCR are well established. In this study, primers based on the secA gene have been developed into a semi-nested PCR assay that results in a sequence of the expected size (about 480 bp) from all 34 phytoplasmas examined, including strains representative of 12 16Sr groups. Phylogenetic analysis of secA gene sequences showed similar clustering of phytoplasmas when compared with clusters resolved by similar sequence analyses of a 16-23S ISR-23S rRNA gene contig or of the 16S rRNA gene alone. The main differences between trees were in the branch lengths, which were elongated in the 16-23S ISR-23S rRNA gene tree when compared with the 16S rRNA gene tree and elongated still further in the secA gene tree, despite this being a shorter sequence. The improved resolution in the secA gene-derived phylogenetic tree resulted in the 16SrII group splitting into two distinct clusters, while phytoplasmas associated with coconut lethal yellowing-type diseases split into three distinct groups, thereby supporting past proposals that they represent different candidate species within 'Candidatus Phytoplasma'. The ability to differentiate 16Sr groups and subgroups by virtual RFLP analysis of secA gene sequences suggests that this gene may provide an informative alternative molecular marker for pathogen identification and diagnosis of phytoplasma diseases.
Deep sequencing-based transcriptome analysis of Plutella xylostella larvae parasitized by Diadegma semiclausum

PubMed Central

2011-01-01

Background Parasitoid insects manipulate their hosts' physiology by injecting various factors into their host upon parasitization. Transcriptomic approaches provide a powerful approach to study insect host-parasitoid interactions at the molecular level. In order to investigate the effects of parasitization by an ichneumonid wasp (Diadegma semiclausum) on the host (Plutella xylostella), the larval transcriptome profile was analyzed using a short-read deep sequencing method (Illumina). Symbiotic polydnaviruses (PDVs) associated with ichneumonid parasitoids, known as ichnoviruses, play significant roles in host immune suppression and developmental regulation. In the current study, D. semiclausum ichnovirus (DsIV) genes expressed in P. xylostella were identified and their sequences compared with other reported PDVs. Five of these genes encode proteins of unknown identity, that have not previously been reported. Results De novo assembly of cDNA sequence data generated 172,660 contigs between 100 and 10000 bp in length; with 35% of > 200 bp in length. Parasitization had significant impacts on expression levels of 928 identified insect host transcripts. Gene ontology data illustrated that the majority of the differentially expressed genes are involved in binding, catalytic activity, and metabolic and cellular processes. In addition, the results show that transcription levels of antimicrobial peptides, such as gloverin, cecropin E and lysozyme, were up-regulated after parasitism. Expression of ichnovirus genes were detected in parasitized larvae with 19 unique sequences identified from five PDV gene families including vankyrin, viral innexin, repeat elements, a cysteine-rich motif, and polar residue rich protein. Vankyrin 1 and repeat element 1 genes showed the highest transcription levels among the DsIV genes. Conclusion This study provides detailed information on differential expression of P. xylostella larval genes following parasitization, DsIV genes expressed in the host and also improves our current understanding of this host-parasitoid interaction. PMID:21906285

Complete mitochondrial genome sequence of Indian medium carp, Labeo gonius (Hamilton, 1822) and its comparison with other related carp species.

PubMed

Behera, Bijay Kumar; Kumari, Kavita; Baisvar, Vishwamitra Singh; Rout, Ajaya Kumar; Pakrashi, Sudip; Paria, Prasenjet; Jena, J K

2017-01-01

In the present study, the complete mitochondrial genome sequence of Labeo gonius is reported using PGM sequencer (Ion Torrent). The complete mitogenome of L. gonius is obtained by the de novo sequences assembly of genomic reads using the Torrent Mapping Alignment Program (TMAP) which is 16 614 bp in length. The mitogenome of L. gonius comprised of 13 protein-coding genes, 22 tRNAs, 2 rRNA genes, and D-loop as control region along with gene order and organization, being similar to most of other fish mitogenomes of NCBI databases. The mitogenome in the present study has 99% similarity to the complete mitogenome sequence of Labeo fimbriatus, as reported earlier. The phylogenetic analysis of Cypriniformes depicted that their mitogenomes are closely related to each other. The complete mitogenome sequence of L. gonius would be helpful in understanding the population genetics, phylogenetics, and evolution of Indian Carps.
Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

PubMed

Weirather, Jason L; Afshar, Pegah Tootoonchi; Clark, Tyson A; Tseng, Elizabeth; Powers, Linda S; Underwood, Jason G; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

2015-10-15

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Experience of targeted Usher exome sequencing as a clinical test

PubMed Central

Besnard, Thomas; García-García, Gema; Baux, David; Vaché, Christel; Faugère, Valérie; Larrieu, Lise; Léonard, Susana; Millan, Jose M; Malcolm, Sue; Claustres, Mireille; Roux, Anne-Françoise

2014-01-01

We show that massively parallel targeted sequencing of 19 genes provides a new and reliable strategy for molecular diagnosis of Usher syndrome (USH) and nonsyndromic deafness, particularly appropriate for these disorders characterized by a high clinical and genetic heterogeneity and a complex structure of several of the genes involved. A series of 71 patients including Usher patients previously screened by Sanger sequencing plus newly referred patients was studied. Ninety-eight percent of the variants previously identified by Sanger sequencing were found by next-generation sequencing (NGS). NGS proved to be efficient as it offers analysis of all relevant genes which is laborious to reach with Sanger sequencing. Among the 13 newly referred Usher patients, both mutations in the same gene were identified in 77% of cases (10 patients) and one candidate pathogenic variant in two additional patients. This work can be considered as pilot for implementing NGS for genetically heterogeneous diseases in clinical service. PMID:24498627
Draft genome sequence of a KPC-2-producing Klebsiella pneumoniae ST340 carrying blaCTX-M-15 and blaCTX-M-59 genes: a rich genome of mobile genetic elements and genes encoding antibiotic resistance.

PubMed

Casella, Tiago; de Morais, Andressa Batista Zequini; de Paula Barcelos, Diego Diniz; Tolentino, Fernanda Modesto; Cerdeira, Louise Teixeira; Bueno, Maria Fernanda Campagnari; Francisco, Gabriela Rodrigues; de Andrade, Leonardo Neves; da Costa Darini, Ana Lucia; de Oliveira Garcia, Doroti; Lincopan, Nilton; Nogueira, Mara Corrêa Lelles

2018-06-01

Klebsiella pneumoniae is considered an opportunistic pathogen and an important agent of nosocomial and community infections. It presents the ability to capture and harbour several antimicrobial resistance genes and, in this context, the extensive use of carbapenems to treat serious infections has been responsible for the selection of several resistance genes. This study reports the draft genome sequence of a KPC-2-producing K. pneumoniae strain (Kp10) simultaneously harbouring bla CTX-M-15 and bla CTX-M-59 genes isolated from urine culture of a patient with Parkinson's disease. Classical microbiological methods were applied to isolate and identify the strain, and PCR and sequencing were used to identify and characterise the genes and the genetic environment. Whole-genome sequencing (WGS) was performed using a Nextera XT DNA library and a NextSeq platform. WGS analysis revealed the presence of 5915 coding genes, 46 RNA-encoding genes and 255 pseudogenes. Kp10 belonged to sequence type 340 (ST340) of clonal complex 258 (CC258) and carried 20 transferable genes associated with antimicrobial resistance, comprising seven drug classes. Although the simultaneous presence of different bla CTX-M genes in the same strain is rarely reported, the bla KPC-2 , bla CTX-M-15 and bla CTX-M-59 genes were not associated with the same genetic mobile structure in Kp10. These results confirm the capacity of K. pneumoniae to harbour several antimicrobial resistance genes. Thus, this draft genome could help in future epidemiological studies regarding the dissemination of clinically relevant resistance genes. Copyright © 2018 International Society for Chemotherapy of Infection and Cancer. Published by Elsevier Ltd. All rights reserved.
A regulatory sequence from the retinoid X receptor γ gene directs expression to horizontal cells and photoreceptors in the embryonic chicken retina.

PubMed

Blixt, Maria K E; Hallböök, Finn

2016-01-01

Combining techniques of episomal vector gene-specific Cre expression and genomic integration using the piggyBac transposon system enables studies of gene expression-specific cell lineage tracing in the chicken retina. In this work, we aimed to target the retinal horizontal cell progenitors. A 208 bp gene regulatory sequence from the chicken retinoid X receptor γ gene (RXRγ208) was used to drive Cre expression. RXRγ is expressed in progenitors and photoreceptors during development. The vector was combined with a piggyBac "donor" vector containing a floxed STOP sequence followed by enhanced green fluorescent protein (EGFP), as well as a piggyBac helper vector for efficient integration into the host cell genome. The vectors were introduced into the embryonic chicken retina with in ovo electroporation. Tissue electroporation targets specific developmental time points and in specific structures. Cells that drove Cre expression from the regulatory RXRγ208 sequence excised the floxed STOP-sequence and expressed GFP. The approach generated a stable lineage with robust expression of GFP in retinal cells that have activated transcription from the RXRγ208 sequence. Furthermore, GFP was expressed in cells that express horizontal or photoreceptor markers when electroporation was performed between developmental stages 22 and 28. Electroporation of a stage 12 optic cup gave multiple cell types in accordance with RXRγ gene expression in the early retina. In this study, we describe an easy, cost-effective, and time-efficient method for testing regulatory sequences in general. More specifically, our results open up the possibility for further studies of the RXRγ-gene regulatory network governing the formation of photoreceptor and horizontal cells. In addition, the method presents approaches to target the expression of effector genes, such as regulators of cell fate or cell cycle progression, to these cells and their progenitor.
Feline hypersomatotropism and acromegaly tumorigenesis: a potential role for the AIP gene.

PubMed

Scudder, C J; Niessen, S J; Catchpole, B; Fowkes, R C; Church, D B; Forcada, Y

2017-04-01

Acromegaly in humans is usually sporadic, however up to 20% of familial isolated pituitary adenomas are caused by germline sequence variants of the aryl-hydrocarbon-receptor interacting protein (AIP) gene. Feline acromegaly has similarities to human acromegalic families with AIP mutations. The aim of this study was to sequence the feline AIP gene, identify sequence variants and compare the AIP gene sequence between feline acromegalic and control cats, and in acromegalic siblings. The feline AIP gene was amplified through PCR using whole blood genomic DNA from 10 acromegalic and 10 control cats, and 3 sibling pairs affected by acromegaly. PCR products were sequenced and compared with the published predicted feline AIP gene. A single nonsynonymous SNP was identified in exon 1 (AIP:c.9T > G) of two acromegalic cats and none of the control cats, as well as both members of one sibling pair. The region of this SNP is considered essential for the interaction of the AIP protein with its receptor. This sequence variant has not previously been reported in humans. Two additional synonymous sequence variants were identified (AIP:c.481C > T and AIP:c.826C > T). This is the first molecular study to investigate a potential genetic cause of feline acromegaly and identified a nonsynonymous AIP single nucleotide polymorphism in 20% of the acromegalic cat population evaluated, as well as in one of the sibling pairs evaluated. Copyright © 2016 Elsevier Inc. All rights reserved.
Porcine insulin receptor substrate 4 (IRS4) gene: cloning, polymorphism and association study

USDA-ARS?s Scientific Manuscript database

Using PCR and IPCR techniques we obtained a 4498 bp nucleotide sequence FN424076 encompassing the complete coding sequence of the porcine IRS4 gene and its proximal promoter. The 1269-amino acid porcine protein deduced from the nucleotide sequence shares 92% identity with the human IRS4 and possesse...
Genome-wide localization and expression profiling establish Sp2 as a sequence-specific transcription factor regulating vitally important genes

PubMed Central

Terrados, Gloria; Finkernagel, Florian; Stielow, Bastian; Sadic, Dennis; Neubert, Juliane; Herdt, Olga; Krause, Michael; Scharfe, Maren; Jarek, Michael; Suske, Guntram

2012-01-01

The transcription factor Sp2 is essential for early mouse development and for proliferation of mouse embryonic fibroblasts in culture. Yet its mechanisms of action and its target genes are largely unknown. In this study, we have combined RNA interference, in vitro DNA binding, chromatin immunoprecipitation sequencing and global gene-expression profiling to investigate the role of Sp2 for cellular functions, to define target sites and to identify genes regulated by Sp2. We show that Sp2 is important for cellular proliferation that it binds to GC-boxes and occupies proximal promoters of genes essential for vital cellular processes including gene expression, replication, metabolism and signalling. Moreover, we identified important key target genes and cellular pathways that are directly regulated by Sp2. Most significantly, Sp2 binds and activates numerous sequence-specific transcription factor and co-activator genes, and represses the whole battery of cholesterol synthesis genes. Our results establish Sp2 as a sequence-specific regulator of vitally important genes. PMID:22684502
The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry)

PubMed Central

Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro

2018-01-01

Abstract Background The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. Findings In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Conclusions Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family. PMID:29659812
The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry).

PubMed

Buti, Matteo; Moretto, Marco; Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Brilli, Matteo; Lomsadze, Alexandre; Sonego, Paolo; Giongo, Lara; Alonge, Michael; Velasco, Riccardo; Varotto, Claudio; Šurbanovski, Nada; Borodovsky, Mark; Ward, Judson A; Engelen, Kristof; Cavallini, Andrea; Cestaro, Alessandro; Sargent, Daniel James

2018-04-01

The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family.
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

PubMed Central

Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

2000-01-01

The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
rpoB Gene Sequencing for Identification of Corynebacterium Species

PubMed Central

Khamis, Atieh; Raoult, Didier; La Scola, Bernard

2004-01-01

The genus Corynebacterium is a heterogeneous group of species comprising human and animal pathogens and environmental bacteria. It is defined on the basis of several phenotypic characters and the results of DNA-DNA relatedness and, more recently, 16S rRNA gene sequencing. However, the 16S rRNA gene is not polymorphic enough to ensure reliable phylogenetic studies and needs to be completely sequenced for accurate identification. The almost complete rpoB sequences of 56 Corynebacterium species were determined by both PCR and genome walking methods. In all cases the percent similarities between different species were lower than those observed by 16S rRNA gene sequencing, even for those species with degrees of high similarity. Several clusters supported by high bootstrap values were identified. In order to propose a method for strain identification which does not require sequencing of the complete rpoB sequence (approximately 3,500 bp), we identified an area with a high degree of polymorphism, bordered by conserved sequences that can be used as universal primers for PCR amplification and sequencing. The sequence of this fragment (434 to 452 bp) allows accurate species identification and may be used in the future for routine sequence-based identification of Corynebacterium species. PMID:15364970
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2013-12-01

The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
Generation of a total of 6483 expressed sequence tags from 60 day-old bovine whole fetus and fetal placenta.

PubMed

Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y

2004-05-01

Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.
The GENCODE exome: sequencing the complete human exome

PubMed Central

Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno

2011-01-01

Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
Development of phoH as a Novel Signature Gene for Assessing Marine Phage Diversity▿

PubMed Central

Goldsmith, Dawn B.; Crosti, Giuseppe; Dwivedi, Bhakti; McDaniel, Lauren D.; Varsani, Arvind; Suttle, Curtis A.; Weinbauer, Markus G.; Sandaa, Ruth-Anne; Breitbart, Mya

2011-01-01

Phages play a key role in the marine environment by regulating the transfer of energy between trophic levels and influencing global carbon and nutrient cycles. The diversity of marine phage communities remains difficult to characterize because of the lack of a signature gene common to all phages. Recent studies have demonstrated the presence of host-derived auxiliary metabolic genes in phage genomes, such as those belonging to the Pho regulon, which regulates phosphate uptake and metabolism under low-phosphate conditions. Among the completely sequenced phage genomes in GenBank, this study identified Pho regulon genes in nearly 40% of the marine phage genomes, while only 4% of nonmarine phage genomes contained these genes. While several Pho regulon genes were identified, phoH was the most prevalent, appearing in 42 out of 602 completely sequenced phage genomes. Phylogenetic analysis demonstrated that phage phoH sequences formed a cluster distinct from those of their bacterial hosts. PCR primers designed to amplify a region of the phoH gene were used to determine the diversity of phage phoH sequences throughout a depth profile in the Sargasso Sea and at six locations worldwide. phoH was present at all sites examined, and a high diversity of phoH sequences was recovered. Most phoH sequences belonged to clusters without any cultured representatives. Each depth and geographic location had a distinct phoH composition, although most phoH clusters were recovered from multiple sites. Overall, phoH is an effective signature gene for examining phage diversity in the marine environment. PMID:21926220
Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.

PubMed

Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia

2017-07-24

Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.
Assessing the genetic diversity of Cu resistance in mine tailings through high-throughput recovery of full-length copA genes

PubMed Central

Li, Xiaofang; Zhu, Yong-Guan; Shaban, Babak; Bruxner, Timothy J. C.; Bond, Philip L.; Huang, Longbin

2015-01-01

Characterizing the genetic diversity of microbial copper (Cu) resistance at the community level remains challenging, mainly due to the polymorphism of the core functional gene copA. In this study, a local BLASTN method using a copA database built in this study was developed to recover full-length putative copA sequences from an assembled tailings metagenome; these sequences were then screened for potentially functioning CopA using conserved metal-binding motifs, inferred by evolutionary trace analysis of CopA sequences from known Cu resistant microorganisms. In total, 99 putative copA sequences were recovered from the tailings metagenome, out of which 70 were found with high potential to be functioning in Cu resistance. Phylogenetic analysis of selected copA sequences detected in the tailings metagenome showed that topology of the copA phylogeny is largely congruent with that of the 16S-based phylogeny of the tailings microbial community obtained in our previous study, indicating that the development of copA diversity in the tailings might be mainly through vertical descent with few lateral gene transfer events. The method established here can be used to explore copA (and potentially other metal resistance genes) diversity in any metagenome and has the potential to exhaust the full-length gene sequences for downstream analyses. PMID:26286020
Genomic analysis of NAC transcription factors in banana (Musa acuminata) and definition of NAC orthologous groups for monocots and dicots.

PubMed

Cenci, Albero; Guignon, Valentin; Roux, Nicolas; Rouard, Mathieu

2014-05-01

Identifying the molecular mechanisms underlying tolerance to abiotic stresses is important in crop breeding. A comprehensive understanding of the gene families associated with drought tolerance is therefore highly relevant. NAC transcription factors form a large plant-specific gene family involved in the regulation of tissue development and responses to biotic and abiotic stresses. The main goal of this study was to set up a framework of orthologous groups determined by an expert sequence comparison of NAC genes from both monocots and dicots. In order to clarify the orthologous relationships among NAC genes of different species, we performed an in-depth comparative study of four divergent taxa, in dicots and monocots, whose genomes have already been completely sequenced: Arabidopsis thaliana, Vitis vinifera, Musa acuminata and Oryza sativa. Due to independent evolution, NAC copy number is highly variable in these plant genomes. Based on an expert NAC sequence comparison, we propose forty orthologous groups of NAC sequences that were probably derived from an ancestor gene present in the most recent common ancestor of dicots and monocots. These orthologous groups provide a curated resource for large-scale protein sequence annotation of NAC transcription factors. The established orthology relationships also provide a useful reference for NAC function studies in newly sequenced genomes such as M. acuminata and other plant species.
Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

PubMed Central

Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

2012-01-01

High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309

Transcriptome response to elevated CO2, water deficit, and thermal stress in peanut

USDA-ARS?s Scientific Manuscript database

Previously, our laboratories have performed gene expression studies using EST sequencing and spotted microarrays to investigate tissue-specific gene expression and response to abiotic stress. While these studies have provided valuable insight into these processes, they are constrained by sequencer t...
Detection of a new bat gammaherpesvirus in the Philippines.

PubMed

Watanabe, Shumpei; Ueda, Naoya; Iha, Koichiro; Masangkay, Joseph S; Fujii, Hikaru; Alviola, Phillip; Mizutani, Tetsuya; Maeda, Ken; Yamane, Daisuke; Walid, Azab; Kato, Kentaro; Kyuwa, Shigeru; Tohya, Yukinobu; Yoshikawa, Yasuhiro; Akashi, Hiroomi

2009-08-01

A new bat herpesvirus was detected in the spleen of an insectivorous bat (Hipposideros diadema, family Hipposideridae) collected on Panay Island, the Philippines. PCR analyses were performed using COnsensus-DEgenerate Hybrid Oligonucleotide Primers (CODEHOPs) targeting the herpesvirus DNA polymerase (DPOL) gene. Although we obtained PCR products with CODEHOPs, direct sequencing using the primers was not possible because of high degree of degeneracy. Direct sequencing technology developed in our rapid determination system of viral RNA sequences (RDV) was applied in this study, and a partial DPOL nucleotide sequence was determined. In addition, a partial gB gene nucleotide sequence was also determined using the same strategy. We connected the partial gB and DPOL sequences with long-distance PCR, and a 3741-bp nucleotide fragment, including the 3' part of the gB gene and the 5' part of the DPOL gene, was finally determined. Phylogenetic analysis showed that the sequence was novel and most similar to those of the subfamily Gammaherpesvirinae.
PCR detection of Anaplasma phagocytophilum in goat flocks in an area endemic for tick-borne fever in Switzerland.

PubMed

Silaghi, C; Scheuerle, M C; Friche Passos, L M; Thiel, C; Pfister, K

2011-02-01

Central Switzerland is a highly endemic region for tick-borne fever (TBF) in cattle, however, little is known about A. phagocytophilum in goats. In the present study, 72 animals from six goat flocks (373 EDTA blood-samples) in Central Switzerland were analysed for A. phagocytophilum DNA. A real-time PCR targeting the msp2 gene of A. phagocytophilum was performed and in positive samples the partial 165 rRNA, groEL and msp4 gene were amplified for sequence analysis. Four DNA extracts were positive. Different sequence types on basis of the amplified genes were found. For comparison, sequences of A. phagocytophilum from 12 cattle (originating from Switzerland and Southern Germany) were analysed. The 165 rRNA gene sequences from cattle were all identical amongst each other, but the groEL and msp4 gene differed depending on the origin of the cattle samples and differed from the variants from goats. This study clearly provides molecular evidence for the presence of different types of A. phagocytophilum in goat flocks in Switzerland, a fact which deserves more thorough attention in clinical studies.
Deep transcriptome sequencing provides new insights into the structural and functional organization of the wheat genome.

PubMed

Pingault, Lise; Choulet, Frédéric; Alberti, Adriana; Glover, Natasha; Wincker, Patrick; Feuillet, Catherine; Paux, Etienne

2015-02-10

Because of its size, allohexaploid nature, and high repeat content, the bread wheat genome is a good model to study the impact of the genome structure on gene organization, function, and regulation. However, because of the lack of a reference genome sequence, such studies have long been hampered and our knowledge of the wheat gene space is still limited. The access to the reference sequence of the wheat chromosome 3B provided us with an opportunity to study the wheat transcriptome and its relationships to genome and gene structure at a level that has never been reached before. By combining this sequence with RNA-seq data, we construct a fine transcriptome map of the chromosome 3B. More than 8,800 transcription sites are identified, that are distributed throughout the entire chromosome. Expression level, expression breadth, alternative splicing as well as several structural features of genes, including transcript length, number of exons, and cumulative intron length are investigated. Our analysis reveals a non-monotonic relationship between gene expression and structure and leads to the hypothesis that gene structure is determined by its function, whereas gene expression is subject to energetic cost. Moreover, we observe a recombination-based partitioning at the gene structure and function level. Our analysis provides new insights into the relationships between gene and genome structure and function. It reveals mechanisms conserved with other plant species as well as superimposed evolutionary forces that shaped the wheat gene space, likely participating in wheat adaptation.
Cold Shock Exoribonuclease R (VacB) is Involved in Aeromonas hydrophila Pathogenesis

EPA Science Inventory

In this study, we cloned and sequenced a virulence-associated gene (vacB) from a clinical isolate SSU of Aeromonas hydrophila. We identified this gene based on our recently annotated genome sequence of the environmental isolate ATCC 7966T of A. hydrophila and the vacB gene of Shi...
Cold Shock Exoribonuclease R(VacB) is involved in Aeromonas hydrophila Virulence

EPA Science Inventory

In this study, we cloned and sequenced a virulence-associated gene (vacB) from a clinical isolate SSU of Aeromonas hydrophila. We identified this gene based on our recently annotated genome sequence of the environmental isolate ATCC 7966T of A. hydrophila and the vacB gene of Shi...
Homologous recombination within the capsid gene of porcine circovirus type 2 subgroup viruses via natural co-infection

USDA-ARS?s Scientific Manuscript database

Several studies had reported homologous recombination between porcine circovirus type 2 (PCV2)-group 1 (Gp1) and -group 2 (Gp2) viruses. Interestingly, the recombination events described thus far mapped either within the Rep gene sequences or the sequences flanking the Rep gene region. Previously, ...
Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032.

PubMed

Yang, Shihui; Vera, Jessica M; Grass, Jeff; Savvakis, Giannis; Moskvin, Oleg V; Yang, Yongfu; McIlwain, Sean J; Lyu, Yucai; Zinonos, Irene; Hebert, Alexander S; Coon, Joshua J; Bates, Donna M; Sato, Trey K; Brown, Steven D; Himmel, Michael E; Zhang, Min; Landick, Robert; Pappas, Katherine M; Zhang, Yaoping

2018-01-01

Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4 and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.
The complete mitochondrial genome sequence of Eimeria magna (Apicomplexa: Coccidia).

PubMed

Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Liu, Guo-Hua; Wang, Chun-Ren; Zhu, Xing-Quan

2015-01-01

In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Eimeria magna from rabbits for the first time, and compared its gene contents and genome organizations with that of seven Eimeria spp. from domestic chickens. The size of the complete mt genome sequence of E. magna is 6249 bp, which consists of 3 protein-coding genes (cytb, cox1 and cox3), 12 gene fragments for the large subunit (LSU) rRNA, and 7 gene fragments for the small subunit (SSU) rRNA, without transfer RNA genes, in accordance with that of Eimeria spp. from chickens. The putative direction of translation for three genes (cytb, cox1 and cox3) was the same as those of Eimeria species from domestic chickens. The content of A + T is 65.16% for E. magna mt genome (29.73% A, 35.43% T, 17.09 G and 17.75% C). The E. magna mt genome sequence provides novel mtDNA markers for studying the molecular epidemiology and population genetics of Eimeria spp. and has implications for the molecular diagnosis and control of rabbit coccidiosis.
Genome-wide identification of aquaporin encoding genes in Brassica oleracea and their phylogenetic sequence comparison to Brassica crops and Arabidopsis

PubMed Central

Diehn, Till A.; Pommerrenig, Benjamin; Bernhardt, Nadine; Hartmann, Anja; Bienert, Gerd P.

2015-01-01

Aquaporins (AQPs) are essential channel proteins that regulate plant water homeostasis and the uptake and distribution of uncharged solutes such as metalloids, urea, ammonia, and carbon dioxide. Despite their importance as crop plants, little is known about AQP gene and protein function in cabbage (Brassica oleracea) and other Brassica species. The recent releases of the genome sequences of B. oleracea and Brassica rapa allow comparative genomic studies in these species to investigate the evolution and features of Brassica genes and proteins. In this study, we identified all AQP genes in B. oleracea by a genome-wide survey. In total, 67 genes of four plant AQP subfamilies were identified. Their full-length gene sequences and locations on chromosomes and scaffolds were manually curated. The identification of six additional full-length AQP sequences in the B. rapa genome added to the recently published AQP protein family of this species. A phylogenetic analysis of AQPs of Arabidopsis thaliana, B. oleracea, B. rapa allowed us to follow AQP evolution in closely related species and to systematically classify and (re-) name these isoforms. Thirty-three groups of AQP-orthologous genes were identified between B. oleracea and Arabidopsis and their expression was analyzed in different organs. The two selectivity filters, gene structure and coding sequences were highly conserved within each AQP subfamily while sequence variations in some introns and untranslated regions were frequent. These data suggest a similar substrate selectivity and function of Brassica AQPs compared to Arabidopsis orthologs. The comparative analyses of all AQP subfamilies in three Brassicaceae species give initial insights into AQP evolution in these taxa. Based on the genome-wide AQP identification in B. oleracea and the sequence analysis and reprocessing of Brassica AQP information, our dataset provides a sequence resource for further investigations of the physiological and molecular functions of Brassica crop AQPs. PMID:25904922
Comparative and evolutionary studies of vertebrate ALDH1A-like genes and proteins.

PubMed

Holmes, Roger S

2015-06-05

Vertebrate ALDH1A-like genes encode cytosolic enzymes capable of metabolizing all-trans-retinaldehyde to retinoic acid which is a molecular 'signal' guiding vertebrate development and adipogenesis. Bioinformatic analyses of vertebrate and invertebrate genomes were undertaken using known ALDH1A1, ALDH1A2 and ALDH1A3 amino acid sequences. Comparative analyses of the corresponding human genes provided evidence for distinct modes of gene regulation and expression with putative transcription factor binding sites (TFBS), CpG islands and micro-RNA binding sites identified for the human genes. ALDH1A-like sequences were identified for all mammalian, bird, lizard and frog genomes examined, whereas fish genomes displayed a more restricted distribution pattern for ALDH1A1 and ALDH1A3 genes. The ALDH1A1 gene was absent in many bony fish genomes examined, with the ALDH1A3 gene also absent in the medaka and tilapia genomes. Multiple ALDH1A1-like genes were identified in mouse, rat and marsupial genomes. Vertebrate ALDH1A1, ALDH1A2 and ALDH1A3 subunit sequences were highly conserved throughout vertebrate evolution. Comparative amino acid substitution rates showed that mammalian ALDH1A2 sequences were more highly conserved than for the ALDH1A1 and ALDH1A3 sequences. Phylogenetic studies supported an hypothesis for ALDH1A2 as a likely primordial gene originating in invertebrate genomes and undergoing sequential gene duplication to generate two additional genes, ALDH1A1 and ALDH1A3, in most vertebrate genomes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Evolution of the chalcone synthase gene family in the genus Ipomoea.

PubMed Central

Durbin, M L; Learn, G H; Huttley, G A; Clegg, M T

1995-01-01

The evolution of the chalcone synthase [CHS; malonyl-CoA:4-coumaroyl-CoA malonyltransferase (cyclizing), EC 2.3.1.74] multigene family in the genus Ipomoea is explored. Thirteen CHS genes from seven Ipomoea species (family Convolvulaceae) were sequenced--three from genomic clones and the remainder from PCR amplification with primers designed from the 5' flanking region and the end of the 3' coding region of Ipomoea purpurea Roth. Analysis of the data indicates a duplication of CHS that predates the divergence of the Ipomoea species in this study. The Ipomoea CHS genes are among the most rapidly evolving of the CHS genes sequenced to date. The CHS genes in this study are most closely related to the Petunia CHS-B gene, which is also rapidly evolving and highly divergent from the rest of the Petunia CHS sequences. PMID:7724563
GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

PubMed Central

Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.

2011-01-01

GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria.

PubMed

Gaby, John Christian; Buckley, Daniel H

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm.
A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria

PubMed Central

Gaby, John Christian; Buckley, Daniel H.

2014-01-01

We describe a nitrogenase gene sequence database that facilitates analysis of the evolution and ecology of nitrogen-fixing organisms. The database contains 32 954 aligned nitrogenase nifH sequences linked to phylogenetic trees and associated sequence metadata. The database includes 185 linked multigene entries including full-length nifH, nifD, nifK and 16S ribosomal RNA (rRNA) gene sequences. Evolutionary analyses enabled by the multigene entries support an ancient horizontal transfer of nitrogenase genes between Archaea and Bacteria and provide evidence that nifH has a different history of horizontal gene transfer from the nifDK enzyme core. Further analyses show that lineages in nitrogenase cluster I and cluster III have different rates of substitution within nifD, suggesting that nifD is under different selection pressure in these two lineages. Finally, we find that that the genetic divergence of nifH and 16S rRNA genes does not correlate well at sequence dissimilarity values used commonly to define microbial species, as stains having <3% sequence dissimilarity in their 16S rRNA genes can have up to 23% dissimilarity in nifH. The nifH database has a number of uses including phylogenetic and evolutionary analyses, the design and assessment of primers/probes and the evaluation of nitrogenase sequence diversity. Database URL: http://www.css.cornell.edu/faculty/buckley/nifh.htm PMID:24501396
Sequence analysis and expression of the M1 and M2 matrix protein genes of hirame rhabdovirus (HIRRV)

USGS Publications Warehouse

Nishizawa, T.; Kurath, G.; Winton, J.R.

1997-01-01

We have cloned and sequenced a 2318 nucleotide region of the genomic RNA of hirame rhabdovirus (HIRRV), an important viral pathogen of Japanese flounder Paralichthys olivaceus. This region comprises approximately two-thirds of the 3' end of the nucleocapsid protein (N) gene and the complete matrix protein (M1 and M2) genes with the associated intergenic regions. The partial N gene sequence was 812 nucleotides in length with an open reading frame (ORF) that encoded the carboxyl-terminal 250 amino acids of the N protein. The M1 and M2 genes were 771 and 700 nucleotides in length, respectively, with ORFs encoding proteins of 227 and 193 amino acids. The M1 gene sequence contained an additional small ORF that could encode a highly basic, arginine-rich protein of 25 amino acids. Comparisons of the N, M1, and M2 gene sequences of HIRRV with the corresponding sequences of the fish rhabdoviruses, infectious hematopoietic necrosis virus (IHNV) or viral hemorrhagic septicemia virus (VHSV) indicated that HIRRV was more closely related to IHNV than to VHSV, but was clearly distinct from either. The putative consensus gene termination sequence for IHNV and VHSV, AGAYAG(A)(7), was present in the N-M1, M1-M2, and M2-G intergenic regions of HIRRV as were the putative transcription initiation sequences YGGCAC and AACA. An Escherichia coli expression system was used to produce recombinant proteins from the M1 and M2 genes of HIRRV. These were the same size as the authentic M1 and M2 proteins and reacted with anti-HIRRV rabbit serum in western blots. These reagents can be used for further study of the fish immune response and to test novel control methods.
Genetic diversity of Clostridium perfringens type A isolates from animals, food poisoning outbreaks and sludge

PubMed Central

Johansson, Anders; Aspan, Anna; Bagge, Elisabeth; Båverud, Viveca; Engström, Björn E; Johansson, Karl-Erik

2006-01-01

Background Clostridium perfringens, a serious pathogen, causes enteric diseases in domestic animals and food poisoning in humans. The epidemiological relationship between C. perfringens isolates from the same source has previously been investigated chiefly by pulsed-field gel electrophoresis (PFGE). In this study the genetic diversity of C. perfringens isolated from various animals, from food poisoning outbreaks and from sludge was investigated. Results We used PFGE to examine the genetic diversity of 95 C. perfringens type A isolates from eight different sources. The isolates were also examined for the presence of the beta2 toxin gene (cpb2) and the enterotoxin gene (cpe). The cpb2 gene from the 28 cpb2-positive isolates was also partially sequenced (519 bp, corresponding to positions 188 to 706 in the consensus cpb2 sequence). The results of PFGE revealed a wide genetic diversity among the C. perfringens type A isolates. The genetic relatedness of the isolates ranged from 58 to 100% and 56 distinct PFGE types were identified. Almost all clusters with similar patterns comprised isolates with a known epidemiological correlation. Most of the isolates from pig, horse and sheep carried the cpb2 gene. All isolates originating from food poisoning outbreaks carried the cpe gene and three of these also carried cpb2. Two evolutionary different populations were identified by sequence analysis of the partially sequenced cpb2 genes from our study and cpb2 sequences previously deposited in GenBank. Conclusion As revealed by PFGE, there was a wide genetic diversity among C. perfringens isolates from different sources. Epidemiologically related isolates showed a high genetic similarity, as expected, while isolates with no obvious epidemiological relationship expressed a lesser degree of genetic similarity. The wide diversity revealed by PFGE was not reflected in the 16S rRNA sequences, which had a considerable degree of sequence similarity. Sequence comparison of the partially sequenced cpb2 gene revealed two genetically different populations. This is to our knowledge the first study in which the genetic diversity of C. perfringens isolates both from different animals species, from food poisoning outbreaks and from sludge has been investigated. PMID:16737528
Computational Identification Of CDR3 Sequence Archetypes Among Immunoglobulin Sequences in Chronic Lymphocytic Leukemia

PubMed Central

Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

2009-01-01

The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL. PMID:18640719
Computational identification of CDR3 sequence archetypes among immunoglobulin sequences in chronic lymphocytic leukemia.

PubMed

Messmer, Bradley T; Raphael, Benjamin J; Aerni, Sarah J; Widhopf, George F; Rassenti, Laura Z; Gribben, John G; Kay, Neil E; Kipps, Thomas J

2009-03-01

The leukemia cells of unrelated patients with chronic lymphocytic leukemia (CLL) display a restricted repertoire of immunoglobulin (Ig) gene rearrangements with preferential usage of certain Ig gene segments. We developed a computational method to rigorously quantify biases in Ig sequence similarity in large patient databases and to identify groups of patients with unusual levels of sequence similarity. We applied our method to sequences from 1577 CLL patients through the CLL Research Consortium (CRC), and identified 67 similarity groups into which roughly 20% of all patients could be assigned. Immunoglobulin light chain class was highly correlated within all groups and light chain gene usage was similar within sets. Surprisingly, over 40% of the identified groups were composed of somatically mutated genes. This study significantly expands the evidence that antigen selection shapes the Ig repertoire in CLL.

Microbial evolution of sulphate reduction when lateral gene transfer is geographically restricted.

PubMed

Chi Fru, E

2011-07-01

Lateral gene transfer (LGT) is an important mechanism by which micro-organisms acquire new functions. This process has been suggested to be central to prokaryotic evolution in various environments. However, the influence of geographical constraints on the evolution of laterally acquired genes in microbial metabolic evolution is not yet well understood. In this study, the influence of geographical isolation on the evolution of laterally acquired dissimilatory sulphite reductase (dsr) gene sequences in the sulphate-reducing micro-organisms (SRM) was investigated. Sequences on four continental blocks related to SRM known to have received dsr by LGT were analysed using standard phylogenetic and multidimensional statistical methods. Sequences related to lineages with large genetic diversity correlated positively with habitat divergence. Those affiliated to Thermodesulfobacterium indicated strong biogeographical delineation; hydrothermal-vent sequences clustered independently from hot-spring sequences. Some of the hydrothermal-vent and hot-spring sequences suggested to have been acquired from a common ancestral source may have diverged upon isolation within distinct habitats. In contrast, analysis of some Desulfotomaculum sequences indicated they could have been transferred from different ancestral sources but converged upon isolation within the same niche. These results hint that, after lateral acquisition of dsr genes, barriers to gene flow probably play a strong role in their subsequent evolution.
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)

PubMed Central

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-01-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Developing JSequitur to Study the Hierarchical Structure of Biological Sequences in a Grammatical Inference Framework of String Compression Algorithms.

PubMed

Galbadrakh, Bulgan; Lee, Kyung-Eun; Park, Hyun-Seok

2012-12-01

Grammatical inference methods are expected to find grammatical structures hidden in biological sequences. One hopes that studies of grammar serve as an appropriate tool for theory formation. Thus, we have developed JSequitur for automatically generating the grammatical structure of biological sequences in an inference framework of string compression algorithms. Our original motivation was to find any grammatical traits of several cancer genes that can be detected by string compression algorithms. Through this research, we could not find any meaningful unique traits of the cancer genes yet, but we could observe some interesting traits in regards to the relationship among gene length, similarity of sequences, the patterns of the generated grammar, and compression rate.
FunGene: the functional gene pipeline and repository.

PubMed

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie.

PubMed

Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol; Na, Ki-Jeong

2010-12-01

P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance.
Novel insertion mutation of ABCB1 gene in an ivermectin-sensitive Border Collie

PubMed Central

Han, Jae-Ik; Son, Hyoung-Won; Park, Seung-Cheol

2010-01-01

P-glycoprotein (P-gp) is encoded by the ABCB1 gene and acts as an efflux pump for xenobiotics. In the Border Collie, a nonsense mutation caused by a 4-base pair deletion in the ABCB1 gene is associated with a premature stop to P-gp synthesis. In this study, we examined the full-length coding sequence of the ABCB1 gene in an ivermectin-sensitive Border Collie that lacked the aforementioned deletion mutation. The sequence was compared to the corresponding sequences of a wild-type Beagle and seven ivermectin-tolerant family members of the Border Collie. When compared to the wild-type Beagle sequence, that of the ivermectin-sensitive Border Collie was found to have one insertion mutation and eight single nucleotide polymorphisms (SNPs) in the coding sequence of the ABCB1 gene. While the eight SNPs were also found in the family members' sequences, the insertion mutation was found only in the ivermectin-sensitive dog. These results suggest the possibility that the SNPs are species-specific features of the ABCB1 gene in Border Collies, and that the insertion mutation may be related to ivermectin intolerance. PMID:21113104
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

PubMed

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-02-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Genome sequence of an aflatoxigenic pathogen of Argentinian peanut, Aspergillus arachidicola

USDA-ARS?s Scientific Manuscript database

In this study we sequenced the genome of the A. arachidicola Type strain (CBS 117610) and found its genome size to be 38.9 Mb, and its number of predicted genes to be 12,091, which are values comparable to those in other sequenced Aspergilli. Of its predicted genes, 691 were identified as unique to ...
DArT Markers Effectively Target Gene Space in the Rye Genome

PubMed Central

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625
DArT Markers Effectively Target Gene Space in the Rye Genome.

PubMed

Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

2016-01-01

Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.
Genes expressed during the development and ripening of watermelon fruit.

PubMed

Levi, A; Davis, A; Hernandez, A; Wechter, P; Thimmapuram, J; Trebitsh, T; Tadmor, Y; Katzir, N; Portnoy, V; King, S

2006-11-01

A normalized cDNA library was constructed using watermelon flesh mRNA from three distinct developmental time-points and was subtracted by hybridization with leaf cDNA. Random cDNA clones of the watermelon flesh subtraction library were sequenced from the 5' end in order to identify potentially informative genes associated with fruit setting, development, and ripening. One-thousand and forty-six 5'-end sequences (expressed sequence tags; ESTs) were assembled into 832 non-redundant sequences, designated as "EST-unigenes". Of these 832 "EST-unigenes", 254 ( approximately 30%) have no significant homology to sequences published so far for other plant species. Additionally, 168 "EST-unigenes" ( approximately 20%) correspond to genes with unknown function, whereas 410 "EST-unigenes" ( approximately 50%) correspond to genes with known function in other plant species. These "EST-unigenes" are mainly associated with metabolism, membrane transport, cytoskeleton synthesis and structure, cell wall formation and cell division, signal transduction, nucleic acid binding and transcription factors, defense and stress response, and secondary metabolism. This study provides the scientific community with novel genetic information for watermelon as well as an expanded pool of genes associated with fruit development in watermelon. These genes will be useful targets in future genetic and functional genomic studies of watermelon and its development.
Targeted next-generation sequencing in steroid-resistant nephrotic syndrome: mutations in multiple glomerular genes may influence disease severity.

PubMed

Bullich, Gemma; Trujillano, Daniel; Santín, Sheila; Ossowski, Stephan; Mendizábal, Santiago; Fraga, Gloria; Madrid, Álvaro; Ariceta, Gema; Ballarín, José; Torra, Roser; Estivill, Xavier; Ars, Elisabet

2015-09-01

Genetic diagnosis of steroid-resistant nephrotic syndrome (SRNS) using Sanger sequencing is complicated by the high genetic heterogeneity and phenotypic variability of this disease. We aimed to improve the genetic diagnosis of SRNS by simultaneously sequencing 26 glomerular genes using massive parallel sequencing and to study whether mutations in multiple genes increase disease severity. High-throughput mutation analysis was performed in 50 SRNS and/or focal segmental glomerulosclerosis (FSGS) patients, a validation cohort of 25 patients with known pathogenic mutations, and a discovery cohort of 25 uncharacterized patients with probable genetic etiology. In the validation cohort, we identified the 42 previously known pathogenic mutations across NPHS1, NPHS2, WT1, TRPC6, and INF2 genes. In the discovery cohort, disease-causing mutations in SRNS/FSGS genes were found in nine patients. We detected three patients with mutations in an SRNS/FSGS gene and COL4A3. Two of them were familial cases and presented a more severe phenotype than family members with mutation in only one gene. In conclusion, our results show that massive parallel sequencing is feasible and robust for genetic diagnosis of SRNS/FSGS. Our results indicate that patients carrying mutations in an SRNS/FSGS gene and also in COL4A3 gene have increased disease severity.
The transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) male reproductive organs.

PubMed

Azevedo, Renata V D M; Dias, Denise B S; Bretãs, Jorge A C; Mazzoni, Camila J; Souza, Nataly A; Albano, Rodolpho M; Wagner, Glauber; Davila, Alberto M R; Peixoto, Alexandre A

2012-01-01

It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. We generated 2678 high quality ESTs ("Expressed Sequence Tags") of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies.
The Transcriptome of Lutzomyia longipalpis (Diptera: Psychodidae) Male Reproductive Organs

PubMed Central

Bretãs, Jorge A. C.; Mazzoni, Camila J.; Souza, Nataly A.; Albano, Rodolpho M.; Wagner, Glauber; Davila, Alberto M. R.; Peixoto, Alexandre A.

2012-01-01

Background It has been suggested that genes involved in the reproductive biology of insect disease vectors are potential targets for future alternative methods of control. Little is known about the molecular biology of reproduction in phlebotomine sand flies and there is no information available concerning genes that are expressed in male reproductive organs of Lutzomyia longipalpis, the main vector of American visceral leishmaniasis and a species complex. Methods/Principal Findings We generated 2678 high quality ESTs (“Expressed Sequence Tags”) of L. longipalpis male reproductive organs that were grouped in 1391 non-redundant sequences (1136 singlets and 255 clusters). BLAST analysis revealed that only 57% of these sequences share similarity with a L. longipalpis female EST database. Although no more than 36% of the non-redundant sequences showed similarity to protein sequences deposited in databases, more than half of them presented the best-match hits with mosquito genes. Gene ontology analysis identified subsets of genes involved in biological processes such as protein biosynthesis and DNA replication, which are probably associated with spermatogenesis. A number of non-redundant sequences were also identified as putative male reproductive gland proteins (mRGPs), also known as male accessory gland protein genes (Acps). Conclusions The transcriptome analysis of L. longipalpis male reproductive organs is one step further in the study of the molecular basis of the reproductive biology of this important species complex. It has allowed the identification of genes potentially involved in spermatogenesis as well as putative mRGPs sequences, which have been studied in many insect species because of their effects on female post-mating behavior and physiology and their potential role in sexual selection and speciation. These data open a number of new avenues for further research in the molecular and evolutionary reproductive biology of sand flies. PMID:22496818
Deep sequencing and genome-wide analysis reveals the expansion of MicroRNA genes in the gall midge Mayetiola destructor

PubMed Central

2013-01-01

Background MicroRNAs (miRNAs) are small non-coding RNAs that play critical roles in regulating post transcriptional gene expression. Gall midges encompass a large group of insects that are of economic importance and also possess fascinating biological traits. The gall midge Mayetiola destructor, commonly known as the Hessian fly, is a destructive pest of wheat and model organism for studying gall midge biology and insect – host plant interactions. Results In this study, we systematically analyzed miRNAs from the Hessian fly. Deep-sequencing a Hessian fly larval transcriptome led to the identification of 89 miRNA species that are either identical or very similar to known miRNAs from other insects, and 184 novel miRNAs that have not been reported from other species. A genome-wide search through a draft Hessian fly genome sequence identified a total of 611 putative miRNA-encoding genes based on sequence similarity and the existence of a stem-loop structure for miRNA precursors. Analysis of the 611 putative genes revealed a striking feature: the dramatic expansion of several miRNA gene families. The largest family contained 91 genes that encoded 20 different miRNAs. Microarray analyses revealed the expression of miRNA genes was strictly regulated during Hessian fly larval development and abundance of many miRNA genes were affected by host genotypes. Conclusion The identification of a large number of miRNAs for the first time from a gall midge provides a foundation for further studies of miRNA functions in gall midge biology and behavior. The dramatic expansion of identical or similar miRNAs provides a unique system to study functional relations among miRNA iso-genes as well as changes in sequence specificity due to small changes in miRNAs and in their mRNA targets. These results may also facilitate the identification of miRNA genes for potential pest control through transgenic approaches. PMID:23496979
Phylogenetic Relationship of Necoclí Virus to Other South American Hantaviruses (Bunyaviridae: Hantavirus).

PubMed

Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F

2015-07-01

The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Molecular epidemiology and evolution of avian infectious bronchitis virus in Spain over a fourteen-year period.

PubMed

Dolz, Roser; Pujols, Joan; Ordóñez, German; Porta, Ramon; Majó, Natàlia

2008-04-25

An in-depth molecular study of infectious bronchitis viruses (IBV) with particular interest in evolutionary aspects of IBV in Spain was carried out in the present study based on the S1 gene molecular characterization of twenty-six Spanish strains isolated over a fourteen-year period. Four genotypes were identified based on S1 gene sequence analyses and phylogenetic studies. A drastic virus population shift was demonstrated along time and the novel Italy 02 serotype was shown to have displaced the previous predominant serotype 4/91 in the field. Detailed analyses of synonymous to non-synonymous ratio of the S1 gene sequences of this new serotype Italy 02 suggested positive selection pressures might have contributed to the successful establishment of Italy 02 serotype in our country. In addition, differences on the fitness abilities of new emergent genotypes were indicated. Furthermore, intergenic sequences (IGs)-like motifs within S1 gene sequences of IBV isolates were suggested to enhance the recombination abilities of certain serotypes.
Isolation and characterization of NBS–LRR resistance gene analogues from mango

PubMed Central

Lei, Xintao; Yao, Quansheng; Xu, Xuerong; Liu, Yang

2014-01-01

The nucleotide-binding site (NBS)–leucine-rich repeat (LRR) gene family is a class of R genes in plants. NBS genes play a very important role in disease defence. To further study the variation and homology of mango NBS–LRR genes, 16 resistance gene analogues (RGAs) (GenBank accession number HM446507-22) were isolated from the polymerase chain reaction fragments and sequenced by using two degenerate primer sets. The total nucleotide diversity index Pi was 0.362, and 236 variation sites were found among 16 RGAs. The degree of homology between the RGAs varied from 44.4% to 98.5%. Sixteen RGAs could be translated into amino sequences. The high level of this homology in the protein sequences of the P-loop and kinase-2 of the NBS domain between the RGAs isolated in this study and previously characterized R genes indicated that these cloned sequences belonged to the NBS–LRR gene family. Moreover, these 16 RGAs could be classified into the non-TIR–NBS–LRR gene family because only tryptophan (W) could be claimed as the final residual of the kinase-2 domain of all RGAs isolated here. From our results, we concluded that our mango NBS–LRR genes possessed a high level of variation from the mango genome, which may allow mango to recognize many different pathogenic virulence factors. PMID:26740762
Exome Sequencing and Linkage Analysis Identified Novel Candidate Genes in Recessive Intellectual Disability Associated with Ataxia.

PubMed

Jazayeri, Roshanak; Hu, Hao; Fattahi, Zohreh; Musante, Luciana; Abedini, Seyedeh Sedigheh; Hosseini, Masoumeh; Wienker, Thomas F; Ropers, Hans Hilger; Najmabadi, Hossein; Kahrizi, Kimia

2015-10-01

Intellectual disability (ID) is a neuro-developmental disorder which causes considerable socio-economic problems. Some ID individuals are also affected by ataxia, and the condition includes different mutations affecting several genes. We used whole exome sequencing (WES) in combination with homozygosity mapping (HM) to identify the genetic defects in five consanguineous families among our cohort study, with two affected children with ID and ataxia as major clinical symptoms. We identified three novel candidate genes, RIPPLY1, MRPL10, SNX14, and a new mutation in known gene SURF1. All are autosomal genes, except RIPPLY1, which is located on the X chromosome. Two are housekeeping genes, implicated in transcription and translation regulation and intracellular trafficking, and two encode mitochondrial proteins. The pathogenesis of these variants was evaluated by mutation classification, bioinformatic methods, review of medical and biological relevance, co-segregation studies in the particular family, and a normal population study. Linkage analysis and exome sequencing of a small number of affected family members is a powerful new technique which can be used to decrease the number of candidate genes in heterogenic disorders such as ID, and may even identify the responsible gene(s).
Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.

PubMed

Batty, Elizabeth M; Chaemchuen, Suwittra; Blacksell, Stuart; Richards, Allen L; Paris, Daniel; Bowden, Rory; Chan, Caroline; Lachumanan, Ramkumar; Day, Nicholas; Donnelly, Peter; Chen, Swaine; Salje, Jeanne

2018-06-01

Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species. We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia. Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.

Identification of Cis-Acting Promoter Elements in Cold- and Dehydration-Induced Transcriptional Pathways in Arabidopsis, Rice, and Soybean

PubMed Central

Maruyama, Kyonoshin; Todaka, Daisuke; Mizoi, Junya; Yoshida, Takuya; Kidokoro, Satoshi; Matsukura, Satoko; Takasaki, Hironori; Sakurai, Tetsuya; Yamamoto, Yoshiharu Y.; Yoshiwara, Kyouko; Kojima, Mikiko; Sakakibara, Hitoshi; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko

2012-01-01

The genomes of three plants, Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and soybean (Glycine max), have been sequenced, and their many genes and promoters have been predicted. In Arabidopsis, cis-acting promoter elements involved in cold- and dehydration-responsive gene expression have been extensively analysed; however, the characteristics of such cis-acting promoter sequences in cold- and dehydration-inducible genes of rice and soybean remain to be clarified. In this study, we performed microarray analyses using the three species, and compared characteristics of identified cold- and dehydration-inducible genes. Transcription profiles of the cold- and dehydration-responsive genes were similar among these three species, showing representative upregulated (dehydrin/LEA) and downregulated (photosynthesis-related) genes. All (46 = 4096) hexamer sequences in the promoters of the three species were investigated, revealing the frequency of conserved sequences in cold- and dehydration-inducible promoters. A core sequence of the abscisic acid-responsive element (ABRE) was the most conserved in dehydration-inducible promoters of all three species, suggesting that transcriptional regulation for dehydration-inducible genes is similar among these three species, with the ABRE-dependent transcriptional pathway. In contrast, for cold-inducible promoters, the conserved hexamer sequences were diversified among these three species, suggesting the existence of diverse transcriptional regulatory pathways for cold-inducible genes among the species. PMID:22184637
Molecular evolution of the CYP2D subfamily in primates: purifying selection on substrate recognition sites without the frequent or long-tract gene conversion.

PubMed

Yasukochi, Yoshiki; Satta, Yoko

2015-03-25

The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

PubMed Central

Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

2003-01-01

Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375
Molecular Evolution of the CYP2D Subfamily in Primates: Purifying Selection on Substrate Recognition Sites without the Frequent or Long-Tract Gene Conversion

PubMed Central

Yasukochi, Yoshiki; Satta, Yoko

2015-01-01

The human cytochrome P450 (CYP) 2D6 gene is a member of the CYP2D gene subfamily, along with the CYP2D7P and CYP2D8P pseudogenes. Although the CYP2D6 enzyme has been studied extensively because of its clinical importance, the evolution of the CYP2D subfamily has not yet been fully understood. Therefore, the goal of this study was to reveal the evolutionary process of the human drug metabolic system. Here, we investigate molecular evolution of the CYP2D subfamily in primates by comparing 14 CYP2D sequences from humans to New World monkey genomes. Window analysis and statistical tests revealed that entire genomic sequences of paralogous genes were extensively homogenized by gene conversion during molecular evolution of CYP2D genes in primates. A neighbor-joining tree based on genomic sequences at the nonsubstrate recognition sites showed that CYP2D6 and CYP2D8 genes were clustered together due to gene conversion. In contrast, a phylogenetic tree using amino acid sequences at substrate recognition sites did not cluster the CYP2D6 and CYP2D8 genes, suggesting that the functional constraint on substrate specificity is one of the causes for purifying selection at the substrate recognition sites. Our results suggest that the CYP2D gene subfamily in primates has evolved to maintain the regioselectivity for a substrate hydroxylation activity between individual enzymes, even though extensive gene conversion has occurred across CYP2D coding sequences. PMID:25808902
Genetic variability among Schistosoma japonicum isolates from the Philippines, Japan and China revealed by sequence analysis of three mitochondrial genes.

PubMed

Chen, Fen; Li, Juan; Sugiyama, Hiromu; Zhou, Dong-Hui; Song, Hui-Qun; Zhao, Guang-Hui; Zhu, Xing-Quan

2015-02-01

The present study examined sequence variability in the mitochondrial (mt) protein-coding genes cytochrome b (cytb), NADH dehydrogenase subunits 2 and 6 (nad2 and nad6) among 24 isolates of Schistosoma japonicum from different endemic regions in the Philippines, Japan and China. The complete cytb, nad2 and nad6 genes were amplified and sequenced separately from individual schistosome. Sequence variations for isolates from the Philippines were 0-0.5% for cytb, 0-0.6% for nad2, and 0-0.9% for nad6. Variation was 0-0.5%, 0.1-0.8%, 0-0.7% for corresponding genes for schistosome samples from mainland China. For worms in Japan, genetic variations were 0-0.2%, 0.1-0.2% and 0 for the three genes, respectively. Sequence variations were 0-1.0%, 0-1.8% and 0-1.1% for cytb, nad2 and nad6, respectively, among schistosome isolates from different geographical strains in the Philippines, Japan and China. Of the three countries, lowest sequence variations were found between isolates from mainland China and the Philippines and highest were detected between Japan and the Philippines in three mtDNA genes. Phylogenetic analyses based on the combined sequences of cytb, nad2 and nad6 revealed that all isolates in the Philippines clustered together sistered to samples from Yunnan and Zhejiang provinces in China, while isolates from Yamanashi in Japan were in a solitary clade. These results demonstrated the usefulness of the combined three mtDNA sequences for studying genetic diversity and population structure among S. japonicum isolates from the Philippines, China and Japan.
The American cranberry: first insights into the whole genome of a species adapted to bog habitat.

PubMed

Polashock, James; Zelzion, Ehud; Fajardo, Diego; Zalapa, Juan; Georgi, Laura; Bhattacharya, Debashish; Vorsa, Nicholi

2014-06-13

The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance.
The American cranberry: first insights into the whole genome of a species adapted to bog habitat

PubMed Central

2014-01-01

Background The American cranberry (Vaccinium macrocarpon Ait.) is one of only three widely-cultivated fruit crops native to North America- the other two are blueberry (Vaccinium spp.) and native grape (Vitis spp.). In terms of taxonomy, cranberries are in the core Ericales, an order for which genome sequence data are currently lacking. In addition, cranberries produce a host of important polyphenolic secondary compounds, some of which are beneficial to human health. Whereas next-generation sequencing technology is allowing the advancement of whole-genome sequencing, one major obstacle to the successful assembly from short-read sequence data of complex diploid (and higher ploidy) organisms is heterozygosity. Cranberry has the advantage of being diploid (2n = 2x = 24) and self-fertile. To minimize the issue of heterozygosity, we sequenced the genome of a fifth-generation inbred genotype (F ≥ 0.97) derived from five generations of selfing originating from the cultivar Ben Lear. Results The genome size of V. macrocarpon has been estimated to be about 470 Mb. Genomic sequences were assembled into 229,745 scaffolds representing 420 Mbp (N50 = 4,237 bp) with 20X average coverage. The number of predicted genes was 36,364 and represents 17.7% of the assembled genome. Of the predicted genes, 30,090 were assigned to candidate genes based on homology. Genes supported by transcriptome data totaled 13,170 (36%). Conclusions Shotgun sequencing of the cranberry genome, with an average sequencing coverage of 20X, allowed efficient assembly and gene calling. The candidate genes identified represent a useful collection to further study important biochemical pathways and cellular processes and to use for marker development for breeding and the study of horticultural characteristics, such as disease resistance. PMID:24927653
Microinjection of CRISPR/Cas9 Protein into Channel Catfish, Ictalurus punctatus, Embryos for Gene Editing.

PubMed

Elaswad, Ahmed; Khalil, Karim; Cline, David; Page-McCaw, Patrick; Chen, Wenbiao; Michel, Maximilian; Cone, Roger; Dunham, Rex

2018-01-20

The complete genome of the channel catfish, Ictalurus punctatus, has been sequenced, leading to greater opportunities for studying channel catfish gene function. Gene knockout has been used to study these gene functions in vivo. The clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9 (CRISPR/Cas9) system is a powerful tool used to edit genomic DNA sequences to alter gene function. While the traditional approach has been to introduce CRISPR/Cas9 mRNA into the single cell embryos through microinjection, this can be a slow and inefficient process in catfish. Here, a detailed protocol for microinjection of channel catfish embryos with CRISPR/Cas9 protein is described. Briefly, eggs and sperm were collected and then artificial fertilization performed. Fertilized eggs were transferred to a Petri dish containing Holtfreter's solution. Injection volume was calibrated and then guide RNAs/Cas9 targeting the toll/interleukin 1 receptor domain-containing adapter molecule (TICAM 1) gene and rhamnose binding lectin (RBL) gene were microinjected into the yolk of one-cell embryos. The gene knockout was successful as indels were confirmed by DNA sequencing. The predicted protein sequence alterations due to these mutations included frameshift and truncated protein due to premature stop codons.
Both positive and negative regulatory elements mediate expression of a photoregulated CAB gene from Nicotiana plumbaginifolia.

PubMed Central

Castresana, C; Garcia-Luque, I; Alonso, E; Malik, V S; Cashmore, A R

1988-01-01

We have analyzed promoter regulatory elements from a photoregulated CAB gene (Cab-E) isolated from Nicotiana plumbaginifolia. These studies have been performed by introducing chimeric gene constructs into tobacco cells via Agrobacterium tumefaciens-mediated transformation. Expression studies on the regenerated transgenic plants have allowed us to characterize three positive and one negative cis-acting elements that influence photoregulated expression of the Cab-E gene. Within the upstream sequences we have identified two positive regulatory elements (PRE1 and PRE2) which confer maximum levels of photoregulated expression. These sequences contain multiple repeated elements related to the sequence-ACCGGCCCACTT-. We have also identified within the upstream region a negative regulatory element (NRE) extremely rich in AT sequences, which reduces the level of gene expression in the light. We have defined a light regulatory element (LRE) within the promoter region extending from -396 to -186 bp which confers photoregulated expression when fused to a constitutive nopaline synthase ('nos') promoter. Within this region there is a 132-bp element, extending from -368 to -234 bp, which on deletion from the Cab-E promoter reduces gene expression from high levels to undetectable levels. Finally, we have demonstrated for a full length Cab-E promoter conferring high levels of photoregulated expression, that sequences proximal to the Cab-E TATA box are not replaceable by corresponding sequences from a 'nos' promoter. This contrasts with the apparent equivalence of these Cab-E and 'nos' TATA box-proximal sequences in truncated promoters conferring low levels of photoregulated expression. Images PMID:2901343
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

PubMed Central

2009-01-01

Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

PubMed

Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

2009-08-06

Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.
Development of a PCR-based marker utilizing a deletion mutation in the dihydroflavonol 4-reductase (DFR) gene responsible for the lack of anthocyanin production in yellow onions (Allium cepa).

PubMed

Kim, Sunggil; Yoo, Kil Sun; Pike, Leonard M

2005-02-01

Bulb color in onions (Allium cepa) is an important trait, but the mechanism of color inheritance is poorly understood at the molecular level. A previous study showed that inactivation of the dihydroflavonol 4-reductase (DFR) gene at the transcriptional level resulted in a lack of anthocyanin production in yellow onions. The objectives of the present study were the identification of the critical mutations in the DFR gene (DFR-A) and the development of a PCR-based marker for allelic selection. We report the isolation of two additional DFR homologs (DFR-B and DFR-C). No unique sequences were identified in either DFR homolog, even in the untranslated region (UTR). Both genes shared more than 95% nucleotide sequence identity with the DFR-A gene. To obtain a unique sequence from each gene, we isolated the promoter regions. Sequences of the DFR-A and DFR-B promoters differed completely from one another, except for an approximately 100-bp sequence adjacent to the 5'UTR. It was possible to specifically amplify only the DFR-A gene using primers designed to anneal to the unique promoter region. The sequences of yellow and red DFR-A alleles were the same except for a single base-pair change in the promoter and an approximately 800-bp deletion within the 3' region of the yellow DFR-A allele. This deletion was used to develop a co-dominant PCR-based marker that segregated perfectly with color phenotypes in the F2 population. These results indicate that a deletion mutation in the yellow DFR-A gene results in the lack of anthocyanin production in yellow onions.
Targeted amplicon sequencing (TAS): a scalable next-gen approach to multilocus, multitaxa phylogenetics.

PubMed

Bybee, Seth M; Bracken-Grissom, Heather; Haynes, Benjamin D; Hermansen, Russell A; Byers, Robert L; Clement, Mark J; Udall, Joshua A; Wilcox, Edward R; Crandall, Keith A

2011-01-01

Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach.
Targeted Amplicon Sequencing (TAS): A Scalable Next-Gen Approach to Multilocus, Multitaxa Phylogenetics

PubMed Central

Bybee, Seth M.; Bracken-Grissom, Heather; Haynes, Benjamin D.; Hermansen, Russell A.; Byers, Robert L.; Clement, Mark J.; Udall, Joshua A.; Wilcox, Edward R.; Crandall, Keith A.

2011-01-01

Next-gen sequencing technologies have revolutionized data collection in genetic studies and advanced genome biology to novel frontiers. However, to date, next-gen technologies have been used principally for whole genome sequencing and transcriptome sequencing. Yet many questions in population genetics and systematics rely on sequencing specific genes of known function or diversity levels. Here, we describe a targeted amplicon sequencing (TAS) approach capitalizing on next-gen capacity to sequence large numbers of targeted gene regions from a large number of samples. Our TAS approach is easily scalable, simple in execution, neither time-nor labor-intensive, relatively inexpensive, and can be applied to a broad diversity of organisms and/or genes. Our TAS approach includes a bioinformatic application, BarcodeCrucher, to take raw next-gen sequence reads and perform quality control checks and convert the data into FASTA format organized by gene and sample, ready for phylogenetic analyses. We demonstrate our approach by sequencing targeted genes of known phylogenetic utility to estimate a phylogeny for the Pancrustacea. We generated data from 44 taxa using 68 different 10-bp multiplexing identifiers. The overall quality of data produced was robust and was informative for phylogeny estimation. The potential for this method to produce copious amounts of data from a single 454 plate (e.g., 325 taxa for 24 loci) significantly reduces sequencing expenses incurred from traditional Sanger sequencing. We further discuss the advantages and disadvantages of this method, while offering suggestions to enhance the approach. PMID:22002916
A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes

PubMed Central

Herbold, Craig W.; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander

2015-01-01

High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305
Signature of genetic associations in oral cancer.

PubMed

Sharma, Vishwas; Nandan, Amrita; Sharma, Amitesh Kumar; Singh, Harpreet; Bharadwaj, Mausumi; Sinha, Dhirendra Narain; Mehrotra, Ravi

2017-10-01

Oral cancer etiology is complex and controlled by multi-factorial events including genetic events. Candidate gene studies, genome-wide association studies, and next-generation sequencing identified various chromosomal loci to be associated with oral cancer. There is no available review that could give us the comprehensive picture of genetic loci identified to be associated with oral cancer by candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based approaches. A systematic literature search was performed in the PubMed database to identify the loci associated with oral cancer by exclusive candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based study approaches. The information of loci associated with oral cancer is made online through the resource "ORNATE." Next, screening of the loci validated by candidate gene studies and next-generation sequencing approach or by two independent studies within candidate gene studies or next-generation sequencing approaches were performed. A total of 264 loci were identified to be associated with oral cancer by candidate gene studies, genome-wide association studies, and next-generation sequencing approaches. In total, 28 loci, that is, 14q32.33 (AKT1), 5q22.2 (APC), 11q22.3 (ATM), 2q33.1 (CASP8), 11q13.3 (CCND1), 16q22.1 (CDH1), 9p21.3 (CDKN2A), 1q31.1 (COX-2), 7p11.2 (EGFR), 22q13.2 (EP300), 4q35.2 (FAT1), 4q31.3 (FBXW7), 4p16.3 (FGFR3), 1p13.3 (GSTM1-GSTT1), 11q13.2 (GSTP1), 11p15.5 (H-RAS), 3p25.3 (hOGG1), 1q32.1 (IL-10), 4q13.3 (IL-8), 12p12.1 (KRAS), 12q15 (MDM2), 12q13.12 (MLL2), 9q34.3 (NOTCH1), 17p13.1 (p53), 3q26.32 (PIK3CA), 10q23.31 (PTEN), 13q14.2 (RB1), and 5q14.2 (XRCC4), were validated to be associated with oral cancer. "ORNATE" gives a snapshot of genetic loci associated with oral cancer. All 28 loci were validated to be linked to oral cancer for which further fine-mapping followed by gene-by-gene and gene-environment interaction studies is needed to confirm their involvement in modifying oral cancer.
FARME DB: a functional antibiotic resistance element database

PubMed Central

Wallace, James C.; Port, Jesse A.; Smith, Marissa N.; Faustman, Elaine M.

2017-01-01

Antibiotic resistance (AR) is a major global public health threat but few resources exist that catalog AR genes outside of a clinical context. Current AR sequence databases are assembled almost exclusively from genomic sequences derived from clinical bacterial isolates and thus do not include many microbial sequences derived from environmental samples that confer resistance in functional metagenomic studies. These environmental metagenomic sequences often show little or no similarity to AR sequences from clinical isolates using standard classification criteria. In addition, existing AR databases provide no information about flanking sequences containing regulatory or mobile genetic elements. To help address this issue, we created an annotated database of DNA and protein sequences derived exclusively from environmental metagenomic sequences showing AR in laboratory experiments. Our Functional Antibiotic Resistant Metagenomic Element (FARME) database is a compilation of publically available DNA sequences and predicted protein sequences conferring AR as well as regulatory elements, mobile genetic elements and predicted proteins flanking antibiotic resistant genes. FARME is the first database to focus on functional metagenomic AR gene elements and provides a resource to better understand AR in the 99% of bacteria which cannot be cultured and the relationship between environmental AR sequences and antibiotic resistant genes derived from cultured isolates. Database URL: http://staff.washington.edu/jwallace/farme PMID:28077567
Analysis of xylem formation in pine by cDNA sequencing

NASA Technical Reports Server (NTRS)

Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.;

1998-01-01

Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

PubMed

Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

PubMed Central

Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563

Co-regulation of the atrial natriuretic factor and cardiac myosin light chain-2 genes during alpha-adrenergic stimulation of neonatal rat ventricular cells. Identification of cis sequences within an embryonic and a constitutive contractile protein gene which mediate inducible expression.

PubMed

Knowlton, K U; Baracchini, E; Ross, R S; Harris, A N; Henderson, S A; Evans, S M; Glembotski, C C; Chien, K R

1991-04-25

To study the mechanisms which mediate the transcriptional activation of cardiac genes during alpha adrenergic stimulation, the present study examined the regulated expression of three cardiac genes, a ventricular embryonic gene (atrial natriuretic factor, ANF), a constitutively expressed contractile protein gene (cardiac MLC-2), and a cardiac sodium channel gene. alpha 1-Adrenergic stimulation activates the expression and release of ANF from neonatal ventricular cells. As assessed by RNase protection analyses, treatment with alpha-adrenergic agonists increases the steady-state levels of ANF mRNA by greater than 15-fold. However, a rat cardiac sodium channel gene mRNA is not induced, indicating that alpha-adrenergic stimulation does not lead to an increase in the expression of all cardiac genes. Studies employing a series of rat ANF luciferase and rat MLC-2 luciferase fusion genes identify 315- and 92-base pair cis regulatory sequences within an embryonic gene (ANF) and a constitutively expressed contractile protein gene (MLC-2), respectively, which mediate alpha-adrenergic-inducible gene expression. Transfection of various ANF luciferase reporters into neonatal rat ventricular cells demonstrated that upstream sequences which mediate tissue-specific expression (-3003 to -638) can be segregated from those responsible for inducibility. The lack of inducibility of a cardiac Na+ channel gene, and the segregation of ANF gene sequences which mediate cardiac specific from those which mediate inducible expression, provides further insight into the relationship between muscle-specific and inducible expression during cardiac myocyte hypertrophy. Based on these results, a testable model is proposed for the induction of embryonic cardiac genes and constitutively expressed contractile protein genes and the noninducibility of a subset of cardiac genes during alpha-adrenergic stimulation of neonatal rat ventricular cells.
De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.

PubMed

Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min

2013-02-01

Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.
JGI Plant Genomics Gene Annotation Pipeline

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

2014-07-14

Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less
Molecular Epidemiological Survey and Genetic Characterization of Anaplasma Species in Mongolian Livestock.

PubMed

Ochirkhuu, Nyamsuren; Konnai, Satoru; Odbileg, Raadan; Murata, Shiro; Ohashi, Kazuhiko

2017-08-01

Anaplasma species are obligate intracellular rickettsial pathogens that cause great economic loss to the animal industry. Few studies on Anaplasma infections in Mongolian livestock have been conducted. This study examined the prevalence of Anaplasma marginale, Anaplasma ovis, Anaplasma phagocytophilum, and Anaplasma bovis by polymerase chain reaction assay in 928 blood samples collected from native cattle and dairy cattle (Bos taurus), yaks (Bos grunniens), sheep (Ovis aries), and goats (Capra aegagrus hircus) in four provinces of Ulaanbaatar city in Mongolia. We genetically characterized positive samples through sequencing analysis based on the heat-shock protein groEL, major surface protein 4 (msp4), and 16S rRNA genes. Only A. ovis was detected in Mongolian livestock (cattle, yaks, sheep, and goats), with 413 animals (44.5%) positive for groEL and 308 animals (33.2%) positive for msp4 genes. In the phylogenetic tree, we separated A. ovis sequences into two distinct clusters based on the groEL gene. One cluster comprised sequences derived mainly from sheep and goats, which was similar to that in A. ovis isolates from other countries. The other divergent cluster comprised sequences derived from cattle and yaks and appeared to be newly branched from that in previously published single isolates in Mongolian cattle. In addition, the msp4 gene of A. ovis using same and different samples with groEL gene of the pathogen demonstrated that all sequences derived from all animal species, except for three sequences derived from cattle and yak, were clustered together, and were identical or similar to those in isolates from other countries. We used 16S rRNA gene sequences to investigate the genetically divergent A. ovis and identified high homology of 99.3-100%. However, the sequences derived from cattle did not match those derived from sheep and goats. The results of this study on the prevalence and molecular characterization of A. ovis in Mongolian livestock can facilitate the control of infectious diseases in livestock.
Mollusk genes encoding lysine tRNA (UUU) contain introns.

PubMed

Matsuo, M; Abe, Y; Saruta, Y; Okada, N

1995-11-20

New intron-containing genes encoding tRNAs were discovered when genomic DNA isolated from various animal species was amplified by the polymerase chain reaction (PCR) with primers based on sequences of rabbit tRNA(Lys). From sequencing analysis of the products of PCR, we found that introns are present in several genes encoding tRNA(Lys) in mollusks, such as Loligo bleekeri (squid) and Octopus vulgaris (octopus). These introns were specific to genes encoding tRNA(Lys)(CUU) and were not present in genes encoding tRNA(Lys)(CUU). In addition, the sequences of the introns were different from one another. To confirm the results of our initial experiments, we isolated and sequenced genes encoding tRNA(Lys)(CUU) and tRNA(Lys)(UUU). The gene for tRNA(Lys)(UUU) from squid contained an intron, whose sequence was the same as that identified by PCR, and the gene formed a cluster with a corresponding pseudogene. Several DNA regions of 2.1 kb containing this cluster appeared to be tandemly arrayed in the squid genome. By contrast, the gene encoding tRNA(Lys)(CUU) did not contain an intron, as shown also by PCR. The tRNA(Lys)(UUU) that corresponded to the analyzed gene was isolated and characterized. The present study provides the first example of an intron-containing gene encoding a tRNA in mollusks and suggests the universality of introns in such genes in higher eukaryotes.
Comparative sequence analysis of B5R gene of zoonotic buffalo pox virus isolates with other orthopoxviruses.

PubMed

Chandranaik, B M; Singh, Raj Kumar; Hosamani, Mahusudan; Krishnappa, Giriappa; Harish, Balur R; Chethana, C S; Renukaprasad, C

2011-02-01

The present paper describes the isolation of buffalo pox virus from scab lesions and its molecular characterization through B5R gene sequencing. During our study, pustular pox lesions were observed on the teats and mammary parenchyma of cattle and buffaloes, and the disease was of significant zoonotic importance since similar lesions were produced on the hands, legs, and face of people in close contact with the affected animals. The collected scab materials were subjected for virus isolation in 9-11-day-old chicken embryos by the chorioallontoic membrane route and in the Vero cell line. The virus was confirmed by a sensitive and rapid diagnostic polymerase chain reaction using the primers that amplify "A type inclusion" gene, and further, B5R gene of the virus was sequenced and compared with the corresponding sequences of other orthopoxviruses. The results showed high sequence homology of our isolates with other orthopoxviruses.
Integration of targeted sequencing and NIPT into clinical practice in a Chinese family with maple syrup urine disease.

PubMed

You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying

2014-08-01

This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.
Mitochondrial genomes of the jungle crow Corvus macrorhynchos (Passeriformes: Corvidae) from shed feathers and a phylogenetic analysis of genus Corvus using mitochondrial protein-coding genes.

PubMed

Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M

2016-07-01

The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
RNA-Seq analysis of yak ovary: improving yak gene structure information and mining reproduction-related genes.

PubMed

Lan, DaoLiang; Xiong, XianRong; Wei, YanLi; Xu, Tong; Zhong, JinCheng; Zhi, XiangDong; Wang, Yong; Li, Jian

2014-09-01

RNA-Seq, a high-throughput (HT) sequencing technique, has been used effectively in large-scale transcriptomic studies, and is particularly useful for improving gene structure information and mining of new genes. In this study, RNA-Seq HT technology was employed to analyze the transcriptome of yak ovary. After Illumina-Solexa deep sequencing, 26826516 clean reads with a total of 4828772880 bp were obtained from the ovary library. Alignment analysis showed that 16992 yak genes mapped to the yak genome and 3734 of these genes were involved in alternative splicing. Gene structure refinement analysis showed that 7340 genes that were annotated in the yak genome could be extended at the 5' or 3' ends based on the alignments been the transcripts and the genome sequence. Novel transcript prediction analysis identified 6321 new transcripts with lengths ranging from 180 to 14884 bp, and 2267 of them were predicted to code proteins. BLAST analysis of the new transcripts showed that 1200?4933 mapped to the non-redundant (nr), nucleotide (nt) and/or SwissProt sequence databases. Comparative statistical analysis of the new mapped transcripts showed that the majority of them were similar to genes in Bos taurus (41.4%), Bos grunniens mutus (33.0%), Ovis aries (6.3%), Homo sapiens (2.8%), Mus musculus (1.6%) and other species. Functional analysis showed that these expressed genes were involved in various Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes pathways. GO analysis of the new transcripts found that the largest proportion of them was associated with reproduction. The results of this study will provide a basis for describing the normal transcriptome map of yak ovary and for future studies on yak breeding performance. Moreover, the results confirmed that RNA-Seq HT technology is highly advantageous in improving gene structure information and mining of new genes, as well as in providing valuable data to expand the yak genome information.
Detection and characterization of Pasteuria 16S rRNA gene sequences from nematodes and soils.

PubMed

Duan, Y P; Castro, H F; Hewlett, T E; White, J H; Ogram, A V

2003-01-01

Various bacterial species in the genus Pasteuria have great potential as biocontrol agents against plant-parasitic nematodes, although study of this important genus is hampered by the current inability to cultivate Pasteuria species outside their host. To aid in the study of this genus, an extensive 16S rRNA gene sequence phylogeny was constructed and this information was used to develop cultivation-independent methods for detection of Pasteuria in soils and nematodes. Thirty new clones of Pasteuria 16S rRNA genes were obtained directly from nematodes and soil samples. These were sequenced and used to construct an extensive phylogeny of this genus. These sequences were divided into two deeply branching clades within the low-G + C, Gram-positive division; some sequences appear to represent novel species within the genus Pasteuria. In addition, a surprising degree of 16S rRNA gene sequence diversity was observed within what had previously been designated a single strain of Pasteuria penetrans (P-20). PCR primers specific to Pasteuria 16S rRNA for detection of Pasteuria in soils were also designed and evaluated. Detection limits for soil DNA were 100-10,000 Pasteuria endospores (g soil)(-1).
Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

PubMed Central

2009-01-01

Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. Conclusion This automated process allows laboratories to discover DNA variations in a short time and at low cost. PMID:19835634
Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes.

PubMed

Bennett, Richard R; Schneider, Hal E; Estrella, Elicia; Burgess, Stephanie; Cheng, Andrew S; Barrett, Caitlin; Lip, Va; Lai, Poh San; Shen, Yiping; Wu, Bai-Lin; Darras, Basil T; Beggs, Alan H; Kunkel, Louis M

2009-10-18

One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive.These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels.The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. This automated process allows laboratories to discover DNA variations in a short time and at low cost.
Information Propagation in Developmental Enhancers

NASA Astrophysics Data System (ADS)

Jena, Siddhartha; Levine, Michael

Rather than encoding information about protein sequence, certain lengths of noncoding DNA, called enhancers, interact with protein machinery such as transcription factors to precisely regulate gene expression. Enhancers have been studied extensively in the fruit fly Drosophila melanogaster, where they regulate the expression of developmental genes that establish the blueprint of the adult fly. It has been suggested that enhancer sequences possess a specific but unknown syntax with regards to the placement and strength of transcription factor binding sites. Moreover, studies in divergent fly species have shown that compensatory evolution allows for maintenance of enhancer functionality despite considerable variation in primary DNA sequence. Here, the possible role of enhancers as signal processing modules is studied as a way of explaining these two findings. We first demonstrate how this framework can be used to explain the fine-tuned spatiotemporal dynamics of gene expression. We then explore the evolutionary pressure on enhancer sequences and the resulting emergence of enhancers that are linked by compensatory mutations. This study provides a possible mechanism for the function of multiple enhancers linked to a single gene.
tRNADB-CE: tRNA gene database well-timed in the era of big sequence data.

PubMed

Abe, Takashi; Inokuchi, Hachiro; Yamada, Yuko; Muto, Akira; Iwasaki, Yuki; Ikemura, Toshimichi

2014-01-01

The tRNA gene data base curated by experts "tRNADB-CE" (http://trna.ie.niigata-u.ac.jp) was constructed by analyzing 1,966 complete and 5,272 draft genomes of prokaryotes, 171 viruses', 121 chloroplasts', and 12 eukaryotes' genomes plus fragment sequences obtained by metagenome studies of environmental samples. 595,115 tRNA genes in total, and thus two times of genes compiled previously, have been registered, for which sequence, clover-leaf structure, and results of sequence-similarity and oligonucleotide-pattern searches can be browsed. To provide collective knowledge with help from experts in tRNA researches, we added a column for enregistering comments to each tRNA. By grouping bacterial tRNAs with an identical sequence, we have found high phylogenetic preservation of tRNA sequences, especially at the phylum level. Since many species-unknown tRNAs from metagenomic sequences have sequences identical to those found in species-known prokaryotes, the identical sequence group (ISG) can provide phylogenetic markers to investigate the microbial community in an environmental ecosystem. This strategy can be applied to a huge amount of short sequences obtained from next-generation sequencers, as showing that tRNADB-CE is a well-timed database in the era of big sequence data. It is also discussed that batch-learning self-organizing-map with oligonucleotide composition is useful for efficient knowledge discovery from big sequence data.
Properties of a U1 RNA enhancer-like sequence.

PubMed Central

Ciliberto, G; Palla, F; Tebb, G; Mattaj, I W; Philipson, L

1987-01-01

The properties of a X.laevis U1B snRNA gene enhancer have been studied by microinjection in Xenopus oocytes. The enhancer-like sequence, defined as a short DNA stretch that is able to activate transcription in an orientation independent manner, is interchangeable between different U snRNA genes. The enhancer sequence alone does not, however, efficiently activate transcription from an SV40 pol II promoter but regains its activity when combined with the U-gene specific proximal sequence element. DNase I protection experiments show that the X.laevis U1B enhancer can interact specifically with a nuclear factor present in mammalian cells. Images PMID:3031597
Chromatin and RNAi factors protect the C. elegans germline against repetitive sequences

PubMed Central

Robert, Valérie J.P.; Sijen, Titia; van Wolfswinkel, Josien; Plasterk, Ronald H.A.

2005-01-01

Protection of genomes against invasion by repetitive sequences, such as transposons, viruses, and repetitive transgenes, involves strong and selective silencing of these sequences. During silencing of repetitive transgenes, a trans effect (“cosuppression”) occurs that results in silencing of cognate endogenous genes. Here we report RNA interference (RNAi) screens performed to catalog genes required for cosuppression in the Caenorhabditis elegans germline. We find factors with a putative role in chromatin remodeling and factors involved in RNAi. Together with molecular data also presented in this study, these results suggest that in C. elegans repetitive sequences trigger transcriptional gene silencing using RNAi and chromatin factors. PMID:15774721
Isolation and characterization of a water stress-specific genomic gene, pwsi 18, from rice.

PubMed

Joshee, N; Kisaka, H; Kitagawa, Y

1998-01-01

One of the water stress-specific cDNA clones of rice characterised previously, wsi18, was selected for further study. The wsi18 gene can be induced by water stress conditions such as mannitol, NaCl, and dryness, but not by ABA, cold, or heat. A genomic clone for wsi18, pwsi18, contained about 1.7 kbp of the 5' upstream sequence, two introns, and the full coding sequence. The 5'-upstream sequence of pwsi18 contained putative cis-acting elements, namely an ABA-responsive element (ABRE), three G-boxes, three E-boxes, a MEF-2 sequence, four direct and two inverted repeats, and four sequences similar to DRE, which is involved in the dehydration response of Arabidopsis genes. The gusA reporter gene under the control of the pwsi18 promoter showed transient expression in response to water stress. Deletion of the downstream DRE-like sequence between the distal G-boxes-2 and -3 resulted in rather low GUS expression.
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

PubMed Central

Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

2010-01-01

Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
Epoxyalkane:Coenzyme M Transferase Gene Diversity and Distribution in Groundwater Samples from Chlorinated-Ethene-Contaminated Sites

PubMed Central

Liu, Xikun

2016-01-01

ABSTRACT Epoxyalkane:coenzyme M transferase (EaCoMT) plays a critical role in the aerobic biodegradation and assimilation of alkenes, including ethene, propene, and the toxic chloroethene vinyl chloride (VC). To improve our understanding of the diversity and distribution of EaCoMT genes in the environment, novel EaCoMT-specific terminal-restriction fragment length polymorphism (T-RFLP) and nested-PCR methods were developed and applied to groundwater samples from six different contaminated sites. T-RFLP analysis revealed 192 different EaCoMT T-RFs. Using clone libraries, we retrieved 139 EaCoMT gene sequences from these samples. Phylogenetic analysis revealed that a majority of the sequences (78.4%) grouped with EaCoMT genes found in VC- and ethene-assimilating Mycobacterium strains and Nocardioides sp. strain JS614. The four most-abundant T-RFs were also matched with EaCoMT clone sequences related to Mycobacterium and Nocardioides strains. The remaining EaCoMT sequences clustered within two emergent EaCoMT gene subgroups represented by sequences found in propene-assimilating Gordonia rubripertincta strain B-276 and Xanthobacter autotrophicus strain Py2. EaCoMT gene abundance was positively correlated with VC and ethene concentrations at the sites studied. IMPORTANCE The EaCoMT gene plays a critical role in assimilation of short-chain alkenes, such as ethene, VC, and propene. An improved understanding of EaCoMT gene diversity and distribution is significant to the field of bioremediation in several ways. The expansion of the EaCoMT gene database and identification of incorrectly annotated EaCoMT genes currently in the database will facilitate improved design of environmental molecular diagnostic tools and high-throughput sequencing approaches for future bioremediation studies. Our results further suggest that potentially significant aerobic VC degraders in the environment are not well represented in pure culture. Future research should aim to isolate and characterize aerobic VC-degrading bacteria from these underrepresented groups. PMID:27016563
Defining the Estimated Core Genome of Bacterial Populations Using a Bayesian Decision Model

PubMed Central

van Tonder, Andries J.; Mistry, Shilan; Bray, James E.; Hill, Dorothea M. C.; Cody, Alison J.; Farmer, Chris L.; Klugman, Keith P.; von Gottberg, Anne; Bentley, Stephen D.; Parkhill, Julian; Jolley, Keith A.; Maiden, Martin C. J.; Brueggemann, Angela B.

2014-01-01

The bacterial core genome is of intense interest and the volume of whole genome sequence data in the public domain available to investigate it has increased dramatically. The aim of our study was to develop a model to estimate the bacterial core genome from next-generation whole genome sequencing data and use this model to identify novel genes associated with important biological functions. Five bacterial datasets were analysed, comprising 2096 genomes in total. We developed a Bayesian decision model to estimate the number of core genes, calculated pairwise evolutionary distances (p-distances) based on nucleotide sequence diversity, and plotted the median p-distance for each core gene relative to its genome location. We designed visually-informative genome diagrams to depict areas of interest in genomes. Case studies demonstrated how the model could identify areas for further study, e.g. 25% of the core genes with higher sequence diversity in the Campylobacter jejuni and Neisseria meningitidis genomes encoded hypothetical proteins. The core gene with the highest p-distance value in C. jejuni was annotated in the reference genome as a putative hydrolase, but further work revealed that it shared sequence homology with beta-lactamase/metallo-beta-lactamases (enzymes that provide resistance to a range of broad-spectrum antibiotics) and thioredoxin reductase genes (which reduce oxidative stress and are essential for DNA replication) in other C. jejuni genomes. Our Bayesian model of estimating the core genome is principled, easy to use and can be applied to large genome datasets. This study also highlighted the lack of knowledge currently available for many core genes in bacterial genomes of significant global public health importance. PMID:25144616

PCR Primers to Study the Diversity of Expressed Fungal Genes Encoding Lignocellulolytic Enzymes in Soils Using High-Throughput Sequencing

PubMed Central

Barbi, Florian; Bragalini, Claudia; Vallon, Laurent; Prudent, Elsa; Dubost, Audrey; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia

2014-01-01

Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5) and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2), active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may lead to the discovery of complex patterns in gene expression of soil fungal communities. PMID:25545363
De novo sequencing and analysis of the transcriptome of Panax ginseng in the leaf-expansion period.

PubMed

Liu, Shichao; Wang, Siming; Liu, Meichen; Yang, Fei; Zhang, Hui; Liu, Shiyang; Wang, Qun; Zhao, Yu

2016-08-01

Panax ginseng, a traditional Chinese medicine, is used worldwide for its variety of health benefits and its treatment efficacy. However, it is difficult to cultivate due to its vulnerability to environmental stresses. The present study provided the first report, to the best of our knowledge, of transcriptome analysis of ginseng at the leaf‑expansion stage. Using the Illumina sequencing platform, >40,000,000 high‑quality paired‑end reads were obtained and assembled into 100,533 unique sequences. When the sequences were searched against the publicly available National Center for Biotechnology Information protein database using The Basic Local Alignment Search Tool, 61,599 sequences exhibited similarity to known proteins. Functional annotation and classification, including use of the Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases, revealed that the activated genes in ginseng were predominantly ribonuclease‑like storage genes, environmental stress genes, pathogenesis-related genes and other antioxidant genes. A number of candidate genes in environmental stress‑associated pathways were also identified. These novel data provide useful information on the growth and development stages of ginseng, and serve as an important public information platform for further understanding of the molecular mechanisms and functional genomics of ginseng.
Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

PubMed Central

Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

2011-01-01

Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716
Immunoglobulin heavy variable (IGHV) genes and alleles: new entities, new names and implications for research and prognostication in chronic lymphocytic leukaemia.

PubMed

Xochelli, Aliki; Agathangelidis, Andreas; Kavakiotis, Ioannis; Minga, Evangelia; Sutton, Lesley Ann; Baliakas, Panagiotis; Chouvarda, Ioanna; Giudicelli, Véronique; Vlahavas, Ioannis; Maglaveras, Nikos; Bonello, Lisa; Trentin, Livio; Tedeschi, Alessandra; Panagiotidis, Panagiotis; Geisler, Christian; Langerak, Anton W; Pospisilova, Sarka; Jelinek, Diane F; Oscier, David; Chiorazzi, Nicholas; Darzentas, Nikos; Davi, Fred; Ghia, Paolo; Rosenquist, Richard; Hadzidimitriou, Anastasia; Belessi, Chrysoula; Lefranc, Marie-Paule; Stamatopoulos, Kostas

2015-01-01

Νext generation sequencing studies in Homo sapiens have identified novel immunoglobulin heavy variable (IGHV) genes and alleles necessitating changes in the international ImMunoGeneTics information system (IMGT) GENE-DB and reference directories of IMGT/V-QUEST. In chronic lymphocytic leukaemia (CLL), the somatic hypermutation (SHM) status of the clonotypic rearranged IGHV gene is strongly associated with patient outcome. Correct determination of this parameter strictly depends on the comparison of the nucleotide sequence of the clonotypic rearranged IGHV gene with that of the closest germline counterpart. Consequently, changes in the reference directories could, in principle, affect the correct interpretation of the IGHV mutational status in CLL. To this end, we analyzed 8066 productive IG heavy chain (IGH) rearrangement sequences from our consortium both before and after the latest update of the IMGT/V-QUEST reference directory. Differences were identified in 405 cases (5 % of the cohort). In 291/405 sequences (71.9 %), changes concerned only the IGHV gene or allele name, whereas a change in the percent germline identity (%GI) was noted in 114/405 (28.1 %) sequences; in 50/114 (43.8 %) sequences, changes in the %GI led to a change in the mutational set. In conclusion, recent changes in the IMGT reference directories affected the interpretation of SHM in a sizeable number of IGH rearrangement sequences from CLL patients. This indicates that both physicians and researchers should consider a re-evaluation of IG sequence data, especially for those IGH rearrangement sequences that, up to date, have a GI close to 98 %, where caution is warranted.
Sequence diversity within the reovirus S2 gene: reovirus genes reassort in nature, and their termini are predicted to form a panhandle motif.

PubMed Central

Chapell, J D; Goral, M I; Rodgers, S E; dePamphilis, C W; Dermody, T S

1994-01-01

To better understand genetic diversity within mammalian reoviruses, we determined S2 nucleotide and deduced sigma 2 amino acid sequences of nine reovirus strains and compared these sequences with those of prototype strains of the three reovirus serotypes. The S2 gene and sigma 2 protein are highly conserved among the four type 1, one type 2, and seven type 3 strains studied. Phylogenetic analyses based on S2 nucleotide sequences of the 12 reovirus strains indicate that diversity within the S2 gene is independent of viral serotype. Additionally, we found marked topological differences between phylogenetic trees generated from S1 and S2 gene nucleotide sequences of the seven type 3 strains. These results demonstrate that reovirus S1 and S2 genes have distinct evolutionary histories, thus providing phylogenetic evidence for lateral transfer of reovirus genes in nature. When variability among the 12 sigma 2-encoding S2 nucleotide sequences was analyzed at synonymous positions, we found that approximately 60 nucleotides at the 5' terminus and 30 nucleotides at the 3' terminus were markedly conserved in comparison with other sigma 2-encoding regions of S2. Predictions of RNA secondary structures indicate that the more conserved S2 sequences participate in the formation of an extended region of duplex RNA interrupted by a pair of stem-loops. Among the 12 deduced sigma 2 amino acid sequences examined, substitutions were observed at only 11% of amino acid positions. This finding suggests that constraints on the structure or function of sigma 2, perhaps in part because of its location in the virion core, have limited sequence diversity within this protein. PMID:8289378
Using microarrays to identify positional candidate genes for QTL: the case study of ACTH response in pigs.

PubMed

Jouffe, Vincent; Rowe, Suzanne; Liaubet, Laurence; Buitenhuis, Bart; Hornshøj, Henrik; SanCristobal, Magali; Mormède, Pierre; de Koning, D J

2009-07-16

Microarray studies can supplement QTL studies by suggesting potential candidate genes in the QTL regions, which by themselves are too large to provide a limited selection of candidate genes. Here we provide a case study where we explore ways to integrate QTL data and microarray data for the pig, which has only a partial genome sequence. We outline various procedures to localize differentially expressed genes on the pig genome and link this with information on published QTL. The starting point is a set of 237 differentially expressed cDNA clones in adrenal tissue from two pig breeds, before and after treatment with adrenocorticotropic hormone (ACTH). Different approaches to localize the differentially expressed (DE) genes to the pig genome showed different levels of success and a clear lack of concordance for some genes between the various approaches. For a focused analysis on 12 genes, overlapping QTL from the public domain were presented. Also, differentially expressed genes underlying QTL for ACTH response were described. Using the latest version of the draft sequence, the differentially expressed genes were mapped to the pig genome. This enabled co-location of DE genes and previously studied QTL regions, but the draft genome sequence is still incomplete and will contain many errors. A further step to explore links between DE genes and QTL at the pathway level was largely unsuccessful due to the lack of annotation of the pig genome. This could be improved by further comparative mapping analyses but this would be time consuming. This paper provides a case study for the integration of QTL data and microarray data for a species with limited genome sequence information and annotation. The results illustrate the challenges that must be addressed but also provide a roadmap for future work that is applicable to other non-model species.
Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

PubMed Central

Deng, Youping; Dong, Yinghua; Thodima, Venkata; Clem, Rollie J; Passarelli, A Lorena

2006-01-01

Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses. PMID:17052344
The complete mitochondrial genome of Conus tulipa (Neogastropoda: Conidae).

PubMed

Chen, Po-Wei; Hsiao, Sheng-Tai; Huang, Chih-Wei; Chen, Kao-Sung; Tseng, Chen-Te; Wu, Wen-Lung; Hwang, Deng-Fwu

2016-07-01

The complete mitogenome sequence of the cone snail Conus tulipa (Linnaeus, 1758) has been sequenced by next-generation sequencing method. The assembled mitogenome is 16,599 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The overall base composition of C. tulipa is 28.7% A, 15.2% C, 18.4% G and 37.7% T. It shows 81.1% identity to the cone snail C. consors, 78.5% to C. borgesi and 77.5% to C. textile. Using the 13 protein-coding genes and 2 ribosomal RNA genes of C. tulipa in this study, together with 18 other closely species, we constructed the species phylogenetic tree to verify the accuracy and utility of new determined mitogenome sequence. The complete mitogenome of the C. tulipa provides an essential and important DNA molecular data for further phylogeography and evolutionary analysis for cone snail phylogeny.
Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

PubMed Central

Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

2014-01-01

RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209
The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).

PubMed

Choi, Kyoung Su; Park, SeonJoo

2016-09-01

The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.
Sequence variations of the alpha-globin genes: scanning of high CG content genes with DHPLC and DG-DGGE.

PubMed

Lacerra, Giuseppina; Fiorito, Mirella; Musollino, Gennaro; Di Noce, Francesca; Esposito, Maria; Nigro, Vincenzo; Gaudiano, Carlo; Carestia, Clementina

2004-10-01

The alpha-globin chains are encoded by two duplicated genes (HBA2 and HBA1, 5'-3') showing overall sequence homology >96% and average CG content >60%. alpha-Thalassemia, the most prevalent worldwide autosomal recessive disorder, is a hereditary anemia caused by sequence variations of these genes in about 25% of carriers. We evaluated the overall sensitivity and suitability of DHPLC and DG-DGGE in scanning both the alpha-globin genes by carrying out a retrospective analysis of 19 variant alleles in 29 genotypes. The HBA2 alleles c.1A>G, c.79G>A, and c.281T>G, and the HBA1 allele c.475C>A were new. Three pathogenic sequence variations were associated in cis with nonpathogenic variations in all families studied; they were the HBA2 variation c.2T>C associated with c.-24C>G, and the HBA2 variations c.391G>C and c.427T>C, both associated with c.565G>A. We set up original experimental conditions for DHPLC and DG-DGGE and analyzed 10 normal subjects, 46 heterozygotes, seven homozygotes, seven compound heterozygotes, and six compound heterozygotes for a hybrid gene. Both the methodologies gave reproducible results and no false-positive was detected. DHPLC showed 100% sensitivity and DG-DGGE nearly 90%. About 100% of the sequence from the cap site to the polyA addition site could be scanned by DHPLC, about 87% by DG-DGGE. It is noteworthy that the three most common pathogenic sequence variations (HBA2 alleles c.2T>C, c.95+2_95+6del, and c.523A>G) were unambiguously detected by both the methodologies. Genotype diagnosis must be confirmed with PCR sequencing of single amplicons or with an allele-specific method. This study can be helpful for scanning genes with high CG content and offers a model suitable for duplicated genes with high homology. Copyright 2004 Wiley-Liss, Inc.
Bioinformatic Characterization of Genes and Proteins Involved in Blood Clotting in Lampreys.

PubMed

Doolittle, Russell F

2015-10-01

Lampreys and hagfish are the earliest diverging of extant vertebrates and are obvious targets for investigating the origins of complex biochemical systems found in mammals. Currently, the simplest approach for such inquiries is to search for the presence of relevant genes in whole genome sequence (WGS) assemblies. Unhappily, in the past a high-quality complete genome sequence has not been available for either lampreys or hagfish, precluding the possibility of proving gene absence. Recently, improved but still incomplete genome assemblies for two species of lamprey have been posted, and, taken together with an extensive collection of short sequences in the NCBI trace archive, they have made it possible to make reliable counts for specific gene families. Particularly, a multi-source tactic has been used to study the lamprey blood clotting system with regard to the presence and absence of genes known to occur in higher vertebrates. As was suggested in earlier studies, lampreys lack genes for coagulation factors VIII and IX, both of which are critical for the "intrinsic" clotting system and responsible for hemophilia in humans. On the other hand, they have three each of genes for factors VII and X, participants in the "extrinsic" clotting system. The strategy of using raw trace sequence "reads" together with partial WGS assemblies for lampreys can be used in studies on the early evolution of other biochemical systems in vertebrates.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples.

PubMed

Laird Smith, Melissa; Murrell, Ben; Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E; Kosakovsky Pond, Sergei L; Smith, Davey M

2016-07-01

The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences' Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data.
Nucleotide sequence of the ribosomal RNA gene of Physarum polycephalum: intron 2 and its flanking regions of the 26S rRNA gene.

PubMed Central

Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y

1981-01-01

The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776
Phenotypic and genotypic discrepancy of Streptococcus pneumoniae strains isolated from Asian countries.

PubMed

Ko, Kwan Soo; Oh, Won Sup; Peck, Kyong Ran; Lee, Jang Ho; Lee, Nam Yong; Song, Jae-Hoon

2005-07-01

Non-typeable isolates of Streptococcus pneumoniae collected from Asian countries were characterized by optochin susceptibility test, bile solubility test, multilocus sequence typing of housekeeping genes, amplification of virulence-related genes, 16S rDNA-RsaI digestion, and 16S rDNA sequencing. Six of 54 non-typeable pneumococcal isolates showed divergence of gene sequences of recP and xpt from typical pneumococcal strains. Of these six atypical pneumococcal strains, two showed different results in optochin susceptibility or bile solubility test from typical pneumococcal strains. All six isolates showed high sequence dissimilarities of multilocus sequence typing, 16S rDNA sequences, and lytA sequences from typical S. pneumoniae strains. Data from this study suggest that classic tests such as optochin susceptibility and bile solubility tests may lead to incorrect identification of S. pneumoniae. These atypical strains may belong to different bacterial species from S. pneumoniae.
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius)

PubMed Central

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-01-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus. Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research (FSHR and LHR) as well as reproduction-linked polymorphisms and breeding programs. PMID:27844002
Isolation and molecular characterization of partial FSH and LH receptor genes in Arabian camels (Camelus dromedarius).

PubMed

Jelokhani-Niaraki, Saber; Tahmoorespur, Mojtaba; Bitaraf-Sani, Morteza

2015-06-01

Very little is known about LHR and FSHR genes of domestic dromedary camels. The main objective of this study was to determine and analyze partial genomic regions of FSHR and LHR genes in dromedary camels for the first time. To this end, a total of50 DNA samples belonging to dromedary camels raised in Iran were sent for sequencing (25 samples of each gene). We compared the nucleotide sequences of Camelus dromedarius with corresponding sequences of previously published FSHR and LHR genes in bactrian camels and other species. According to the data, the same nucleotide variation was identified in both regions of the two camel species. The alignment of deduced protein sequences of the two different species revealed an amino acid variation at the FSHR region. No evidence of amino acid variation was observed, however, in LHR sequences. Phylogenetic analysis indicated that both camel species had a close relationship and clustered together in a separate branch. This was further confirmed by genetic distance values illustrating significant sequence identity between Camelus dromedarius and Camelus bactrianus . Interestingly, sequence comparisons revealed heterozygote patterns in FSHR sequences isolated from dromedary camels of Iran. In comparison to other species, this camel contains three amino acid substitutions at 5, 67, and 105 positions in the FSHR coding region. These positions are found exclusively in camels and can be considered as species specific. The results of our study can be used for hormone functionality research ( FSHR and LHR ) as well as reproduction-linked polymorphisms and breeding programs.
Genetic variations in merozoite surface antigen genes of Babesia bovis detected in Vietnamese cattle and water buffaloes.

PubMed

Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Tuvshintulga, Bumduuren; Hayashida, Kyoko; Igarashi, Ikuo; Inoue, Noboru; Long, Phung Thang; Lan, Dinh Thi Bich

2015-03-01

The genes that encode merozoite surface antigens (MSAs) in Babesia bovis are genetically diverse. In this study, we analyzed the genetic diversity of B. bovis MSA-1, MSA-2b, and MSA-2c genes in Vietnamese cattle and water buffaloes. Blood DNA samples from 258 cattle and 49 water buffaloes reared in the Thua Thien Hue province of Vietnam were screened with a B. bovis-specific diagnostic PCR assay. The B. bovis-positive DNA samples (23 cattle and 16 water buffaloes) were then subjected to PCR assays to amplify the MSA-1, MSA-2b, and MSA-2c genes. Sequencing analyses showed that the Vietnamese MSA-1 and MSA-2b sequences are genetically diverse, whereas MSA-2c is relatively conserved. The nucleotide identity values for these MSA gene sequences were similar in the cattle and water buffaloes. Consistent with the sequencing data, the Vietnamese MSA-1 and MSA-2b sequences were dispersed across several clades in the corresponding phylogenetic trees, whereas the MSA-2c sequences occurred in a single clade. Cattle- and water-buffalo-derived sequences also often clustered together on the phylogenetic trees. The Vietnamese MSA-1, MSA-2b, and MSA-2c sequences were then screened for recombination with automated methods. Of the seven recombination events detected, five and two were associated with the MSA-2b and MSA-2c recombinant sequences, respectively, whereas no MSA-1 recombinants were detected among the sequences analyzed. Recombination between the sequences derived from cattle and water buffaloes was very common, and the resultant recombinant sequences were found in both host animals. These data indicate that the genetic diversity of the MSA sequences does not differ between cattle and water buffaloes in Vietnam. They also suggest that recombination between the B. bovis MSA sequences in both cattle and water buffaloes might contribute to the genetic variation in these genes in Vietnam. Copyright © 2015 Elsevier B.V. All rights reserved.
Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

PubMed

Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

2015-07-01

The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.
The unique C- and N-terminal sequences of Metallothionein isoform 3 mediate growth inhibition and Vectorial active transport in MCF-7 cells.

PubMed

Voels, Brent; Wang, Liping; Sens, Donald A; Garrett, Scott H; Zhang, Ke; Somji, Seema

2017-05-25

The 3rd isoform of the metallothionein (MT3) gene family has been shown to be overexpressed in most ductal breast cancers. A previous study has shown that the stable transfection of MCF-7 cells with the MT3 gene inhibits cell growth. The goal of the present study was to determine the role of the unique C-terminal and N-terminal sequences of MT3 on phenotypic properties and gene expression profiles of MCF-7 cells. MCF-7 cells were transfected with various metallothionein gene constructs which contain the insertion or the removal of the unique MT3 C- and N-terminal domains. Global gene expression analysis was performed on the MCF-7 cells containing the various constructs and the expression of the unique C- and N- terminal domains of MT3 was correlated to phenotypic properties of the cells. The results of the present study demonstrate that the C-terminal sequence of MT3, in the absence of the N-terminal sequence, induces dome formation in MCF-7 cells, which in cell cultures is the phenotypic manifestation of a cell's ability to perform vectorial active transport. Global gene expression analysis demonstrated that the increased expression of the GAGE gene family correlated with dome formation. Expression of the C-terminal domain induced GAGE gene expression, whereas the N-terminal domain inhibited GAGE gene expression and that the effect of the N-terminal domain inhibition was dominant over the C-terminal domain of MT3. Transfection with the metallothionein 1E gene increased the expression of GAGE genes. In addition, both the C- and the N-terminal sequences of the MT3 gene had growth inhibitory properties, which correlated to an increased expression of the interferon alpha-inducible protein 6. Our study shows that the C-terminal domain of MT3 confers dome formation in MCF-7 cells and the presence of this domain induces expression of the GAGE family of genes. The differential effects of MT3 and metallothionein 1E on the expression of GAGE genes suggests unique roles of these genes in the development and progression of breast cancer. The finding that interferon alpha-inducible protein 6 expression is associated with the ability of MT3 to inhibit growth needs further investigation.

Identification of an expressed gene in Dipylidium caninum.

PubMed

Miranda, Rodrigo R C; Costa-Júnior, Livio M; Campos, Artur K; Santos, Hudson A; Rabelo, Elida M L

2004-10-01

Recombinant DNA studies have been focused on developing vaccines to different cestodes. But few studies involving Dipylidium caninum molecular biology and genes have been done. Only partial sequences of mitochondrial DNA and ribosomal RNA gene are available in databases. Any molecular work with this parasite, including epidemiology, study of drug-resistant strains, and vaccine development, is hampered by the lack of knowledge of its genome. Thus, the knowledge of specific genes of different developmental stages of D. caninum is crucial to locate potential targets to be used as candidates to develop a vaccine and/or new drugs against this parasite. Here we report, for the first time, the sequencing of a fragment of a D. caninum expressed gene.
The complete mitochondrial genome sequence of the Tibetan red fox (Vulpes vulpes montana).

PubMed

Zhang, Jin; Zhang, Honghai; Zhao, Chao; Chen, Lei; Sha, Weilai; Liu, Guangshuai

2015-01-01

In this study, the complete mitochondrial genome of the Tibetan red fox (Vulpes Vulpes montana) was sequenced for the first time using blood samples obtained from a wild female red fox captured from Lhasa in Tibet, China. Qinghai--Tibet Plateau is the highest plateau in the world with an average elevation above 3500 m. Sequence analysis showed it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region (CR). The variable tandem repeats in CR is the main reason of the length variability of mitochondrial genome among canide animals.
Complete mitochondrial genome of Eagle Owl (Bubo bubo, Strigiformes; Strigidae) from China.

PubMed

Hengjiu, Tian; Jianwei, Ji; Shi, Yang; Zhiming, Zhang; Laghari, Muhammad Younis; Narejo, Naeem Tariq; Lashari, Punhal

2016-01-01

In the present study, the complete mitochondrial genome sequence of Bubo bubo using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,250 bp, with the base composition of 29.88% A, 34.16% C, 14.35% G, and 21.58% T. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes and 22 transfer RNA genes) and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Bubo bubo provides an important data set for further investigation on the phylogenetic relationships within Strigiformes.
Cloning of a CACTA transposon-like insertion in intron I of tomato invertase Lin5 gene and identification of transposase-like sequences of Solanaceae species.

PubMed

Proels, Reinhard K; Roitsch, Thomas

2006-03-01

Very few CACTA transposon-like sequences have been described in Solanaceae species. Sequence information has been restricted to partial transposase (TPase)-like fragments, and no target gene of CACTA-like transposon insertion has been described in tomato to date. In this manuscript, we report on a CACTA transposon-like insertion in intron I of tomato (Lycopersicon esculentum) invertase gene Lin5 and TPase-like sequences of several Solanaceae species. Consensus primers deduced from the TPase region of the tomato CACTA transposon-like element allowed the amplification of similar sequences from various Solanaceae species of different subfamilies including Solaneae (Solanum tuberosum), Cestreae (Nicotiana tabacum) and Datureae (Datura stramonium). This demonstrates the ubiquitous presence of CACTA-like elements in Solanaceae genomes. The obtained partial sequences are highly conserved, and allow further detection and detailed analysis of CACTA-like transposons throughout Solanaceae species. CACTA-like transposon sequences make possible the evaluation of their use for genome analysis, functional studies of genes and the evolutionary relationships between plant species.
The nuclear 18S ribosomal RNA gene as a source of phylogenetic information in the genus Taenia.

PubMed

Yan, Hongbin; Lou, Zhongzi; Li, Li; Ni, Xingwei; Guo, Aijiang; Li, Hongmin; Zheng, Yadong; Dyachenko, Viktor; Jia, Wanzhong

2013-03-01

Most species of the genus Taenia are of considerable medical and veterinary significance. In this study, complete nuclear 18S rRNA gene sequences were obtained from seven members of genus Taenia [Taenia multiceps, Taenia saginata, Taenia asiatica, Taenia solium, Taenia pisiformis, Taenia hydatigena, and Taenia taeniaeformis] and a phylogeny inferred using these sequences. Most of the variable sites fall within the variable regions, V1-V5. We show that sequences from the nuclear 18S ribosomal RNA gene have considerable promise as sources of phylogenetic information within the genus Taenia. Furthermore, given that almost all the variable sites lie within defined variable portions of that gene, it will be appropriate and economical to sequence only those regions for additional species of Taenia.
A transcriptome resource for the koala (Phascolarctos cinereus): insights into koala retrovirus transcription and sequence diversity.

PubMed

Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter

2014-09-11

The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org.
MicroRNAs Form Triplexes with Double Stranded DNA at Sequence-Specific Binding Sites; a Eukaryotic Mechanism via which microRNAs Could Directly Alter Gene Expression

PubMed Central

Grace, Christy R.; Ferreira, Antonio M.; Waddell, M. Brett; Ridout, Granger; Naeve, Deanna; Leuze, Michael; LoCascio, Philip F.; Panetta, John C.; Wilkinson, Mark R.; Pui, Ching-Hon; Naeve, Clayton W.; Uberbacher, Edward C.; Bonten, Erik J.; Evans, William E.

2016-01-01

MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA) and typically down-regulating their stability or translation. Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence (i.e., NMR, FRET, SPR) that purine or pyrimidine-rich microRNAs of appropriate length and sequence form triple-helical structures with purine-rich sequences of duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show that several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 × 10−16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. This work has thus revealed a new mechanism by which microRNAs could interact with gene promoter regions to modify gene transcription. PMID:26844769
Development of PCR primers specific for the amplification and direct sequencing of gyrB genes from microbacteria, order Actinomycetales.

PubMed

Richert, Kathrin; Brambilla, Evelyne; Stackebrandt, Erko

2005-01-01

PCR primer sets were developed for the specific amplification and sequence analyses encoding the gyrase subunit B (gyrB) of members of the family Microbacteriaceae, class Actinobacteria. The family contains species highly related by 16S rRNA gene sequence analyses. In order to test if the gene sequence analysis of gyrB is appropriate to discriminate between closely related species, we evaluate the 16S rRNA gene phylogeny of its members. As the published universal primer set for gyrB failed to amplify the responding gene of the majority of the 80 type strains of the family, three new primer sets were identified that generated fragments with a composite sequence length of about 900 nt. However, the amplification of all three fragments was successful only in 25% of the 80 type strains. In this study, the substitution frequencies in genes encoding gyrase and 16S rDNA were compared for 10 strains of nine genera. The frequency of gyrB nucleotide substitution is significantly higher than that of the 16S rDNA, and no linear correlation exists between the similarities of both molecules among members of the Microbacteriaceae. The phylogenetic analyses using the gyrB sequences provide higher resolution than using 16S rDNA sequences and seem able to discriminate between closely related species.
Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

PubMed

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-06-01

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

PubMed

Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

2013-01-01

Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
An analysis of the sequence of the BAD gene among patients with maturity-onset diabetes of the young (MODY).

PubMed

Antosik, Karolina; Gnyś, Piotr; Jarosz-Chobot, Przemysława; Myśliwiec, Małgorzata; Szadkowska, Agnieszka; Małecki, Maciej; Młynarski, Wojciech; Borowiec, Maciej

2017-01-01

Monogenic diabetes is a rare disease caused by single gene mutations. Maturity onset diabetes of the young (MODY) is one of the major forms of monogenic diabetes recognised in the paediatric population. To date, 13 genes have been related to MODY development. The aim of the study was to analyse the sequence of the BCL2-associated agonist of cell death (BAD) gene in patients with clinical suspicion of GCK-MODY, but who were negative for glucokinase (GCK) gene mutations. A group of 122 diabetic patients were recruited from the "Polish Registry for Paediatric and Adolescent Diabetes - nationwide genetic screening for monogenic diabetes" project. The molecular testing was performed by Sanger sequencing. A total of 10 sequence variants of the BAD gene were identified in 122 analysed diabetic patients. Among the analysed patients suspected of MODY, one possible pathogenic variant was identified in one patient; however, further confirmation is required for a certain identification.
Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

PubMed Central

2012-01-01

Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Regulatory elements of the floral homeotic gene AGAMOUS identified by phylogenetic footprinting and shadowing.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hong, R. L., Hamaguchi, L., Busch, M. A., and Weigel, D.

2003-06-01

OAK-B135 In Arabidopsis thaliana, cis-regulatory sequences of the floral homeotic gene AGAMOUS (AG) are located in the second intron. This 3 kb intron contains binding sites for two direct activators of AG, LEAFY (LFY) and WUSCHEL (WUS), along with other putative regulatory elements. We have used phylogenetic footprinting and the related technique of phylogenetic shadowing to identify putative cis-regulatory elements in this intron. Among 29 Brassicaceae, several other motifs, but not the LFY and WUS binding sites previously identified, are largely invariant. Using reporter gene analyses, we tested six of these motifs and found that they are all functionally importantmore » for activity of AG regulatory sequences in A. thaliana. Although there is little obvious sequence similarity outside the Brassicaceae, the intron from cucumber AG has at least partial activity in A. thaliana. Our studies underscore the value of the comparative approach as a tool that complements gene-by-gene promoter dissection, but also highlight that sequence-based studies alone are insufficient for a complete identification of cis-regulatory sites.« less
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur).

PubMed

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens.
RNA sequencing to study gene expression and SNP variations associated with growth in zebrafish fed a plant protein-based diet.

PubMed

Ulloa, Pilar E; Rincón, Gonzalo; Islas-Trejo, Alma; Araneda, Cristian; Iturra, Patricia; Neira, Roberto; Medrano, Juan F

2015-06-01

The objectives of this study were to measure gene expression in zebrafish and then identify SNP to be used as potential markers in a growth association study. We developed an approach where muscle samples collected from low- and high-growth fish were analyzed using RNA-Sequencing (RNA-seq), and SNP were chosen from the genes that were differentially expressed between the low and high groups. A population of 24 families was fed a plant protein-based diet from the larval to adult stages. From a total of 440 males, 5 % of the fish from both tails of the weight gain distribution were selected. Total RNA was extracted from individual muscle of 8 low-growth and 8 high-growth fish. Two pooled RNA-Seq libraries were prepared for each phenotype using 4 fish per library. Libraries were sequenced using the Illumina GAII Sequencer and analyzed using the CLCBio genomic workbench software. One hundred and twenty-four genes were differentially expressed between phenotypes (p value < 0.05 and FDR < 0.2). From these genes, 164 SNP were selected and genotyped in 240 fish samples. Marker-trait analysis revealed 5 SNP associated with growth in key genes (Nars, Lmod2b, Cuzd1, Acta1b, and Plac8l1). These genes are good candidates for further growth studies in fish and to consider for identification of potential SNPs associated with different growth rates in response to a plant protein-based diet.
Complexity and Entropy Analysis of DNMT1 Gene

USDA-ARS?s Scientific Manuscript database

Background: The application of complexity information on DNA sequence and protein in biological processes are well established in this study. Available sequences for DNMT1 gene, which is a maintenance methyltransferase is responsible for copying DNA methylation patterns to the daughter strands durin...

Pooled Resequencing of 122 Ulcerative Colitis Genes in a Large Dutch Cohort Suggests Population-Specific Associations of Rare Variants in MUC2.

PubMed

Visschedijk, Marijn C; Alberts, Rudi; Mucha, Soren; Deelen, Patrick; de Jong, Dirk J; Pierik, Marieke; Spekhorst, Lieke M; Imhann, Floris; van der Meulen-de Jong, Andrea E; van der Woude, C Janneke; van Bodegraven, Adriaan A; Oldenburg, Bas; Löwenberg, Mark; Dijkstra, Gerard; Ellinghaus, David; Schreiber, Stefan; Wijmenga, Cisca; Rivas, Manuel A; Franke, Andre; van Diemen, Cleo C; Weersma, Rinse K

2016-01-01

Genome-wide association studies have revealed several common genetic risk variants for ulcerative colitis (UC). However, little is known about the contribution of rare, large effect genetic variants to UC susceptibility. In this study, we performed a deep targeted re-sequencing of 122 genes in Dutch UC patients in order to investigate the contribution of rare variants to the genetic susceptibility to UC. The selection of genes consists of 111 established human UC susceptibility genes and 11 genes that lead to spontaneous colitis when knocked-out in mice. In addition, we sequenced the promoter regions of 45 genes where known variants exert cis-eQTL-effects. Targeted pooled re-sequencing was performed on DNA of 790 Dutch UC cases. The Genome of the Netherlands project provided sequence data of 500 healthy controls. After quality control and prioritization based on allele frequency and pathogenicity probability, follow-up genotyping of 171 rare variants was performed on 1021 Dutch UC cases and 1166 Dutch controls. Single-variant association and gene-based analyses identified an association of rare variants in the MUC2 gene with UC. The associated variants in the Dutch population could not be replicated in a German replication cohort (1026 UC cases, 3532 controls). In conclusion, this study has identified a putative role for MUC2 on UC susceptibility in the Dutch population and suggests a population-specific contribution of rare variants to UC.
Novel aromatic ring-hydroxylating dioxygenase genes from coastal marine sediments of Patagonia

PubMed Central

Lozada, Mariana; Riva Mercadal, Juan P; Guerrero, Leandro D; Di Marzio, Walter D; Ferrero, Marcela A; Dionisi, Hebe M

2008-01-01

Background Polycyclic aromatic hydrocarbons (PAHs), widespread pollutants in the marine environment, can produce adverse effects in marine organisms and can be transferred to humans through seafood. Our knowledge of PAH-degrading bacterial populations in the marine environment is still very limited, and mainly originates from studies of cultured bacteria. In this work, genes coding catabolic enzymes from PAH-biodegradation pathways were characterized in coastal sediments of Patagonia with different levels of PAH contamination. Results Genes encoding for the catalytic alpha subunit of aromatic ring-hydroxylating dioxygenases (ARHDs) were amplified from intertidal sediment samples using two different primer sets. Products were cloned and screened by restriction fragment length polymorphism analysis. Clones representing each restriction pattern were selected in each library for sequencing. A total of 500 clones were screened in 9 gene libraries, and 193 clones were sequenced. Libraries contained one to five different ARHD gene types, and this number was correlated with the number of PAHs found in the samples above the quantification limit (r = 0.834, p < 0.05). Overall, eight different ARHD gene types were detected in the sediments. In five of them, their deduced amino acid sequences formed deeply rooted branches with previously described ARHD peptide sequences, exhibiting less than 70% identity to them. They contain consensus sequences of the Rieske type [2Fe-2S] cluster binding site, suggesting that these gene fragments encode for ARHDs. On the other hand, three gene types were closely related to previously described ARHDs: archetypical nahAc-like genes, phnAc-like genes as identified in Alcaligenes faecalis AFK2, and phnA1-like genes from marine PAH-degraders from the genus Cycloclasticus. Conclusion These results show the presence of hitherto unidentified ARHD genes in this sub-Antarctic marine environment exposed to anthropogenic contamination. This information can be used to study the geographical distribution and ecological significance of bacterial populations carrying these genes, and to design molecular assays to monitor the progress and effectiveness of remediation technologies. PMID:18366740
Digital Gene Expression Analysis Based on De Novo Transcriptome Assembly Reveals New Genes Associated with Floral Organ Differentiation of the Orchid Plant Cymbidium ensifolium

PubMed Central

Yang, Fengxi; Zhu, Genfa

2015-01-01

Cymbidium ensifolium belongs to the genus Cymbidium of the orchid family. Owing to its spectacular flower morphology, C. ensifolium has considerable ecological and cultural value. However, limited genetic data is available for this non-model plant, and the molecular mechanism underlying floral organ identity is still poorly understood. In this study, we characterize the floral transcriptome of C. ensifolium and present, for the first time, extensive sequence and transcript abundance data of individual floral organs. After sequencing, over 10 Gb clean sequence data were generated and assembled into 111,892 unigenes with an average length of 932.03 base pairs, including 1,227 clusters and 110,665 singletons. Assembled sequences were annotated with gene descriptions, gene ontology, clusters of orthologous group terms, the Kyoto Encyclopedia of Genes and Genomes, and the plant transcription factor database. From these annotations, 131 flowering-associated unigenes, 61 CONSTANS-LIKE (COL) unigenes and 90 floral homeotic genes were identified. In addition, four digital gene expression libraries were constructed for the sepal, petal, labellum and gynostemium, and 1,058 genes corresponding to individual floral organ development were identified. Among them, eight MADS-box genes were further investigated by full-length cDNA sequence analysis and expression validation, which revealed two APETALA1/AGL9-like MADS-box genes preferentially expressed in the sepal and petal, two AGAMOUS-like genes particularly restricted to the gynostemium, and four DEF-like genes distinctively expressed in different floral organs. The spatial expression of these genes varied distinctly in different floral mutant corresponding to different floral morphogenesis, which validated the specialized roles of them in floral patterning and further supported the effectiveness of our in silico analysis. This dataset generated in our study provides new insights into the molecular mechanisms underlying floral patterning of Cymbidium and supports a valuable resource for molecular breeding of the orchid plant. PMID:26580566
Completion of the mitochondrial genome sequence of onion (Allium cepa L.) containing the CMS-S male-sterile cytoplasm and identification of an independent event of the ccmF N gene split.

PubMed

Kim, Bongju; Kim, Kyunghee; Yang, Tae-Jin; Kim, Sunggil

2016-11-01

Cytoplasmic male-sterility (CMS) conferred by the CMS-S cytoplasm has been most commonly used for onion (Allium cepa L.) F 1 hybrid seed production. We first report the complete mitochondrial genome sequence containing CMS-S cytoplasm in this study. Initially, seven contigs were de novo assembled from 150-bp paired-end raw reads produced from the total genomic DNA using the Illumina NextSeq500 platform. These contigs were connected into a single circular genome consisting of 316,363 bp (GenBank accession: KU318712) by PCR amplification. Although all 24 core protein-coding genes were present, no ribosomal protein-coding genes, except rps12, were identified in the onion mitochondrial genome. Unusual trans-splicing of the cox2 gene was verified, and the cox1 gene was identified as part of the chimeric orf725 gene, which is a candidate gene responsible for inducing CMS. In addition to orf725, two small chimeric genes were identified, but no transcripts were detected for these two open reading frames. Thirteen chloroplast-derived sequences, with sizes of 126-13,986 bp, were identified in the intergenic regions. Almost 10 % of the onion mitochondrial genome was composed of repeat sequences. The vast majority of repeats were short repeats of <100 base pairs. Interestingly, the gene encoding ccmF N was split into two genes. The ccmF N gene split is first identified outside the Brassicaceae family. The breakpoint in the onion ccmF N gene was different from that of other Brassicaceae species. This split of the ccmF N gene was also present in 30 other Allium species. The complete onion mitochondrial genome sequence reported in this study would be fundamental information for elucidation of onion CMS evolution.
Dealing with the incidental finding of secondary variants by the example of SRNS patients undergoing targeted next-generation sequencing.

PubMed

Weber, Stefanie; Büscher, Anja K; Hagmann, Henning; Liebau, Max C; Heberle, Christian; Ludwig, Michael; Rath, Sabine; Alberer, Martin; Beissert, Antje; Zenker, Martin; Hoyer, Peter F; Konrad, Martin; Klein, Hanns-Georg; Hoefele, Julia

2016-01-01

Steroid-resistant nephrotic syndrome (SRNS) is a severe cause of progressive renal disease. Genetic forms of SRNS can present with autosomal recessive or autosomal dominant inheritance. Recent studies have identified mutations in multiple podocyte genes responsible for SRNS. Improved sequencing methods (next-generation sequencing, NGS) now promise rapid mutational testing of SRNS genes. In the present study, a simultaneous screening of ten SRNS genes in 37 SRNS patients was performed by NGS. In 38 % of the patients, causative mutations in one SRNS gene were found. In 22 % of the patients, in addition to these mutations, a secondary variant in a different gene was identified. This high incidence of accumulating sequence variants was unexpected but, although they might have modifier effects, the pathogenic potential of these additional sequence variants seems unclear so far. The example of molecular diagnostics by NGS in SRNS patients shows that these new sequencing technologies might provide further insight into molecular pathogenicity in genetic disorders but will also generate results, which will be difficult to interpret and complicate genetic counseling. Although NGS promises more frequent identification of disease-causing mutations, the identification of causative mutations, the interpretation of incidental findings and possible pitfalls might pose problems, which hopefully will decrease by further experience and elucidation of molecular interactions.
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds

PubMed Central

Dean, Rebecca; Harrison, Peter W.; Wright, Alison E.; Zimmer, Fabian; Mank, Judith E.

2015-01-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. PMID:26067773
Influence of 5'-flanking sequence on 4.5SI RNA gene transcription by RNA polymerase III.

PubMed

Gogolevskaya, Irina K; Stasenko, Danil V; Tatosyan, Karina A; Kramerov, Dmitri A

2018-05-01

Short nuclear 4.5SI RNA can be found in three related rodent families. Its function remains unknown. The genes of 4.5SI RNA contain an internal promoter of RNA polymerase III composed of the boxes A and B. Here, the effect of the sequence immediately upstream of the mouse 4.5SI RNA gene on its transcription was studied. The gene with deletions and substitutions in the 5'-flanking sequence was used to transfect HeLa cells and its transcriptional activity was evaluated from the cellular level of 4.5SI RNA. Single-nucleotide substitutions in the region adjacent to the transcription start site (positions -2 to -8) decreased the expression activity of the gene down to 40%-60% of the control. The substitution of the conserved pentanucleotide AGAAT (positions -14 to -18) could either decrease (43%-56%) or increase (134%) the gene expression. A TATA-like box (TACATGA) was found at positions -24 to -30 of the 4.5SI RNA gene. Its replacement with a polylinker fragment of the vector did not decrease the transcription level, while its replacement with a GC-rich sequence almost completely (down to 2%-5%) suppressed the transcription of the 4.5SI RNA gene. The effect of plasmid sequences bordering the gene on its transcription by RNA polymerase III is discussed.
Contribution of WUSCHEL-related homeobox (WOX) genes to identify the phylogenetic relationships among Petunia species

PubMed Central

Segatto, Ana Lúcia Anversa; Thompson, Claudia Elizabeth; Freitas, Loreta Brandão

2016-01-01

Abstract Developmental genes are believed to contribute to major changes during plant evolution, from infrageneric to higher levels. Due to their putative high sequence conservation, developmental genes are rarely used as molecular markers, and few studies including these sequences at low taxonomic levels exist. WUSCHEL-related homeobox genes (WOX) are transcription factors exclusively present in plants and are involved in developmental processes. In this study, we characterized the infrageneric genetic variation of Petunia WOX genes. We obtained phylogenetic relationships consistent with other phylogenies based on nuclear markers, but with higher statistical support, resolution in terminals, and compatibility with flower morphological changes. PMID:27768156
Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

PubMed Central

2011-01-01

Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684
Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes.

PubMed

Taylor, Louis J; Strebel, Klaus

2017-01-07

Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.
Using phylogenetically-informed annotation (PIA) to search for light-interacting genes in transcriptomes from non-model organisms.

PubMed

Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H

2014-11-19

Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Shihui; Vera, Jessica M.; Grass, Jeff

Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4more » and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Furthermore, plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.« less
Complete genome sequence and the expression pattern of plasmids of the model ethanologen Zymomonas mobilis ZM4 and its xylose-utilizing derivatives 8b and 2032

DOE PAGES

Yang, Shihui; Vera, Jessica M.; Grass, Jeff; ...

2018-05-02

Zymomonas mobilis is a natural ethanologen being developed and deployed as an industrial biofuel producer. To date, eight Z. mobilis strains have been completely sequenced and found to contain 2-8 native plasmids. However, systematic verification of predicted Z. mobilis plasmid genes and their contribution to cell fitness has not been hitherto addressed. Moreover, the precise number and identities of plasmids in Z. mobilis model strain ZM4 have been unclear. The lack of functional information about plasmid genes in ZM4 impedes ongoing studies for this model biofuel-producing strain. In this study, we determined the complete chromosome and plasmid sequences of ZM4more » and its engineered xylose-utilizing derivatives 2032 and 8b. Compared to previously published and revised ZM4 chromosome sequences, the ZM4 chromosome sequence reported here contains 65 nucleotide sequence variations as well as a 2400-bp insertion. Four plasmids were identified in all three strains, with 150 plasmid genes predicted in strain ZM4 and 2032, and 153 plasmid genes predicted in strain 8b due to the insertion of heterologous DNA for expanded substrate utilization. Plasmid genes were then annotated using Blast2GO, InterProScan, and systems biology data analyses, and most genes were found to have apparent orthologs in other organisms or identifiable conserved domains. To verify plasmid gene prediction, RNA-Seq was used to map transcripts and also compare relative gene expression under various growth conditions, including anaerobic and aerobic conditions, or growth in different concentrations of biomass hydrolysates. Overall, plasmid genes were more responsive to varying hydrolysate concentrations than to oxygen availability. Additionally, our results indicated that although all plasmids were present in low copy number (about 1-2 per cell), the copy number of some plasmids varied under specific growth conditions or due to heterologous gene insertion. The complete genome of ZM4 and two xylose-utilizing derivatives is reported in this study, with an emphasis on identifying and characterizing plasmid genes. Furthermore, plasmid gene annotation, validation, expression levels at growth conditions of interest, and contribution to host fitness are reported for the first time.« less
Characteristics of the Lotus japonicus gene repertoire deduced from large-scale expressed sequence tag (EST) analysis.

PubMed

Asamizu, Erika; Nakamura, Yasukazu; Sato, Shusei; Tabata, Satoshi

2004-02-01

To perform a comprehensive analysis of genes expressed in a model legume, Lotus japonicus, a total of 74472 3'-end expressed sequence tags (EST) were generated from cDNA libraries produced from six different organs. Clustering of sequences was performed with an identity criterion of 95% for 50 bases, and a total of 20457 non-redundant sequences, 8503 contigs and 11954 singletons were generated. EST sequence coverage was analyzed by using the annotated L. japonicus genomic sequence and 1093 of the 1889 predicted protein-encoding genes (57.9%) were hit by the EST sequence(s). Gene content was compared to several plant species. Among the 8503 contigs, 471 were identified as sequences conserved only in leguminous species and these included several disease resistance-related genes. This suggested that in legumes, these genes may have evolved specifically to resist pathogen attack. The rate of gene sequence divergence was assessed by comparing similarity level and functional category based on the Gene Ontology (GO) annotation of Arabidopsis genes. This revealed that genes encoding ribosomal proteins, as well as those related to translation, photosynthesis, and cellular structure were more abundantly represented in the highly conserved class, and that genes encoding transcription factors and receptor protein kinases were abundantly represented in the less conserved class. To make the sequence information and the cDNA clones available to the research community, a Web database with useful services was created at http://www.kazusa.or.jp/en/plant/lotus/EST/.
Evolution of meiotic recombination genes in maize and teosinte.

PubMed

Sidhu, Gaganpreet K; Warzecha, Tomasz; Pawlowski, Wojciech P

2017-01-25

Meiotic recombination is a major source of genetic variation in eukaryotes. The role of recombination in evolution is recognized but little is known about how evolutionary forces affect the recombination pathway itself. Although the recombination pathway is fundamentally conserved across different species, genetic variation in recombination components and outcomes has been observed. Theoretical predictions and empirical studies suggest that changes in the recombination pathway are likely to provide adaptive abilities to populations experiencing directional or strong selection pressures, such as those occurring during species domestication. We hypothesized that adaptive changes in recombination may be associated with adaptive evolution patterns of genes involved in meiotic recombination. To examine how maize evolution and domestication affected meiotic recombination genes, we studied patterns of sequence polymorphism and divergence in eleven genes controlling key steps in the meiotic recombination pathway in a diverse set of maize inbred lines and several accessions of teosinte, the wild ancestor of maize. We discovered that, even though the recombination genes generally exhibited high sequence conservation expected in a pathway controlling a key cellular process, they showed substantial levels and diverse patterns of sequence polymorphism. Among others, we found differences in sequence polymorphism patterns between tropical and temperate maize germplasms. Several recombination genes displayed patterns of polymorphism indicative of adaptive evolution. Despite their ancient origin and overall sequence conservation, meiotic recombination genes can exhibit extensive and complex patterns of molecular evolution. Changes in these genes could affect the functioning of the recombination pathway, and may have contributed to the successful domestication of maize and its expansion to new cultivation areas.
Evolutionary Distance of Amino Acid Sequence Orthologs across Macaque Subspecies: Identifying Candidate Genes for SIV Resistance in Chinese Rhesus Macaques

PubMed Central

Ross, Cody T.; Roodgar, Morteza; Smith, David Glenn

2015-01-01

We use the Reciprocal Smallest Distance (RSD) algorithm to identify amino acid sequence orthologs in the Chinese and Indian rhesus macaque draft sequences and estimate the evolutionary distance between such orthologs. We then use GOanna to map gene function annotations and human gene identifiers to the rhesus macaque amino acid sequences. We conclude methodologically by cross-tabulating a list of amino acid orthologs with large divergence scores with a list of genes known to be involved in SIV or HIV pathogenesis. We find that many of the amino acid sequences with large evolutionary divergence scores, as calculated by the RSD algorithm, have been shown to be related to HIV pathogenesis in previous laboratory studies. Four of the strongest candidate genes for SIVmac resistance in Chinese rhesus macaques identified in this study are CDK9, CXCL12, TRIM21, and TRIM32. Additionally, ANKRD30A, CTSZ, GORASP2, GTF2H1, IL13RA1, MUC16, NMDAR1, Notch1, NT5M, PDCD5, RAD50, and TM9SF2 were identified as possible candidates, among others. We failed to find many laboratory experiments contrasting the effects of Indian and Chinese orthologs at these sites on SIVmac pathogenesis, but future comparative studies might hold fertile ground for research into the biological mechanisms underlying innate resistance to SIVmac in Chinese rhesus macaques. PMID:25884674
Strain variation, based on the hemagglutinin gene, in Norwegian ISA virus isolates collected from 1987 to 2001: indications of recombination.

PubMed

Devold, M; Falk, K; Dale, B; Krossøy, B; Biering, E; Aspehaug, V; Nilsen, F; Nylund, A

2001-11-08

Infectious salmon anemia (ISA) is caused by a virus that probably belongs to the Orthomyxoviridae and was first recorded in Norway in 1984. The disease has since spread along the Norwegian coast and has later been found in Canada, Scotland, the Faroe Islands, Chile, and the USA. This study presents sequence variation of the hemagglutinin gene from 37 ISA virus isolates, viz. one isolate from Scotland, one from Canada and 35 from Norway. The hemagglutinin gene contains a highly polymorphic region (HPR), which together with the rest of the gene sequence provides a good tool for studies of epizootics. The gene shows temporal and geographical sequence variation, where certain areas are dominated by distinct groups of isolates. Evidence of transmission of ISA virus isolates within and between regions is given. It is suggested that the hemagglutinin gene from different isolates may recombine. Possible recombination sites are found within the HPR and in the 5'-end flanking region close to the HPR.
Skin Microbiome Surveys Are Strongly Influenced by Experimental Design.

PubMed

Meisel, Jacquelyn S; Hannigan, Geoffrey D; Tyldsley, Amanda S; SanMiguel, Adam J; Hodkinson, Brendan P; Zheng, Qi; Grice, Elizabeth A

2016-05-01

Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provides more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e., gastrointestinal) and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource and cost intensive, provides evidence of a community's functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This study highlights the importance of experimental design for downstream results in skin microbiome surveys. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
De novo sequencing analysis of the Rosa roxburghii fruit transcriptome reveals putative ascorbate biosynthetic genes and EST-SSR markers.

PubMed

Yan, Xiuqin; Zhang, Xue; Lu, Min; He, Yong; An, Huaming

2015-04-25

Rosa roxburghii Tratt. is a well-known ornamental rose species native to China. In addition, the fruits of this species are valued for their nutritional and medicinal characteristics, especially their high ascorbic acid (AsA) levels. Nevertheless, AsA biosynthesis in R. roxburghii fruit has not been explored in detail because of a lack of genomic resources for this species. High-throughput transcriptomic sequencing generating large volumes of transcript sequence data can aid in gene discovery and molecular marker development. In this study, we generated more than 53 million clean reads using Illumina paired-end sequencing technology. De novo assembly yielded 106,590 unigenes, with an average length of 343 bp. On the basis of sequence similarity to known proteins, 9301 and 2393 unigenes were classified into Gene Ontology and Clusters of Orthologous Group categories, respectively. There were 7480 unigenes assigned to 124 pathways in the Kyoto Encyclopedia of Gene and Genome pathway database. BLASTx searches identified 498 unique putative transcripts encoding various transcription factors, some known to regulate fruit development. qRT-PCR validated the expressions of most of the genes encoding the main enzymes involved in ascorbate biosynthesis. In addition, 9131 potential simple sequence repeat (SSR) loci were identified among the unigenes. One hundred and two primer pairs were synthesized and 71 pairs produced an amplification product during initial screening. Among the amplified products, 30 were polymorphic in the 16 R. roxburghii germplasms tested. Our study was the first to produce a large volume of transcriptome data from R. roxburghii. The resulting sequence collection is a valuable resource for gene discovery and marker-assisted selective breeding in this rose species. Copyright © 2015 Elsevier B.V. All rights reserved.
Serratia marcescens outbreak in a neonatal intensive care unit (NICU): new insights from next-generation sequencing applications.

PubMed

Martineau, Christine; Li, Xuejing; Lalancette, Cindy; Perreault, Thérèse; Fournier, Eric; Tremblay, Julien; Gonzales, Milagros; Yergeau, Étienne; Quach, Caroline

2018-06-13

Serratia marcescens is an environmental bacterium commonly associated with outbreaks in neonatal intensive care units (NICU). Investigation of S. marcescens outbreaks requires efficient recovery and typing of clinical and environmental isolates. In this study, we described how the use of next-generation sequencing applications, such as bacterial whole-genome sequencing (WGS) and bacterial community profiling, could improve S. marcescens outbreak investigation. Phylogenomic links and potential antibiotic resistance genes and plasmids in S. marcescens isolates were investigated using WGS, while bacterial communities and relative abundances of Serratia in environmental samples were assessed using sequencing of bacterial phylogenetic marker genes (16S rRNA and gyrB genes). Typing results obtained using WGS for the ten S. marcescens isolates recovered during a NICU outbreak investigation were highly consistent with those from pulse-field gel electrophoresis (PFGE), the current gold standard typing method for this bacterium. WGS also allowed for the identification of genes associated with antibiotic resistance in all isolates, while no plasmid was detected. Sequencing of the 16S rRNA and gyrB genes both showed higher relative abundances of Serratia in environmental sampling sites that were in close contact with infected babies. Much lower relative abundances of Serratia were observed following disinfection of a room, indicating that the protocol used was efficient. Variations in the bacterial community composition and structure following room disinfection and between sampling sites were also identified through 16S rRNA gene sequencing. Globally, results from this study highlight the potential for next-generation sequencing tools to improve and facilitate outbreak investigation. Copyright © 2018 American Society for Microbiology.

The phosphotransferase system-dependent sucrose utilization regulon in enteropathogenic Escherichia coli strains is located in a variable chromosomal region containing iap sequences.

PubMed

Treviño-Quintanilla, Luis Gerardo; Escalante, Adelfo; Caro, Alma Delia; Martínez, Alfredo; González, Ricardo; Puente, José Luis; Bolívar, Francisco; Gosset, Guillermo

2007-01-01

The capacity to utilize sucrose as a carbon and energy source (Scr(+) phenotype) is a highly variable trait among Escherichia coli strains. In this study, seven enteropathogenic E. coli (EPEC) strains from different sources were studied for their capacity to grow using sucrose. Liquid media cultures showed that all analyzed strains have the Scr(+) phenotype and two distinct groups were defined: one of five and another of two strains displaying doubling times of 67 and 125 min, respectively. The genes conferring the Scr(+) phenotype in one of the fast-growing strains (T19) were cloned and sequenced. Comparative sequence analysis revealed that this strain possesses the scr regulon genes scrKYABR, encoding phosphoenolpyruvate:phosphotransferase system-dependent sucrose transport and utilization activities. Transcript level quantification revealed sucrose-dependent induction of scrK and scrR genes in fast-growing strains, whereas no transcripts were detected in slow-growing strains. Sequence comparison analysis revealed that the scr genes in strain T19 are almost identical to those present in the scr regulon of prototype EPEC E2348/69 and in both strains, the scr genes are inserted in the chromosomal intergenic region of hypothetical genes ygcE and ygcF. Comparison of the ygcE-ygcF intergenic region sequence of strains MG1655, enterohemorrhagic EDL933, uropathogenic ECFT073 and EPEC T19-E2348/69 revealed that the number of extragenic highly repeated iap sequences corresponded to nine, four, two and none, respectively. These results show that the iap sequence-containing chromosomal ygcE-ygcF intergenic region is highly variable in E. coli. Copyright (c) 2007 S. Karger AG, Basel.
RNA editing: trypanosomes rewrite the genetic code.

PubMed

Stuart, K

1998-01-01

The understanding of how genetic information is stored and expressed has advanced considerably since the "central dogma" asserted that genetic information flows from the nucleotide sequence of DNA to that of messenger RNA (mRNA) which in turn specifies the amino acid sequence of a protein. It was found that genetic information can be stored as RNA (e.g. in RNA viruses) and can flow from RNA to DNA by reverse transcriptase enzyme activity. In addition, some genes contain introns, nucleotide sequences that are removed from their RNA (by RNA splicing) and thus are not represented in the resultant protein. Furthermore, alternative splicing was found to produce variant proteins from a single gene. More recently, the study of trypanosome parasites revealed an unexpected and indeed counter-intuitive genetic complexity. Genetic information for a single protein can be dispersed among several (DNA) genes in these organisms. One of these genes specifies an encrypted precursor mRNA that is converted to a functional mRNA by a process called RNA editing that inserts and deletes uridylate nucleotides. The sequence of the edited mRNA is specified by multiple small RNAs, named guide RNAs, (gRNAs) each of which is encoded in a separate gene. Thus, edited mRNA sequences are assembled from multiple genes by the transfer of information from one type of RNA to another. The existence of editing was surprising but has stimulated the discovery of other types of RNA editing. The Stuart laboratory has been exploring RNA editing in trypanosomes from the time of its discovery. They found dramatic differences between the mitochondrial gene sequences and those of the corresponding mRNAs, which indicated editing by the insertion and deletion of uridylates. Some editing was modest; simply eliminating shifts in sequence register of minimally extending the protein coding sequence. However, editing of many mRNAs was startingly extensive. The RNA sequence was essentially entirely remodeled with its sequence more the result of editing than the gene sequence. The identities of genes for such extensively edited RNA were not recognizable from the DNA sequence but they were readily identifiable from the edited mRNA sequence. Thus, despite the complex and extensive editing the resultant mRNA sequence is precise. Characterization of partially edited RNAs indicated that editing proceeds in the direction opposite to that used to specify the protein which reflects the use of the gRNAs. The numerous gRNAs that are used for editing are encoded in the DNA molecules whose role was previously a mystery. Using information gained in our earlier studies, the Stuart group developed an in vitro system that reproduces the fundamental process of editing in order to resolve the mechanism by which it occurs. They determined that editing entails a series of enzymatic steps rather than the mechanism used in RNA splicing. They also showed that chimeric gRNA-mRNA molecules are aberrant by-products of editing rather than intermediates in the process as had been proposed. Additional studies are exploring precisely how the number of added and deleted uridylates is specified by the gRNA. The Stuart laboratory showed that editing is performed by an aggregation of enzymes that catalyze the separate steps of editing. It also developed a method to purify this multimolecule complex that contains several, perhaps tens of, proteins. This will allow the study of its composition and the functions of its component parts. Indeed, the gene for one component has been identified and its detailed characterization begun. These studies are developing tools to explore related processes. An early finding in the lab was that the various mRNAs are differentially edited during the life cycle of the parasite. The pattern of this editing indicates that editing serves to regulate the alternation between two modes of energy generation. This regulation is coordinated with other events that are occurring during the life c
Impact of sequencing depth and read length on single cell RNA sequencing data of T cells.

PubMed

Rizzetto, Simone; Eltahla, Auda A; Lin, Peijie; Bull, Rowena; Lloyd, Andrew R; Ho, Joshua W K; Venturi, Vanessa; Luciani, Fabio

2017-10-06

Single cell RNA sequencing (scRNA-seq) provides great potential in measuring the gene expression profiles of heterogeneous cell populations. In immunology, scRNA-seq allowed the characterisation of transcript sequence diversity of functionally relevant T cell subsets, and the identification of the full length T cell receptor (TCRαβ), which defines the specificity against cognate antigens. Several factors, e.g. RNA library capture, cell quality, and sequencing output affect the quality of scRNA-seq data. We studied the effects of read length and sequencing depth on the quality of gene expression profiles, cell type identification, and TCRαβ reconstruction, utilising 1,305 single cells from 8 publically available scRNA-seq datasets, and simulation-based analyses. Gene expression was characterised by an increased number of unique genes identified with short read lengths (<50 bp), but these featured higher technical variability compared to profiles from longer reads. Successful TCRαβ reconstruction was achieved for 6 datasets (81% - 100%) with at least 0.25 millions (PE) reads of length >50 bp, while it failed for datasets with <30 bp reads. Sufficient read length and sequencing depth can control technical noise to enable accurate identification of TCRαβ and gene expression profiles from scRNA-seq data of T cells.
Novel Mutations in pncA Gene of Pyrazinamide Resistant Clinical Isolates of Mycobacterium tuberculosis.

PubMed

Kahbazi, Manijeh; Sarmadian, Hossein; Ahmadi, Azam; Didgar, Farshideh; Sadrnia, Maryam; Poolad, Toktam; Arjomandzadegan, Mohammad

2018-04-16

In clinical isolates of Mycobacterium tuberculosis (MTB), resistance to pyrazinamide occurs by mutations in any positions of the pncA gene (NC_000962.3) especially in nucleotides 359 and 374. In this study we examined the pncA gene sequence in clinical isolates of MTB. Genomic DNA of 33 clinical isolates of MTB was extracted by the Chelex100 method. The polymerase chain reactions (PCR) were performed using specific primers for amplification of 744 bp amplicon comprising the coding sequences (CDS) of the pncA gene. PCR products were sequenced by an automated sequencing Bioscience system. Additionally, semi Nested-allele specific (sNASP) and polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) methods were carried out for verification of probable mutations in nucleotides 359 and 374. Sequencing results showed that from 33 MTB clinical isolates, nine pyrazinamide-resistant isolates have mutations. Furthermore, no mutation was detected in 24 susceptible strains in the entire 561 bp of the pncA gene. Moreover, new mutations of G→A at position 3 of the pncA gene were identified in some of the resistant isolates. Results showed that the sNASP method could detect mutations in nucleotide 359 and 374 of the pncA gene, but the PCR-RFLP method by the SacII enzyme could not detect these mutations. In conclusion, the identification of new mutations in the pncA gene confirmed the probable occurrence of mutations in any nucleotides of the pncA gene sequence in resistant isolates of MTB.
Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sakoyama, Y.; Hong, K.J.; Byun, S.M.

To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: themore » mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.« less
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.

PubMed

Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D

2017-01-23

The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
Application of Genotyping during an Extensive Outbreak of Waterborne Giardiasis in Bergen, Norway, during Autumn and Winter 2004†

PubMed Central

Robertson, L. J.; Hermansen, L.; Gjerde, B. K.; Strand, E.; Alvsvåg, J. O.; Langeland, N.

2006-01-01

During the autumn and winter of 2004 and 2005, an extensive outbreak of waterborne giardiasis occurred in Bergen, Norway. Over 1,500 patients were diagnosed with giardiasis. Analysis of water from the implicated source revealed low numbers of Giardia cysts, but the initial contamination event probably occurred up to 10 weeks previously. While sewage leakage from a residential area is now considered to be the probable source of contamination, during the episode waste from one particular septic tank was thought to be a possible source. Genotyping of cysts from the septic tank demonstrated that they were assemblage A cysts, although the sequences were not identical to any previously published sequences. For the β-giardin gene, the closest published subgenotype was subgenotype A3; for the gdh gene, the closest published subgenotype was subgenotype A2. Genotyping of cysts from 21 patient samples revealed that they were assemblage B cysts; thus, the septic tank was unlikely to be the contamination source. Sequencing of the β-giardin and gdh genes from patient samples and a comparison of the sequences gave complex results. For the β-giardin gene, three isolates had sequences identical to subgenotype B3 sequences. However, other isolates had between one and four single-nucleotide polymorphisms (SNPs). For the gdh gene, none of the sequences were identical to the sequence published for subgenotype B3, and the sequences had between one and three SNPs. One isolate, which was identical to subgenotype B3 at the β-giardin gene, was more similar to subgenotype B2 at the gdh gene. Grouping the isolates on the basis of SNPs resulted in different groups for the two genes. The results are discussed in relation to giardiasis in Norway and to other Giardia genotyping studies. PMID:16517674
The utility of Next Generation Sequencing for molecular diagnostics in Rett syndrome.

PubMed

Vidal, Silvia; Brandi, Núria; Pacheco, Paola; Gerotina, Edgar; Blasco, Laura; Trotta, Jean-Rémi; Derdak, Sophia; Del Mar O'Callaghan, Maria; Garcia-Cazorla, Àngels; Pineda, Mercè; Armstrong, Judith

2017-09-25

Rett syndrome (RTT) is an early-onset neurodevelopmental disorder that almost exclusively affects girls and is totally disabling. Three genes have been identified that cause RTT: MECP2, CDKL5 and FOXG1. However, the etiology of some of RTT patients still remains unknown. Recently, next generation sequencing (NGS) has promoted genetic diagnoses because of the quickness and affordability of the method. To evaluate the usefulness of NGS in genetic diagnosis, we present the genetic study of RTT-like patients using different techniques based on this technology. We studied 1577 patients with RTT-like clinical diagnoses and reviewed patients who were previously studied and thought to have RTT genes by Sanger sequencing. Genetically, 477 of 1577 patients with a RTT-like suspicion have been diagnosed. Positive results were found in 30% by Sanger sequencing, 23% with a custom panel, 24% with a commercial panel and 32% with whole exome sequencing. A genetic study using NGS allows the study of a larger number of genes associated with RTT-like symptoms simultaneously, providing genetic study of a wider group of patients as well as significantly reducing the response time and cost of the study.
Rapid Sequencing of Complete env Genes from Primary HIV-1 Samples

PubMed Central

Eren, Kemal; Ignacio, Caroline; Landais, Elise; Weaver, Steven; Phung, Pham; Ludka, Colleen; Hepler, Lance; Caballero, Gemma; Pollner, Tristan; Guo, Yan; Richman, Douglas; Poignard, Pascal; Paxinos, Ellen E.; Kosakovsky Pond, Sergei L.

2016-01-01

Abstract The ability to study rapidly evolving viral populations has been constrained by the read length of next-generation sequencing approaches and the sampling depth of single-genome amplification methods. Here, we develop and characterize a method using Pacific Biosciences’ Single Molecule, Real-Time (SMRT®) sequencing technology to sequence multiple, intact full-length human immunodeficiency virus-1 env genes amplified from viral RNA populations circulating in blood, and provide computational tools for analyzing and visualizing these data. PMID:29492273
Detection of arc genes related with the ethyl carbamate precursors in wine lactic acid bacteria.

PubMed

Araque, Isabel; Gil, Joana; Carreté, Ramon; Bordons, Albert; Reguant, Cristina

2009-03-11

Trace amounts of the carcinogen ethyl carbamate can appear in wine by the reaction of ethanol with compounds such as citrulline and carbamyl phosphate, which are produced from arginine degradation by some wine lactic acid bacteria (LAB). In this work, the presence of arc genes for the arginine-deiminase pathway was studied in several strains of different species of LAB. Their ability to degrade arginine was also studied. To detect the presence of arc genes, degenerate primers were designed from the alignment of protein sequences in already sequenced LAB. The usefulness of these degenerate primers has been proven by sequencing some of the amplified PCR fragments and searching for homologies with published sequences of the same species and related ones. Correlation was found between the presence of genes and the ability to degrade arginine. Degrading strains included all heterofermentative lactobacilli, Oenococcus oeni , Pediococcus pentosaceus , and some strains of Leuconostoc mesenteroides and Lactobacillus plantarum .
Construction of a cDNA library from female adult of Toxocara canis, and analysis of EST and immune-related genes expressions.

PubMed

Zhou, Rongqiong; Xia, Qingyou; Huang, Hancheng; Lai, Min; Wang, Zhenxin

2011-10-01

Toxocara canis is a widespread intestinal nematode parasite of dogs, which can also cause disease in humans. We employed an expressed sequence tag (EST) strategy in order to study gene-expression including development, digestion and reproduction of T. canis. ESTs provided a rapid way to identify genes, particularly in organisms for which we have very little molecular information. In this study, a cDNA library was constructed from a female adult of T. canis and 215 high-quality ESTs from 5'-ends of the cDNA clones representing 79 unigenes were obtained. The titer of the primary cDNA library was 1.83×10(6)pfu/mL with a recombination rate of 99.33%. Most of the sequences ranged from 300 to 900bp with an average length of 656bp. Cluster analysis of these ESTs allowed identification of 79 unique sequences containing 28 contigs and 51 singletons. BLASTX searches revealed that 18 unigenes (22.78% of the total) or 70 ESTs (32.56% of the total) were novel genes that had no significant matches to any protein sequences in the public databases. The rest of the 61 unigenes (77.22% of the total) or 145 ESTs (67.44% of the total) were closely matched to the known genes or sequences deposited in the public databases. These genes were classified into seven groups based on their known or putative biological functions. We also confirmed the gene expression patterns of several immune-related genes using RT-PCR examination. This work will provide a valuable resource for the further investigations in the stage-, sex- and tissue-specific gene transcription or expression. Copyright © 2011. Published by Elsevier Inc.
Cloning and Expression of the Erwinia carotovora subsp. carotovora Gene Encoding the Low-Molecular-Weight Bacteriocin Carocin S1▿

PubMed Central

Chuang, Duen-yau; Chien, Yung-chei; Wu, Huang-Pin

2007-01-01

The purpose of this study was to clone the carocin S1 gene and express it in a non-carocin-producing strain of Erwinia carotovora. A mutant, TH22-10, which produced a high-molecular-weight bacteriocin but not a low-molecular-weight bacteriocin, was obtained by Tn5 insertional mutagenesis using H-rif-8-2 (a spontaneous rifampin-resistant mutant of Erwinia carotovora subsp. carotovora 89-H-4). Using thermal asymmetric interlaced PCR, the DNA sequence from the Tn5 insertion site and the DNA sequence of the contiguous 2,280-bp region were determined. Two complete open reading frames (ORF), designated ORF2 and ORF3, were identified within the sequence fragment. ORF2 and ORF3 were identified with the carocin S1 genes, caroS1K (ORF2) and caroS1I (ORF3), which, respectively, encode a killing protein (CaroS1K) and an immunity protein (CaroS1I). These genes were homologous to the pyocin S3 gene and the pyocin AP41 gene. Carocin S1 was expressed in E. carotovora subsp. carotovora Ea1068 and replicated in TH22-10 but could not be expressed in Escherichia coli (JM101) because a consensus sequence resembling an SOS box was absent. A putative sequence similar to the consensus sequence for the E. coli cyclic AMP receptor protein binding site (−312 bp) was found upstream of the start codon. Production of this bacteriocin was also induced by glucose and lactose. The homology search results indicated that the carocin S1 gene (between bp 1078 and bp 1704) was homologous to the pyocin S3 and pyocin AP41 genes in Pseudomonas aeruginosa. These genes encode proteins with nuclease activity (domain 4). This study found that carocin S1 also has nuclease activity. PMID:17071754
Analyses of Hypomethylated Oil Palm Gene Space

PubMed Central

Jayanthi, Nagappan; Mohd-Amin, Ab Halim; Azizi, Norazah; Chan, Kuang-Lim; Maqbool, Nauman J.; Maclean, Paul; Brauning, Rudi; McCulloch, Alan; Moraga, Roger; Ong-Abdullah, Meilina; Singh, Rajinder

2014-01-01

Demand for palm oil has been increasing by an average of ∼8% the past decade and currently accounts for about 59% of the world's vegetable oil market. This drives the need to increase palm oil production. Nevertheless, due to the increasing need for sustainable production, it is imperative to increase productivity rather than the area cultivated. Studies on the oil palm genome are essential to help identify genes or markers that are associated with important processes or traits, such as flowering, yield and disease resistance. To achieve this, 294,115 and 150,744 sequences from the hypomethylated or gene-rich regions of Elaeis guineensis and E. oleifera genome were sequenced and assembled into contigs. An additional 16,427 shot-gun sequences and 176 bacterial artificial chromosomes (BAC) were also generated to check the quality of libraries constructed. Comparison of these sequences revealed that although the methylation-filtered libraries were sequenced at low coverage, they still tagged at least 66% of the RefSeq supported genes in the BAC and had a filtration power of at least 2.0. A total 33,752 microsatellites and 40,820 high-quality single nucleotide polymorphism (SNP) markers were identified. These represent the most comprehensive collection of microsatellites and SNPs to date and would be an important resource for genetic mapping and association studies. The gene models predicted from the assembled contigs were mined for genes of interest, and 242, 65 and 14 oil palm transcription factors, resistance genes and miRNAs were identified respectively. Examples of the transcriptional factors tagged include those associated with floral development and tissue culture, such as homeodomain proteins, MADS, Squamosa and Apetala2. The E. guineensis and E. oleifera hypomethylated sequences provide an important resource to understand the molecular mechanisms associated with important agronomic traits in oil palm. PMID:24497974
Partial sequence homogenization in the 5S multigene families may generate sequence chimeras and spurious results in phylogenetic reconstructions.

PubMed

Galián, José A; Rosato, Marcela; Rosselló, Josep A

2014-03-01

Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.
Comparative transcriptome analysis of microsclerotia development in Nomuraea rileyi.

PubMed

Song, Zhangyong; Yin, Youping; Jiang, Shasha; Liu, Juanjuan; Chen, Huan; Wang, Zhongkang

2013-06-19

Nomuraea rileyi is used as an environmental-friendly biopesticide. However, mass production and commercialization of this organism are limited due to its fastidious growth and sporulation requirements. When cultured in amended medium, we found that N. rileyi could produce microsclerotia bodies, replacing conidiophores as the infectious agent. However, little is known about the genes involved in microsclerotia development. In the present study, the transcriptomes were analyzed using next-generation sequencing technology to find the genes involved in microsclerotia development. A total of 4.69 Gb of clean nucleotides comprising 32,061 sequences was obtained, and 20,919 sequences were annotated (about 65%). Among the annotated sequences, only 5928 were annotated with 34 gene ontology (GO) functional categories, and 12,778 sequences were mapped to 165 pathways by searching against the Kyoto Encyclopedia of Genes and Genomes pathway (KEGG) database. Furthermore, we assessed the transcriptomic differences between cultures grown in minimal and amended medium. In total, 4808 sequences were found to be differentially expressed; 719 differentially expressed unigenes were assigned to 25 GO classes and 1888 differentially expressed unigenes were assigned to 161 KEGG pathways, including 25 enrichment pathways. Subsequently, we examined the up-regulation or uniquely expressed genes following amended medium treatment, which were also expressed on the enrichment pathway, and found that most of them participated in mediating oxidative stress homeostasis. To elucidate the role of oxidative stress in microsclerotia development, we analyzed the diversification of unigenes using quantitative reverse transcription-PCR (RT-qPCR). Our findings suggest that oxidative stress occurs during microsclerotia development, along with a broad metabolic activity change. Our data provide the most comprehensive sequence resource available for the study of N. rileyi. We believe that the transcriptome datasets will serve as an important public information platform to accelerate studies on N. rileyi microsclerotia.
Anchoring genome sequence to chromosomes of the central bearded dragon (Pogona vitticeps) enables reconstruction of ancestral squamate macrochromosomes and identifies sequence content of the Z chromosome.

PubMed

Deakin, Janine E; Edwards, Melanie J; Patel, Hardip; O'Meally, Denis; Lian, Jinmin; Stenhouse, Rachael; Ryan, Sam; Livernois, Alexandra M; Azad, Bhumika; Holleley, Clare E; Li, Qiye; Georges, Arthur

2016-06-10

Squamates (lizards and snakes) are a speciose lineage of reptiles displaying considerable karyotypic diversity, particularly among lizards. Understanding the evolution of this diversity requires comparison of genome organisation between species. Although the genomes of several squamate species have now been sequenced, only the green anole lizard has any sequence anchored to chromosomes. There is only limited gene mapping data available for five other squamates. This makes it difficult to reconstruct the events that have led to extant squamate karyotypic diversity. The purpose of this study was to anchor the recently sequenced central bearded dragon (Pogona vitticeps) genome to chromosomes to trace the evolution of squamate chromosomes. Assigning sequence to sex chromosomes was of particular interest for identifying candidate sex determining genes. By using two different approaches to map conserved blocks of genes, we were able to anchor approximately 42 % of the dragon genome sequence to chromosomes. We constructed detailed comparative maps between dragon, anole and chicken genomes, and where possible, made broader comparisons across Squamata using cytogenetic mapping information for five other species. We show that squamate macrochromosomes are relatively well conserved between species, supporting findings from previous molecular cytogenetic studies. Macrochromosome diversity between members of the Toxicofera clade has been generated by intrachromosomal, and a small number of interchromosomal, rearrangements. We reconstructed the ancestral squamate macrochromosomes by drawing upon comparative cytogenetic mapping data from seven squamate species and propose the events leading to the arrangements observed in representative species. In addition, we assigned over 8 Mbp of sequence containing 219 genes to the Z chromosome, providing a list of genes to begin testing as candidate sex determining genes. Anchoring of the dragon genome has provided substantial insight into the evolution of squamate genomes, enabling us to reconstruct ancestral macrochromosome arrangements at key positions in the squamate phylogeny, demonstrating that fusions between macrochromosomes or fusions of macrochromosomes and microchromosomes, have played an important role during the evolution of squamate genomes. Assigning sequence to the sex chromosomes has identified NR5A1 as a promising candidate sex determining gene in the dragon.
Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

PubMed

Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

2014-06-01

It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

PubMed Central

Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

2004-01-01

Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals

PubMed Central

2008-01-01

Background The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) α, β and γ subunits. Further investigation of 14 α-like (Abpa) and 13 β- or γ-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals. Results Here, we interrogate the latest 'finished' mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa. Conclusion We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes' participation in chemosensation and/or sexual identification. PMID:18269759
Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals.

PubMed

Laukaitis, Christina M; Heger, Andreas; Blakley, Tyler D; Munclinger, Pavel; Ponting, Chris P; Karn, Robert C

2008-02-12

The draft mouse (Mus musculus) genome sequence revealed an unexpected proliferation of gene duplicates encoding a family of secretoglobin proteins including the androgen-binding protein (ABP) alpha, beta and gamma subunits. Further investigation of 14 alpha-like (Abpa) and 13 beta- or gamma-like (Abpbg) undisrupted gene sequences revealed a rich diversity of developmental stage-, sex- and tissue-specific expression. Despite these studies, our understanding of the evolution of this gene family remains incomplete. Questions arise from imperfections in the initial mouse genome assembly and a dearth of information about the gene family structure in other rodents and mammals. Here, we interrogate the latest 'finished' mouse (Mus musculus) genome sequence assembly to show that the Abp gene repertoire is, in fact, twice as large as reported previously, with 30 Abpa and 34 Abpbg genes and pseudogenes. All of these have arisen since the last common ancestor with rat (Rattus norvegicus). We then demonstrate, by sequencing homologs from species within the Mus genus, that this burst of gene duplication occurred very recently, within the past seven million years. Finally, we survey Abp orthologs in genomes from across the mammalian clade and show that bursts of Abp gene duplications are not specific to the murid rodents; they also occurred recently in the lagomorph (rabbit, Oryctolagus cuniculus) and ruminant (cattle, Bos taurus) lineages, although not in other mammalian taxa. We conclude that Abp genes have undergone repeated bursts of gene duplication and adaptive sequence diversification driven by these genes' participation in chemosensation and/or sexual identification.

A public platform for the verification of the phenotypic effect of candidate genes for resistance to aflatoxin accumulation and Aspergillus flavus infection in maize.

PubMed

Warburton, Marilyn L; Williams, William Paul; Hawkins, Leigh; Bridges, Susan; Gresham, Cathy; Harper, Jonathan; Ozkan, Seval; Mylroie, J Erik; Shan, Xueyan

2011-07-01

A public candidate gene testing pipeline for resistance to aflatoxin accumulation or Aspergillus flavus infection in maize is presented here. The pipeline consists of steps for identifying, testing, and verifying the association of selected maize gene sequences with resistance under field conditions. Resources include a database of genetic and protein sequences associated with the reduction in aflatoxin contamination from previous studies; eight diverse inbred maize lines for polymorphism identification within any maize gene sequence; four Quantitative Trait Loci (QTL) mapping populations and one association mapping panel, all phenotyped for aflatoxin accumulation resistance and associated phenotypes; and capacity for Insertion/Deletion (InDel) and SNP genotyping in the population(s) for mapping. To date, ten genes have been identified as possible candidate genes and put through the candidate gene testing pipeline, and results are presented here to demonstrate the utility of the pipeline.
Conserved noncoding sequences conserve biological networks and influence genome evolution.

PubMed

Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

2018-05-01

Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.
Definition of Cis-Acting Elements Regulating Expression of the Drosophila Melanogaster Ninae Opsin Gene by Oligonucleotide-Directed Mutagenesis

PubMed Central

Mismer, D.; Rubin, G. M.

1989-01-01

We have analyzed the cis-acting regulatory sequences of the Rh1 (ninaE) gene in Drosophila melanogaster by P-element-mediated germline transformation of indicator genes transcribed from mutant ninaE promoter sequences. We have previously shown that a 200-bp region extending from -120 to +67 relative to the transcription start site is sufficient to obtain eye-specific expression from the ninaE promoter. In the present study, 22 different 4-13-bp sequences in the -120/+67 promoter region were altered by oligonucleotide-directed mutagenesis. Several of these sequences were found to be required for proper promoter function; two of these are conserved in the promoter of the homologous gene isolated from the related species Drosophila virilis. Alteration of a conserved 9-bp sequence results in aberrant, low level expression in the body. Alteration of a separate 11-bp sequence, found in the promoter regions of several photoreceptor-specific genes of Drosophila, results in an approximately 15-fold reduction in promoter efficiency but without apparent alteration of tissue-specificity. A protein factor capable of interacting with this 11-bp sequence has been detected by DNaseI footprinting in embryonic nuclear extracts. Finally, we have further characterized two separable enhancer sequences previously shown to be required for normal levels of expression from this promoter. PMID:2521839
Legume Genome Initiative at the University of Oklahoma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bruce A. Roe

2004-02-27

Consolidated Appropriations Resolution, 2003 Conference Report for the Department of Energy's Biological and Environmental Research (BER) program provided $481,000 for the Legume Genome Initiative at the University of Oklahoma. These funds were used to support our research that is aimed at determining the entire sequence of the gene rich regions of the genome of the legume, Medicago truncatula, by allowing us to obtain a greater degree of finished BAC sequences from the draft sequences we have already obtained through research funded by the Noble Foundation. During the funding period we increased the number of Medicago truncatula BACs with finished (Bermudamore » standard) sequences from 109 to 359, and the total number of BACs for which we collected sequence data from 584 to 842, 359 of which reached phase 2 (ordered and oriented contigs). We also sequenced a series of pooled BAC clones that cover additional euchromatic (gene rich) genomic regions. This work resulted in 6 refereed publications, see below. Genes whose sequence was determined during this study included multiple members of the plant disease resistance (R-gene) family as well as several genes involved in flavinoid biosynthesis, nitrogen fixation and plant-microbial symbosis. This work also served as a prelude to obtaining NSF funding for the international collaborative effort to complete the entire sequence of the Medicago truncatula genomic euchromatic regions using a BAC based approach.« less
Analysis of an osmotically regulated pathogenesis-related osmotin gene promoter.

PubMed

Raghothama, K G; Liu, D; Nelson, D E; Hasegawa, P M; Bressan, R A

1993-12-01

Osmotin is a small (24 kDa), basic, pathogenesis-related protein, that accumulates during adaptation of tobacco (Nicotiana tabacum) cells to osmotic stress. There are more than 10 inducers that activate the osmotin gene in various plant tissues. The osmotin promoter contains several sequences bearing a high degree of similarity to ABRE, as-1 and E-8 cis element sequences. Gel retardation studies indicated the presence of at least two regions in the osmotin promoter that show specific interactions with nuclear factors isolated from cultured cells or leaves. The abundance of these binding factors increased in response to salt, ABA and ethylene. Nuclear factors protected a 35 bp sequence of the promoter from DNase I digestion. Different 5' deletions of the osmotin promoter cloned into a promoter-less GUSNOS plasmid (pBI 201) were used in transient expression studies with a Biolistic gun. The transient expression studies revealed the presence of three distinct regions in the osmotin promoter. The promoter sequence from -108 to -248 bp is absolutely required for reporter gene activity, followed by a long stretch (up to -1052) of enhancer-like sequence and then a sequence upstream of -1052, which appears to contain negative elements. The responses to ABA, ethylene, salt, desiccation and wounding appear to be associated with the -248 bp sequence of the promoter. This region also contains a putative ABRE (CACTGTG) core element. Activation of the osmotin gene by various inducers is discussed in view of antifungal activity of the osmotin protein.
Identification of nitrogen-fixing genes and gene clusters from metagenomic library of acid mine drainage.

PubMed

Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Identification of Nitrogen-Fixing Genes and Gene Clusters from Metagenomic Library of Acid Mine Drainage

PubMed Central

Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan

2014-01-01

Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
A New Way to Introduce Microarray Technology in a Lecture/Laboratory Setting by Studying the Evolution of This Modern Technology

ERIC Educational Resources Information Center

Rowland-Goldsmith, Melissa

2009-01-01

DNA microarray is an ordered grid containing known sequences of DNA, which represent many of the genes in a particular organism. Each DNA sequence is unique to a specific gene. This technology enables the researcher to screen many genes from cells or tissue grown in different conditions. We developed an undergraduate lecture and laboratory…
Chicken genome analysis reveals novel genes encoding biotin-binding proteins related to avidin family

PubMed Central

Niskanen, Einari A; Hytönen, Vesa P; Grapputo, Alessandro; Nordlund, Henri R; Kulomaa, Markku S; Laitinen, Olli H

2005-01-01

Background A chicken egg contains several biotin-binding proteins (BBPs), whose complete DNA and amino acid sequences are not known. In order to identify and characterise these genes and proteins we studied chicken cDNAs and genes available in the NCBI database and chicken genome database using the reported N-terminal amino acid sequences of chicken egg-yolk BBPs as search strings. Results Two separate hits showing significant homology for these N-terminal sequences were discovered. For one of these hits, the chromosomal location in the immediate proximity of the avidin gene family was found. Both of these hits encode proteins having high sequence similarity with avidin suggesting that chicken BBPs are paralogous to avidin family. In particular, almost all residues corresponding to biotin binding in avidin are conserved in these putative BBP proteins. One of the found DNA sequences, however, seems to encode a carboxy-terminal extension not present in avidin. Conclusion We describe here the predicted properties of the putative BBP genes and proteins. Our present observations link BBP genes together with avidin gene family and shed more light on the genetic arrangement and variability of this family. In addition, comparative modelling revealed the potential structural elements important for the functional and structural properties of the putative BBP proteins. PMID:15777476
Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

PubMed

Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

2000-12-15

Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.
Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

PubMed

Adriaens, M E; Bezzina, C R

2018-06-22

Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.
The Complete Plastome Sequence of an Antarctic Bryophyte Sanionia uncinata (Hedw.) Loeske

PubMed Central

Park, Mira; Park, Hyun; Lee, Hyoungseok; Lee, Byeong-ha

2018-01-01

Organellar genomes of bryophytes are poorly represented with chloroplast genomes of only four mosses, four liverworts and two hornworts having been sequenced and annotated. Moreover, while Antarctic vegetation is dominated by the bryophytes, there are few reports on the plastid genomes for the Antarctic bryophytes. Sanionia uncinata (Hedw.) Loeske is one of the most dominant moss species in the maritime Antarctic. It has been researched as an important marker for ecological studies and as an extremophile plant for studies on stress tolerance. Here, we report the complete plastome sequence of S. uncinata, which can be exploited in comparative studies to identify the lineage-specific divergence across different species. The complete plastome of S. uncinata is 124,374 bp in length with a typical quadripartite structure of 114 unique genes including 82 unique protein-coding genes, 37 tRNA genes and four rRNA genes. However, two genes encoding the α subunit of RNA polymerase (rpoA) and encoding the cytochrome b6/f complex subunit VIII (petN) were absent. We could identify nuclear genes homologous to those genes, which suggests that rpoA and petN might have been relocated from the chloroplast genome to the nuclear genome. PMID:29494552
The structure of the coding and 5'-flanking region of the type 1 iodothyronine deiodinase (dio1) gene is normal in a patient with suspected congenital dio1 deficiency.

PubMed

Toyoda, N; Kleinhaus, N; Larsen, P R

1996-06-01

We analyzed the exon-intron structure of the human type 1 deiodinase gene (dio1) and compared it with that of a patient with suspected congenital type 1 deiodinase (D1) deficiency. The hdio1 gene is identical in exon-intron arrangement to the mouse gene, with coding sequences and a selenocysteine insertion sequence (SECIS) element contained in four exons. There were no mutations in the sequences of exons 1-4 of the patient's genomic DNA. Functional studies by transient expression techniques showed no difference in basal promoter activity or T3 responsiveness between the patient's and the normal dio1 gene. A structural abnormality in the dio1 gene is not a likely explanation for this patient's D1-deficient phenotype.
Lactobacillus cypricasei Lawson et al. 2001 is a later heterotypic synonym of Lactobacillus acidipiscis Tanasupawat et al. 2000.

PubMed

Naser, Sabri M; Vancanneyt, Marc; Hoste, Bart; Snauwaert, Cindy; Swings, Jean

2006-07-01

The applicability of a multilocus sequence analysis (MLSA)-based identification system for lactobacilli was evaluated. Two housekeeping genes that code for the phenylalanyl-tRNA synthase alpha-subunit (pheS) and RNA polymerase alpha-subunit (rpoA) were sequenced and analysed for members of the Lactobacillus salivarius species group. The type strains of Lactobacillus acidipiscis and Lactobacillus cypricasei were investigated further using a third gene that encodes the alpha-subunit of ATP synthase (atpA). The MLSA data revealed close relatedness between L. acidipiscis and L. cypricasei, with 99.8-100 % pheS, rpoA and atpA gene sequence similarities. Comparison of the 16S rRNA gene sequences of the type strains of the two species confirmed the close relatedness (99.8 % gene sequence similarity) between the two taxa. Similar phenotypes and high DNA-DNA binding values in the range of 84 to 97.5 % confirmed that L. acidipiscis and L. cypricasei are synonymous species. On the basis of the present study, it is proposed that Lactobacillus cypricasei is a later heterotypic synonym of Lactobacillus acidipiscis.
Evolutionary differences in chromosomal locations of four early genes of the tryptophan pathway in fluorescent pseudomonads: DNA sequences and characterization of Pseudomonas putida trpE and trpGDC.

PubMed

Essar, D W; Eberly, L; Crawford, I P

1990-02-01

Pseudomonas putida possesses seven structural genes for enzymes of the tryptophan pathway. All but one, trpG, which encodes the small (beta) subunit of anthranilate synthase, have been mapped on the circular chromosome. This report describes the cloning and sequencing of P. putida trpE, trpG, trpD, and trpC. In P. putida and Pseudomonas aeruginosa, DNA sequence analysis as well as growth and enzyme assays of insertionally inactivated strains indicated that trpG is the first gene in a three-gene operon that also contains trpD and trpC. In P. putida, trpE is 2.2 kilobases upstream from the trpGDC cluster, whereas in P. aeruginosa, they are separated by at least 25 kilobases (T. Shinomiya, S. Shiga, and M. Kageyama, Mol. Gen. Genet., 189:382-389, 1983). The DNA sequence in P. putida shows an open reading frame on the opposite strand between trpE and trpGDC; this putative gene was not characterized. Evidence is also presented for sequence similarities in the 5' untranslated regions of trpE and trpGDC in both pseudomonads; the function of these regions is unknown, but it is possible that they play some role in regulation of these genes, since all the genes respond to repression by tryptophan. The sequences of the anthranilate synthase genes in the fluorescent pseudomonads resemble those of p-aminobenzoate synthase genes of the enteric bacteria more closely than the anthranilate synthase genes of those organisms; however, no requirement for p-aminobenzoate was found in the Pseudomonas mutants created in this study.
Role of Sequencing the Measles Virus Hemagglutinin Gene and Hypervariable Region in the Measles Outbreak Investigations in Sweden During 2013-2014.

PubMed

Harvala, Heli; Wiman, Åsa; Wallensten, Anders; Zakikhany, Katherina; Englund, Hélène; Brytting, Maria

2016-02-15

It is increasingly difficult to differentiate measles viruses (MeVs) relating to certain outbreaks on the basis of the nucleoprotein (N) gene sequence only, as the diversity of circulating MeV strains has decreased. We studied genomic regions that could provide better molecular discrimination between epidemiologically linked and unlinked MeV variants identified in Sweden during 2013-2014. The hemagglutinin (H) gene and hypervariable region between the fusion and matrix genes (MF-HVR) from 53 MeV-positive samples were amplified and sequenced. Data on phylogenetic clustering of MeVs on the basis of N, H, and MF-HVR sequences were compared to epidemiological data. MeVs were genotyped: 27 were B3, and 26 were D8. One genotype B3 cluster based on the N gene sequence contained epidemiologically unrelated viruses from 4 outbreaks, whereas analysis of H and MF-HVR sequences separated them into phylogenetic clusters consistent with the epidemiological data. Similarly, the single cluster of viruses with a genotype D8 N gene could be divided into the 5 outbreak groups on the basis of the phylogeny of MF-HVR sequences. A detailed picture of MeV circulation with more-defined links between outbreaks was obtained by sequencing the H gene and MF-HVR. Further identification and better genetic characterization of MeVs internationally is essential in identifying sources and routes of MeV spread within and beyond Europe in the elimination end game. © The Author 2015. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
Whole-exome sequencing in obsessive-compulsive disorder identifies rare mutations in immunological and neurodevelopmental pathways

PubMed Central

Cappi, C; Brentani, H; Lima, L; Sanders, S J; Zai, G; Diniz, B J; Reis, V N S; Hounie, A G; Conceição do Rosário, M; Mariani, D; Requena, G L; Puga, R; Souza-Duran, F L; Shavitt, R G; Pauls, D L; Miguel, E C; Fernandez, T V

2016-01-01

Studies of rare genetic variation have identified molecular pathways conferring risk for developmental neuropsychiatric disorders. To date, no published whole-exome sequencing studies have been reported in obsessive-compulsive disorder (OCD). We sequenced all the genome coding regions in 20 sporadic OCD cases and their unaffected parents to identify rare de novo (DN) single-nucleotide variants (SNVs). The primary aim of this pilot study was to determine whether DN variation contributes to OCD risk. To this aim, we evaluated whether there is an elevated rate of DN mutations in OCD, which would justify this approach toward gene discovery in larger studies of the disorder. Furthermore, to explore functional molecular correlations among genes with nonsynonymous DN SNVs in OCD probands, a protein–protein interaction (PPI) network was generated based on databases of direct molecular interactions. We applied Degree-Aware Disease Gene Prioritization (DADA) to rank the PPI network genes based on their relatedness to a set of OCD candidate genes from two OCD genome-wide association studies (Stewart et al., 2013; Mattheisen et al., 2014). In addition, we performed a pathway analysis with genes from the PPI network. The rate of DN SNVs in OCD was 2.51 × 10−8 per base per generation, significantly higher than a previous estimated rate in unaffected subjects using the same sequencing platform and analytic pipeline. Several genes harboring DN SNVs in OCD were highly interconnected in the PPI network and ranked high in the DADA analysis. Nearly all the DN SNVs in this study are in genes expressed in the human brain, and a pathway analysis revealed enrichment in immunological and central nervous system functioning and development. The results of this pilot study indicate that further investigation of DN variation in larger OCD cohorts is warranted to identify specific risk genes and to confirm our preliminary finding with regard to PPI network enrichment for particular biological pathways and functions. PMID:27023170
Transcriptome sequence analysis of an ornamental plant, Ananas comosus var. bracteatus, revealed the potential unigenes involved in terpenoid and phenylpropanoid biosynthesis.

PubMed

Ma, Jun; Kanakala, S; He, Yehua; Zhang, Junli; Zhong, Xiaolan

2015-01-01

Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus.
Transcriptome Sequence Analysis of an Ornamental Plant, Ananas comosus var. bracteatus, Revealed the Potential Unigenes Involved in Terpenoid and Phenylpropanoid Biosynthesis

PubMed Central

Ma, Jun; Kanakala, S.; He, Yehua; Zhang, Junli; Zhong, Xiaolan

2015-01-01

Background Ananas comosus var. bracteatus (Red Pineapple) is an important ornamental plant for its colorful leaves and decorative red fruits. Because of its complex genome, it is difficult to understand the molecular mechanisms involved in the growth and development. Thus high-throughput transcriptome sequencing of Ananas comosus var. bracteatus is necessary to generate large quantities of transcript sequences for the purpose of gene discovery and functional genomic studies. Results The Ananas comosus var. bracteatus transcriptome was sequenced by the Illumina paired-end sequencing technology. We obtained a total of 23.5 million high quality sequencing reads, 1,555,808 contigs and 41,052 unigenes. In total 41,052 unigenes of Ananas comosus var. bracteatus, 23,275 unigenes were annotated in the NCBI non-redundant protein database and 23,134 unigenes were annotated in the Swiss-Port database. Out of these, 17,748 and 8,505 unigenes were assigned to gene ontology categories and clusters of orthologous groups, respectively. Functional annotation against Kyoto Encyclopedia of Genes and Genomes Pathway database identified 5,825 unigenes which were mapped to 117 pathways. The assembly predicted many unigenes that were previously unknown. The annotated unigenes were compared against pineapple, rice, maize, Arabidopsis, and sorghum. Unigenes that did not match any of those five sequence datasets are considered to be Ananas comosus var. bracteatus unique. We predicted unigenes encoding enzymes involved in terpenoid and phenylpropanoid biosynthesis. Conclusion The sequence data provide the most comprehensive transcriptomic resource currently available for Ananas comosus var. bracteatus. To our knowledge; this is the first report on the de novo transcriptome sequencing of the Ananas comosus var. bracteatus. Unigenes obtained in this study, may help improve future gene expression, genetic and genomics studies in Ananas comosus var. bracteatus. PMID:25769053
The complete mitochondrial genome of the tapeworm Cladotaenia vulturi (Cestoda: Paruterinidae): gene arrangement and phylogenetic relationships with other cestodes.

PubMed

Guo, Aijiang

2016-08-31

Tapeworms Cladotaenia spp. are among the most important wildlife pathogens in birds of prey. The genus Cladotaenia is placed in the family Paruterinidae based on morphological characteristics and hosts. However, limited molecular information is available for studying the phylogenetic position of this genus in relation to other cestodes. In this study, the complete mitochondrial (mt) genome of Cladotaenia vulturi was amplified using "Long-PCR" and then sequenced by primer walking. Sequence annotation and gene identification were performed by comparison with published flatworm mt genomes. The phylogenetic relationships of C. vulturi with other cestode species were established using the concatenated amino acid sequences of 12 protein-coding genes with Bayesian Inference and Maximum Likelihood methods. The complete mitochondrial genome of the Cladotaenia vulturi is 13,411 kb in size and contains 36 genes. The gene arrangement of C. vulturi is identical to those in Anoplocephala spp. (Anoplocephalidae), Hymenolepis spp. (Hymenolepididae) and Dipylidium caninum (Dipylidiidae), but different from that in taeniids owing to the order shift between the tRNA (L1) and tRNA (S2) genes. Phylogenetic analyses based on the amino acid sequences of the concatenated 12 protein-coding genes showed that the species in the Taeniidae form a group and C. vulturi is a sister taxon to the species of the family Taeniidae. To our knowledge, the present study provides the first molecular data to support the early proposal from morphological evidence that the Taeniidae is a sister group to the family Paruterinidae. This novel mt genome sequence will be useful for further investigations into the population genetics, phylogenetics and systematics of the family Paruterinidae and inferring phylogenetic relationships among several lineages within the order Cyclophyllidea.

GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

PubMed

Schulz, Tizian; Stoye, Jens; Doerr, Daniel

2018-05-08

Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ryan, Q.C.

There are two nonallelic human {gamma} globin genes located on the short arm of chromosome No. 11 in the order 5{prime}-{sup G}{sub {gamma}}-{sup A}{sub {gamma}}-3{prime}. Various modifications of the two {gamma} genes have been reported and include: deletions, triplications, quadruplications and recently a quintuplication. These are generally created by one or more unequal crossovers in the {gamma} globin gene regions on adjacent chromosomes. During the course of looking for a {gamma}{sup {degree}} thalassemia, which might be due to a crossover of looking for a {gamma} genes, two cases were found in the family W. Bgl II mapping studies showed amore » 5 kb deletion at the {gamma} gene loci in these individuals. The Bgl II fragment from the {gamma} gene loci of R.W. was cloned into the phage vector QR1. Phage mapping showed that two out of the three Pst I sites within the Bgl II fragment were missing which suggested that the crossover might have occurred within the {gamma} gene, possibly within the {gamma}IVS II region. Sequence analysis of the cloned fragment revealed an unusual sequence which had no sequence homology with the {gamma} gene region except for a small 264 bp region near the 3{prime} end. The orientation of the 264 bp fragment is inverted relative to homologous sequences in the {sup G}{sub {gamma}} and {sup A}{sub {gamma}} IVS II. The unusual sequence was computer analyzed for homology with every DNA sequence file in the EMBL database and GenBank and did not show any significant homologies to all the available DNA sequences except for the 264 bp {gamma}IVS II homology.« less
Peanut gene expression profiling in developing seeds at different reproduction stages during Aspergillus parasiticus infection

PubMed Central

Guo, Baozhu; Chen, Xiaoping; Dang, Phat; Scully, Brian T; Liang, Xuanqiang; Holbrook, C Corley; Yu, Jiujiang; Culbreath, Albert K

2008-01-01

Background Peanut (Arachis hypogaea L.) is an important crop economically and nutritionally, and is one of the most susceptible host crops to colonization of Aspergillus parasiticus and subsequent aflatoxin contamination. Knowledge from molecular genetic studies could help to devise strategies in alleviating this problem; however, few peanut DNA sequences are available in the public database. In order to understand the molecular basis of host resistance to aflatoxin contamination, a large-scale project was conducted to generate expressed sequence tags (ESTs) from developing seeds to identify resistance-related genes involved in defense response against Aspergillus infection and subsequent aflatoxin contamination. Results We constructed six different cDNA libraries derived from developing peanut seeds at three reproduction stages (R5, R6 and R7) from a resistant and a susceptible cultivated peanut genotypes, 'Tifrunner' (susceptible to Aspergillus infection with higher aflatoxin contamination and resistant to TSWV) and 'GT-C20' (resistant to Aspergillus with reduced aflatoxin contamination and susceptible to TSWV). The developing peanut seed tissues were challenged by A. parasiticus and drought stress in the field. A total of 24,192 randomly selected cDNA clones from six libraries were sequenced. After removing vector sequences and quality trimming, 21,777 high-quality EST sequences were generated. Sequence clustering and assembling resulted in 8,689 unique EST sequences with 1,741 tentative consensus EST sequences (TCs) and 6,948 singleton ESTs. Functional classification was performed according to MIPS functional catalogue criteria. The unique EST sequences were divided into twenty-two categories. A similarity search against the non-redundant protein database available from NCBI indicated that 84.78% of total ESTs showed significant similarity to known proteins, of which 165 genes had been previously reported in peanuts. There were differences in overall expression patterns in different libraries and genotypes. A number of sequences were expressed throughout all of the libraries, representing constitutive expressed sequences. In order to identify resistance-related genes with significantly differential expression, a statistical analysis to estimate the relative abundance (R) was used to compare the relative abundance of each gene transcripts in each cDNA library. Thirty six and forty seven unique EST sequences with threshold of R > 4 from libraries of 'GT-C20' and 'Tifrunner', respectively, were selected for examination of temporal gene expression patterns according to EST frequencies. Nine and eight resistance-related genes with significant up-regulation were obtained in 'GT-C20' and 'Tifrunner' libraries, respectively. Among them, three genes were common in both genotypes. Furthermore, a comparison of our EST sequences with other plant sequences in the TIGR Gene Indices libraries showed that the percentage of peanut EST matched to Arabidopsis thaliana, maize (Zea mays), Medicago truncatula, rapeseed (Brassica napus), rice (Oryza sativa), soybean (Glycine max) and wheat (Triticum aestivum) ESTs ranged from 33.84% to 79.46% with the sequence identity ≥ 80%. These results revealed that peanut ESTs are more closely related to legume species than to cereal crops, and more homologous to dicot than to monocot plant species. Conclusion The developed ESTs can be used to discover novel sequences or genes, to identify resistance-related genes and to detect the differences among alleles or markers between these resistant and susceptible peanut genotypes. Additionally, this large collection of cultivated peanut EST sequences will make it possible to construct microarrays for gene expression studies and for further characterization of host resistance mechanisms. It will be a valuable genomic resource for the peanut community. The 21,777 ESTs have been deposited to the NCBI GenBank database with accession numbers ES702769 to ES724546. PMID:18248674
Expressed sequence tags from Atta laevigata and identification of candidate genes for the control of pest leaf-cutting ants.

PubMed

Rodovalho, Cynara M; Ferro, Milene; Fonseca, Fernando Pp; Antonio, Erik A; Guilherme, Ivan R; Henrique-Silva, Flávio; Bacci, Maurício

2011-06-17

Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters.
Expressed sequence tags from Atta laevigata and identification of candidate genes for the control of pest leaf-cutting ants

PubMed Central

2011-01-01

Background Leafcutters are the highest evolved within Neotropical ants in the tribe Attini and model systems for studying caste formation, labor division and symbiosis with microorganisms. Some species of leafcutters are agricultural pests controlled by chemicals which affect other animals and accumulate in the environment. Aiming to provide genetic basis for the study of leafcutters and for the development of more specific and environmentally friendly methods for the control of pest leafcutters, we generated expressed sequence tag data from Atta laevigata, one of the pest ants with broad geographic distribution in South America. Results The analysis of the expressed sequence tags allowed us to characterize 2,006 unique sequences in Atta laevigata. Sixteen of these genes had a high number of transcripts and are likely positively selected for high level of gene expression, being responsible for three basic biological functions: energy conservation through redox reactions in mitochondria; cytoskeleton and muscle structuring; regulation of gene expression and metabolism. Based on leafcutters lifestyle and reports of genes involved in key processes of other social insects, we identified 146 sequences potential targets for controlling pest leafcutters. The targets are responsible for antixenobiosis, development and longevity, immunity, resistance to pathogens, pheromone function, cell signaling, behavior, polysaccharide metabolism and arginine kynase activity. Conclusion The generation and analysis of expressed sequence tags from Atta laevigata have provided important genetic basis for future studies on the biology of leaf-cutting ants and may contribute to the development of a more specific and environmentally friendly method for the control of agricultural pest leafcutters. PMID:21682882
Identification of SLC20A1 and SLC15A4 among other genes as potential risk factors for combined pituitary hormone deficiency.

PubMed

Simm, Franziska; Griesbeck, Anne; Choukair, Daniela; Weiß, Birgit; Paramasivam, Nagarajan; Klammt, Jürgen; Schlesner, Matthias; Wiemann, Stefan; Martinez, Cristina; Hoffmann, Georg F; Pfäffle, Roland W; Bettendorf, Markus; Rappold, Gudrun A

2017-10-26

PurposeCombined pituitary hormone deficiency (CPHD) is characterized by a malformed or underdeveloped pituitary gland resulting in an impaired pituitary hormone secretion. Several transcription factors have been described in its etiology, but defects in known genes account for only a small proportion of cases.MethodsTo identify novel genetic causes for congenital hypopituitarism, we performed exome-sequencing studies on 10 patients with CPHD and their unaffected parents. Two candidate genes were sequenced in further 200 patients. Genotype data of known hypopituitary genes are reviewed.ResultsWe discovered 51 likely damaging variants in 38 genes; 12 of the 51 variants represent de novo events (24%); 11 of the 38 genes (29%) were present in the E12.5/E14.5 pituitary transcriptome. Targeted sequencing of two candidate genes, SLC20A1 and SLC15A4, of the solute carrier membrane transport protein family in 200 additional patients demonstrated two further variants predicted as damaging. We also found combinations of de novo (SLC20A1/SLC15A4) and transmitted variants (GLI2/LHX3) in the same individuals, leading to the full-blown CPHD phenotype.ConclusionThese data expand the pituitary target genes repertoire for diagnostics and further functional studies. Exome sequencing has identified a combination of rare variants in different genes that might explain incomplete penetrance in CPHD.Genetics in Medicine advance online publication, 26 October 2017; doi:10.1038/gim.2017.165.
Next-Generation Sequence Analysis of the Genome of RFHVMn, the Macaque Homolog of Kaposi's Sarcoma (KS)-Associated Herpesvirus, from a KS-Like Tumor of a Pig-Tailed Macaque

PubMed Central

Bruce, A. Gregory; Ryan, Jonathan T.; Thomas, Mathew J.; Peng, Xinxia; Grundhoff, Adam; Tsai, Che-Chung

2013-01-01

The complete sequence of retroperitoneal fibromatosis-associated herpesvirus Macaca nemestrina (RFHVMn), the pig-tailed macaque homolog of Kaposi's sarcoma-associated herpesvirus (KSHV), was determined by next-generation sequence analysis of a Kaposi's sarcoma (KS)-like macaque tumor. Colinearity of genes was observed with the KSHV genome, and the core herpesvirus genes had strong sequence homology to the corresponding KSHV genes. RFHVMn lacked homologs of open reading frame 11 (ORF11) and KSHV ORFs K5 and K6, which appear to have been generated by duplication of ORFs K3 and K4 after the divergence of KSHV and RFHV. RFHVMn contained positional homologs of all other unique KSHV genes, although some showed limited sequence similarity. RFHVMn contained a number of candidate microRNA genes. Although there was little sequence similarity with KSHV microRNAs, one candidate contained the same seed sequence as the positional homolog, kshv-miR-K12-10a, suggesting functional overlap. RNA transcript splicing was highly conserved between RFHVMn and KSHV, and strong sequence conservation was noted in specific promoters and putative origins of replication, predicting important functional similarities. Sequence comparisons indicated that RFHVMn and KSHV developed in long-term synchrony with the evolution of their hosts, and both viruses phylogenetically group within the RV1 lineage of Old World primate rhadinoviruses. RFHVMn is the closest homolog of KSHV to be completely sequenced and the first sequenced RV1 rhadinovirus homolog of KSHV from a nonhuman Old World primate. The strong genetic and sequence similarity between RFHVMn and KSHV, coupled with similarities in biology and pathology, demonstrate that RFHVMn infection in macaques offers an important and relevant model for the study of KSHV in humans. PMID:24109218
The Inference of Gene Trees with Species Trees

PubMed Central

Szöllősi, Gergely J.; Tannier, Eric; Daubin, Vincent; Boussau, Bastien

2015-01-01

This article reviews the various models that have been used to describe the relationships between gene trees and species trees. Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can coexist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree–species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution. PMID:25070970
Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

USDA-ARS?s Scientific Manuscript database

The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...
Genome sequence analysis of predicted polyprenol reductase gene from mangrove plant kandelia obovata

NASA Astrophysics Data System (ADS)

Basyuni, M.; Sagami, H.; Baba, S.; Oku, H.

2018-03-01

It has been previously reported that dolichols but not polyprenols were predominated in mangrove leaves and roots. Therefore, the occurrence of larger amounts of dolichol in leaves of mangrove plants implies that polyprenol reductase is responsible for the conversion of polyprenol to dolichol may be active in mangrove leaves. Here we report the early assessment of probably polyprenol reductase gene from genome sequence of mangrove plant Kandelia obovata. The functional assignment of the gene was based on a homology search of the sequences against the non-redundant (nr) peptide database of NCBI using Blastx. The degree of sequence identity between DNA sequence and known polyprenol reductase was confirmed using the Blastx probability E-value, total score, and identity. The genome sequence data resulted in three partial sequences, termed c23157 (700 bp), c23901 (960 bp), and c24171 (531 bp). The c23157 gene showed the highest similarity (61%) to predicted polyprenol reductase 2- like from Gossypium raimondii with E-value 2e-100. The second gene was c23901 to exhibit high similarity (78%) to the steroid 5-alpha-reductase Det2 from J. curcas with E-value 2e-140. Furthermore, the c24171 gene depicted highest similarity (79%) to the polyprenol reductase 2 isoform X1 from Jatropha curcas with E- value 7e-21.The present study suggested that the c23157, c23901, and c24171, genes may encode predicted polyprenol reductase. The c23157, c23901, c24171 are therefore the new type of predicted polyprenol reductase from K. obovata.
Enterobacter xiangfangensis sp. nov., isolated from Chinese traditional sourdough, and reclassification of Enterobacter sacchari Zhu et al. 2013 as Kosakonia sacchari comb. nov.

PubMed

Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

2014-08-01

A Gram-stain-negative bacterial strain, 10-17(T), was isolated from traditional sourdough in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, RNA polymerase β subunit (rpoB) gene sequence analysis, DNA gyrase (gyrB) gene sequence analysis, initiation translation factor 2 (infB) gene sequence analysis, ATP synthase β subunit (atpD) gene sequence analysis, fatty acid methyl ester analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain 10-17(T) was phylogenetically related to Enterobacter hormaechei CIP 103441(T), Enterobacter cancerogenus LMG 2693(T), Enterobacter asburiae JCM 6051(T), Enterobacter mori LMG 25706(T), Enterobacter ludwigii EN-119(T) and Leclercia adecarboxylata LMG 2803(T), having 99.5%, 99.3%, 98.7%, 98.5%, 98.4% and 98.4% 16S rRNA gene sequence similarity, respectively. On the basis of polyphasic characterization data obtained in the present study, a novel species, Enterobacter xiangfangensis sp. nov., is proposed and the type strain is 10-17(T) ( = LMG 27195(T) = NCIMB 14836(T) = CCUG 62994(T)). Enterobacter sacchari Zhu et al. 2013 was reclassified as Kosakonia sacchari comb. nov. on the basis of 16S rRNA, rpoB, gyrB, infB and atpD gene sequence analysis and the type strain is strain SP1(T)( = CGMCC 1.12102(T) = LMG 26783(T)). © 2014 IUMS.
Candidate gene identification of ovulation-inducing genes by RNA sequencing with an in vivo assay in zebrafish.

PubMed

Klangnurak, Wanlada; Fukuyo, Taketo; Rezanujjaman, M D; Seki, Masahide; Sugano, Sumio; Suzuki, Yutaka; Tokumoto, Toshinobu

2018-01-01

We previously reported the microarray-based selection of three ovulation-related genes in zebrafish. We used a different selection method in this study, RNA sequencing analysis. An additional eight up-regulated candidates were found as specifically up-regulated genes in ovulation-induced samples. Changes in gene expression were confirmed by qPCR analysis. Furthermore, up-regulation prior to ovulation during natural spawning was verified in samples from natural pairing. Gene knock-out zebrafish strains of one of the candidates, the starmaker gene (stm), were established by CRISPR genome editing techniques. Unexpectedly, homozygous mutants were fertile and could spawn eggs. However, a high percentage of unfertilized eggs and abnormal embryos were produced from these homozygous females. The results suggest that the stm gene is necessary for fertilization. In this study, we selected additional ovulation-inducing candidate genes, and a novel function of the stm gene was investigated.
Sorting by Cuts, Joins, and Whole Chromosome Duplications.

PubMed

Zeira, Ron; Shamir, Ron

2017-02-01

Genome rearrangement problems have been extensively studied due to their importance in biology. Most studied models assumed a single copy per gene. However, in reality, duplicated genes are common, most notably in cancer. In this study, we make a step toward handling duplicated genes by considering a model that allows the atomic operations of cut, join, and whole chromosome duplication. Given two linear genomes, [Formula: see text] with one copy per gene and [Formula: see text] with two copies per gene, we give a linear time algorithm for computing a shortest sequence of operations transforming [Formula: see text] into [Formula: see text] such that all intermediate genomes are linear. We also show that computing an optimal sequence with fewest duplications is NP-hard.
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).

PubMed

Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang

2016-11-01

In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
The gene space in wheat: the complete γ-gliadin gene family from the wheat cultivar Chinese Spring.

PubMed

Anderson, Olin D; Huo, Naxin; Gu, Yong Q

2013-06-01

The complete set of unique γ-gliadin genes is described for the wheat cultivar Chinese Spring using a combination of expressed sequence tag (EST) and Roche 454 DNA sequences. Assemblies of Chinese Spring ESTs yielded 11 different γ-gliadin gene sequences. Two of the sequences encode identical polypeptides and are assumed to be the result of a recent gene duplication. One gene has a 3' coding mutation that changes the reading frame in the final eight codons. A second assembly of Chinese Spring γ-gliadin sequences was generated using Roche 454 total genomic DNA sequences. The 454 assembly confirmed the same 11 active genes as the EST assembly plus two pseudogenes not represented by ESTs. These 13 γ-gliadin sequences represent the complete unique set of γ-gliadin genes for cv Chinese Spring, although not ruled out are additional genes that are exact duplications of these 13 genes. A comparison with the ESTs of two other hexaploid cultivars (Butte 86 and Recital) finds that the most active genes are present in all three cultivars, with exceptions likely due to too few ESTs for detection in Butte 86 and Recital. A comparison of the numbers of ESTs per gene indicates differential levels of expression within the γ-gliadin gene family. Genome assignments were made for 6 of the 13 Chinese Spring γ-gliadin genes, i.e., one assignment from a match to two γ-gliadin genes found within a tetraploid wheat A genome BAC and four genes that match four distinct γ-gliadin sequences assembled from Roche 454 sequences from Aegilops tauschii, the hexaploid wheat D-genome ancestor.
Evaluating the Detection of Hydrocarbon-Degrading Bacteria in 16S rRNA Gene Sequencing Surveys

PubMed Central

Berry, David; Gutierrez, Tony

2017-01-01

Hydrocarbonoclastic bacteria (HCB) play a key role in the biodegradation of oil hydrocarbons in marine and other environments. A small number of taxa have been identified as obligate HCB, notably the Gammaproteobacterial genera Alcanivorax, Cycloclasticus, Marinobacter, Neptumonas, Oleiphilus, Oleispira, and Thalassolituus, as well as the Alphaproteobacterial genus Thalassospira. Detection of HCB in amplicon-based sequencing surveys relies on high coverage by PCR primers and accurate taxonomic classification. In this study, we performed a phylogenetic analysis to identify 16S rRNA gene sequence regions that represent the breadth of sequence diversity within these taxa. Using validated sequences, we evaluated 449 universal 16S rRNA gene-targeted bacterial PCR primer pairs for their coverage of these taxa. The results of this analysis provide a practical framework for selection of suitable primer sets for optimal detection of HCB in sequencing surveys. PMID:28567035
Evaluating the Detection of Hydrocarbon-Degrading Bacteria in 16S rRNA Gene Sequencing Surveys.

PubMed

Berry, David; Gutierrez, Tony

2017-01-01

Hydrocarbonoclastic bacteria (HCB) play a key role in the biodegradation of oil hydrocarbons in marine and other environments. A small number of taxa have been identified as obligate HCB, notably the Gammaproteobacterial genera Alcanivorax, Cycloclasticus, Marinobacter, Neptumonas, Oleiphilus, Oleispira , and Thalassolituus , as well as the Alphaproteobacterial genus Thalassospira . Detection of HCB in amplicon-based sequencing surveys relies on high coverage by PCR primers and accurate taxonomic classification. In this study, we performed a phylogenetic analysis to identify 16S rRNA gene sequence regions that represent the breadth of sequence diversity within these taxa. Using validated sequences, we evaluated 449 universal 16S rRNA gene-targeted bacterial PCR primer pairs for their coverage of these taxa. The results of this analysis provide a practical framework for selection of suitable primer sets for optimal detection of HCB in sequencing surveys.
Transcriptomic analysis of Siberian ginseng (Eleutherococcus senticosus) to discover genes involved in saponin biosynthesis.

PubMed

Hwang, Hwan-Su; Lee, Hyoshin; Choi, Yong Eui

2015-03-14

Eleutherococcus senticosus, Siberian ginseng, is a highly valued woody medicinal plant belonging to the family Araliaceae. E. senticosus produces a rich variety of saponins such as oleanane-type, noroleanane-type, 29-hydroxyoleanan-type, and lupane-type saponins. Genomic or transcriptomic approaches have not been used to investigate the saponin biosynthetic pathway in this plant. In this study, de novo sequencing was performed to select candidate genes involved in the saponin biosynthetic pathway. A half-plate 454 pyrosequencing run produced 627,923 high-quality reads with an average sequence length of 422 bases. De novo assembly generated 72,811 unique sequences, including 15,217 contigs and 57,594 singletons. Approximately 48,300 (66.3%) unique sequences were annotated using BLAST similarity searches. All of the mevalonate pathway genes for saponin biosynthesis starting from acetyl-CoA were isolated. Moreover, 206 reads of cytochrome P450 (CYP) and 145 reads of uridine diphosphate glycosyltransferase (UGT) sequences were isolated. Based on methyl jasmonate (MeJA) treatment and real-time PCR (qPCR) analysis, 3 CYPs and 3 UGTs were finally selected as candidate genes involved in the saponin biosynthetic pathway. The identified sequences associated with saponin biosynthesis will facilitate the study of the functional genomics of saponin biosynthesis and genetic engineering of E. senticosus.
A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

PubMed Central

Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

2014-01-01

The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082
Assembly and features of secondary metabolite biosynthetic gene clusters in Streptomyces ansochromogenes.

PubMed

Zhong, Xingyu; Tian, Yuqing; Niu, Guoqing; Tan, Huarong

2013-07-01

A draft genome sequence of Streptomyces ansochromogenes 7100 was generated using 454 sequencing technology. In combination with local BLAST searches and gap filling techniques, a comprehensive antiSMASH-based method was adopted to assemble the secondary metabolite biosynthetic gene clusters in the draft genome of S. ansochromogenes. A total of at least 35 putative gene clusters were identified and assembled. Transcriptional analysis showed that 20 of the 35 gene clusters were expressed in either or all of the three different media tested, whereas the other 15 gene clusters were silent in all three different media. This study provides a comprehensive method to identify and assemble secondary metabolite biosynthetic gene clusters in draft genomes of Streptomyces, and will significantly promote functional studies of these secondary metabolite biosynthetic gene clusters.

Molecular typing of Staphylococcus aureus based on coagulase gene.

PubMed

Javid, Faizan; Taku, Anil; Bhat, Mohd Altaf; Badroo, Gulzar Ahmad; Mudasir, Mir; Sofi, Tanveer Ahmad

2018-04-01

This study was conducted to study the coagulase gene-based genetic diversity of Staphylococcus aureus , isolated from different samples of cattle using restriction fragment length polymorphism (RFLP) and their sequence-based phylogenetic analysis. A total of 192 different samples from mastitic milk, nasal cavity, and pus from skin wounds of cattle from Military Dairy Farm, Jammu, India, were screened for the presence of S. aureus . The presumptive isolates were confirmed by nuc gene-based polymerase chain reaction (PCR). The confirmed S. aureus isolates were subjected to coagulase ( coa ) gene PCR. Different coa genotypes observed were subjected to RFLP using restriction enzymes Hae111 and Alu1 , to obtain the different restriction patterns. One isolate from each restriction pattern was sequenced. These sequences were aligned for maximum homology using the Bioedit softwareandsimilarity in the sequences was inferred with the help of sequence identity matrix. Of 192 different samples,39 (20.31%) isolates of S. aureus were confirmed by targeting nuc gene using PCR. Of 39 S. aureus isolates, 25 (64.10%) isolates carried coa gene. Four different genotypes of coa gene, i.e., 514 bp, 595 bp, 757 bp, and 802 bp were obtained. Two coa genotypes, 595 bp (15 isolates) and 802 bp (4 isolates), were observed in mastitic milk. 514 bp (2 isolates) and 757 bp (4 isolates) coa genotypes were observed from nasal cavity and pus from skin wounds, respectively. On RFLP using both restriction enzymes, four different restriction patterns P1, P2, P3, and P4 were observed. On sequencing, four different sequences having unique restriction patterns were obtained. The most identical sequences with the value of 0.810 were found between isolate S. aureus 514 (nasal cavity) and S. aureus 595 (mastitic milk), and thus, they are most closely related. While as the most distant sequences with the value of 0.483 were found between S. aureus 514 and S. aureus 802 isolates. The study, being localized to only one farm, yielded different RFLP patterns as observed from different sampling sites, which indicates that different S . aureus coagulase typeshave a site-specific predilection. Two coa patterns were observed in mastitic milk indicating multiple origins of infection, with 595 bp coa genotype being predominant in mastitic milk. The coa genotypes and their restriction patterns observed in the present study are novel, not published earlier. 514 and 595 coa variants of S. aureus are genetically most related.
Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413

PubMed Central

Vizcaíno, Juan Antonio; González, Francisco Javier; Suárez, M Belén; Redondo, José; Heinrich, Julian; Delgado-Jarana, Jesús; Hermosa, Rosa; Gutiérrez, Santiago; Monte, Enrique; Llobell, Antonio; Rey, Manuel

2006-01-01

Background The filamentous fungus Trichoderma harzianum is used as biological control agent of several plant-pathogenic fungi. In order to study the genome of this fungus, a functional genomics project called "TrichoEST" was developed to give insights into genes involved in biological control activities using an approach based on the generation of expressed sequence tags (ESTs). Results Eight different cDNA libraries from T. harzianum strain CECT 2413 were constructed. Different growth conditions involving mainly different nutrient conditions and/or stresses were used. We here present the analysis of the 8,710 ESTs generated. A total of 3,478 unique sequences were identified of which 81.4% had sequence similarity with GenBank entries, using the BLASTX algorithm. Using the Gene Ontology hierarchy, we performed the annotation of 51.1% of the unique sequences and compared its distribution among the gene libraries. Additionally, the InterProScan algorithm was used in order to further characterize the sequences. The identification of the putatively secreted proteins was also carried out. Later, based on the EST abundance, we examined the highly expressed genes and a hydrophobin was identified as the gene expressed at the highest level. We compared our collection of ESTs with the previous collections obtained from Trichoderma species and we also compared our sequence set with different complete eukaryotic genomes from several animals, plants and fungi. Accordingly, the presence of similar sequences in different kingdoms was also studied. Conclusion This EST collection and its annotation provide a significant resource for basic and applied research on T. harzianum, a fungus with a high biotechnological interest. PMID:16872539
Exome sequencing for simultaneous mutation screening in children with hemophagocytic lymphohistiocytosis.

PubMed

Mukda, Ekchol; Trachoo, Objoon; Pasomsub, Ekawat; Tiyasirichokchai, Rawiphorn; Iemwimangsa, Nareenart; Sosothikul, Darintr; Chantratita, Wasun; Pakakasama, Samart

2017-08-01

In the present study, we used exome sequencing to analyze PRF1, UNC13D, STX11, and STXBP2, as well as genes associated with primary immunodeficiency disease (RAB27A, LYST, AP3B1, SH2D1A, ITK, CD27, XIAP, and MAGT1) in Thai children with hemophagocytic lymphohistiocytosis (HLH). We performed mutation analysis of HLH-associated genes in 25 Thai children using an exome sequencing method. Genetic variations found within these target genes were compared to exome sequencing data from 133 healthy individuals. Variants identified with minor allele frequencies <5% and novel mutations were confirmed using Sanger sequencing. Exome sequencing data revealed 101 non-synonymous single nucleotide polymorphisms (SNPs) in all subjects. These SNPs were classified as pathogenic (n = 1), likely pathogenic (n = 16), variant of unknown significance (n = 12), or benign variant (n = 72). Homozygous, compound heterozygous, and double-gene heterozygous variants, involving mutations in PRF1 (n = 3), UNC13D (n = 2), STXBP2 (n = 3), LYST (n = 3), XIAP (n = 2), AP3B1 (n = 1), RAB27A (n = 1), and MAGT1 (n = 1), were demonstrated in 12 patients. Novel mutations were found in most patients in this study. In conclusion, exome sequencing demonstrated the ability to identify rare genetic variants in HLH patients. This method is useful in the detection of mutations in multi-gene associated diseases.
Optimization of Multilocus Sequence Analysis for Identification of Species in the Genus Vibrio

PubMed Central

Gabriel, Michael W.; Matsui, George Y.; Friedman, Robert

2014-01-01

Multilocus sequence analysis (MLSA) is an important method for identification of taxa that are not well differentiated by 16S rRNA gene sequences alone. In this procedure, concatenated sequences of selected genes are constructed and then analyzed. The effects that the number and the order of genes used in MLSA have on reconstruction of phylogenetic relationships were examined. The recA, rpoA, gapA, 16S rRNA gene, gyrB, and ftsZ sequences from 56 species of the genus Vibrio were used to construct molecular phylogenies, and these were evaluated individually and using various gene combinations. Phylogenies from two-gene sequences employing recA and rpoA in both possible gene orders were different. The addition of the gapA gene sequence, producing all six possible concatenated sequences, reduced the differences in phylogenies to degrees of statistical (bootstrap) support for some nodes. The overall statistical support for the phylogenetic tree, assayed on the basis of a reliability score (calculated from the number of nodes having bootstrap values of ≥80 divided by the total number of nodes) increased with increasing numbers of genes used, up to a maximum of four. No further improvement was observed from addition of the fifth gene sequence (ftsZ), and addition of the sixth gene (gyrB) resulted in lower proportions of strongly supported nodes. Reductions in the numbers of strongly supported nodes were also observed when maximum parsimony was employed for tree construction. Use of a small number of gene sequences in MLSA resulted in accurate identification of Vibrio species. PMID:24951781
Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

PubMed

Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

2015-12-01

Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
Sequence, distribution and chromosomal context of class I and class II pilin genes of Neisseria meningitidis identified in whole genome sequences

PubMed Central

2014-01-01

Background Neisseria meningitidis expresses type four pili (Tfp) which are important for colonisation and virulence. Tfp have been considered as one of the most variable structures on the bacterial surface due to high frequency gene conversion, resulting in amino acid sequence variation of the major pilin subunit (PilE). Meningococci express either a class I or a class II pilE gene and recent work has indicated that class II pilins do not undergo antigenic variation, as class II pilE genes encode conserved pilin subunits. The purpose of this work was to use whole genome sequences to further investigate the frequency and variability of the class II pilE genes in meningococcal isolate collections. Results We analysed over 600 publically available whole genome sequences of N. meningitidis isolates to determine the sequence and genomic organization of pilE. We confirmed that meningococcal strains belonging to a limited number of clonal complexes (ccs, namely cc1, cc5, cc8, cc11 and cc174) harbour a class II pilE gene which is conserved in terms of sequence and chromosomal context. We also identified pilS cassettes in all isolates with class II pilE, however, our analysis indicates that these do not serve as donor sequences for pilE/pilS recombination. Furthermore, our work reveals that the class II pilE locus lacks the DNA sequence motifs that enable (G4) or enhance (Sma/Cla repeat) pilin antigenic variation. Finally, through analysis of pilin genes in commensal Neisseria species we found that meningococcal class II pilE genes are closely related to pilE from Neisseria lactamica and Neisseria polysaccharea, suggesting horizontal transfer among these species. Conclusions Class II pilins can be defined by their amino acid sequence and genomic context and are present in meningococcal isolates which have persisted and spread globally. The absence of G4 and Sma/Cla sequences adjacent to the class II pilE genes is consistent with the lack of pilin subunit variation in these isolates, although horizontal transfer may generate class II pilin diversity. This study supports the suggestion that high frequency antigenic variation of pilin is not universal in pathogenic Neisseria. PMID:24690385
[Phylogenetic and diversity analysis of Acidithiobacillus spp. based on 16S rRNA and RubisCO genes homologues].

PubMed

Liu, Minrui; Lin, Pengwu; Qi, Xing'e; Ni, Yongqing

2016-04-14

The purpose of the study was to reveal geographic region-related Acidithiobacillus spp. distribution and allopatric speciation. Phylogenetic and diversity analysis was done to expand our knowledge on microbial phylogeography, diversity-maintaining mechanisms and molecular biogeography. We amplified 16S rRNA gene and RubisCO genes to construct corresponding phylogenetic trees based on the sequence homology and analyzed genetic diversity of Acidithiobacillus spp.. Thirty-five strains were isolated from three different regions in China (Yunnan, Hubei, Xinjiang). The whole isolates were classified into five groups. Four strains were identified as A. ferrivorans, six as A. ferridurans, YNTR4-15 Leptspirillum ferrooxidans and HBDY3-31 as Leptospirillum ferrodiazotrophum. The remaining strains were identified as A. ferrooxidans. Analysis of cbbL and cbbM genes sequences of representative 26 strains indicated that cbbL gene of 19 were two copies (cbbL1 and cbbL2) and 7 possessed only cbbL1. cbbM gene was single copy. In nucleotide-based trees, cbbL1 gene sequences of strains were separated into three sequence types, and the cbbL2 was similar to cbbL1 with three types. Codon bias of RubisCO genes was not obvious in Acidithiobacillus spp.. Strains isolated from three different regions in China indicated a great genetic diversity in Acidithiobacillus spp. and their 16S rRNA/RubisCO genes sequence was of significant difference. Phylogenetic tree based on 16S rRNA genes and RubisCO genes was different in Acidithiobacillus spp..
A 454 multiplex sequencing method for rapid and reliable genotyping of highly polymorphic genes in large-scale studies.

PubMed

Galan, Maxime; Guivier, Emmanuel; Caraux, Gilles; Charbonnel, Nathalie; Cosson, Jean-François

2010-05-11

High-throughput sequencing technologies offer new perspectives for biomedical, agronomical and evolutionary research. Promising progresses now concern the application of these technologies to large-scale studies of genetic variation. Such studies require the genotyping of high numbers of samples. This is theoretically possible using 454 pyrosequencing, which generates billions of base pairs of sequence data. However several challenges arise: first in the attribution of each read produced to its original sample, and second, in bioinformatic analyses to distinguish true from artifactual sequence variation. This pilot study proposes a new application for the 454 GS FLX platform, allowing the individual genotyping of thousands of samples in one run. A probabilistic model has been developed to demonstrate the reliability of this method. DNA amplicons from 1,710 rodent samples were individually barcoded using a combination of tags located in forward and reverse primers. Amplicons consisted in 222 bp fragments corresponding to DRB exon 2, a highly polymorphic gene in mammals. A total of 221,789 reads were obtained, of which 153,349 were finally assigned to original samples. Rules based on a probabilistic model and a four-step procedure, were developed to validate sequences and provide a confidence level for each genotype. The method gave promising results, with the genotyping of DRB exon 2 sequences for 1,407 samples from 24 different rodent species and the sequencing of 392 variants in one half of a 454 run. Using replicates, we estimated that the reproducibility of genotyping reached 95%. This new approach is a promising alternative to classical methods involving electrophoresis-based techniques for variant separation and cloning-sequencing for sequence determination. The 454 system is less costly and time consuming and may enhance the reliability of genotypes obtained when high numbers of samples are studied. It opens up new perspectives for the study of evolutionary and functional genetics of highly polymorphic genes like major histocompatibility complex genes in vertebrates or loci regulating self-compatibility in plants. Important applications in biomedical research will include the detection of individual variation in disease susceptibility. Similarly, agronomy will benefit from this approach, through the study of genes implicated in productivity or disease susceptibility traits.
Lessons learned from whole exome sequencing in multiplex families affected by a complex genetic disorder, intracranial aneurysm.

PubMed

Farlow, Janice L; Lin, Hai; Sauerbeck, Laura; Lai, Dongbing; Koller, Daniel L; Pugh, Elizabeth; Hetrick, Kurt; Ling, Hua; Kleinloog, Rachel; van der Vlies, Pieter; Deelen, Patrick; Swertz, Morris A; Verweij, Bon H; Regli, Luca; Rinkel, Gabriel J E; Ruigrok, Ynte M; Doheny, Kimberly; Liu, Yunlong; Broderick, Joseph; Foroud, Tatiana

2015-01-01

Genetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.
Genome-wide transcriptome profiling reveals novel insights into Luffa cylindrica browning.

PubMed

Chen, Xia; Tan, Taiming; Xu, Changcheng; Huang, Shuping; Tan, Jie; Zhang, Min; Wang, Chunli; Xie, Conghua

2015-08-07

Luffa cylindrica (sponge gourd) is one of the most popular vegetables in China. Production and consumption of L. cylindrica are limited due to postharvest browning; however, little is known about the genetic regulation of the browning process. In the present study, transcriptome profiles of L. cylindrica cultivars, YLB05 (browning resistant) and XTR05 (browning sensitive), were analyzed using next-generation sequencing to clarify the genes and mechanisms associated with browning. A total of 9.1 Gb of valid data including 116,703 unigenes (>200 bp) were obtained and 39,473 sequences were annotated by alignment against five public databases. Of these, there were 27,407 genes assigned to 747 Gene Ontology functional categories; and 12,350 genes were annotated with 25 Eukaryotic Orthologous Groups (KOG) categories with 343 KOG functional terms. Additionally, by searching against the Kyoto Encyclopedia of Genes and Genomes database, 8689 unigenes were mapped to 189 pathways. Furthermore, there were 24,556 sequences found to be differentially regulated, including 4344 annotated unigenes. Several genes potentially associated with phenolic oxidation, carbohydrate and hormone metabolism were found differentially regulated between the cultivars of different browning sensitivities. Our results suggest that elements involved in enzymatic processes and other pathways might be responsible for L. cylindrica browning. The present study provides a comprehensive transcriptome sequence resource, which will facilitate further studies on gene discovery and exploiting the fruit browning mechanism of L. cylindrica. Copyright © 2015 Elsevier Inc. All rights reserved.
CRAWview: for viewing splicing variation, gene families, and polymorphism in clusters of ESTs and full-length sequences.

PubMed

Chou, A; Burke, J

1999-05-01

DNA sequence clustering has become a valuable method in support of gene discovery and gene expression analysis. Our interest lies in leveraging the sequence diversity within clusters of expressed sequence tags (ESTs) to model gene structure for the study of gene variants that arise from, among other things, alternative mRNA splicing, polymorphism, and divergence after gene duplication, fusion, and translocation events. In previous work, CRAW was developed to discover gene variants from assembled clusters of ESTs. Most importantly, novel gene features (the differing units between gene variants, for example alternative exons, polymorphisms, transposable elements, etc.) that are specialized to tissue, disease, population, or developmental states can be identified when these tools collate DNA source information with gene variant discrimination. While the goal is complete automation of novel feature and gene variant detection, current methods are far from perfect and hence the development of effective tools for visualization and exploratory data analysis are of paramount importance in the process of sifting through candidate genes and validating targets. We present CRAWview, a Java based visualization extension to CRAW. Features that vary between gene forms are displayed using an automatically generated color coded index. The reporting format of CRAWview gives a brief, high level summary report to display overlap and divergence within clusters of sequences as well as the ability to 'drill down' and see detailed information concerning regions of interest. Additionally, the alignment viewing and editing capabilities of CRAWview make it possible to interactively correct frame-shifts and otherwise edit cluster assemblies. We have implemented CRAWview as a Java application across windows NT/95 and UNIX platforms. A beta version of CRAWview will be freely available to academic users from Pangea Systems (http://www.pangeasystems.com). Contact :
Clinical germline diagnostic exome sequencing for hereditary cancer: Findings within novel candidate genes are prevalent.

PubMed

Powis, Zöe; Espenschied, Carin R; LaDuca, Holly; Hagman, Kelly D; Paudyal, Tripti; Li, Shuwei; Inaba, Hiroto; Mauer, Ann; Nathanson, Katherine L; Knost, James; Chao, Elizabeth C; Tang, Sha

2018-08-01

Clinical diagnostic exome sequencing (DES) has been effective in diagnosing individuals with suspected genetic conditions; nevertheless little has been described regarding its clinical utility in individuals with a personal and family history of cancer. This study aimed to assess diagnostic yield and clinical characteristics of pediatric and adult patients undergoing germline DES for hereditary cancer. We retrospectively reviewed 2171 patients referred for DES; cases with a personal and/or family history of cancer were further studied. Of 39 cancer patients, relevant alterations were found in eight individuals (21%), including one (3%) positive pathogenic alteration within a characterized gene, two (5%) uncertain findings in characterized genes, and five (13%) alterations in novel candidate genes. Two of the 5 pediatric patients, undergoing testing, (40%) had findings in novel candidate genes, with the remainder being negative. We include brief case studies to illustrate the variety of challenging issues related to these patients. Our observations demonstrate utility of family-based exome sequencing in patients for suspected hereditary cancer, including familial co-segregation analysis, and comprehensive medical review. DES may be particularly useful when traditional approaches do not result in a diagnosis or in families with unique phenotypes. This work also highlights the importance and complexity of analysis of uncharacterized genes in exome sequencing for hereditary cancer. Copyright © 2018 Elsevier Inc. All rights reserved.
Identification of potential platelet alloantigens in the Equidae family by comparison of gene sequences encoding major platelet membrane glycoproteins.

PubMed

Boudreaux, Mary K; Humphries, Drew M

2013-12-01

Platelet alloantigens in horses may play an important role in the development of neonatal alloimmune thrombocytopenia (NAIT). The objective of this study was to evaluate genes encoding major platelet glycoproteins within the Equidae family in an effort to identify potential alloantigens. DNA was isolated from blood samples obtained from Equidae family members, including a Holsteiner-Oldenburg cross, a Quarter horse, a donkey, and a Plains zebra (Equus burchelli). Gene sequences encoding equine platelet membrane glycoproteins IIb, IIIa (integrin subunits αIIb and β3), Ia (integrin subunit α2), and Ibα were determined using PCR. Gene sequences were compared to the equine genome available on GenBank. Polymorphisms that would be predicted to result in amino acid changes on platelet surfaces were documented and compared with known alloantigenic sites documented on human platelets. Amino acid differences were predicted based on nucleotide sequences for all 4 genes. Nine differences were documented for αIIb, 5 differences were documented for β3, 7 differences were documented for α2, and 16 differences were documented for Ibα outside the macroglycopeptide region. This study represents the first effort at identifying potential platelet alloantigens in members of the Equidae Family based on evaluation of gene sequences. The data obtained form the groundwork for identifying potential platelet alloantigens involved in transfusion reactions and neonatal alloimmune thrombocytopenia (NAIT). More work is required to determine whether the predicted amino acid differences documented in this study play a role in alloimmunity, and whether other polymorphisms not detected in this study are present that may result in alloimmunity. © 2013 American Society for Veterinary Clinical Pathology.
Comparative RNA sequencing reveals substantial genetic variation in endangered primates

PubMed Central

Perry, George H.; Melsted, Páll; Marioni, John C.; Wang, Ying; Bainer, Russell; Pickrell, Joseph K.; Michelini, Katelyn; Zehr, Sarah; Yoder, Anne D.; Stephens, Matthew; Pritchard, Jonathan K.; Gilad, Yoav

2012-01-01

Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success. PMID:22207615
Whole genome sequences of Japanese porcine species C rotaviruses reveal a high diversity of genotypes of individual genes and will contribute to a comprehensive, generally accepted classification system.

PubMed

Niira, Kazutaka; Ito, Mika; Masuda, Tsuneyuki; Saitou, Toshiya; Abe, Tadatsugu; Komoto, Satoshi; Sato, Mitsuo; Yamasato, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Tuchiaka, Shinobu; Okada, Takashi; Omatsu, Tsutomu; Furuya, Tetsuya; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Shirai, Junsuke; Taniguchi, Koki; Mizutani, Tetsuya; Nagai, Makoto

2016-10-01

Porcine rotavirus C (RVC) is distributed throughout the world and is thought to be a pathogenic agent of diarrhea in piglets. Although, the VP7, VP4, and VP6 gene sequences of Japanese porcine RVCs are currently available, there is no whole-genome sequence data of Japanese RVC. Furthermore, only one to three sequences are available for porcine RVC VP1-VP3 and NSP1-NSP3 genes. Therefore, we determined nearly full-length whole-genome sequences of nine Japanese porcine RVCs from seven piglets with diarrhea and two healthy pigs and compared them with published RVC sequences from a database. The VP7 genes of two Japanese RVCs from healthy pigs were highly divergent from other known RVC strains and were provisionally classified as G12 and G13 based on the 86% nucleotide identity cut-off value. Pairwise sequence identity calculations and phylogenetic analyses revealed that candidate novel genotypes of porcine Japanese RVC were identified in the NSP1, NSP2 and NSP3 encoding genes, respectively. Furthermore, VP3 of Japanese porcine RVCs was shown to be closely related to human RVCs, suggesting a gene reassortment event between porcine and human RVCs and past interspecies transmission. The present study demonstrated that porcine RVCs show greater genetic diversity among strains than human and bovine RVCs. Copyright © 2016 Elsevier B.V. All rights reserved.
De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

PubMed Central

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689
Molecular Diagnosis of Usher Syndrome: Application of Two Different Next Generation Sequencing-Based Procedures

PubMed Central

Licastro, Danilo; Mutarelli, Margherita; Peluso, Ivana; Neveling, Kornelia; Wieskamp, Nienke; Rispoli, Rossella; Vozzi, Diego; Athanasakis, Emmanouil; D'Eustacchio, Angela; Pizzo, Mariateresa; D'Amico, Francesca; Ziviello, Carmela; Simonelli, Francesca; Fabretto, Antonella; Scheffer, Hans; Gasparini, Paolo; Banfi, Sandro; Nigro, Vincenzo

2012-01-01

Usher syndrome (USH) is a clinically and genetically heterogeneous disorder characterized by visual and hearing impairments. Clinically, it is subdivided into three subclasses with nine genes identified so far. In the present study, we investigated whether the currently available Next Generation Sequencing (NGS) technologies are already suitable for molecular diagnostics of USH. We analyzed a total of 12 patients, most of which were negative for previously described mutations in known USH genes upon primer extension-based microarray genotyping. We enriched the NGS template either by whole exome capture or by Long-PCR of the known USH genes. The main NGS sequencing platforms were used: SOLiD for whole exome sequencing, Illumina (Genome Analyzer II) and Roche 454 (GS FLX) for the Long-PCR sequencing. Long-PCR targeting was more efficient with up to 94% of USH gene regions displaying an overall coverage higher than 25×, whereas whole exome sequencing yielded a similar coverage for only 50% of those regions. Overall this integrated analysis led to the identification of 11 novel sequence variations in USH genes (2 homozygous and 9 heterozygous) out of 18 detected. However, at least two cases were not genetically solved. Our result highlights the current limitations in the diagnostic use of NGS for USH patients. The limit for whole exome sequencing is linked to the need of a strong coverage and to the correct interpretation of sequence variations with a non obvious, pathogenic role, whereas the targeted approach suffers from the high genetic heterogeneity of USH that may be also caused by the presence of additional causative genes yet to be identified. PMID:22952768
De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

PubMed

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.
Genetic diversity of merozoite surface antigens in Babesia bovis detected from Sri Lankan cattle.

PubMed

Sivakumar, Thillaiampalam; Okubo, Kazuhiro; Igarashi, Ikuo; de Silva, Weligodage Kumarawansa; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Vimalakumar, Singarayar Caniciyas; Meewewa, Asela Sanjeewa; Yokoyama, Naoaki

2013-10-01

Babesia bovis, the causative agent of severe bovine babesiosis, is endemic in Sri Lanka. The live attenuated vaccine (K-strain), which was introduced in the early 1990s, has been used to immunize cattle populations in endemic areas of the country. The present study was undertaken to determine the genetic diversity of merozoite surface antigens (MSAs) in B. bovis isolates from Sri Lankan cattle, and to compare the gene sequences obtained from such isolates against those of the K-strain. Forty-four bovine blood samples isolated from different geographical regions of Sri Lanka and judged to be B. bovis-positive by PCR screening were used to amplify MSAs (MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b), AMA-1, and 12D3 genes from parasite DNA. Although the AMA-1 and 12D3 gene sequences were highly conserved among the Sri Lankan isolates, the MSA gene sequences from the same isolates were highly diverse. Sri Lankan MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b sequences clustered within 5, 2, 4, 1, and 9 different clades in the gene phylograms, respectively, while the minimum similarity values among the deduced amino acid sequences of these genes were 36.8%, 68.7%, 80.3%, 100%, and 68.3%, respectively. In the phylograms, none of the Sri Lankan sequences fell within clades containing the respective K-strain sequences. Additionally, the similarity values for MSA-1 and MSA-2c were 40-61.8% and 90.9-93.2% between the Sri Lankan isolates and the K-strain, respectively, while the K-strain MSA-2a/b sequence shared 64.5-69.8%, 69.3%, and 70.5-80.3% similarities with the Sri Lankan MSA-2a1, MSA-2a2, and MSA-2b sequences, respectively. The present study has shown that genetic diversity among MSAs of Sri Lankan B. bovis isolates is very high, and that the sequences of field isolates diverged genetically from the K-strain. Copyright © 2013 Elsevier B.V. All rights reserved.
Rapid Gene Turnover as a Significant Source of Genetic Variation in a Recently Seeded Population of a Healthcare-Associated Pathogen

PubMed Central

Graña-Miraglia, Lucía; Lozano, Luis F.; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A.; Castillo-Ramírez, Santiago

2017-01-01

Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation. PMID:28979253

Rapid Gene Turnover as a Significant Source of Genetic Variation in a Recently Seeded Population of a Healthcare-Associated Pathogen.

PubMed

Graña-Miraglia, Lucía; Lozano, Luis F; Velázquez, Consuelo; Volkow-Fernández, Patricia; Pérez-Oseguera, Ángeles; Cevallos, Miguel A; Castillo-Ramírez, Santiago

2017-01-01

Genome sequencing has been useful to gain an understanding of bacterial evolution. It has been used for studying the phylogeography and/or the impact of mutation and recombination on bacterial populations. However, it has rarely been used to study gene turnover at microevolutionary scales. Here, we sequenced Mexican strains of the human pathogen Acinetobacter baumannii sampled from the same locale over a 3 year period to obtain insights into the microevolutionary dynamics of gene content variability. We found that the Mexican A. baumannii population was recently founded and has been emerging due to a rapid clonal expansion. Furthermore, we noticed that on average the Mexican strains differed from each other by over 300 genes and, notably, this gene content variation has accrued more frequently and faster than the accumulation of mutations. Moreover, due to its rapid pace, gene content variation reflects the phylogeny only at very short periods of time. Additionally, we found that the external branches of the phylogeny had almost 100 more genes than the internal branches. All in all, these results show that rapid gene turnover has been of paramount importance in producing genetic variation within this population and demonstrate the utility of genome sequencing to study alternative forms of genetic variation.
Preparing and Analyzing Expressed Sequence Tags (ESTs) Library for the Mammary Tissue of Local Turkish Kivircik Sheep

PubMed Central

Omeroglu Ulu, Zehra; Ulu, Salih; Un, Cemal; Ozdem Oztabak, Kemal; Altunatmaz, Kemal

2017-01-01

Kivircik sheep is an important local Turkish sheep according to its meat quality and milk productivity. The aim of this study was to analyze gene expression profiles of both prenatal and postnatal stages for the Kivircik sheep. Therefore, two different cDNA libraries, which were taken from the same Kivircik sheep mammary gland tissue at prenatal and postnatal stages, were constructed. Total 3072 colonies which were randomly selected from the two libraries were sequenced for developing a sheep ESTs collection. We used Phred/Phrap computer programs for analysis of the raw EST and readable EST sequences were assembled with the CAP3 software. Putative functions of all unique sequences and statistical analysis were determined by Geneious software. Total 422 ESTs have over 80% similarity to known sequences of other organisms in NCBI classified by Panther database for the Gene Ontology (GO) category. By comparing gene expression profiles, we observed some putative genes that may be relative to reproductive performance or play important roles in milk synthesis and secretion. A total of 2414 ESTs have been deposited to the NCBI GenBank database (GW996847–GW999260). EST data in this study have provided a new source of information to functional genome studies of sheep. PMID:28239610
Novel sequence variants in the TMIE gene in families with autosomal recessive nonsyndromic hearing impairment

PubMed Central

Santos, Regie Lyn P.; El-Shanti, Hatem; Sikandar, Shaheen; Lee, Kwanghyuk; Bhatti, Attya; Yan, Kai; Chahrour, Maria H.; McArthur, Nathan; Pham, Thanh L.; Mahasneh, Amjad Abdullah; Ahmad, Wasim

2010-01-01

To date, 37 genes have been identified for nonsyndromic hearing impairment (NSHI). Identifying the functional sequence variants within these genes and knowing their population-specific frequencies is of public health value, in particular for genetic screening for NSHI. To determine putatively functional sequence variants in the transmembrane inner ear (TMIE) gene in Pakistani and Jordanian families with autosomal recessive (AR) NSHI, four Jordanian and 168 Pakistani families with ARNSHI that is not due to GJB2 (CX26) were submitted to a genome scan. Two-point and multipoint parametric linkage analyses were performed, and families with logarithmic odds (LOD) scores of 1.0 or greater within the TMIE region underwent further DNA sequencing. The evolutionary conservation and location in predicted protein domains of amino acid residues where sequence variants occurred were studied to elucidate the possible effects of these sequence variants on function. Of seven families that were screened for TMIE, putatively functional sequence variants were found to segregate with hearing impairment in four families but were not seen in not less than 110 ethnically matched control chromosomes. The previously reported c.241C>T (p.R81C) variant was observed in two Pakistani families. Two novel variants, c.92A>G (p.E31G) and the splice site mutation c.212–2A>C, were identified in one Pakistani and one Jordanian family, respectively. The c.92A>G (p.E31G) variant occurred at a residue that is conserved in the mouse and is predicted to be extracellular. Conservation and potential functionality of previously published mutations were also examined. The prevalence of functional TMIE variants in Pakistani families is 1.7% [95% confidence interval (CI) 0.3–4.8]. Further studies on the spectrum, prevalence rates, and functional effect of sequence variants in the TMIE gene in other populations should demonstrate the true importance of this gene as a cause of hearing impairment. PMID:16389551
Complete mitochondrial DNA sequence of oyster Crassostrea hongkongensis-a case of "Tandem duplication-random loss" for genome rearrangement in Crassostrea?

PubMed Central

Yu, Ziniu; Wei, Zhengpeng; Kong, Xiaoyu; Shi, Wei

2008-01-01

Background Mitochondrial DNA sequences are extensively used as genetic markers not only for studies of population or ecological genetics, but also for phylogenetic and evolutionary analyses. Complete mt-sequences can reveal information about gene order and its variation, as well as gene and genome evolution when sequences from multiple phyla are compared. Mitochondrial gene order is highly variable among mollusks, with bivalves exhibiting the most variability. Of the 41 complete mt genomes sequenced so far, 12 are from bivalves. We determined, in the current study, the complete mitochondrial DNA sequence of Crassostrea hongkongensis. We present here an analysis of features of its gene content and genome organization in comparison with two other Crassostrea species to assess the variation within bivalves and among main groups of mollusks. Results The complete mitochondrial genome of C. hongkongensis was determined using long PCR and a primer walking sequencing strategy with genus-specific primers. The genome is 16,475 bp in length and contains 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 22 transfer tRNA genes (including a suppressor tRNA gene), and 2 ribosomal RNA genes, all of which appear to be transcribed from the same strand. A striking finding of this study is that a DNA segment containing four tRNA genes (trnk1, trnC, trnQ1 and trnN) and two duplicated or split rRNA gene (rrnL5' and rrnS) are absent from the genome, when compared with that of two other extant Crassostrea species, which is very likely a consequence of loss of a single genomic region present in ancestor of C. hongkongensis. It indicates this region seem to be a "hot spot" of genomic rearrangements over the Crassostrea mt-genomes. The arrangement of protein-coding genes in C. hongkongensis is identical to that of Crassostrea gigas and Crassostrea virginica, but higher amino acid sequence identities are shared between C. hongkongensis and C. gigas than between other pairs. There exists significant codon bias, favoring codons ending in A or T and against those ending with C. Pair analysis of genome rearrangements showed that the rearrangement distance is great between C. gigas-C. hongkongensis and C. virginica, indicating a high degree of rearrangements within Crassostrea. The determination of complete mt-genome of C. hongkongensis has yielded useful insight into features of gene order, variation, and evolution of Crassostrea and bivalve mt-genomes. Conclusion The mt-genome of C. hongkongensis shares some similarity with, and interesting differences to, other Crassostrea species and bivalves. The absence of trnC and trnN genes and duplicated or split rRNA genes from the C. hongkongensis genome is a completely novel feature not previously reported in Crassostrea species. The phenomenon is likely due to the loss of a segment that is present in other Crassostrea species and was present in ancestor of C. hongkongensis, thus a case of "tandem duplication-random loss (TDRL)". The mt-genome and new feature presented here reveal and underline the high level variation of gene order and gene content in Crassostrea and bivalves, inspiring more research to gain understanding to mechanisms underlying gene and genome evolution in bivalves and mollusks. PMID:18847502
Evaluation of the cationic trypsinogen gene for potential mutations in miniature schnauzers with pancreatitis

PubMed Central

2004-01-01

Abstract The purpose of this study was to evaluate the cationic trypsinogen gene in miniature schnauzers for possible mutations. Genetic mutations have been linked with hereditary pancreatitis in humans. Four miniature schnauzers were selected on the basis of a clinical history of pancreatitis. One healthy miniature schnauzer and 1 healthy mixed breed canine were enrolled as controls. DNA was extracted from these canines using a commercial kit. Primers were designed to amplify the entire canine cationic trypsinogen cDNA sequence. A polymerase chain reaction (PCR) was performed and products were purified and sequenced. All sequences were then compared. The healthy control canine, a healthy miniature schnauzer, and the 4 miniature schnauzers with pancreatitis showed identical sequences of the cationic trypsinogen gene to the published sequence. We conclude that, in contrast to humans with hereditary pancreatitis, mutations of the cationic trypsinogen gene do not play a major role in the genesis of pancreatitis in the miniature schnauzer. PMID:15581228
Evaluation of the cationic trypsinogen gene for potential mutations in miniature schnauzers with pancreatitis.

PubMed

Bishop, Micah A; Steiner, Jörg M; Moore, Lisa E; Williams, David A

2004-10-01

The purpose of this study was to evaluate the cationic trypsinogen gene in miniature schnauzers for possible mutations. Genetic mutations have been linked with hereditary pancreatitis in humans. Four miniature schnauzers were selected on the basis of a clinical history of pancreatitis. One healthy miniature schnauzer and 1 healthy mixed breed canine were enrolled as controls. DNA was extracted from these canines using a commercial kit. Primers were designed to amplify the entire canine cationic trypsinogen cDNA sequence. A polymerase chain reaction (PCR) was performed and products were purified and sequenced. All sequences were then compared. The healthy control canine, a healthy miniature schnauzer, and the 4 miniature schnauzers with pancreatitis showed identical sequences of the cationic trypsinogen gene to the published sequence. We conclude that, in contrast to humans with hereditary pancreatitis, mutations of the cationic trypsinogen gene do not play a major role in the genesis of pancreatitis in the miniature schnauzer.
Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

PubMed Central

Goldsmith, Dawn B.; Parsons, Rachel J.; Beyene, Damitu; Salamon, Peter

2015-01-01

Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS) site in the summer (September) and winter (March) of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity) were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years. PMID:26157645
RapGene: a fast and accurate strategy for synthetic gene assembly in Escherichia coli

PubMed Central

Zampini, Massimiliano; Stevens, Pauline Rees; Pachebat, Justin A.; Kingston-Smith, Alison; Mur, Luis A. J.; Hayes, Finbarr

2015-01-01

The ability to assemble DNA sequences de novo through efficient and powerful DNA fabrication methods is one of the foundational technologies of synthetic biology. Gene synthesis, in particular, has been considered the main driver for the emergence of this new scientific discipline. Here we describe RapGene, a rapid gene assembly technique which was successfully tested for the synthesis and cloning of both prokaryotic and eukaryotic genes through a ligation independent approach. The method developed in this study is a complete bacterial gene synthesis platform for the quick, accurate and cost effective fabrication and cloning of gene-length sequences that employ the widely used host Escherichia coli. PMID:26062748
Combined sequence and sequence-structure-based methods for analyzing RAAS gene SNPs: a computational approach.

PubMed

Singh, Kh Dhanachandra; Karthikeyan, Muthusamy

2014-12-01

The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.
Novel compound heterozygous mutations in MYO7A gene associated with autosomal recessive sensorineural hearing loss in a Chinese family.

PubMed

Ma, Yalin; Xiao, Yun; Zhang, Fengguo; Han, Yuechen; Li, Jianfeng; Xu, Lei; Bai, Xiaohui; Wang, Haibo

2016-04-01

Mutations in MYO7A gene have been reported to be associated with Usher Syndrome type 1B (USH1B) and nonsyndromic hearing loss (DFNB2, DFNA11). Most mutations in MYO7A gene caused USH1B, whereas only a few reported mutations led to DFNB2 and DFNA11. The current study was designed to investigate the mutations among a Chinese family with autosomal recessive hearing loss. In this study, we present the clinical, genetic and molecular characteristics of a Chinese family. Targeted capture of 127 known deafness genes and next-generation sequencing were employed to study the genetic causes of two siblings in the Chinese family. Sanger sequencing was employed to examine those variant mutations in the members of this family and other ethnicity-matched controls. We identified the novel compound heterozygous mutant alleles of MYO7A gene: a novel missense mutation c.3671C>A (p.A1224D) and a reported insert mutation c.390_391insC (p.P131PfsX9). Variants were further confirmed by Sanger sequencing. These two compound heterozygous variants were co-segregated with autosomal recessive hearing loss phenotype. The gene mutation analysis and protein sequence alignment further supported that the novel compound heterozygous mutations were pathogenic. The novel compound heterozygous mutations (c.3671C>A and c.390_391insC) in MYO7A gene identified in this study were responsible for the autosomal recessive sensorineural hearing loss of this Chinese family. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
First Transcriptome and Digital Gene Expression Analysis in Neuroptera with an Emphasis on Chemoreception Genes in Chrysopa pallens (Rambur)

PubMed Central

Li, Zhao-Qun; Zhang, Shuai; Ma, Yan; Luo, Jun-Yu; Wang, Chun-Yi; Lv, Li-Min; Dong, Shuang-Lin; Cui, Jin-Jie

2013-01-01

Background Chrysopa pallens (Rambur) are the most important natural enemies and predators of various agricultural pests. Understanding the sophisticated olfactory system in insect antennae is crucial for studying the physiological bases of olfaction and also could lead to effective applications of C. pallens in integrated pest management. However no transcriptome information is available for Neuroptera, and sequence data for C. pallens are scarce, so obtaining more sequence data is a priority for researchers on this species. Results To facilitate identifying sets of genes involved in olfaction, a normalized transcriptome of C. pallens was sequenced. A total of 104,603 contigs were obtained and assembled into 10,662 clusters and 39,734 singletons; 20,524 were annotated based on BLASTX analyses. A large number of candidate chemosensory genes were identified, including 14 odorant-binding proteins (OBPs), 22 chemosensory proteins (CSPs), 16 ionotropic receptors, 14 odorant receptors, and genes potentially involved in olfactory modulation. To better understand the OBPs, CSPs and cytochrome P450s, phylogenetic trees were constructed. In addition, 10 digital gene expression libraries of different tissues were constructed and gene expression profiles were compared among different tissues in males and females. Conclusions Our results provide a basis for exploring the mechanisms of chemoreception in C. pallens, as well as other insects. The evolutionary analyses in our study provide new insights into the differentiation and evolution of insect OBPs and CSPs. Our study provided large-scale sequence information for further studies in C. pallens. PMID:23826220
Identification of Alternative Splicing and Fusion Transcripts in Non-Small Cell Lung Cancer by RNA Sequencing.

PubMed

Hong, Yoonki; Kim, Woo Jin; Bang, Chi Young; Lee, Jae Cheol; Oh, Yeon-Mok

2016-04-01

Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.
Interspecific and intraspecific gene variability in a 1-Mb region containing the highest density of NBS-LRR genes found in the melon genome.

PubMed

González, Víctor M; Aventín, Núria; Centeno, Emilio; Puigdomènech, Pere

2014-12-17

Plant NBS-LRR -resistance genes tend to be found in clusters, which have been shown to be hot spots of genome variability. In melon, half of the 81 predicted NBS-LRR genes group in nine clusters, and a 1 Mb region on linkage group V contains the highest density of R-genes and presence/absence gene polymorphisms found in the melon genome. This region is known to contain the locus of Vat, an agronomically important gene that confers resistance to aphids. However, the presence of duplications makes the sequencing and annotation of R-gene clusters difficult, usually resulting in multi-gapped sequences with higher than average errors. A 1-Mb sequence that contains the largest NBS-LRR gene cluster found in melon was improved using a strategy that combines Illumina paired-end mapping and PCR-based gap closing. Unknown sequence was decreased by 70% while about 3,000 SNPs and small indels were corrected. As a result, the annotations of 18 of a total of 23 NBS-LRR genes found in this region were modified, including additional coding sequences, amino acid changes, correction of splicing boundaries, or fussion of ORFs in common transcription units. A phylogeny analysis of the R-genes and their comparison with syntenic sequences in other cucurbits point to a pattern of local gene amplifications since the diversification of cucurbits from other families, and through speciation within the family. A candidate Vat gene is proposed based on the sequence similarity between a reported Vat gene from a Korean melon cultivar and a sequence fragment previously absent in the unrefined sequence. A sequence refinement strategy allowed substantial improvement of a 1 Mb fragment of the melon genome and the re-annotation of the largest cluster of NBS-LRR gene homologues found in melon. Analysis of the cluster revealed that resistance genes have been produced by sequence duplication in adjacent genome locations since the divergence of cucurbits from other close families, and through the process of speciation within the family a candidate Vat gene was also identified using sequence previously unavailable, which demonstrates the advantages of genome assembly refinements when analyzing complex regions such as those containing clusters of highly similar genes.
Genomic resources for songbird research and their use in characterizing gene expression during brain development

PubMed Central

Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry

2007-01-01

Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146
Function-Based Algorithms for Biological Sequences

ERIC Educational Resources Information Center

Mohanty, Pragyan Sheela P.

2015-01-01

Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…
Deep sequencing with intronic capture enables identification of an APC exon 10 inversion in a patient with polyposis.

PubMed

Shirts, Brian H; Salipante, Stephen J; Casadei, Silvia; Ryan, Shawnia; Martin, Judith; Jacobson, Angela; Vlaskin, Tatyana; Koehler, Karen; Livingston, Robert J; King, Mary-Claire; Walsh, Tom; Pritchard, Colin C

2014-10-01

Single-exon inversions have rarely been described in clinical syndromes and are challenging to detect using Sanger sequencing. We report the case of a 40-year-old woman with adenomatous colon polyps too numerous to count and who had a complex inversion spanning the entire exon 10 in APC (the gene encoding for adenomatous polyposis coli), causing exon skipping and resulting in a frameshift and premature protein truncation. In this study, we employed complete APC gene sequencing using high-coverage next-generation sequencing by ColoSeq, analysis with BreakDancer and SLOPE software, and confirmatory transcript analysis. ColoSeq identified a complex small genomic rearrangement consisting of an inversion that results in translational skipping of exon 10 in the APC gene. This mutation would not have been detected by traditional sequencing or gene-dosage methods. We report a case of adenomatous polyposis resulting from a complex single-exon inversion. Our report highlights the benefits of large-scale sequencing methods that capture intronic sequences with high enough depth of coverage-as well as the use of informatics tools-to enable detection of small pathogenic structural rearrangements.
Alt a 1 allergen homologs from Alternaria and related taxa: analysis of phylogenetic content and secondary structure.

PubMed

Hong, Soon Gyu; Cramer, Robert A; Lawrence, Christopher B; Pryor, Barry M

2005-02-01

A gene for the Alternaria major allergen, Alt a 1, was amplified from 52 species of Alternaria and related genera, and sequence information was used for phylogenetic study. Alt a 1 gene sequences evolved 3.8 times faster and contained 3.5 times more parsimony-informative sites than glyceraldehyde-3-phosphate dehydrogenase (gpd) sequences. Analyses of Alt a 1 gene and gpd exon sequences strongly supported grouping of Alternaria spp. and related taxa into several species-groups described in previous studies, especially the infectoria, alternata, porri, brassicicola, and radicina species-groups and the Embellisia group. The sonchi species-group was newly suggested in this study. Monophyly of the Nimbya group was moderately supported, and monophyly of the Ulocladium group was weakly supported. Relationships among species-groups and among closely related species of the same species-group were not fully resolved. However, higher resolution could be obtained using Alt a 1 sequences or a combined dataset than using gpd sequences alone. Despite high levels of variation in amino acid sequences, results of in silico prediction of protein secondary structure for Alt a 1 demonstrated a high degree of structural similarity for most of the species suggesting a conservation of function.
Genome-wide association analysis of milk yield traits in Nordic Red Cattle using imputed whole genome sequence variants.

PubMed

Iso-Touru, T; Sahana, G; Guldbrandtsen, B; Lund, M S; Vilkki, J

2016-03-22

The Nordic Red Cattle consisting of three different populations from Finland, Sweden and Denmark are under a joint breeding value estimation system. The long history of recording of production and health traits offers a great opportunity to study production traits and identify causal variants behind them. In this study, we used whole genome sequence level data from 4280 progeny tested Nordic Red Cattle bulls to scan the genome for loci affecting milk, fat and protein yields. Using a genome-wise significance threshold, regions on Bos taurus chromosomes 5, 14, 23, 25 and 26 were associated with fat yield. Regions on chromosomes 5, 14, 16, 19, 20 and 25 were associated with milk yield and chromosomes 5, 14 and 25 had regions associated with protein yield. Significantly associated variations were found in 227 genes for fat yield, 72 genes for milk yield and 30 genes for protein yield. Ingenuity Pathway Analysis was used to identify networks connecting these genes displaying significant hits. When compared to previously mapped genomic regions associated with fertility, significantly associated variations were found in 5 genes common for fat yield and fertility, thus linking these two traits via biological networks. This is the first time when whole genome sequence data is utilized to study genomic regions affecting milk production in the Nordic Red Cattle population. Sequence level data offers the possibility to study quantitative traits in detail but still cannot unambiguously reveal which of the associated variations is causative. Linkage disequilibrium creates difficulties to pinpoint the causative genes and variations. One solution to overcome these difficulties is the identification of the functional gene networks and pathways to reveal important interacting genes as candidates for the observed effects. This information on target genomic regions may be exploited to improve genomic prediction.
Generation and analysis of expression sequence tags from haustoria of the wheat stripe rust fungus Puccinia striiformis f. sp. Tritici

PubMed Central

2009-01-01

Background Stripe rust, caused by Puccinia striiformis f. sp. tritici (Pst), is one of the most destructive diseases of wheat (Triticum aestivum L.) worldwide. In spite of its agricultural importance, the genomics and genetics of the pathogen are poorly characterized. Pst transcripts from urediniospores and germinated urediniospores have been examined previously, but little is known about genes expressed during host infection. Some genes involved in virulence in other rust fungi have been found to be specifically expressed in haustoria. Therefore, the objective of this study was to generate a cDNA library to characterize genes expressed in haustoria of Pst. Results A total of 5,126 EST sequences of high quality were generated from haustoria of Pst, from which 287 contigs and 847 singletons were derived. Approximately 10% and 26% of the 1,134 unique sequences were homologous to proteins with known functions and hypothetical proteins, respectively. The remaining 64% of the unique sequences had no significant similarities in GenBank. Fifteen genes were predicted to be proteins secreted from Pst haustoria. Analysis of ten genes, including six secreted protein genes, using quantitative RT-PCR revealed changes in transcript levels in different developmental and infection stages of the pathogen. Conclusions The haustorial cDNA library was useful in identifying genes of the stripe rust fungus expressed during the infection process. From the library, we identified 15 genes encoding putative secreted proteins and six genes induced during the infection process. These genes are candidates for further studies to determine their functions in wheat-Pst interactions. PMID:20028560
The complete mitochondrial genome of Sika deer Cervus nippon hortulorum (Artiodactyla: Cervidae) and phylogenetic studies.

PubMed

Liu, Yan-Hua; Liu, Xin-Xin; Zhang, Ming-Hai

2016-07-01

Sika deer (Cervus nippon Temminck 1836) are classified in the order Artiodactyla, family Cervidae, subfamily Cervinae. At present, the phylogenetic studies of C. nippon are problematic. In this study, we first determined and described the complete mitochondrial sequence of the wild C. nippon hortulorum. The complete mitogenome sequence is 16 566 bp in length, including 13 protein-coding genes, two rRNA genes, 22 tRNA genes, a putative control region (CR) and a light-strand replication origin (OL). The overall base composition was 33.4% A, 28.6% T, 24.5% C, 13.5% G, with a 62.0% AT bias. The 13 protein-coding genes encode 3782 amino acids in total. To further validate the new determined sequences and phylogeny of Sika deer, phylogenetic trees involving 15 most closely related species available in GenBank database were constructed. These results are expected to provide useful molecular data for deer species identification and further phylogenetic studies of Artiodactyla.

Annotation of differentially expressed genes in the somatic embryogenesis of musa and their location in the banana genome.

PubMed

Maldonado-Borges, Josefina Ines; Ku-Cauich, José Roberto; Escobedo-Graciamedrano, Rosa Maria

2013-01-01

Analysis of cDNA-AFLP was used to study the genes expressed in zygotic and somatic embryogenesis of Musa acuminata Colla ssp. malaccensis, and a comparison was made between their differential transcribed fragments (TDFs) and the sequenced genome of the double haploid- (DH-) Pahang of the malaccensis subspecies that is available in the network. A total of 253 transcript-derived fragments (TDFs) were detected with apparent size of 100-4000 bp using 5 pairs of AFLP primers, of which 21 were differentially expressed during the different stages of banana embryogenesis; 15 of the sequences have matched DH-Pahang chromosomes, with 7 of them being homologous to gene sequences encoding either known or putative protein domains of higher plants. Four TDF sequences were located in all Musa chromosomes, while the rest were located in one or two chromosomes. Their putative individual function is briefly reviewed based on published information, and the potential roles of these genes in embryo development are discussed. Thus the availability of the genome of Musa and the information of TDFs sequences presented here opens new possibilities for an in-depth study of the molecular and biochemical research of zygotic and somatic embryogenesis of Musa.
Structure-Function Analysis of Chloroplast Proteins via Random Mutagenesis Using Error-Prone PCR.

PubMed

Dumas, Louis; Zito, Francesca; Auroy, Pascaline; Johnson, Xenie; Peltier, Gilles; Alric, Jean

2018-06-01

Site-directed mutagenesis of chloroplast genes was developed three decades ago and has greatly advanced the field of photosynthesis research. Here, we describe a new approach for generating random chloroplast gene mutants that combines error-prone polymerase chain reaction of a gene of interest with chloroplast complementation of the knockout Chlamydomonas reinhardtii mutant. As a proof of concept, we targeted a 300-bp sequence of the petD gene that encodes subunit IV of the thylakoid membrane-bound cytochrome b 6 f complex. By sequencing chloroplast transformants, we revealed 149 mutations in the 300-bp target petD sequence that resulted in 92 amino acid substitutions in the 100-residue target subunit IV sequence. Our results show that this method is suited to the study of highly hydrophobic, multisubunit, and chloroplast-encoded proteins containing cofactors such as hemes, iron-sulfur clusters, and chlorophyll pigments. Moreover, we show that mutant screening and sequencing can be used to study photosynthetic mechanisms or to probe the mutational robustness of chloroplast-encoded proteins, and we propose that this method is a valuable tool for the directed evolution of enzymes in the chloroplast. © 2018 American Society of Plant Biologists. All rights reserved.
Microbial Characterization of Qatari Barchan Sand Dunes

PubMed Central

Chatziefthimiou, Aspassia D.; Nguyen, Hanh; Richer, Renee; Louge, Michel; Sultan, Ali A.; Schloss, Patrick; Hay, Anthony G.

2016-01-01

This study represents the first characterization of sand microbiota in migrating barchan sand dunes. Bacterial communities were studied through direct counts and cultivation, as well as 16S rRNA gene and metagenomic sequence analysis to gain an understanding of microbial abundance, diversity, and potential metabolic capabilities. Direct on-grain cell counts gave an average of 5.3 ± 0.4 x 105 cells g-1 of sand. Cultured isolates (N = 64) selected for 16S rRNA gene sequencing belonged to the phyla Actinobacteria (58%), Firmicutes (27%) and Proteobacteria (15%). Deep-sequencing of 16S rRNA gene amplicons from 18 dunes demonstrated a high relative abundance of Proteobacteria, particularly enteric bacteria, and a dune-specific-pattern of bacterial community composition that correlated with dune size. Shotgun metagenome sequences of two representative dunes were analyzed and found to have similar relative bacterial abundance, though the relative abundances of eukaryotic, viral and enterobacterial sequences were greater in sand from the dune closer to a camel-pen. Functional analysis revealed patterns similar to those observed in desert soils; however, the increased relative abundance of genes encoding sporulation and dormancy are consistent with the dune microbiome being well-adapted to the exceptionally hyper-arid Qatari desert. PMID:27655399
GeneChip Resequencing of the Smallpox Virus Genome Can Identify Novel Strains: a Biodefense Application▿

PubMed Central

Sulaiman, Irshad M.; Tang, Kevin; Osborne, John; Sammons, Scott; Wohlhueter, Robert M.

2007-01-01

We developed a set of seven resequencing GeneChips, based on the complete genome sequences of 24 strains of smallpox virus (variola virus), for rapid characterization of this human-pathogenic virus. Each GeneChip was designed to analyze a divergent segment of approximately 30,000 bases of the smallpox virus genome. This study includes the hybridization results of 14 smallpox virus strains. Of the 14 smallpox virus strains hybridized, only 7 had sequence information included in the design of the smallpox virus resequencing GeneChips; similar information for the remaining strains was not tiled as a reference in these GeneChips. By use of variola virus-specific primers and long-range PCR, 22 overlapping amplicons were amplified to cover nearly the complete genome and hybridized with the smallpox virus resequencing GeneChip set. These GeneChips were successful in generating nucleotide sequences for all 14 of the smallpox virus strains hybridized. Analysis of the data indicated that the GeneChip resequencing by hybridization was fast and reproducible and that the smallpox virus resequencing GeneChips could differentiate the 14 smallpox virus strains characterized. This study also suggests that high-density resequencing GeneChips have potential biodefense applications and may be used as an alternate tool for rapid identification of smallpox virus in the future. PMID:17182757
Delimiting regulatory sequences of the Drosophila melanogaster Ddc gene.

PubMed Central

Hirsh, J; Morgan, B A; Scholnick, S B

1986-01-01

We delimited sequences necessary for in vivo expression of the Drosophila melanogaster dopa decarboxylase gene Ddc. The expression of in vitro-altered genes was assayed following germ line integration via P-element vectors. Sequences between -209 and -24 were necessary for normally regulated expression, although genes lacking these sequences could be expressed at 10 to 50% of wild-type levels at specific developmental times. These genes showed components of normal developmental expression, which suggests that they retain some regulatory elements. All Ddc genes lacking the normal immediate 5'-flanking sequences were grossly deficient in larval central nervous system expression. Thus, this upstream region must contain at least one element necessary for this expression. A mutated Ddc gene without a normal TATA boxlike sequence used the normal RNA start points, indicating that this sequences is not required for start point specificity. Images PMID:3099170
Characterization of epidemic IncI1-Iγ plasmids harboring ambler class A and C genes in Escherichia coli and Salmonella enterica from animals and humans.

PubMed

Smith, Hilde; Bossers, Alex; Harders, Frank; Wu, Guanghui; Woodford, Neil; Schwarz, Stefan; Guerra, Beatriz; Rodríguez, Irene; van Essen-Zandbergen, Alieda; Brouwer, Michael; Mevius, Dik

2015-09-01

The aim of the study was to identify the plasmid-encoded factors contributing to the emergence and spread of epidemic IncI1-Iγ plasmids obtained from Escherichia coli and Salmonella enterica isolates from animal and human reservoirs. For this, 251 IncI1-Iγ plasmids carrying various extended-spectrum β-lactamase (ESBL) or AmpC β-lactamase genes were compared using plasmid multilocus sequence typing (pMLST). Thirty-two of these plasmids belonging to different pMLST types were sequenced using Roche 454 and Illumina platforms. Epidemic IncI1-Iγ plasmids could be assigned to various dominant clades, whereas rarely detected plasmids clustered together as a distinct clade. Similar phylogenetic trees were obtained using only the plasmid backbone sequences, showing that the differences observed between the plasmids belonging to distinct clades resulted mainly from differences between their backbone sequences. Plasmids belonging to the various clades differed particularly in the presence/absence of genes encoding partitioning and addiction systems, which contribute to stable inheritance during cell division and plasmid maintenance. Despite this, plasmids belonging to the various phylogenetic clades also showed marked resistance gene associations, indicating the circulation of successful plasmid-gene combinations. The variation in traY and excA genes found in IncI1-Iγ plasmids is conserved within pMLST sequence types and plays a role in incompatibility, although functional study is needed to elucidate the role of these genes in plasmid epidemiology. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Complete mitochondrial genome sequence of Melipona scutellaris, a Brazilian stingless bee.

PubMed

Pereira, Ulisses de Padua; Bonetti, Ana Maria; Goulart, Luiz Ricardo; Santos, Anderson Rodrigues Dos; Oliveira, Guilherme Correa de; Cuadros-Orellana, Sara; Ueira-Vieira, Carlos

2016-09-01

Melipona scutellaris is a Brazilian stingless bee species and a highly important native pollinator besides its use in rational rearing for honey production. In this study, we present the whole mitochondrial DNA sequence of M. scutellaris from a haploid male. The mitogenome has a size of 14,862 bp and harbors 13 protein-coding genes (PCGs), 2 rRNA genes and 21 tRNA genes.
Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

PubMed Central

Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

2007-01-01

Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127
Genomic insights into the taxonomic status of the Bacillus cereus group

PubMed Central

Liu, Yang; Lai, Qiliang; Göker, Markus; Meier-Kolthoff, Jan P.; Wang, Meng; Sun, Yamin; Wang, Lei; Shao, Zongze

2015-01-01

The identification and phylogenetic relationships of bacteria within the Bacillus cereus group are controversial. This study aimed at determining the taxonomic affiliations of these strains using the whole-genome sequence-based Genome BLAST Distance Phylogeny (GBDP) approach. The GBDP analysis clearly separated 224 strains into 30 clusters, representing eleven known, partially merged species and accordingly 19–20 putative novel species. Additionally, 16S rRNA gene analysis, a novel variant of multi-locus sequence analysis (nMLSA) and screening of virulence genes were performed. The 16S rRNA gene sequence was not sufficient to differentiate the bacteria within this group due to its high conservation. The nMLSA results were consistent with GBDP. Moreover, a fast typing method was proposed using the pycA gene, and where necessary, the ccpA gene. The pXO plasmids and cry genes were widely distributed, suggesting little correlation with the phylogenetic positions of the host bacteria. This might explain why classifications based on virulence characteristics proved unsatisfactory in the past. In summary, this is the first large-scale and systematic study of the taxonomic status of the bacteria within the B. cereus group using whole-genome sequences, and is likely to contribute to further insights into their pathogenicity, phylogeny and adaptation to diverse environments. PMID:26373441
Protein and gene structure of a blue laccase from Pleurotus ostreatus1.

PubMed Central

Giardina, P; Palmieri, G; Scaloni, A; Fontanella, B; Faraco, V; Cennamo, G; Sannia, G

1999-01-01

A new laccase isoenzyme (POXA1b, where POX is phenol oxidase), produced by Pleurotus ostreatus in cultures supplemented with copper sulphate, has been purified and fully characterized. The main characteristics of this protein (molecular mass in native and denaturing conditions, pI and catalytic properties) are almost identical to the previously studied laccase POXA1w. However, POXA1b contains four copper atoms per molecule instead of one copper, two zinc and one iron atom per molecule of POXA1w. Furthermore, POXA1b shows an unusually high stability at alkaline pH. The gene and cDNA coding for POXA1b have been cloned and sequenced. The gene coding sequence contains 1599 bp, interrupted by 15 introns. Comparison of the structure of the poxa1b gene with the two previously studied P. ostreatus laccase genes (pox1 and poxc) suggests that these genes belong to two different subfamilies. The amino acid sequence of POXA1b deduced from the cDNA sequence has been almost completely verified by means of matrix-assisted laser desorption ionization MS. It has been demonstrated that three out of six putative glycosylation sites are post-translationally modified and the structure of the bound glycosidic moieties has been determined, whereas two other putative glycosylation sites are unmodified. PMID:10417329
Sequence analysis of the GP, NP, VP40 and VP24 genes of Ebola virus isolated from deceased, surviving and asymptomatically infected individuals during the 1996 outbreak in Gabon: comparative studies and phylogenetic characterization.

PubMed

Leroy, Eric M; Baize, Sylvain; Mavoungou, Elie; Apetrei, Cristian

2002-01-01

The aims of this study were to determine if the clinical outcome of Ebola virus (EBOV) infection is associated with virus genetic structure and to document the genetic changes in the Gabon strains of EBOV by sequencing the GP, NP, VP40 and VP24 genes from deceased and surviving symptomatic and asymptomatic individuals. GP and NP sequences were identical in the three groups of patients and only one silent substitution occurred in the VP40 and VP24 genes in asymptomatic individuals. A strain from an asymptomatic individual had a reverse substitution to the Gabon-94 sequence, indicating that minor virus variants may cocirculate during an outbreak. These results suggest that the different clinical outcomes of EBOV infection do not result from virus mutations. Phylogenetic analysis confirmed that Gabon-96 belonged to the Zaire subtype of EBOV and revealed that synonymous substitution rates were higher than nonsynonymous substitution rates in the GP, VP40 and VP24 genes. In contrast, nonsynonymous substitutions predominated over synonymous substitutions in the NP gene of the two Gabon strains, pointing to divergent evolution of these strains and to selective pressures on this gene.
The genome sequence of the model ascomycete fungus Podospora anserina.

PubMed

Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

2008-01-01

The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.
Regularized rare variant enrichment analysis for case-control exome sequencing data.

PubMed

Larson, Nicholas B; Schaid, Daniel J

2014-02-01

Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.
Effect of long-term exposure to mobile phone radiation on alpha-Int1 gene sequence of Candida albicans

PubMed Central

Shahin-jafari, Ariyo; Bayat, Mansour; Shahhosseiny, Mohammad Hassan; Tajik, Parviz; Roudbar-mohammadi, Shahla

2015-01-01

Over the last decade, communication industries have witnessed a tremendous expansion, while, the biological effects of electromagnetic waves have not been fully elucidated. Current study aimed at evaluating the mutagenic effect of long-term exposure to 900-MHz radiation on alpha-Int1 gene sequences of Candida albicans. A standard 900 MHz radiation generator was used for radiation. 10 ml volumes from a stock suspension of C. albicans were transferred into 10 polystyrene tubes. Five tubes were exposed at 4 °C to a fixed magnitude of radiation with different time periods of 10, 70, 210, 350 and 490 h. The other 5 tubes were kept far enough from radiation. The samples underwent genomic DNA extraction. PCR amplification of alpha-Int1 gene sequence was done using one set of primers. PCR products were resolved using agarose gel electrophoresis and the nucleotide sequences were determined. All samples showed a clear electrophoretic band around 441 bp and further sequencing revealed the amplified DNA segments are related to alpha-Int1 gene of the yeast. No mutations in the gene were seen in radiation exposed samples. Long-term exposure of the yeast to mobile phone radiation under the above mentioned conditions had no mutagenic effect on alpha-Int1 gene sequence. PMID:27081370
Effect of long-term exposure to mobile phone radiation on alpha-Int1 gene sequence of Candida albicans.

PubMed

Shahin-Jafari, Ariyo; Bayat, Mansour; Shahhosseiny, Mohammad Hassan; Tajik, Parviz; Roudbar-Mohammadi, Shahla

2016-05-01

Over the last decade, communication industries have witnessed a tremendous expansion, while, the biological effects of electromagnetic waves have not been fully elucidated. Current study aimed at evaluating the mutagenic effect of long-term exposure to 900-MHz radiation on alpha-Int1 gene sequences of Candida albicans. A standard 900 MHz radiation generator was used for radiation. 10 ml volumes from a stock suspension of C. albicans were transferred into 10 polystyrene tubes. Five tubes were exposed at 4 °C to a fixed magnitude of radiation with different time periods of 10, 70, 210, 350 and 490 h. The other 5 tubes were kept far enough from radiation. The samples underwent genomic DNA extraction. PCR amplification of alpha-Int1 gene sequence was done using one set of primers. PCR products were resolved using agarose gel electrophoresis and the nucleotide sequences were determined. All samples showed a clear electrophoretic band around 441 bp and further sequencing revealed the amplified DNA segments are related to alpha-Int1 gene of the yeast. No mutations in the gene were seen in radiation exposed samples. Long-term exposure of the yeast to mobile phone radiation under the above mentioned conditions had no mutagenic effect on alpha-Int1 gene sequence.
The Rickettsia Endosymbiont of Ixodes pacificus Contains All the Genes of De Novo Folate Biosynthesis

PubMed Central

Bodnar, James; Mortazavi, Bobak; Laurent, Timothy; Deason, Jeff; Thephavongsa, Khanhkeo; Zhong, Jianmin

2015-01-01

Ticks and other arthropods often are hosts to nutrient providing bacterial endosymbionts, which contribute to their host’s fitness by supplying nutrients such as vitamins and amino acids. It has been detected, in our lab, that Ixodes pacificus is host to Rickettsia species phylotype G021. This endosymbiont is predominantly present, and 100% maternally transmitted in I. pacificus. To study roles of phylotype G021 in I. pacificus, bioinformatic and molecular approaches were carried out. MUMmer genome alignments of whole genome sequence of I. scapularis, a close relative to I. pacificus, against completely sequenced genomes of R. bellii OSU85-389, R. conorii, and R. felis, identified 8,190 unique sequences that are homologous to Rickettsia sequences in the NCBI Trace Archive. MetaCyc metabolic reconstructions revealed that all folate gene orthologues (folA, folC, folE, folKP, ptpS) required for de novo folate biosynthesis are present in the genome of Rickettsia buchneri in I. scapularis. To examine the metabolic capability of phylotype G021 in I. pacificus, genes of the folate biosynthesis pathway of the bacterium were PCR amplified using degenerate primers. BLAST searches identified that nucleotide sequences of the folA, folC, folE, folKP, and ptpS genes possess 98.6%, 98.8%, 98.9%, 98.5% and 99.0% identity respectively to the corresponding genes of Rickettsia buchneri. Phylogenetic tree constructions show that the folate genes of phylotype G021 and homologous genes from various Rickettsia species are monophyletic. This study has shown that all folate genes exist in the genome of Rickettsia species phylotype G021 and that this bacterium has the genetic capability for de novo folate synthesis. PMID:26650541
Detection and analysis of hemolysin genes in Aeromonas hydrophila isolated from Gouramy (Osphronemus gouramy) by polymerase chain reaction (PCR)

NASA Astrophysics Data System (ADS)

Rozi; Rahayu, K.; Daruti, D. N.

2018-04-01

The goal of this study was to detect of Aeromonas hydrophila carrying the hlyA gene in guramy by PCR assay. A total of 5 A. hydrophila strains were isolated from gouramy with different location and furthermore genotypic of all A. hydrophila strains havedetected by PCR assay for 16S rRNA gene. The primers used in the PCR targeted a 592-bp fragment of the hlyA gene coding for the hemolysin gene. Particularly hlyA genes are responsible for haemolysin toxins production in this genus. After gel electrophoresis, the amplicons from representative strains of the A. hydrophila were purified using extraction kit and were subjected to the DNA sequencing analysis. The results showed that: (i) the 592bp amplicon of the hlyA gene was detected in 5/6 of the A. hydrophila; (ii) the nucleotide blast results of hemolysin gene sequences of the strains of A. hydrophila revealed a high homology of 90-97 % with published sequences, and;(iii) the protein blast showed 95-98 % homology when compared to the published sequences. The PCR clearly identified the haemolysin-producing strains of A. hydrophila by detection in hlyA genes and may have application as a rapid species-specific virulence test.
Identification of a novel MYO7A mutation in Usher syndrome type 1.

PubMed

Cheng, Ling; Yu, Hongsong; Jiang, Yan; He, Juan; Pu, Sisi; Li, Xin; Zhang, Li

2018-01-05

Usher syndrome (USH) is an autosomal recessive disease characterized by deafness and retinitis pigmentosa. In view of the high phenotypic and genetic heterogeneity in USH, performing genetic screening with traditional methods is impractical. In the present study, we carried out targeted next-generation sequencing (NGS) to uncover the underlying gene in an USH family (2 USH patients and 15 unaffected relatives). One hundred and thirty-five genes associated with inherited retinal degeneration were selected for deep exome sequencing. Subsequently, variant analysis, Sanger validation and segregation tests were utilized to identify the disease-causing mutations in this family. All affected individuals had a classic USH type I (USH1) phenotype which included deafness, vestibular dysfunction and retinitis pigmentosa. Targeted NGS and Sanger sequencing validation suggested that USH1 patients carried an unreported splice site mutation, c.5168+1G>A, as a compound heterozygous mutation with c.6070C>T (p.R2024X) in the MYO7A gene. A functional study revealed decreased expression of the MYO7A gene in the individuals carrying heterozygous mutations. In conclusion, targeted next-generation sequencing provided a comprehensive and efficient diagnosis for USH1. This study revealed the genetic defects in the MYO7A gene and expanded the spectrum of clinical phenotypes associated with USH1 mutations.
De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis

PubMed Central

Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael

2010-01-01

Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology. PMID:20386741
De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

PubMed

Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

2010-04-08

Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative studies to address basic questions of fungal biology.

[Cloning and sequence analysis of full-length cDNA of secoisolariciresinol dehydrogenase of Dysosma versipellis].

PubMed

Xu, Li; Ding, Zhi-Shan; Zhou, Yun-Kai; Tao, Xue-Fen

2009-06-01

To obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis by RACE PCR,then investigate the character of Secoisolariciresinol Dehydrogenase gene. The full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene was obtained by 3'-RACE and 5'-RACE from Dysosma versipellis. We first reported the full cDNA sequences of Secoisolariciresinol Dehydrogenase in Dysosma versipellis. The acquired gene was 991bp in full length, including 5' untranslated region of 42bp, 3' untranslated region of 112bp with Poly (A). The open reading frame (ORF) encoding 278 amino acid with molecular weight 29253.3 Daltons and isolectric point 6.328. The gene accession nucleotide sequence number in GeneBank was EU573789. Semi-quantitative RT-PCR analysis revealed that the Secoisolariciresinol Dehydrogenase gene was highly expressed in stem. Alignment of the amino acid sequence of Secoisolariciresinol Dehydrogenase indicated there may be some significant amino acid sequence difference among different species. Obtain the full-length cDNA sequence of Secoisolariciresinol Dehydrogenase gene from Dysosma versipellis.
Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

PubMed

Timmermans, M J T N; Dodsworth, S; Culverwell, C L; Bocak, L; Ahrens, D; Littlewood, D T J; Pons, J; Vogler, A P

2010-11-01

Mitochondrial genome sequences are important markers for phylogenetics but taxon sampling remains sporadic because of the great effort and cost required to acquire full-length sequences. Here, we demonstrate a simple, cost-effective way to sequence the full complement of protein coding mitochondrial genes from pooled samples using the 454/Roche platform. Multiplexing was achieved without the need for expensive indexing tags ('barcodes'). The method was trialled with a set of long-range polymerase chain reaction (PCR) fragments from 30 species of Coleoptera (beetles) sequenced in a 1/16th sector of a sequencing plate. Long contigs were produced from the pooled sequences with sequencing depths ranging from ∼10 to 100× per contig. Species identity of individual contigs was established via three 'bait' sequences matching disparate parts of the mitochondrial genome obtained by conventional PCR and Sanger sequencing. This proved that assembly of contigs from the sequencing pool was correct. Our study produced sequences for 21 nearly complete and seven partial sets of protein coding mitochondrial genes. Combined with existing sequences for 25 taxa, an improved estimate of basal relationships in Coleoptera was obtained. The procedure could be employed routinely for mitochondrial genome sequencing at the species level, to provide improved species 'barcodes' that currently use the cox1 gene only.
Structure and expression of the attacin genes in Hyalophora cecropia.

PubMed

Sun, S C; Lindström, I; Lee, J Y; Faye, I

1991-02-26

To study the regulation of the immune genes in insects, we have cloned and sequenced the attacin gene locus of the giant silk moth Hyalophora cecropia. The locus contains one acidic and one basic attacin gene as well as two pseudogenes, which are remnants of basic attacin genes. A small insertion element was found within the locus. The two functional attacin genes are transcribed in opposite directions and have two introns inserted at homologous positions. A common sequence, GGGGATTCCT, is found at nucleotide position -48 in the acidic gene and at nucleotide position -58 in the basic gene. Interestingly, this decanucleotide is similar to the consensus of the NF-k B-binding site. Expression studies revealed that both attacins are strongly induced by phorbol 12-myristate 13-acetate, lipopolysaccharide and bacteria. However, only the acidic attacin gene showed a clear response to injury.
A survey of the sorghum transcriptome using single-molecule long reads

DOE PAGES

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; ...

2016-06-24

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novelmore » splice isoforms. Additionally, we uncover APA ofB11,000 expressed genes and more than 2,100 novel genes. Lastly, these results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism.« less
A survey of the sorghum transcriptome using single-molecule long reads

PubMed Central

Abdel-Ghany, Salah E.; Hamilton, Michael; Jacobi, Jennifer L.; Ngam, Peter; Devitt, Nicholas; Schilkey, Faye; Ben-Hur, Asa; Reddy, Anireddy S. N.

2016-01-01

Alternative splicing and alternative polyadenylation (APA) of pre-mRNAs greatly contribute to transcriptome diversity, coding capacity of a genome and gene regulatory mechanisms in eukaryotes. Second-generation sequencing technologies have been extensively used to analyse transcriptomes. However, a major limitation of short-read data is that it is difficult to accurately predict full-length splice isoforms. Here we sequenced the sorghum transcriptome using Pacific Biosciences single-molecule real-time long-read isoform sequencing and developed a pipeline called TAPIS (Transcriptome Analysis Pipeline for Isoform Sequencing) to identify full-length splice isoforms and APA sites. Our analysis reveals transcriptome-wide full-length isoforms at an unprecedented scale with over 11,000 novel splice isoforms. Additionally, we uncover APA of ∼11,000 expressed genes and more than 2,100 novel genes. These results greatly enhance sorghum gene annotations and aid in studying gene regulation in this important bioenergy crop. The TAPIS pipeline will serve as a useful tool to analyse Iso-Seq data from any organism. PMID:27339290
A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels.

PubMed

Al-Bustan, Suzanne A; Al-Serri, Ahmad; Annice, Babitha G; Alnaqeeb, Majed A; Al-Kandari, Wafa Y; Dashti, Mohammed

2018-01-01

The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel "rare" variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004-0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001-0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia.
A novel LPL intronic variant: g.18704C>A identified by re-sequencing Kuwaiti Arab samples is associated with high-density lipoprotein, very low-density lipoprotein and triglyceride lipid levels

PubMed Central

Al-Serri, Ahmad; Annice, Babitha G.; Alnaqeeb, Majed A.; Al-Kandari, Wafa Y.; Dashti, Mohammed

2018-01-01

The role interethnic genetic differences play in plasma lipid level variation across populations is a global health concern. Several genes involved in lipid metabolism and transport are strong candidates for the genetic association with lipid level variation especially lipoprotein lipase (LPL). The objective of this study was to re-sequence the full LPL gene in Kuwaiti Arabs, analyse the sequence variation and identify variants that could attribute to variation in plasma lipid levels for further genetic association. Samples (n = 100) of an Arab ethnic group from Kuwait were analysed for sequence variation by Sanger sequencing across the 30 Kb LPL gene and its flanking sequences. A total of 293 variants including 252 single nucleotide polymorphisms (SNPs) and 39 insertions/deletions (InDels) were identified among which 47 variants (32 SNPs and 15 InDels) were novel to Kuwaiti Arabs. This study is the first to report sequence data and analysis of frequencies of variants at the LPL gene locus in an Arab ethnic group with a novel “rare” variant (LPL:g.18704C>A) significantly associated to HDL (B = -0.181; 95% CI (-0.357, -0.006); p = 0.043), TG (B = 0.134; 95% CI (0.004–0.263); p = 0.044) and VLDL (B = 0.131; 95% CI (-0.001–0.263); p = 0.043) levels. Sequence variation in Kuwaiti Arabs was compared to other populations and was found to be similar with regards to the number of SNPs, InDels and distribution of the number of variants across the LPL gene locus and minor allele frequency (MAF). Moreover, comparison of the identified variants and their MAF with other reports provided a list of 46 potential variants across the LPL gene to be considered for future genetic association studies. The findings warrant further investigation into the association of g.18704C>A with lipid levels in other ethnic groups and with clinical manifestations of dyslipidemia. PMID:29438437
The Complete Genome Phylogeny of Geographically Distinct Dengue Virus Serotype 2 Isolates (1944-2013) Supports Further Groupings within the Cosmopolitan Genotype

PubMed Central

Ali, Akhtar; Ali, Ijaz

2015-01-01

Dengue virus serotype 2 (DENV-2) isolates have been implicated in deadly outbreaks of dengue fever (DF) and dengue hemorrhagic fever (DHF) in several regions of the world. Phylogenetic analysis of DENV-2 isolates collected from particular countries has been performed using partial or individual genes but only a few studies have examined complete whole-genome sequences collected worldwide. Herein, 50 complete genome sequences of DENV-2 isolates, reported over the past 70 years from 19 different countries, were downloaded from GenBank. Phylogenetic analysis was conducted and evolutionary distances of the 50 DENV-2 isolates were determined using maximum likelihood (ML) trees or Bayesian phylogenetic analysis created from complete genome nucleotide (nt) and amino acid (aa) sequences or individual gene sequences. The results showed that all DENV-2 isolates fell into seven main groups containing five previously defined genotypes. A Cosmopolitan genotype showed further division into three groups (C-I, C-II, and C-III) with the C-I group containing two subgroups (C-IA and C-IB). Comparison of the aa sequences showed specific mutations among the various groups of DENV-2 isolates. A maximum number of aa mutations was observed in the NS5 gene, followed by the NS2A, NS3 and NS1 genes, while the smallest number of aa substitutions was recorded in the capsid gene, followed by the PrM/M, NS4A, and NS4B genes. Maximum evolutionary distances were found in the NS2A gene, followed by the NS4A and NS4B genes. Based on these results, we propose that genotyping of DENV-2 isolates in future studies should be performed on entire genome sequences in order to gain a complete understanding of the evolution of various isolates reported from different geographical locations around the world. PMID:26414178
Genetic diversity in Trypanosoma theileri from Sri Lankan cattle and water buffaloes.

PubMed

Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Fukushi, Shintaro; Tattiyapong, Muncharee; Tuvshintulga, Bumduuren; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Igarashi, Ikuo; Inoue, Noboru

2015-01-30

Trypanosoma theileri is a hemoprotozoan parasite that infects various ruminant species. We investigated the epidemiology of this parasite among cattle and water buffalo populations bred in Sri Lanka, using a diagnostic PCR assay based on the cathepsin L-like protein (CATL) gene. Blood DNA samples sourced from cattle (n=316) and water buffaloes (n=320) bred in different geographical areas of Sri Lanka were PCR screened for T. theileri. Parasite DNA was detected in cattle and water buffaloes alike in all the sampling locations. The overall T. theileri-positive rate was higher in water buffaloes (15.9%) than in cattle (7.6%). Subsequently, PCR amplicons were sequenced and the partial CATL sequences were phylogenetically analyzed. The identity values for the CATL gene were 89.6-99.7% among the cattle-derived sequences, compared with values of 90.7-100% for the buffalo-derived sequences. However, the cattle-derived sequences shared 88.2-100% identity values with those from buffaloes. In the phylogenetic tree, the Sri Lankan CATL gene sequences fell into two major clades (TthI and TthII), both of which contain CATL sequences from several other countries. Although most of the CATL sequences from Sri Lankan cattle and buffaloes clustered independently, two buffalo-derived sequences were observed to be closely related to those of the Sri Lankan cattle. Furthermore, a Sri Lankan buffalo sequence clustered with CATL gene sequences from Brazilian buffalo and Thai cattle. In addition to reporting the first PCR-based survey of T. theileri among Sri Lankan-bred cattle and water buffaloes, the present study found that some of the CATL gene fragments sourced from water buffaloes shared similarity with those determined from cattle in this country. Copyright © 2014 Elsevier B.V. All rights reserved.
Enlightenment of Yeast Mitochondrial Homoplasmy: Diversified Roles of Gene Conversion

PubMed Central

Ling, Feng; Mikawa, Tsutomu; Shibata, Takehiko

2011-01-01

Mitochondria have their own genomic DNA. Unlike the nuclear genome, each cell contains hundreds to thousands of copies of mitochondrial DNA (mtDNA). The copies of mtDNA tend to have heterogeneous sequences, due to the high frequency of mutagenesis, but are quickly homogenized within a cell (“homoplasmy”) during vegetative cell growth or through a few sexual generations. Heteroplasmy is strongly associated with mitochondrial diseases, diabetes and aging. Recent studies revealed that the yeast cell has the machinery to homogenize mtDNA, using a common DNA processing pathway with gene conversion; i.e., both genetic events are initiated by a double-stranded break, which is processed into 3′ single-stranded tails. One of the tails is base-paired with the complementary sequence of the recipient double-stranded DNA to form a D-loop (homologous pairing), in which repair DNA synthesis is initiated to restore the sequence lost by the breakage. Gene conversion generates sequence diversity, depending on the divergence between the donor and recipient sequences, especially when it occurs among a number of copies of a DNA sequence family with some sequence variations, such as in immunoglobulin diversification in chicken. MtDNA can be regarded as a sequence family, in which the members tend to be diversified by a high frequency of spontaneous mutagenesis. Thus, it would be interesting to determine why and how double-stranded breakage and D-loop formation induce sequence homogenization in mitochondria and sequence diversification in nuclear DNA. We will review the mechanisms and roles of mtDNA homoplasmy, in contrast to nuclear gene conversion, which diversifies gene and genome sequences, to provide clues toward understanding how the common DNA processing pathway results in such divergent outcomes. PMID:24710143
Sequence Evolution and Expression Regulation of Stress-Responsive Genes in Natural Populations of Wild Tomato

PubMed Central

Fischer, Iris; Steige, Kim A.; Stephan, Wolfgang; Mboup, Mamadou

2013-01-01

The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives. PMID:24205149
Gene and translation initiation site prediction in metagenomic sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

2012-01-01

Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less
Informatic and genomic analysis of melanocyte cDNA libraries as a resource for the study of melanocyte development and function.

PubMed

Baxter, Laura L; Hsu, Benjamin J; Umayam, Lowell; Wolfsberg, Tyra G; Larson, Denise M; Frith, Martin C; Kawai, Jun; Hayashizaki, Yoshihide; Carninci, Piero; Pavan, William J

2007-06-01

As part of the RIKEN mouse encyclopedia project, two cDNA libraries were prepared from melanocyte-derived cell lines, using techniques of full-length clone selection and subtraction/normalization to enrich for rare transcripts. End sequencing showed that these libraries display over 83% complete coding sequence at the 5' end and 96-97% complete coding sequence at the 3' end. Evaluation of the libraries, derived from B16F10Y tumor cells and melan-c cells, revealed that they contain clones for a majority of the genes previously demonstrated to function in melanocyte biology. Analysis of genomic locations for transcripts revealed that the distribution of melanocyte genes is non-random throughout the genome. Three genomic regions identified that showed significant clustering of melanocyte-expressed genes contain one or more genes previously shown to regulate melanocyte development or function. A catalog of genes expressed in these libraries is presented, providing a valuable resource of cDNA clones and sequence information that can be used for identification of new genes important for melanocyte development, function, and disease.
Comparison and phylogenetic analysis of the ISS gene in two predominant avian pathogenic E. coli serogroups isolated from avian colibacillosis in Iran.

PubMed

Zahraei Salehi, Taghi; Derakhshandeh, Abdollah; Tadjbakhsh, Hasan; Karimi, Vahid

2013-02-01

The ISS (increased serum survival) gene and its protein product (ISS) of avian pathogenic Escherichia coli (APEC) are important characteristics of resistance to the complement system. The aims of this study were to clone, sequence and characterize sequence diversity of the ISS gene between two predominant serogroups in Iran and among those previously deposited in Genbank. The ISS gene of 309 bp from the APEC χ1390 strain was amplified by PCR, cloned and sequenced using pTZ57R/T vector. The ISS gene from the χ1390 strain has 100% identity among different serogroups of APEC in different geographical regions throughout the world. Phylogenetic analysis shows two different phylogenic groups among the different strains. Strong association of nucleotide sequences among different E. coli strains suggests that it may be a conserved gene and could be a suitable antigen to control and detect avian pathogenic E. coli, at least in our region. Currently, our group is working on the ISS protein as candidate vaccine in SPF poultry. Copyright © 2012 Elsevier Ltd. All rights reserved.
High-Throughput Sequence Analysis of Turbot (Scophthalmus maximus) Transcriptome Using 454-Pyrosequencing for the Discovery of Antiviral Immune Genes

PubMed Central

Pereiro, Patricia; Balseiro, Pablo; Romero, Alejandro; Dios, Sonia; Forn-Cuni, Gabriel; Fuste, Berta; Planas, Josep V.; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio

2012-01-01

Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms. PMID:22629298
The Unique hmuY Gene Sequence as a Specific Marker of Porphyromonas gingivalis

PubMed Central

Mackiewicz, Paweł; Radwan-Oczko, Małgorzata; Kantorowicz, Małgorzata; Chomyszyn-Gajewska, Maria; Frąszczak, Magdalena; Bielecki, Marcin; Olczak, Mariusz; Olczak, Teresa

2013-01-01

Porphyromonas gingivalis, a major etiological agent of chronic periodontitis, acquires heme from host hemoproteins using the HmuY hemophore. The aim of this study was to develop a specific P. gingivalis marker based on a hmuY gene sequence. Subgingival samples were collected from 66 patients with chronic periodontitis and 40 healthy subjects and the entire hmuY gene was analyzed in positive samples. Phylogenetic analyses demonstrated that both the amino acid sequence of the HmuY protein and the nucleotide sequence of the hmuY gene are unique among P. gingivalis strains/isolates and show low identity to sequences found in other species (below 50 and 56%, respectively). In agreement with these findings, a set of hmuY gene-based primers and standard/real-time PCR with SYBR Green chemistry allowed us to specifically detect P. gingivalis in patients with chronic periodontitis (77.3%) and healthy subjects (20%), the latter possessing lower number of P. gingivalis cells and total bacterial cells. Isolates from healthy subjects possess the hmuY gene-based nucleotide sequence pattern occurring in W83/W50/A7436 (n = 4), 381/ATCC 33277 (n = 3) or TDC60 (n = 1) strains, whereas those from patients typically have TDC60 (n = 21), W83/W50/A7436 (n = 17) and 381/ATCC 33277 (n = 13) strains. We observed a significant correlation between periodontal index of risk of infectiousness (PIRI) and the presence/absence of P. gingivalis (regardless of the hmuY gene-based sequence pattern of the isolate identified [r = 0.43; P = 0.0002] and considering particular isolate pattern [r = 0.38; P = 0.0012]). In conclusion, we demonstrated that the hmuY gene sequence or its fragments may be used as one of the molecular markers of P. gingivalis. PMID:23844074
Genomic organization of human fetal specific P-450IIIA7 (cytochrome P-450HFLa)-related gene(s) and interaction of transcriptional regulatory factor with its DNA element in the 5' flanking region.

PubMed

Itoh, S; Yanagimoto, T; Tagawa, S; Hashimoto, H; Kitamura, R; Nakajima, Y; Okochi, T; Fujimoto, S; Uchino, J; Kamataki, T

1992-03-24

P-450IIIA7 is a form of cytochrome P-450 which was isolated from human fetal livers and termed P-450HFLa. This form has been clarified to be expressed during fetal life specifically (Komori, M., Nishio, K., Kitada, M., Shiramatsu, K., Muroya, K., Soma, M., Nagashima, K. and Kamataki, T. (1990) Biochemistry 29, 4430-4433). In the present study, we isolated five independent clones which probably corresponded to the human P-450IIIA7 gene. These clones were completely sequenced, all exons, exon-intron junctions and the 5' flanking region from the cap site to-869. Although the sequences in the coding region were completely identical to P-450IIIA7, it is possible that genomic fragments sequenced in this study encode portions of other P-450IIIA7-related genes since we could not obtain a complete overlapping set of genomic clones. Within its 5' flanking sequence, the putative binding sites of several transcriptional regulatory factors existed. Among them, it was shown that a basic transcription element binding factor (BTEB) actually interacted with the 5' flanking region of this gene.
Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

PubMed Central

2011-01-01

Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms. PMID:21791039
Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle

PubMed Central

Selvan, A. Sakthivel; Gupta, I. D.; Verma, A.; Chaudhari, M. V.; Magotra, A.

2016-01-01

Aim: The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Materials and Methods: Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ2) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Results: Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. Conclusion: AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes. PMID:27536026
Molecular characterization and combined genotype association study of bovine cluster of differentiation 14 gene with clinical mastitis in crossbred dairy cattle.

PubMed

Selvan, A Sakthivel; Gupta, I D; Verma, A; Chaudhari, M V; Magotra, A

2016-07-01

The present study was undertaken with the objectives to characterize and to analyze combined genotypes of cluster of differentiation 14 (CD14) gene to explore its association with clinical mastitis in Karan Fries (KF) cows maintained in the National Dairy Research Institute herd, Karnal. Genomic DNA was extracted using blood of randomly selected 94 KF lactating cattle by phenol-chloroform method. After checking its quality and quantity, polymerase chain reaction (PCR) was carried out using six sets of reported gene-specific primers to amplify complete KF CD14 gene. The forward and reverse sequences for each PCR fragments were assembled to form complete sequence for the respective region of KF CD14 gene. The multiple sequence alignments of the edited sequence with the corresponding reference with reported Bos taurus sequence (EU148610.1) were performed with ClustalW software to identify single nucleotide polymorphisms (SNPs). Basic Local Alignment Search Tool analysis was performed to compare the sequence identity of KF CD14 gene with other species. The restriction fragment length polymorphism (RFLP) analysis was carried out in all KF cows using Helicobacter pylori 188I (Hpy188I) (contig 2) and Haemophilus influenzae I (HinfI) (contig 4) restriction enzyme (RE). Cows were assigned genotypes obtained by PCR-RFLP analysis, and association study was done using Chi-square (χ (2)) test. The genotypes of both contigs (loci) number 2 and 4 were combined with respect to each animal to construct combined genotype patterns. Two types of sequences of KF were obtained: One with 2630 bp having one insertion at 616 nucleotide (nt) position and one deletion at 1117 nt position, and the another sequence was of 2629 bp having only one deletion at 615 nt position. ClustalW, multiple alignments of KF CD14 gene sequence with B. taurus cattle sequence (EU148610.1), revealed 24 nt changes (SNPs). Cows were also screened using PCR-RFLP with Hpy188I (contig 2) and HinfI (contig 4) RE, which revealed three genotypes each that differed significantly regarding mastitis incidence. The maximum possible combination of these two loci shown nine combined genotype patterns and it was observed only eight combined genotypes out of nine: AACC, AACD, AADD, ABCD, ABDD, BBCC, BBCD, and BBDD. The combined genotype ABCC was not observed in the studied population of KF cows. Out of 94 animals, AACD combined genotype animals (10.63%) were found to be not affected with mastitis, and ABDD combined genotyped animals was observed having the highest mastitis incidence of 15.96%. AACD typed cows were found to be least susceptible to mastitis incidence as compared to other combined genotypes.

PTGBase: an integrated database to study tandem duplicated genes in plants.

PubMed

Yu, Jingyin; Ke, Tao; Tehrim, Sadia; Sun, Fengming; Liao, Boshou; Hua, Wei

2015-01-01

Tandem duplication is a wide-spread phenomenon in plant genomes and plays significant roles in evolution and adaptation to changing environments. Tandem duplicated genes related to certain functions will lead to the expansion of gene families and bring increase of gene dosage in the form of gene cluster arrays. Many tandem duplication events have been studied in plant genomes; yet, there is a surprising shortage of efforts to systematically present the integration of large amounts of information about publicly deposited tandem duplicated gene data across the plant kingdom. To address this shortcoming, we developed the first plant tandem duplicated genes database, PTGBase. It delivers the most comprehensive resource available to date, spanning 39 plant genomes, including model species and newly sequenced species alike. Across these genomes, 54 130 tandem duplicated gene clusters (129 652 genes) are presented in the database. Each tandem array, as well as its member genes, is characterized in complete detail. Tandem duplicated genes in PTGBase can be explored through browsing or searching by identifiers or keywords of functional annotation and sequence similarity. Users can download tandem duplicated gene arrays easily to any scale, up to the complete annotation data set for an entire plant genome. PTGBase will be updated regularly with newly sequenced plant species as they become available. © The Author(s) 2015. Published by Oxford University Press.
Novel primers for complete mitochondrial cytochrome b genesequencing in mammals

USGS Publications Warehouse

Naidu, Ashwin; Fitak, Robert R.; Munguia-Vega, Adrian; Culver, Melanie

2011-01-01

Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.
Diversity of Babesia bovis merozoite surface antigen genes in the Philippines.

PubMed

Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Ybanez, Adrian Patalinghug; Ybanez, Rochelle Haidee Daclan; Perez, Zandro Obligado; Guswanto, Azirwan; Igarashi, Ikuo; Yokoyama, Naoaki

2014-02-01

Babesia bovis is the causative agent of fatal babesiosis in cattle. In the present study, we investigated the genetic diversity of B. bovis among Philippine cattle, based on the genes that encode merozoite surface antigens (MSAs). Forty-one B. bovis-positive blood DNA samples from cattle were used to amplify the msa-1, msa-2b, and msa-2c genes. In phylogenetic analyses, the msa-1, msa-2b, and msa-2c gene sequences generated from Philippine B. bovis-positive DNA samples were found in six, three, and four different clades, respectively. All of the msa-1 and most of the msa-2b sequences were found in clades that were formed only by Philippine msa sequences in the respective phylograms. While all the msa-1 sequences from the Philippines showed similarity to those formed by Australian msa-1 sequences, the msa-2b sequences showed similarity to either Australian or Mexican msa-2b sequences. In contrast, msa-2c sequences from the Philippines were distributed across all the clades of the phylogram, although one clade was formed exclusively by Philippine msa-2c sequences. Similarities among the deduced amino acid sequences of MSA-1, MSA-2b, and MSA-2c from the Philippines were 62.2-100, 73.1-100, and 67.3-100%, respectively. The present findings demonstrate that B. bovis populations are genetically diverse in the Philippines. This information will provide a good foundation for the future design and implementation of improved immunological preventive methodologies against bovine babesiosis in the Philippines. The study has also generated a set of data that will be useful for futher understanding of the global genetic diversity of this important parasite. © 2013.
Fast, accurate and easy-to-pipeline methods for amplicon sequence processing

NASA Astrophysics Data System (ADS)

Antonielli, Livio; Sessitsch, Angela

2016-04-01

Next generation sequencing (NGS) technologies established since years as an essential resource in microbiology. While on the one hand metagenomic studies can benefit from the continuously increasing throughput of the Illumina (Solexa) technology, on the other hand the spreading of third generation sequencing technologies (PacBio, Oxford Nanopore) are getting whole genome sequencing beyond the assembly of fragmented draft genomes, making it now possible to finish bacterial genomes even without short read correction. Besides (meta)genomic analysis next-gen amplicon sequencing is still fundamental for microbial studies. Amplicon sequencing of the 16S rRNA gene and ITS (Internal Transcribed Spacer) remains a well-established widespread method for a multitude of different purposes concerning the identification and comparison of archaeal/bacterial (16S rRNA gene) and fungal (ITS) communities occurring in diverse environments. Numerous different pipelines have been developed in order to process NGS-derived amplicon sequences, among which Mothur, QIIME and USEARCH are the most well-known and cited ones. The entire process from initial raw sequence data through read error correction, paired-end read assembly, primer stripping, quality filtering, clustering, OTU taxonomic classification and BIOM table rarefaction as well as alternative "normalization" methods will be addressed. An effective and accurate strategy will be presented using the state-of-the-art bioinformatic tools and the example of a straightforward one-script pipeline for 16S rRNA gene or ITS MiSeq amplicon sequencing will be provided. Finally, instructions on how to automatically retrieve nucleotide sequences from NCBI and therefore apply the pipeline to targets other than 16S rRNA gene (Greengenes, SILVA) and ITS (UNITE) will be discussed.
CLINICAL PROGRESS IN INHERITED RETINAL DEGENERATIONS: GENE THERAPY CLINICAL TRIALS AND ADVANCES IN GENETIC SEQUENCING.

PubMed

Hafler, Brian P

2017-03-01

Inherited retinal dystrophies are a significant cause of vision loss and are characterized by the loss of photoreceptors and the retinal pigment epithelium (RPE). Mutations in approximately 250 genes cause inherited retinal degenerations with a high degree of genetic heterogeneity. New techniques in next-generation sequencing are allowing the comprehensive analysis of all retinal disease genes thus changing the approach to the molecular diagnosis of inherited retinal dystrophies. This review serves to analyze clinical progress in genetic diagnostic testing and implications for retinal gene therapy. A literature search of PubMed and OMIM was conducted to relevant articles in inherited retinal dystrophies. Next-generation genetic sequencing allows the simultaneous analysis of all the approximately 250 genes that cause inherited retinal dystrophies. Reported diagnostic rates range are high and range from 51% to 57%. These new sequencing tools are highly accurate with sensitivities of 97.9% and specificities of 100%. Retinal gene therapy clinical trials are underway for multiple genes including RPE65, ABCA4, CHM, RS1, MYO7A, CNGA3, CNGB3, ND4, and MERTK for which a molecular diagnosis may be beneficial for patients. Comprehensive next-generation genetic sequencing of all retinal dystrophy genes is changing the paradigm for how retinal specialists perform genetic testing for inherited retinal degenerations. Not only are high diagnostic yields obtained, but mutations in genes with novel clinical phenotypes are also identified. In the era of retinal gene therapy clinical trials, identifying specific genetic defects will increasingly be of use to identify patients who may enroll in clinical studies and benefit from novel therapies.
Pervasive sequence patents cover the entire human genome.

PubMed

Rosenfeld, Jeffrey A; Mason, Christopher E

2013-01-01

The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays.
Transcriptomic analysis of rice aleurone cells identified a novel abscisic acid response element.

PubMed

Watanabe, Kenneth A; Homayouni, Arielle; Gu, Lingkun; Huang, Kuan-Ying; Ho, Tuan-Hua David; Shen, Qingxi J

2017-09-01

Seeds serve as a great model to study plant responses to drought stress, which is largely mediated by abscisic acid (ABA). The ABA responsive element (ABRE) is a key cis-regulatory element in ABA signalling. However, its consensus sequence (ACGTG(G/T)C) is present in the promoters of only about 40% of ABA-induced genes in rice aleurone cells, suggesting other ABREs may exist. To identify novel ABREs, RNA sequencing was performed on aleurone cells of rice seeds treated with 20 μM ABA. Gibbs sampling was used to identify enriched elements, and particle bombardment-mediated transient expression studies were performed to verify the function. Gene ontology analysis was performed to predict the roles of genes containing the novel ABREs. This study revealed 2443 ABA-inducible genes and a novel ABRE, designated as ABREN, which was experimentally verified to mediate ABA signalling in rice aleurone cells. Many of the ABREN-containing genes are predicted to be involved in stress responses and transcription. Analysis of other species suggests that the ABREN may be monocot specific. This study also revealed interesting expression patterns of genes involved in ABA metabolism and signalling. Collectively, this study advanced our understanding of diverse cis-regulatory sequences and the transcriptomes underlying ABA responses in rice aleurone cells. © 2017 John Wiley & Sons Ltd.
An automatic and efficient pipeline for disease gene identification through utilizing family-based sequencing data.

PubMed

Song, Dandan; Li, Ning; Liao, Lejian

2015-01-01

Due to the generation of enormous amounts of data at both lower costs as well as in shorter times, whole-exome sequencing technologies provide dramatic opportunities for identifying disease genes implicated in Mendelian disorders. Since upwards of thousands genomic variants can be sequenced in each exome, it is challenging to filter pathogenic variants in protein coding regions and reduce the number of missing true variants. Therefore, an automatic and efficient pipeline for finding disease variants in Mendelian disorders is designed by exploiting a combination of variants filtering steps to analyze the family-based exome sequencing approach. Recent studies on the Freeman-Sheldon disease are revisited and show that the proposed method outperforms other existing candidate gene identification methods.
Expression profiling and cross-species RNA interference (RNAi) of desiccation-induced transcripts in the anhydrobiotic nematode Aphelenchus avenae

PubMed Central

2010-01-01

Background Some organisms can survive extreme desiccation by entering a state of suspended animation known as anhydrobiosis. The free-living mycophagous nematode Aphelenchus avenae can be induced to enter anhydrobiosis by pre-exposure to moderate reductions in relative humidity (RH) prior to extreme desiccation. This preconditioning phase is thought to allow modification of the transcriptome by activation of genes required for desiccation tolerance. Results To identify such genes, a panel of expressed sequence tags (ESTs) enriched for sequences upregulated in A. avenae during preconditioning was created. A subset of 30 genes with significant matches in databases, together with a number of apparently novel sequences, were chosen for further study. Several of the recognisable genes are associated with water stress, encoding, for example, two new hydrophilic proteins related to the late embryogenesis abundant (LEA) protein family. Expression studies confirmed EST panel members to be upregulated by evaporative water loss, and the majority of genes was also induced by osmotic stress and cold, but rather fewer by heat. We attempted to use RNA interference (RNAi) to demonstrate the importance of this gene set for anhydrobiosis, but found A. avenae to be recalcitrant with the techniques used. Instead, therefore, we developed a cross-species RNAi procedure using A. avenae sequences in another anhydrobiotic nematode, Panagrolaimus superbus, which is amenable to gene silencing. Of 20 A. avenae ESTs screened, a significant reduction in survival of desiccation in treated P. superbus populations was observed with two sequences, one of which was novel, while the other encoded a glutathione peroxidase. To confirm a role for glutathione peroxidases in anhydrobiosis, RNAi with cognate sequences from P. superbus was performed and was also shown to reduce desiccation tolerance in this species. Conclusions This study has identified and characterised the expression profiles of members of the anhydrobiotic gene set in A. avenae. It also demonstrates the potential of RNAi for the analysis of anhydrobiosis and provides the first genetic data to underline the importance of effective antioxidant systems in metazoan desiccation tolerance. PMID:20085654
Nucleotide sequence analysis of the recA gene and discrimination of the three isolates of urease-positive thermophilic Campylobacter (UPTC) isolated from seagulls (Larus spp.) in Northern Ireland.

PubMed

Matsuda, M; Tai, K; Moore, J E; Millar, B C; Murayama, O

2004-01-01

Nucleotide sequencing after TA cloning of the amplicon of the almost-full length recA gene from three strains of UPTC (A1, A2, and A3) isolated from seagulls in Northern Ireland, the phenotypical and genotypical characteristics of which have been demonstrated to be indistinguishable, clarified nucleotide differences at three nucleotide positions among the three strains. In conclusion, the nucleotide sequences of the recA gene were found to discriminate among the three strains of UPTC, A1, A2, and A3, which are indistinguishable phenotypically and genotypically. Thus, the present study strongly suggests that nucleotide sequence data of the amplicon of a suitable gene or region could aid in discriminating among isolates of the UPTC group, which are indistinguishable phenotypically and genotypically. Copyright 2004 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Cloning and sequence analysis of sucrose phosphate synthase gene from varieties of Pennisetum species.

PubMed

Li, H C; Lu, H B; Yang, F Y; Liu, S J; Bai, C J; Zhang, Y W

2015-03-31

Sucrose phosphate synthase (SPS) is an enzyme used by higher plants for sucrose synthesis. In this study, three primer sets were designed on the basis of known SPS sequences from maize (GenBank: NM_001112224.1) and sugarcane (GenBank: JN584485.1), and five novel SPS genes were identified by RT-PCR from the genomes of Pennisetum spp (the hybrid P. americanum x P. purpureum, P. purpureum Schum., P. purpureum Schum. cv. Red, P. purpureum Schum. cv. Taiwan, and P. purpureum Schum. cv. Mott). The cloned sequences showed 99.9% identity and 80-88% similarity to the SPS sequences of other plants. The SPS gene of hybrid Pennisetum had one nucleotide and four amino acid polymorphisms compared to the other four germplasms, and cluster analysis was performed to assess genetic diversity in this species. Additional characterization of the SPS gene product can potentially allow Pennisetum to be exploited as a biofuel source.
Genome sequence of the model medicinal mushroom Ganoderma lucidum

PubMed Central

Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao

2012-01-01

Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi. PMID:22735441
Molecular Characterization of Staphylococcus aureus Isolates Transmitted between Patients with Buruli Ulcer.

PubMed

Amissah, Nana Ama; Chlebowicz, Monika A; Ablordey, Anthony; Sabat, Artur J; Tetteh, Caitlin S; Prah, Isaac; van der Werf, Tjip S; Friedrich, Alex W; van Dijl, Jan Maarten; Rossen, John W; Stienstra, Ymkje

2015-01-01

Buruli ulcer (BU) is a skin infection caused by Mycobacterium ulcerans. The wounds of most BU patients are colonized with different microorganisms, including Staphylococcus aureus. This study investigated possible patient-to-patient transmission events of S. aureus during wound care in a health care center. S. aureus isolates from different BU patients with overlapping visits to the clinic were whole-genome sequenced and analyzed by a gene-by-gene approach using SeqSphere(+) software. In addition, sequence data were screened for the presence of genes that conferred antibiotic resistance. SeqSphere(+) analysis of whole-genome sequence data confirmed transmission of methicillin resistant S. aureus (MRSA) and methicillin susceptible S. aureus among patients that took place during wound care. Interestingly, our sequence data show that the investigated MRSA isolates carry a novel allele of the fexB gene conferring chloramphenicol resistance, which had thus far not been observed in S. aureus.
Nucleotide sequences of the tet(M) genes from the American and Dutch type tetracycline resistance plasmids of Neisseria gonorrhoeae.

PubMed

Gascoyne-Binzi, D M; Heritage, J; Hawkey, P M

1993-11-01

High-level tetracycline-resistant Neisseria gonorrhoeae (TRNG) has been associated with the presence of a plasmid approximately 25.2 MDa in size which carries a Tet M tetracycline resistance determinant. Two different plasmid types, American and Dutch, have previously been described, based on the restriction endonuclease digestion pattern. In this study, the tet(M) genes from the two plasmid types have been amplified by the polymerase chain reaction (PCR) and then sequenced. The gene sequences from the two plasmids shared 96.8% identity, and showed similarities with different segments of the tet(M) gene sequences from Tn1545, Tn916 and Ureaplasma urealyticum. The data suggest that it is highly likely that the Tet M determinant found in the American type plasmid has a different origin from that present in the Dutch plasmid.
Clinical evaluation of panel testing by next-generation sequencing (NGS) for gene mutations in myeloid neoplasms.

PubMed

Au, Chun Hang; Wa, Anna; Ho, Dona N; Chan, Tsun Leung; Ma, Edmond S K

2016-01-22

Genomic techniques in recent years have allowed the identification of many mutated genes important in the pathogenesis of acute myeloid leukemia (AML). Together with cytogenetic aberrations, these gene mutations are powerful prognostic markers in AML and can be used to guide patient management, for example selection of optimal post-remission therapy. The mutated genes also hold promise as therapeutic targets themselves. We evaluated the applicability of a gene panel for the detection of AML mutations in a diagnostic molecular pathology laboratory. Fifty patient samples comprising 46 AML and 4 other myeloid neoplasms were accrued for the study. They consisted of 19 males and 31 females at a median age of 60 years (range: 18-88 years). A total of 54 genes (full coding exons of 15 genes and exonic hotspots of 39 genes) were targeted by 568 amplicons that ranged from 225 to 275 bp. The combined coverage was 141 kb in sequence length. Amplicon libraries were prepared by TruSight myeloid sequencing panel (Illumina, CA) and paired-end sequencing runs were performed on a MiSeq (Illumina) genome sequencer. Sequences obtained were analyzed by in-house bioinformatics pipeline, namely BWA-MEM, Samtools, GATK, Pindel, Ensembl Variant Effect Predictor and a novel algorithm ITDseek. The mean count of sequencing reads obtained per sample was 3.81 million and the mean sequencing depth was over 3000X. Seventy-seven mutations in 24 genes were detected in 37 of 50 samples (74 %). On average, 2 mutations (range 1-5) were detected per positive sample. TP53 gene mutations were found in 3 out of 4 patients with complex and unfavorable cytogenetics. Comparing NGS results with that of conventional molecular testing showed a concordance rate of 95.5 %. After further resolution and application of a novel bioinformatics algorithm ITDseek to aid the detection of FLT3 internal tandem duplication (ITD), the concordance rate was revised to 98.2 %. Gene panel testing by NGS approach was applicable for sensitive and accurate detection of actionable AML gene mutations in the clinical laboratory to individualize patient management. A novel algorithm ITDseek was presented that improved the detection of FLT3-ITD of varying length, position and at low allelic burden.
DNA Sequence Analysis of a Complementary DNA for Cold-Regulated Arabidopsis Gene cor15 and Characterization of the COR 15 Polypeptide 1

PubMed Central

Lin, Chentao; Thomashow, Michael F.

1992-01-01

Previous studies have indicated that changes in gene expression occur in Arabidopsis thaliana L. (Heyn) during cold acclimation and that certain of the cor (cold-regulated) genes encode polypeptides that share the unusual property of remaining soluble upon boiling in aqueous solution. Here, we identify a cDNA clone for a cold-regulated gene encoding one of the “boiling-stable” polypeptides, COR15. DNA sequence analysis indicated that the gene, designated cor15, encodes a 14.7-kilodalton hydrophilic polypeptide having an N-terminal amino acid sequence that closely resembles transit peptides that target proteins to the stromal compartment of chloroplasts. Immunological studies indicated that COR15 is processed in vivo and that the mature polypeptide, COR 15m, is present in the soluble fraction of chloroplasts. Possible functions of COR 15m are discussed. ImagesFigure 1Figure 4Figure 5Figure 6Figure 7 PMID:16668917
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

PubMed

Shen, Jinhui; Cong, Qian; Grishin, Nick V

2015-09-01

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
The Sequences of 1504 Mutants in the Model Rice Variety Kitaake Facilitate Rapid Functional Genomic Studies

PubMed Central

Pham, Nikki T.; Wei, Tong; Schackwitz, Wendy S.; Lipzen, Anna M.; Duong, Phat Q.; Jones, Kyle C.; Ruan, Deling; Bauer, Diane; Peng, Yi; Schmutz, Jeremy

2017-01-01

The availability of a whole-genome sequenced mutant population and the cataloging of mutations of each line at a single-nucleotide resolution facilitate functional genomic analysis. To this end, we generated and sequenced a fast-neutron-induced mutant population in the model rice cultivar Kitaake (Oryza sativa ssp japonica), which completes its life cycle in 9 weeks. We sequenced 1504 mutant lines at 45-fold coverage and identified 91,513 mutations affecting 32,307 genes, i.e., 58% of all rice genes. We detected an average of 61 mutations per line. Mutation types include single-base substitutions, deletions, insertions, inversions, translocations, and tandem duplications. We observed a high proportion of loss-of-function mutations. We identified an inversion affecting a single gene as the causative mutation for the short-grain phenotype in one mutant line. This result reveals the usefulness of the resource for efficient, cost-effective identification of genes conferring specific phenotypes. To facilitate public access to this genetic resource, we established an open access database called KitBase that provides access to sequence data and seed stocks. This population complements other available mutant collections and gene-editing technologies. This work demonstrates how inexpensive next-generation sequencing can be applied to generate a high-density catalog of mutations. PMID:28576844
A phylogenetic comparison of urease-positive thermophilic Campylobacter (UPTC) and urease-negative (UN) C. lari.

PubMed

Hirayama, Junichi; Tazumi, Akihiro; Hayashi, Kyohei; Tasaki, Erina; Kuribayashi, Takashi; Moore, John E; Millar, Beverley C; Matsuda, Motoo

2011-06-01

In the present study, the reliability of full-length gene sequence information for several genes including 16S rRNA was examined, for the discrimination of the two representative Campylobacter lari taxa, namely urease-negative (UN) C. lari and urease-positive thermophilic Campylobacter (UPTC). As previously described, 16S rRNA gene sequence are not reliable for the molecular discrimination of UN C. lari from UPTC organisms employing both the unweighted pair group method using arithmetic means analysis (UPGMA) and neighbor joining (NJ) methods. In addition, three composite full-length gene sequences (ciaB, flaC and vacJ) out of seven gene loci examined were reliable for discrimination employing dendrograms constructed by the UPGMA method. In addition, all the dendrograms of the NJ phylogenetic trees constructed based on the nine gene information were not reliable for the discrimination. Three composite full-length gene sequences (ciaB, flaC and vacJ) were reliable for the molecular discrimination between UN C. lari and UPTC organisms employing the UPGMA method, as well as among four thermophilic Campylobacter species. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Literature and patent analysis of the cloning and identification of human functional genes in China.

PubMed

Xia, Yan; Tang, LiSha; Yao, Lei; Wan, Bo; Yang, XianMei; Yu, Long

2012-03-01

The Human Genome Project was launched at the end of the 1980s. Since then, the cloning and identification of functional genes has been a major focus of research across the world. In China too, the potentially profound impact of such studies on the life sciences and on human health was realized, and relevant studies were initiated in the 1990s. To advance China's involvement in the Human Genome Project, in the mid-1990s, Committee of Experts in Biology from National High Technology Research and Development Program of China (863 Program) proposed the "two 1%" goal. This goal envisaged China contributing 1% of the total sequencing work, and cloning and identifying 1% of the total human functional genes. Over the past 20 years, tremendous achievement has been accomplished by Chinese scientists. It is well known that scientists in China finished the 1% of sequencing work of the Human Genome Project, whereas, there is no comprehensive report about "whether China had finished cloning and identifying 1% of human functional genes". In the present study, the GenBank database at the National Center of Biotechnology Information, the PubMed search tool, and the patent database of the State Intellectual Property Office, China, were used to retrieve entries based on two screening standards: (i) Were the newly cloned and identified genes first reported by Chinese scientists? (ii) Were the Chinese scientists awarded the gene sequence patent? Entries were retrieved from the databases up to the cut-off date of 30 June 2011 and the obtained data were analyzed further. The results showed that 589 new human functional genes were first reported by Chinese scientists and 159 gene sequences were patented (http://gene.fudan.sh.cn/introduction/database/chinagene/chinagene.html). This study systematically summarizes China's contributions to human functional genomics research and answers the question "has China finished cloning and identifying 1% of human functional genes?" in the affirmative.

Whole-exome sequencing identified a variant in EFTUD2 gene in establishing a genetic diagnosis.

PubMed

Rengasamy Venugopalan, S; Farrow, E G; Lypka, M

2017-06-01

Craniofacial anomalies are complex and have an overlapping phenotype. Mandibulofacial Dysostosis and Oculo-Auriculo-Vertebral Spectrum are conditions that share common craniofacial phenotype and present a challenge in arriving at a diagnosis. In this report, we present a case of female proband who was given a differential diagnosis of Treacher Collins syndrome or Hemifacial Microsomia without certainty. Prior genetic testing reported negative for 22q deletion and FGFR screenings. The objective of this study was to demonstrate the critical role of whole-exome sequencing in establishing a genetic diagnosis of the proband. The participants were 14½-year-old affected female proband/parent trio. Proband/parent trio were enrolled in the study. Surgical tissue sample from the proband and parental blood samples were collected and prepared for whole-exome sequencing. Illumina HiSeq 2500 instrument was used for sequencing (125 nucleotide reads/84X coverage). Analyses of variants were performed using custom-developed software, RUNES and VIKING. Variant analyses following whole-exome sequencing identified a heterozygous de novo pathogenic variant, c.259C>T (p.Gln87*), in EFTUD2 (NM_004247.3) gene in the proband. Previous studies have reported that the variants in EFTUD2 gene were associated with Mandibulofacial Dysostosis with Microcephaly. Patients with facial asymmetry, micrognathia, choanal atresia and microcephaly should be analyzed for variants in EFTUD2 gene. Next-generation sequencing techniques, such as whole-exome sequencing offer great promise to improve the understanding of etiologies of sporadic genetic diseases. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Genome sequence and comparative analysis of a putative entomopathogenic Serratia isolated from Caenorhabditis briggsae.

PubMed

Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley

2015-07-18

Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are enriched in putative functions that are biologically relevant to an entomopathogenic lifestyle, including non-ribosomal peptide synthetases, bacteriocins, fimbrial biogenesis, ushering proteins, toxins, secondary metabolite secretion and multiple drug resistance/efflux systems. By revealing the early stages of adaptation to this lifestyle, the Serratia sp. SCBI genome underscores the fact that in EPN formation the composite end result - killing, bioconversion, cadaver protection and recolonization- can be achieved by dissimilar mechanisms. This genome sequence will enable further study of the evolution of entomopathogenic nematode-bacteria complexes.
The mitochondrial genomes of the human hookworms, Ancylostoma duodenale and Necator americanus (Nematoda: Secernentea).

PubMed

Hu, Min; Chilton, Neil B; Gasser, Robin B

2002-02-01

The complete mitochondrial genome sequences were determined for two species of human hookworms, Ancylostoma duodenale (13,721 bp) and Necator americanus (13,604 bp). The circular hookworm genomes are amongst the smallest reported to date for any metazoan organism. Their relatively small size relates mainly to a reduced length in the AT-rich region. Both hookworm genomes encode 12 protein, two ribosomal RNA and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with three other species of Secernentea studied to date. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. For both hookworm species, genes were arranged in the same order as for Caenorhabditis elegans, except for the presence of a non-coding region between genes nad3 and nad5. In A. duodenale, this non-coding region is predicted to form a stem-and-loop structure which is not present in N. americanus. The mitochondrial genome structure for both hookworms differs from Ascaris suum only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus, including four gene or gene-block translocations and the positions of some transfer RNA genes and the AT-rich region. Based on genome organisation and amino acid sequence identity, A. duodenale and N. americanus were more closely related to C. elegans than to A. suum or O. volvulus (all secernentean nematodes), consistent with a previous phylogenetic study using ribosomal DNA sequence data. Determination of the complete mitochondrial genome sequences for two human hookworms (the first members of the order Strongylida ever sequenced) provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

PubMed Central

2010-01-01

Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Gene Composer: database software for protein construct design, codon engineering, and gene synthesis

PubMed Central

Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

2009-01-01

Background To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. Results An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. Conclusion We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies. PMID:19383142
Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

PubMed

Lorimer, Don; Raymond, Amy; Walchli, John; Mixon, Mark; Barrow, Adrienne; Wallace, Ellen; Grice, Rena; Burgin, Alex; Stewart, Lance

2009-04-21

To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering. An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies. We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies.
Mitochondrial genome of the African lion Panthera leo leo.

PubMed

Ma, Yue-ping; Wang, Shuo

2015-01-01

In this study, the complete mitochondrial genome sequence of the African lion P. leo leo was reported. The total length of the mitogenome was 17,054 bp. It contained the typical mitochondrial structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region; 21 of the tRNA genes folded into typical cloverleaf secondary structure except for tRNASe. The overall composition of the mitogenome was A (32.0%), G (14.5%), C (26.5%) and T (27.0%). The new sequence will provide molecular genetic information for conservation genetics study of this important large carnivore.
Limnobacter spp. as newly detected phenol-degraders among Baltic Sea surface water bacteria characterised by comparative analysis of catabolic genes.

PubMed

Vedler, Eve; Heinaru, Eeva; Jutkina, Jekaterina; Viggor, Signe; Koressaar, Triinu; Remm, Maido; Heinaru, Ain

2013-12-01

A set of phenol-degrading strains of a collection of bacteria isolated from Baltic Sea surface water was screened for the presence of two key catabolic genes coding for phenol hydroxylases and catechol 2,3-dioxygenases. The multicomponent phenol hydroxylase (LmPH) gene was detected in 70 out of 92 strains studied, and 41 strains among these LmPH(+) phenol-degraders were found to exhibit catechol 2,3-dioxygenase (C23O) activity. Comparative phylogenetic analyses of LmPH and C23O sequences from 56 representative strains were performed. The studied strains were mostly affiliated to the genera Pseudomonas and Acinetobacter. However, the study also widened the range of phenol-degraders by including the genus Limnobacter. Furthermore, using a next generation sequencing approach, the LmPH genes of Limnobacter strains were found to be the most prevalent ones in the microbial community of the Baltic Sea surface water. Four different Limnobacter strains having almost identical 16S rRNA gene sequences (99%) and similar physiological properties formed separate phylogenetic clusters of LmPH and C23O genes in the respective phylogenetic trees. Copyright © 2013 Elsevier GmbH. All rights reserved.
Characterization and mapping of the human rhodopsin kinase gene and screening of the gene for mutations in patients with retinitis pigmentosa

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khani, S.C.; Lin, D.; Magovcevic, I.

1994-09-01

Rhodopsin kinase (RK) is a cytosolic enzyme in rod photoreceptors that initiates the deactivation of the phototransductions cascade by phosphorylating photoactivated rhodopsin. Although the cDNA sequence of bovine RK has been determined previously, no human cDNA or genomic sequence has thus far been available for genetic studies. In order to investigate the possible role of this candidate gene in retinitis pigmentosa (RP) and allied diseases, we have isolated and characterized human cDNA and genomic clones derived from the RK locus. The coding sequence of the human gene is 1692 nucleotides in length and is split into seven exons. The humanmore » and the bovine sequence show 84% identity at the nucleotide level and 92% identity at the amino acid level. Thus far, the intronic sequences flanking each exon except for one have been determined. We have also mapped the human RK gene to chromosome 13q34 using fluorescence in situ hybridization. To our knowledge, no RP gene has as yet been linked to this region. However, since the substrate for RK (rhodopsin) and other members of the phototransduction cascade have been implicated in the pathogenesis of RP, it is conceivable that defects in RK can also cause some forms of this disease. We are evaluating this possibility by screening DNA from 173 patients with autosomal recessive RP and 190 patients with autosomal dominant RP. So far, we have found 11 patients with variant bands. In one patient with autosomal dominant RP we discovered the missense change Ser536Leu. Cosegregation studies and further sequencing of the variant bands are currently underway.« less
Whole-Exome Sequencing to Identify Novel Biological Pathways Associated With Infertility After Pelvic Inflammatory Disease.

PubMed

Taylor, Brandie D; Zheng, Xiaojing; Darville, Toni; Zhong, Wujuan; Konganti, Kranti; Abiodun-Ojo, Olayinka; Ness, Roberta B; O'Connell, Catherine M; Haggerty, Catherine L

2017-01-01

Ideal management of sexually transmitted infections (STI) may require risk markers for pathology or vaccine development. Previously, we identified common genetic variants associated with chlamydial pelvic inflammatory disease (PID) and reduced fecundity. As this explains only a proportion of the long-term morbidity risk, we used whole-exome sequencing to identify biological pathways that may be associated with STI-related infertility. We obtained stored DNA from 43 non-Hispanic black women with PID from the PID Evaluation and Clinical Health Study. Infertility was assessed at a mean of 84 months. Principal component analysis revealed no population stratification. Potential covariates did not significantly differ between groups. Sequencing kernel association test was used to examine associations between aggregates of variants on a single gene and infertility. The results from the sequencing kernel association test were used to choose "focus genes" (P < 0.01; n = 150) for subsequent Ingenuity Pathway Analysis to identify "gene sets" that are enriched in biologically relevant pathways. Pathway analysis revealed that focus genes were enriched in canonical pathways including, IL-1 signaling, P2Y purinergic receptor signaling, and bone morphogenic protein signaling. Focus genes were enriched in pathways that impact innate and adaptive immunity, protein kinase A activity, cellular growth, and DNA repair. These may alter host resistance or immunopathology after infection. Targeted sequencing of biological pathways identified in this study may provide insight into STI-related infertility.
Sequence determination and analysis of the NSs genes of two tospoviruses.

PubMed

Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

2012-03-01

The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.
Spindle Epithelial Tumor with Thymus-Like Differentiation (SETTLE): A Next-Generation Sequencing Study.

PubMed

Stevens, Todd M; Morlote, Diana; Swensen, Jeff; Ellis, Michelle; Harada, Shuko; Spencer, Sharon; Prieto-Granada, Carlos N; Folpe, Andrew L; Gatalica, Zoran

2018-05-07

Spindle epithelial tumor with thymus-like differentiation (SETTLE) is a malignant biphasic neoplasm of the thyroid or neck with propensity for late metastasis. Unlike synovial sarcoma, its main morphologic mimic, SETTLE lacks synovial sarcoma-associated translocations. A single case of SETTLE has shown a KRAS mutation but to date no comprehensive next generation sequencing studies of this rare neoplasm have been undertaken. Herein, we subjected 5 well defined cases of SETTLE to direct sequence analysis of 592 genes and fusion gene analysis of 52 genes frequently rearranged in human cancers. We identified one case with two pathogenic variants in the KMT2D gene, one being in an intron splice site (c.674-1A>G) and the other being a frameshift variant (p.M2829fs). This same case also had a pathogenic nonsense variant in the KMT2C gene (p.R1237*). A second case of SETTLE carried a pathogenic NRAS missense variant, Q61R. No other molecular alterations, microsatellite instability, gene fusions or amplifications were identified.
MicroRNAs form triplexes with double stranded DNA at sequence-specific binding sites; a eukaryotic mechanism via which microRNAs could directly alter gene expression

DOE PAGES

Paugh, Steven W.; Coss, David R.; Bao, Ju; ...

2016-02-04

MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA). Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence that microRNAs form triple-helical structures with duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show thatmore » several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 x 10 -16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. As a result, this work has thus revealed a new mechanism by which microRNAs can interact with gene promoter regions to modify gene transcription.« less
Molecular Genetics of the Usher Syndrome in Lebanon: Identification of 11 Novel Protein Truncating Mutations by Whole Exome Sequencing

PubMed Central

Reddy, Ramesh; Fahiminiya, Somayyeh; El Zir, Elie; Mansour, Ahmad; Megarbane, Andre; Majewski, Jacek; Slim, Rima

2014-01-01

Background Usher syndrome (USH) is a genetically heterogeneous condition with ten disease-causing genes. The spectrum of genes and mutations causing USH in the Lebanese and Middle Eastern populations has not been described. Consequently, diagnostic approaches designed to screen for previously reported mutations were unlikely to identify the mutations in 11 unrelated families, eight of Lebanese and three of Middle Eastern origins. In addition, six of the ten USH genes consist of more than 20 exons, each, which made mutational analysis by Sanger sequencing of PCR-amplified exons from genomic DNA tedious and costly. The study was aimed at the identification of USH causing genes and mutations in 11 unrelated families with USH type I or II. Methods Whole exome sequencing followed by expanded familial validation by Sanger sequencing. Results We identified disease-causing mutations in all the analyzed patients in four USH genes, MYO7A, USH2A, GPR98 and CDH23. Eleven of the mutations were novel and protein truncating, including a complex rearrangement in GPR98. Conclusion Our data highlight the genetic diversity of Usher syndrome in the Lebanese population and the time and cost-effectiveness of whole exome sequencing approach for mutation analysis of genetically heterogeneous conditions caused by large genes. PMID:25211151
Molecular genetics of the Usher syndrome in Lebanon: identification of 11 novel protein truncating mutations by whole exome sequencing.

PubMed

Reddy, Ramesh; Fahiminiya, Somayyeh; El Zir, Elie; Mansour, Ahmad; Megarbane, Andre; Majewski, Jacek; Slim, Rima

2014-01-01

Usher syndrome (USH) is a genetically heterogeneous condition with ten disease-causing genes. The spectrum of genes and mutations causing USH in the Lebanese and Middle Eastern populations has not been described. Consequently, diagnostic approaches designed to screen for previously reported mutations were unlikely to identify the mutations in 11 unrelated families, eight of Lebanese and three of Middle Eastern origins. In addition, six of the ten USH genes consist of more than 20 exons, each, which made mutational analysis by Sanger sequencing of PCR-amplified exons from genomic DNA tedious and costly. The study was aimed at the identification of USH causing genes and mutations in 11 unrelated families with USH type I or II. Whole exome sequencing followed by expanded familial validation by Sanger sequencing. We identified disease-causing mutations in all the analyzed patients in four USH genes, MYO7A, USH2A, GPR98 and CDH23. Eleven of the mutations were novel and protein truncating, including a complex rearrangement in GPR98. Our data highlight the genetic diversity of Usher syndrome in the Lebanese population and the time and cost-effectiveness of whole exome sequencing approach for mutation analysis of genetically heterogeneous conditions caused by large genes.
Draft genome of the red harvester ant Pogonomyrmex barbatus.

PubMed

Smith, Chris R; Smith, Christopher D; Robertson, Hugh M; Helmkampf, Martin; Zimin, Aleksey; Yandell, Mark; Holt, Carson; Hu, Hao; Abouheif, Ehab; Benton, Richard; Cash, Elizabeth; Croset, Vincent; Currie, Cameron R; Elhaik, Eran; Elsik, Christine G; Favé, Marie-Julie; Fernandes, Vilaiwan; Gibson, Joshua D; Graur, Dan; Gronenberg, Wulfila; Grubbs, Kirk J; Hagen, Darren E; Viniegra, Ana Sofia Ibarraran; Johnson, Brian R; Johnson, Reed M; Khila, Abderrahman; Kim, Jay W; Mathis, Kaitlyn A; Munoz-Torres, Monica C; Murphy, Marguerite C; Mustard, Julie A; Nakamura, Rin; Niehuis, Oliver; Nigam, Surabhi; Overson, Rick P; Placek, Jennifer E; Rajakumar, Rajendhran; Reese, Justin T; Suen, Garret; Tao, Shu; Torres, Candice W; Tsutsui, Neil D; Viljakainen, Lumi; Wolschin, Florian; Gadau, Jürgen

2011-04-05

We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
MicroRNAs form triplexes with double stranded DNA at sequence-specific binding sites; a eukaryotic mechanism via which microRNAs could directly alter gene expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paugh, Steven W.; Coss, David R.; Bao, Ju

MicroRNAs are important regulators of gene expression, acting primarily by binding to sequence-specific locations on already transcribed messenger RNAs (mRNA). Recent studies indicate that microRNAs may also play a role in up-regulating mRNA transcription levels, although a definitive mechanism has not been established. Double-helical DNA is capable of forming triple-helical structures through Hoogsteen and reverse Hoogsteen interactions in the major groove of the duplex, and we show physical evidence that microRNAs form triple-helical structures with duplex DNA, and identify microRNA sequences that favor triplex formation. We developed an algorithm (Trident) to search genome-wide for potential triplex-forming sites and show thatmore » several mammalian and non-mammalian genomes are enriched for strong microRNA triplex binding sites. We show that those genes containing sequences favoring microRNA triplex formation are markedly enriched (3.3 fold, p<2.2 x 10 -16) for genes whose expression is positively correlated with expression of microRNAs targeting triplex binding sequences. As a result, this work has thus revealed a new mechanism by which microRNAs can interact with gene promoter regions to modify gene transcription.« less
A comparison of complete mitochondrial genomes of silver carp hypophthalmichthys molitrix and bighead carp hypophthalmichthys nobilis: Implications for their taxonomic relationship and phylogeny

USGS Publications Warehouse

Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chen, Q.; Chapman, D.C.; Lu, G.

2009-01-01

Based upon morphological characters, Silver carp Hypophthalmichthys molitrix and bighead carp Hypophthalmichthys nobilis (or Aristichthys nobilis) have been classified into either the same genus or two distinct genera. Consequently, the taxonomic relationship of the two species at the generic level remains equivocal. This issue is addressed by sequencing complete mitochondrial genomes of H. molitrix and H. nobilis, comparing their mitogenome organization, structure and sequence similarity, and conducting a comprehensive phylogenetic analysis of cyprinid species. As with other cyprinid fishes, the mitogenomes of the two species were structurally conserved, containing 37 genes including 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA (tRNAs) genes and a putative control region (D-loop). Sequence similarity between the two mitogenomes varied in different genes or regions, being highest in the tRNA genes (98??8%), lowest in the control region (89??4%) and intermediate in the protein-coding genes (94??2%). Analyses of the sequence comparison and phylogeny using concatenated protein sequences support the view that the two species belong to the genus Hypophthalmichthys. Further studies using nuclear markers and involving more closely related species, and the systematic combination of traditional biology and molecular biology are needed in order to confirm this conclusion. ?? 2009 The Fisheries Society of the British Isles.
Environmental genomics of "Haloquadratum walsbyi" in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species

PubMed Central

Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane

2006-01-01

Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057
The complete mitochondrial genome sequence of the Datong yak (Bos grunniens).

PubMed

Wu, Xiaoyun; Chu, Min; Liang, Chunnian; Ding, Xuezhi; Guo, Xian; Bao, Pengjia; Yan, Ping

2016-01-01

Datong yak is a famous artificially cultivated breed in China. In the present work, we report the complete mitochondrial genome sequence of Datong yak for the first time. The total length of the mitogenome is 16,323 bp long, containing 13 protein-coding genes, 22 tRNA genes, two rRNA genes and one non-coding region (D-loop region). The gene order of Datong yak mitogenome is identical to that observed in most other vertebrates. The overall base composition is 33.71% A, 25.8.0% C, 13.21% G and 27.27% T, with an A + T content of 60.98%. The complete mitogenome sequence information of Datong yak can provide useful data for further studies on molecular breeding and taxonomic status.

Characterization of the complete mitochondrial genome sequence of Gannan yak (Bos grunniens).

PubMed

Wu, Xiaoyun; Ding, Xuezhi; Chu, Min; Guo, Xian; Bao, Pengjia; Liang, Chunnian; Yan, Ping

2016-01-01

Gannan yak is the native breed of Gansu province in China. In this work, the complete mitochondrial genome sequence of Gannan yak was determined for the first time. The total length of the mitogenome is 16,322 bp long, with the base composition of 33.74% A, 25.84% T, 13.18% C, and 27.24% G. It contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes and one non-coding region (D-loop region). The gene order of Gannan yak mitogenome is identical to that observed in most other vertebrates. The complete mitogenome sequence information of Gannan yak can provide useful data for further studies on protection of genetic resources and phylogenetic relationships within Bos grunniens.
The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum).

PubMed

Zeng, Fan-chun; Gao, Cheng-wen; Gao, Li-zhi

2016-01-01

The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum) is reported and characterized in this study. The genome size is 156,612 bp, containing a pair of inverted repeats (IRs) of 25,776 bp separated by a large single-copy region of 87,213 bp and a small single-copy region of 17,851 bp. The chloroplast genome harbors 130 known genes, including 89 protein-coding genes, 8 ribosomal RNA genes, and 37 tRNA genes. A total of 18 of these genes are duplicated in the inverted repeat regions, 16 genes contain 1 intron, and 2 genes and one ycf have 2 introns.
LINE-1 retrotransposons: from 'parasite' sequences to functional elements.

PubMed

Paço, Ana; Adega, Filomena; Chaves, Raquel

2015-02-01

Long interspersed nuclear elements-1 (LINE-1) are the most abundant and active retrotransposons in the mammalian genomes. Traditionally, the occurrence of LINE-1 sequences in the genome of mammals has been explained by the selfish DNA hypothesis. Nevertheless, recently, it has also been argued that these sequences could play important roles in these genomes, as in the regulation of gene expression, genome modelling and X-chromosome inactivation. The non-random chromosomal distribution is a striking feature of these retroelements that somehow reflects its functionality. In the present study, we have isolated and analysed a fraction of the open reading frame 2 (ORF2) LINE-1 sequence from three rodent species, Cricetus cricetus, Peromyscus eremicus and Praomys tullbergi. Physical mapping of the isolated sequences revealed an interspersed longitudinal AT pattern of distribution along all the chromosomes of the complement in the three genomes. A detailed analysis shows that these sequences are preferentially located in the euchromatic regions, although some signals could be detected in the heterochromatin. In addition, a coincidence between the location of imprinted gene regions (as Xist and Tsix gene regions) and the LINE-1 retroelements was also observed. According to these results, we propose an involvement of LINE-1 sequences in different genomic events as gene imprinting, X-chromosome inactivation and evolution of repetitive sequences located at the heterochromatic regions (e.g. satellite DNA sequences) of the rodents' genomes analysed.
Analysis of expressed sequence tags from a NaHCO(3)-treated alkali-tolerant plant, Chloris virgata.

PubMed

Nishiuchi, Shunsaku; Fujihara, Kazumasa; Liu, Shenkui; Takano, Tetsuo

2010-04-01

Chloris virgata Swartz (C. virgata) is a gramineous wild plant that can survive in saline-alkali areas in northeast China. To examine the tolerance mechanisms of C. virgata, we constructed a cDNA library from whole plants of C. virgata that had been treated with 100 mM NaHCO(3) for 24 h and sequenced 3168 randomly selected clones. Most (2590) of the expressed sequence tags (ESTs) showed significant similarity to sequences in the NCBI database. Of the 2590 genes, 1893 were unique. Gene Ontology (GO) Slim annotations were obtained for 1081 ESTs by BLAST2GO and it was found that 75 genes of them were annotated with GO terms "response to stress", "response to abiotic stimulus", and "response to biotic stimulus", indicating these genes were likely to function in tolerance mechanism of C. virgata. In a separate experiment, 24 genes that are known from previous studies to be associated with abiotic stress tolerance were further examined by real-time RT-PCR to see how their expressions were affected by NaHCO(3) stress. NaHCO(3) treatment up-regulated the expressions of pathogenesis-related gene (DC998527), Win1 precursor gene (DC998617), catalase gene (DC999385), ribosome inactivating protein 1 (DC999555), Na(+)/H(+) antiporter gene (DC998043), and two-component regulator gene (DC998236). Copyright 2010 Elsevier Masson SAS. All rights reserved.
The master regulator PhoP coordinates phosphate and nitrogen metabolism, respiration, cell differentiation and antibiotic biosynthesis: comparison in Streptomyces coelicolor and Streptomyces avermitilis.

PubMed

Martín, Juan F; Rodríguez-García, Antonio; Liras, Paloma

2017-05-01

Phosphate limitation is important for production of antibiotics and other secondary metabolites in Streptomyces. Phosphate control is mediated by the two-component system PhoR-PhoP. Following phosphate depletion, PhoP stimulates expression of genes involved in scavenging, transport and mobilization of phosphate, and represses the utilization of nitrogen sources. PhoP reduces expression of genes for aerobic respiration and activates nitrate respiration genes. PhoP activates genes for teichuronic acid formation and reduces expression of genes for phosphate-rich teichoic acid biosynthesis. In Streptomyces coelicolor, PhoP repressed several differentiation and pleiotropic regulatory genes, which affects development and indirectly antibiotic biosynthesis. A new bioinformatics analysis of the putative PhoP-binding sequences in Streptomyces avermitilis was made. Many sequences in S. avermitilis genome showed high weight values and were classified according to the available genetic information. These genes encode phosphate scavenging proteins, phosphate transporters and nitrogen metabolism genes. Among of the genes highlighted in the new studies was aveR, located in the avermectin gene cluster, encoding a LAL-type regulator, and afsS, which is regulated by PhoP and AfsR. The sequence logo for S. avermitilis PHO boxes is similar to that of S. coelicolor, with differences in the weight value for specific nucleotides in the sequence.
Wide distribution of O157-antigen biosynthesis gene clusters in Escherichia coli.

PubMed

Iguchi, Atsushi; Shirai, Hiroki; Seto, Kazuko; Ooka, Tadasuke; Ogura, Yoshitoshi; Hayashi, Tetsuya; Osawa, Kayo; Osawa, Ro

2011-01-01

Most Escherichia coli O157-serogroup strains are classified as enterohemorrhagic E. coli (EHEC), which is known as an important food-borne pathogen for humans. They usually produce Shiga toxin (Stx) 1 and/or Stx2, and express H7-flagella antigen (or nonmotile). However, O157 strains that do not produce Stxs and express H antigens different from H7 are sometimes isolated from clinical and other sources. Multilocus sequence analysis revealed that these 21 O157:non-H7 strains tested in this study belong to multiple evolutionary lineages different from that of EHEC O157:H7 strains, suggesting a wide distribution of the gene set encoding the O157-antigen biosynthesis in multiple lineages. To gain insight into the gene organization and the sequence similarity of the O157-antigen biosynthesis gene clusters, we conducted genomic comparisons of the chromosomal regions (about 59 kb in each strain) covering the O-antigen gene cluster and its flanking regions between six O157:H7/non-H7 strains. Gene organization of the O157-antigen gene cluster was identical among O157:H7/non-H7 strains, but was divided into two distinct types at the nucleotide sequence level. Interestingly, distribution of the two types did not clearly follow the evolutionary lineages of the strains, suggesting that horizontal gene transfer of both types of O157-antigen gene clusters has occurred independently among E. coli strains. Additionally, detailed sequence comparison revealed that some positions of the repetitive extragenic palindromic (REP) sequences in the regions flanking the O-antigen gene clusters were coincident with possible recombination points. From these results, we conclude that the horizontal transfer of the O157-antigen gene clusters induced the emergence of multiple O157 lineages within E. coli and speculate that REP sequences may involve one of the driving forces for exchange and evolution of O-antigen loci.
Suppressive subtractive hybridization approach revealed differential expression of hypersensitive response and reactive oxygen species production genes in tea (Camellia sinensis (L.) O. Kuntze) leaves during Pestalotiopsis thea infection.

PubMed

Senthilkumar, Palanisamy; Thirugnanasambantham, Krishnaraj; Mandal, Abul Kalam Azad

2012-12-01

Tea (Camellia sinensis (L.) O. Kuntze) is an economically important plant cultivated for its leaves. Infection of Pestalotiopsis theae in leaves causes gray blight disease and enormous loss to the tea industry. We used suppressive subtractive hybridization (SSH) technique to unravel the differential gene expression pattern during gray blight disease development in tea. Complementary DNA from P. theae-infected and uninfected leaves of disease tolerant cultivar UPASI-10 was used as tester and driver populations respectively. Subtraction efficiency was confirmed by comparing abundance of β-actin gene. A total of 377 and 720 clones with insert size >250 bp from forward and reverse library respectively were sequenced and analyzed. Basic Local Alignment Search Tool analysis revealed 17 sequences in forward SSH library have high degree of similarity with disease and hypersensitive response related genes and 20 sequences with hypothetical proteins while in reverse SSH library, 23 sequences have high degree of similarity with disease and stress response-related genes and 15 sequences with hypothetical proteins. Functional analysis indicated unknown (61 and 59 %) or hypothetical functions (23 and 18 %) for most of the differentially regulated genes in forward and reverse SSH library, respectively, while others have important role in different cellular activities. Majority of the upregulated genes are related to hypersensitive response and reactive oxygen species production. Based on these expressed sequence tag data, putative role of differentially expressed genes were discussed in relation to disease. We also demonstrated the efficiency of SSH as a tool in enriching gray blight disease related up- and downregulated genes in tea. The present study revealed that many genes related to disease resistance were suppressed during P. theae infection and enhancing these genes by the application of inducers may impart better disease tolerance to the plants.
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds.

PubMed

Dean, Rebecca; Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Mank, Judith E

2015-10-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
The complete mitochondrial genome of black-footed ferret, Mustela nigripes (Mustela, Mustelinae).

PubMed

Zhao, Ren-Bin; Zhou, Chao-Yang; Lu, Zhi-Xiang; Hu, Peng; Liu, Jian-Qiong; Tan, Wei-Wei; Yang, Tong-Hua

2016-05-01

In this study, the complete mitochondrial genome sequence of black-footed ferret, Mustela nigripes, is determined for the first time. This mitogenome is 16,556 bp in length and contains 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 control region (D-loop). The overall base composition is A (32.9%), C (26.1%), G (13.8%), and T (27.2%), so the percentage of A and T (60.1%) is higher than that of G and C. Most of the genes are encoded on H-strand, except for the ND6 subunit gene and six tRNA genes. The complete mitochondrial genome sequence reported here would be useful for further phylogenetic analysis and conservation genetic studies in M. nigripes.
Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: a problem of ancient DNA and molecular phylogenies.

PubMed

van der Kuyl, A C; Kuiken, C L; Dekker, J T; Perizonius, W R; Goudsmit, J

1995-06-01

Monkey mummy bones and teeth originating from the North Saqqara Baboon Galleries (Egypt), soft tissue from a mummified baboon in a museum collection, and nineteenth/twentieth-century skin fragments from mangabeys were used for DNA extraction and PCR amplification of part of the mitochondrial 12S rRNA gene. Sequences aligning with the 12S rRNA gene were recovered but were only distantly related to contemporary monkey mitochondrial 12S rRNA sequences. However, many of these sequences were identical or closely related to human nuclear DNA sequences resembling mitochondrial 12S rRNA (isolated from a cell line depleted in mitochondria) and therefore have to be considered contamination. Subsequently in a separate study we were able to recover genuine mitochondrial 12S rRNA sequences from many extant species of nonhuman Old World primates and sequences closely resembling the human nuclear integrations. Analysis of all sequences by the neighbor-joining (NJ) method indicated that mitochondrial DNA sequences and their nuclear counterparts can be divided into two distinct clusters. One cluster contained all temporary cytoplasmic mitochondrial DNA sequences and approximately half of the monkey nuclear mitochondriallike sequences. A second cluster contained most human nuclear sequences and the other half of monkey nuclear sequences with a separate branch leading to human and gorilla mitochondrial and nuclear sequences. Sequences recovered from ancient materials were equally divided between the two clusters. These results constitute a warning for when working with ancient DNA or performing phylogenetic analysis using mitochondrial DNA as a target sequence: Nuclear counterparts of mitochondrial genes may lead to faulty interpretation of results.
Mutation Analysis in Classical Phenylketonuria Patients Followed by Detecting Haplotypes Linked to Some PAH Mutations.

PubMed

Dehghanian, Fatemeh; Silawi, Mohammad; Tabei, Seyed M B

2017-02-01

Deficiency of phenylalanine hydroxylase (PAH) enzyme and elevation of phenylalanine in body fluids cause phenylketonuria (PKU). The gold standard for confirming PKU and PAH deficiency is detecting causal mutations by direct sequencing of the coding exons and splicing involved sequences of the PAH gene. Furthermore, haplotype analysis could be considered as an auxiliary approach for detecting PKU causative mutations before direct sequencing of the PAH gene by making comparisons between prior detected mutation linked-haplotypes and new PKU case haplotypes with undetermined mutations. In this study, 13 unrelated classical PKU patients took part in the study detecting causative mutations. Mutations were identified by polymerase chain reaction (PCR) and direct sequencing in all patients. After that, haplotype analysis was performed by studying VNTR and PAHSTR markers (linked genetic markers of the PAH gene) through application of PCR and capillary electrophoresis (CE). Mutation analysis was performed successfully and the detected mutations were as follows: c.782G>A, c.754C>T, c.842C>G, c.113-115delTCT, c.688G>A, and c.696A>G. Additionally, PAHSTR/VNTR haplotypes were detected to discover haplotypes linked to each mutation. Mutation detection is the best approach for confirming PAH enzyme deficiency in PKU patients. Due to the relatively large size of the PAH gene and high cost of the direct sequencing in developing countries, haplotype analysis could be used before DNA sequencing and mutation detection for a faster and cheaper way via identifying probable mutated exons.
Quantitative analysis of a deeply sequenced marine microbial metatranscriptome.

PubMed

Gifford, Scott M; Sharma, Shalabh; Rinta-Kanto, Johanna M; Moran, Mary Ann

2011-03-01

The potential of metatranscriptomic sequencing to provide insights into the environmental factors that regulate microbial activities depends on how fully the sequence libraries capture community expression (that is, sample-sequencing depth and coverage depth), and the sensitivity with which expression differences between communities can be detected (that is, statistical power for hypothesis testing). In this study, we use an internal standard approach to make absolute (per liter) estimates of transcript numbers, a significant advantage over proportional estimates that can be biased by expression changes in unrelated genes. Coastal waters of the southeastern United States contain 1 × 10(12) bacterioplankton mRNA molecules per liter of seawater (~200 mRNA molecules per bacterial cell). Even for the large bacterioplankton libraries obtained in this study (~500,000 possible protein-encoding sequences in each of two libraries after discarding rRNAs and small RNAs from >1 million 454 FLX pyrosequencing reads), sample-sequencing depth was only 0.00001%. Expression levels of 82 genes diagnostic for transformations in the marine nitrogen, phosphorus and sulfur cycles ranged from below detection (<1 × 10(6) transcripts per liter) for 36 genes (for example, phosphonate metabolism gene phnH, dissimilatory nitrate reductase subunit napA) to >2.7 × 10(9) transcripts per liter (ammonia transporter amt and ammonia monooxygenase subunit amoC). Half of the categories for which expression was detected, however, had too few copy numbers for robust statistical resolution, as would be required for comparative (experimental or time-series) expression studies. By representing whole community gene abundance and expression in absolute units (per volume or mass of environment), 'omics' data can be better leveraged to improve understanding of microbially mediated processes in the ocean.
Gene discovery in an invasive tephritid model pest species, the Mediterranean fruit fly, Ceratitis capitata

PubMed Central

Gomulski, Ludvik M; Dimopoulos, George; Xi, Zhiyong; Soares, Marcelo B; Bonaldo, Maria F; Malacrida, Anna R; Gasperi, Giuliano

2008-01-01

Background The medfly, Ceratitis capitata, is a highly invasive agricultural pest that has become a model insect for the development of biological control programs. Despite research into the behavior and classical and population genetics of this organism, the quantity of sequence data available is limited. We have utilized an expressed sequence tag (EST) approach to obtain detailed information on transcriptome signatures that relate to a variety of physiological systems in the medfly; this information emphasizes on reproduction, sex determination, and chemosensory perception, since the study was based on normalized cDNA libraries from embryos and adult heads. Results A total of 21,253 high-quality ESTs were obtained from the embryo and head libraries. Clustering analyses performed separately for each library resulted in 5201 embryo and 6684 head transcripts. Considering an estimated 19% overlap in the transcriptomes of the two libraries, they represent about 9614 unique transcripts involved in a wide range of biological processes and molecular functions. Of particular interest are the sequences that share homology with Drosophila genes involved in sex determination, olfaction, and reproductive behavior. The medfly transformer2 (tra2) homolog was identified among the embryonic sequences, and its genomic organization and expression were characterized. Conclusion The sequences obtained in this study represent the first major dataset of expressed genes in a tephritid species of agricultural importance. This resource provides essential information to support the investigation of numerous questions regarding the biology of the medfly and other related species and also constitutes an invaluable tool for the annotation of complete genome sequences. Our study has revealed intriguing findings regarding the transcript regulation of tra2 and other sex determination genes, as well as insights into the comparative genomics of genes implicated in chemosensory reception and reproduction. PMID:18500975
Targeted next generation sequencing of the entire vitamin D receptor gene reveals polymorphisms correlated with vitamin D deficiency among older Filipino women with and without fragility fracture.

PubMed

Zumaraga, Mark Pretzel; Medina, Paul Julius; Recto, Juan Miguel; Abrahan, Lauro; Azurin, Edelyn; Tanchoco, Celeste C; Jimeno, Cecilia A; Palmes-Saloma, Cynthia

2017-03-01

This study aimed to discover genetic variants in the entire 101 kB vitamin D receptor (VDR) gene for vitamin D deficiency in a group of postmenopausal Filipino women using targeted next generation sequencing (TNGS) approach in a case-control study design. A total of 50 women with and without osteoporotic fracture seen at the Philippine Orthopedic Center were included. Blood samples were collected for determination of serum vitamin D, calcium, phosphorus, glucose, blood urea nitrogen, creatinine, aspartate aminotransferase, alanine aminotransferase and as primary source for targeted VDR gene sequencing using the Ion Torrent Personal Genome Machine. The variant calling was based on the GATK best practice workflow and annotated using Annovar tool. A total of 1496 unique variants in the whole 101-kb VDR gene were identified. Novel sequence variations not registered in the dbSNP database were found among cases and controls at a rate of 23.1% and 16.6% of total discovered variants, respectively. One disease-associated enhancer showed statistically significant association to low serum 25-hydroxy vitamin D levels (Pearson chi-square P-value=0.009). The transcription factor binding site prediction program PROMO predicted the disruption of three transcription factor binding sites in this enhancer region. These findings show the power of TNGS in identifying sequence variations in a very large gene and the surprising results obtained in this study greatly expand the catalog of known VDR sequence variants that may represent an important clue in the emergence of vitamin D deficiency. Such information will also provide the additional guidance necessary toward a personalized nutritional advice to reach sufficient vitamin D status. Copyright © 2016 Elsevier Inc. All rights reserved.
Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

PubMed

Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

2015-03-27

Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. Copyright © 2015, American Association for the Advancement of Science.
Gene Structures, Evolution and Transcriptional Profiling of the WRKY Gene Family in Castor Bean (Ricinus communis L.).

PubMed

Zou, Zhi; Yang, Lifu; Wang, Danhua; Huang, Qixing; Mo, Yeyong; Xie, Guishui

2016-01-01

WRKY proteins comprise one of the largest transcription factor families in plants and form key regulators of many plant processes. This study presents the characterization of 58 WRKY genes from the castor bean (Ricinus communis L., Euphorbiaceae) genome. Compared with the automatic genome annotation, one more WRKY-encoding locus was identified and 20 out of the 57 predicted gene models were manually corrected. All RcWRKY genes were shown to contain at least one intron in their coding sequences. According to the structural features of the present WRKY domains, the identified RcWRKY genes were assigned to three previously defined groups (I-III). Although castor bean underwent no recent whole-genome duplication event like physic nut (Jatropha curcas L., Euphorbiaceae), comparative genomics analysis indicated that one gene loss, one intron loss and one recent proximal duplication occurred in the RcWRKY gene family. The expression of all 58 RcWRKY genes was supported by ESTs and/or RNA sequencing reads derived from roots, leaves, flowers, seeds and endosperms. Further global expression profiles with RNA sequencing data revealed diverse expression patterns among various tissues. Results obtained from this study not only provide valuable information for future functional analysis and utilization of the castor bean WRKY genes, but also provide a useful reference to investigate the gene family expansion and evolution in Euphorbiaceus plants.
HuMiChip: Development of a Functional Gene Array for the Study of Human Microbiomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tu, Q.; Deng, Ye; Lin, Lu

Microbiomes play very important roles in terms of nutrition, health and disease by interacting with their hosts. Based on sequence data currently available in public domains, we have developed a functional gene array to monitor both organismal and functional gene profiles of normal microbiota in human and mouse hosts, and such an array is called human and mouse microbiota array, HMM-Chip. First, seed sequences were identified from KEGG databases, and used to construct a seed database (seedDB) containing 136 gene families in 19 metabolic pathways closely related to human and mouse microbiomes. Second, a mother database (motherDB) was constructed withmore » 81 genomes of bacterial strains with 54 from gut and 27 from oral environments, and 16 metagenomes, and used for selection of genes and probe design. Gene prediction was performed by Glimmer3 for bacterial genomes, and by the Metagene program for metagenomes. In total, 228,240 and 801,599 genes were identified for bacterial genomes and metagenomes, respectively. Then the motherDB was searched against the seedDB using the HMMer program, and gene sequences in the motherDB that were highly homologous with seed sequences in the seedDB were used for probe design by the CommOligo software. Different degrees of specific probes, including gene-specific, inclusive and exclusive group-specific probes were selected. All candidate probes were checked against the motherDB and NCBI databases for specificity. Finally, 7,763 probes covering 91.2percent (12,601 out of 13,814) HMMer confirmed sequences from 75 bacterial genomes and 16 metagenomes were selected. This developed HMM-Chip is able to detect the diversity and abundance of functional genes, the gene expression of microbial communities, and potentially, the interactions of microorganisms and their hosts.« less
The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

PubMed

Smith, Adam Alexander Thil; Belda, Eugeni; Viari, Alain; Medigue, Claudine; Vallenet, David

2012-05-01

Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.
Phylogenetic analysis of Rutaceous plants based on single nucleotide polymorphism in chloroplast and nuclear gene sequences

USDA-ARS?s Scientific Manuscript database

The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...
Targeted next-generation sequencing identification of mutations in disease resistance gene anologs (RGAs) in wild and cultivated beets

USDA-ARS?s Scientific Manuscript database

Resistance gene analogs (RGAs) were searched bioinformatically in the sugar beet (Beta vulgaris L.) genome as potential candidates for improving resistance against different diseases. In the present study, Ion Torrent sequencing technology was used to identify mutations in 21 RGAs. The DNA samples o...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.