Science.gov

Sample records for agglutinin-like sequence gene

  1. Candida albicans Agglutinin-Like Sequence (Als) Family Vignettes: A Review of Als Protein Structure and Function

    PubMed Central

    Hoyer, Lois L.; Cota, Ernesto

    2016-01-01

    Approximately two decades have passed since the description of the first gene in the Candida albicans ALS (agglutinin-like sequence) family. Since that time, much has been learned about the composition of the family and the function of its encoded cell-surface glycoproteins. Solution of the structure of the Als adhesive domain provides the opportunity to evaluate the molecular basis for protein function. This review article is formatted as a series of fundamental questions and explores the diversity of the Als proteins, as well as their role in ligand binding, aggregative effects, and attachment to abiotic surfaces. Interaction of Als proteins with each other, their functional equivalence, and the effects of protein abundance on phenotypic conclusions are also examined. Structural features of Als proteins that may facilitate invasive function are considered. Conclusions that are firmly supported by the literature are presented while highlighting areas that require additional investigation to reveal basic features of the Als proteins, their relatedness to each other, and their roles in C. albicans biology. PMID:27014205

  2. Dynamics of Agglutinin-Like Sequence (ALS) Protein Localization on the Surface of Candida Albicans

    ERIC Educational Resources Information Center

    Coleman, David Andrew

    2009-01-01

    The ALS gene family encodes large cell-surface glycoproteins associated with "C. albicans" pathogenesis. Als proteins are thought to act as adhesin molecules binding to host tissues. Wide variation in expression levels among the ALS genes exists and is related to cell morphology and environmental conditions. "ALS1," "ALS3," and "ALS4" are three of…

  3. Nemertean toxin genes revealed through transcriptome sequencing.

    PubMed

    Whelan, Nathan V; Kocot, Kevin M; Santos, Scott R; Halanych, Kenneth M

    2014-11-27

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63-74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins.

  4. Cloning and Sequencing the First HLA Gene

    PubMed Central

    Jordan, Bertrand R.

    2010-01-01

    This Perspectives article recounts the isolation and sequencing of the first human histocompatibility gene (HLA) in 1980–1981. At the time, general knowledge of the molecules of the immune system was already fairly extensive, and gene rearrangements in the immunoglobulin complex (discovered in 1976) had generated much excitement: HLA was quite obviously the next frontier. The author was able to use a homologous murine H-2 cDNA to identify putative human HLA genomic clones in a λ-phage library and thus to isolate and sequence the first human histocompatibility gene. This personal account relates the steps that led to this result, describes the highly competitive international environment, and highlights the role of location, connections, and sheer luck in such an achievement. It also puts this work in perspective with a short description of the current knowledge of histocompatibility genes and, finally, presents some reflections on the meaning of “discovery.” PMID:20457890

  5. Nemertean Toxin Genes Revealed through Transcriptome Sequencing

    PubMed Central

    Whelan, Nathan V.; Kocot, Kevin M.; Santos, Scott R.; Halanych, Kenneth M.

    2014-01-01

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63–74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. PMID:25432940

  6. The first determination of DNA sequence of a specific gene.

    PubMed

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  7. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  8. Bioinformatic Identification of Conserved Cis-Sequences in Coregulated Genes.

    PubMed

    Bülow, Lorenz; Hehl, Reinhard

    2016-01-01

    Bioinformatics tools can be employed to identify conserved cis-sequences in sets of coregulated plant genes because more and more gene expression and genomic sequence data become available. Knowledge on the specific cis-sequences, their enrichment and arrangement within promoters, facilitates the design of functional synthetic plant promoters that are responsive to specific stresses. The present chapter illustrates an example for the bioinformatic identification of conserved Arabidopsis thaliana cis-sequences enriched in drought stress-responsive genes. This workflow can be applied for the identification of cis-sequences in any sets of coregulated genes. The workflow includes detailed protocols to determine sets of coregulated genes, to extract the corresponding promoter sequences, and how to install and run a software package to identify overrepresented motifs. Further bioinformatic analyses that can be performed with the results are discussed. PMID:27557771

  9. Coding sequences of functioning human genes derived entirely from mobile element sequences.

    PubMed

    Britten, Roy J

    2004-11-30

    Among all of the many examples of mobile elements or "parasitic sequences" that affect the function of the human genome, this paper describes several examples of functioning genes whose sequences have been almost completely derived from mobile elements. There are many examples where the synthetic coding sequences of observed mRNA sequences are made up of mobile element sequences, to an extent of 80% or more of the length of the coding sequences. In the examples described here, the genes have named functions, and some of these functions have been studied. It appears that each of the functioning genes was originally formed from mobile elements and that in some process of molecular evolution a coding sequence was derived that could be translated into a protein that is of some importance to human biology. In one case (AD7C), the coding sequence is 99% made up of a cluster of Alu sequences. In another example, the gene BNIP3 coding sequence is 97% made up of sequences from an apparent human endogenous retrovirus. The Syncytin gene coding sequence appears to be made from an endogenous retrovirus envelope gene. PMID:15546984

  10. Sequence determinants of prokaryotic gene expression level under heat stress.

    PubMed

    Xiong, Heng; Yang, Yi; Hu, Xiao-Pan; He, Yi-Ming; Ma, Bin-Guang

    2014-11-01

    Prokaryotic gene expression is environment-dependent and temperature plays an important role in shaping the gene expression profile. Revealing the regulation mechanisms of gene expression pertaining to temperature has attracted tremendous efforts in recent years particularly owning to the yielding of transcriptome and proteome data by high-throughput techniques. However, most of the previous works concentrated on the characterization of the gene expression profile of individual organism and little effort has been made to disclose the commonality among organisms, especially for the gene sequence features. In this report, we collected the transcriptome and proteome data measured under heat stress condition from recently published literature and studied the sequence determinants for the expression level of heat-responsive genes on multiple layers. Our results showed that there indeed exist commonness and consistent patterns of the sequence features among organisms for the differentially expressed genes under heat stress condition. Some features are attributed to the requirement of thermostability while some are dominated by gene function. The revealed sequence determinants of bacterial gene expression level under heat stress complement the knowledge about the regulation factors of prokaryotic gene expression responding to the change of environmental conditions. Furthermore, comparisons to thermophilic adaption have been performed to reveal the similarity and dissimilarity of the sequence determinants for the response to heat stress and for the adaption to high habitat temperature, which elucidates the complex landscape of gene expression related to the same physical factor of temperature.

  11. Nucleotide sequence of SHV-2 beta-lactamase gene

    SciTech Connect

    Garbarg-Chenon, A.; Godard, V.; Labia, R.; Nicolas, J.C. )

    1990-07-01

    The nucleotide sequence of plasmid-mediated beta-lactamase SHV-2 from Salmonella typhimurium (SHV-2pHT1) was determined. The gene was very similar to chromosomally encoded beta-lactamase LEN-1 of Klebsiella pneumoniae. Compared with the sequence of the Escherichia coli SHV-2 enzyme (SHV-2E.coli) obtained by protein sequencing, the deduced amino acid sequence of SHV-2pHT1 differed by three amino acid substitutions.

  12. Phase-defined complete sequencing of the HLA genes by next-generation sequencing

    PubMed Central

    2013-01-01

    Background The human leukocyte antigen (HLA) region, the 3.8-Mb segment of the human genome at 6p21, has been associated with more than 100 different diseases, mostly autoimmune diseases. Due to the complex nature of HLA genes, there are difficulties in elucidating complete HLA gene sequences especially HLA gene haplotype structures by the conventional sequencing method. We propose a novel, accurate, and cost-effective method for generating phase-defined complete sequencing of HLA genes by using indexed multiplex next generation sequencing. Results A total of 33 HLA homozygous samples, 11 HLA heterozygous samples, and 3 parents-child families were subjected to phase-defined HLA gene sequencing. We applied long-range PCR to amplify six HLA genes (HLA-A, -C, -B, DRB1, -DQB1, and –DPB1) followed by transposase-based library construction and multiplex sequencing with the MiSeq sequencer. Paired-end reads (2 × 250 bp) derived from the sequencer were aligned to the six HLA gene segments of UCSC hg19 allowing at most 80 bases mismatch. For HLA homozygous samples, the six amplicons of an individual were pooled and simultaneously sequenced and mapped as an individual-tagging method. The paired-end reads were aligned to corresponding genes of UCSC hg19 and unambiguous, continuous sequences were obtained. For HLA heterozygous samples, each amplicon was separately sequenced and mapped as a gene-tagging method. After alignments, we detected informative paired-end reads harboring SNVs on both forward and reverse reads that are used to separate two chromosomes and to generate two phase-defined sequences in an individual. Consequently, we were able to determine the phase-defined HLA gene sequences from promoter to 3′-UTR and assign up to 8-digit HLA allele numbers, regardless of whether the alleles are rare or novel. Parent–child trio-based sequencing validated our sequencing and phasing methods. Conclusions Our protocol generated phased-defined sequences of the entire

  13. Structure and sequence divergence of two archaebacterial genes.

    PubMed Central

    Cue, D; Beckler, G S; Reeve, J N; Konisky, J

    1985-01-01

    The DNA sequences of a region that includes the hisA gene of two related methanogenic archaebacteria, Methanococcus voltae and Methanococcus vannielii, have been compared. Both organisms show a similar genome organization in this region, displaying three open reading frames (ORFs) separated by regions of very high A + T content. Two of the ORFs, including ORFHisA, show significant DNA sequence homology. As might be expected for organisms having a genome that is A + T-rich, there is a high preference for A and U as the third base in codons. Although the regions upstream of the structural genes contain prokaryotic-like promoter sequences, it is not known whether they are recognized as promoters in these archaebacterial cells. A ribosome binding site, G-G-T-G, is located 6 base pairs preceding the ATG translation initiation sequence of both hisA genes. The sequences upstream of the two hisA genes show only limited sequence homology. The M. voltae intergenic region contains four tandemly arranged repetitions of an 11-base-pair sequence, whereas the M. vannielii sequence contains both direct and inverted repetitive sequences. Based on the degree of hisA sequence homology, we conclude that M. voltae and M. vannielii are less closely related taxonomically than are members of the enteric group of eubacteria. PMID:3923489

  14. Single molecule targeted sequencing for cancer gene mutation detection.

    PubMed

    Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui

    2016-01-01

    With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis. PMID:27193446

  15. Single molecule targeted sequencing for cancer gene mutation detection

    PubMed Central

    Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W.; He, Jiankui

    2016-01-01

    With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis. PMID:27193446

  16. Evolution of gene sequence in response to chromosomal location.

    PubMed

    Díaz-Castillo, Carlos; Golic, Kent G

    2007-09-01

    Evolutionary forces acting on the repetitive DNA of heterochromatin are not constrained by the same considerations that apply to protein-coding genes. Consequently, such sequences are subject to rapid evolutionary change. By examining the Troponin C gene family of Drosophila melanogaster, which has euchromatic and heterochromatic members, we find that protein-coding genes also evolve in response to their chromosomal location. The heterochromatic members of the family show a reduced CG content and increased variation in DNA sequence. We show that the CG reduction applies broadly to the protein-coding sequences of genes located at the heterochromatin:euchromatin interface, with a very strong correlation between CG content and the distance from centric heterochromatin. We also observe a similar trend in the transition from telomeric heterochromatin to euchromatin. We propose that the methylation of DNA is one of the forces driving this sequence evolution.

  17. Patterns of sequence conservation in presynaptic neural genes

    PubMed Central

    Hadley, Dexter; Murphy, Tara; Valladares, Otto; Hannenhalli, Sridhar; Ungar, Lyle; Kim, Junhyong; Bućan, Maja

    2006-01-01

    Background The neuronal synapse is a fundamental functional unit in the central nervous system of animals. Because synaptic function is evolutionarily conserved, we reasoned that functional sequences of genes and related genomic elements known to play important roles in neurotransmitter release would also be conserved. Results Evolutionary rate analysis revealed that presynaptic proteins evolve slowly, although some members of large gene families exhibit accelerated evolutionary rates relative to other family members. Comparative sequence analysis of 46 megabases spanning 150 presynaptic genes identified more than 26,000 elements that are highly conserved in eight vertebrate species, as well as a small subset of sequences (6%) that are shared among unrelated presynaptic genes. Analysis of large gene families revealed that upstream and intronic regions of closely related family members are extremely divergent. We also identified 504 exceptionally long conserved elements (≥360 base pairs, ≥80% pair-wise identity between human and other mammals) in intergenic and intronic regions of presynaptic genes. Many of these elements form a highly stable stem-loop RNA structure and consequently are candidates for novel regulatory elements, whereas some conserved noncoding elements are shown to correlate with specific gene expression profiles. The SynapseDB online database integrates these findings and other functional genomic resources for synaptic genes. Conclusion Highly conserved elements in nonprotein coding regions of 150 presynaptic genes represent sequences that may be involved in the transcriptional or post-transcriptional regulation of these genes. Furthermore, comparative sequence analysis will facilitate selection of genes and noncoding sequences for future functional studies and analysis of variation studies in neurodevelopmental and psychiatric disorders. PMID:17096848

  18. The myxoma virus thymidine kinase gene: sequence and transcriptional mapping.

    PubMed

    Jackson, R J; Bults, H G

    1992-02-01

    The myxoma virus thymidine kinase (TK) gene is encoded on a 1.6 kb SacI-SalI restriction fragment located between 57.7 and 59.3 kb on the 163 kb genomic map. The nucleotide sequence of this fragment as well as 228 bp from the adjacent SalI-AA2 fragment was determined and found to encode four major open reading frames (ORFs). Three of these ORFs are similar in nucleotide sequence to ORFs L5R and J1R, and the TK gene of vaccinia virus (VV). The fourth ORF, MF8a, shows similarity to the ORFs found in the same position relative to the TK genes of Shope fibroma virus, Kenya sheep-1 virus and swine-pox virus. A search of the complete VV nucleotide sequence for regions of similarity to MF8a identified the host specificity gene C7L. Northern blot analysis of early viral RNA identified transcripts of approximately 700 nucleotides for both the TK gene and ORF MF8a. The 5' ends of the TK gene and ORF MF8a early mRNAs were mapped by primer extension to initiation sites 13 nucleotides downstream of sequences with similarity to the VV early promoter consensus. The sizes of the TK and MF8a mRNAs are consistent with transcription termination and polyadenylation occurring downstream of the sequence TTTTTNT, which is identical to the consensus sequence for the VV transcription termination signal.

  19. Flagellin gene sequence variation in the genus Pseudomonas.

    PubMed

    Bellingham, N F; Morgan, J A; Saunders, J R; Winstanley, C

    2001-07-01

    Flagellin gene (fliC) sequences from 18 strains of Pseudomonas sensu stricto representing 8 different species, and 9 representative fliC sequences from other members of the gamma sub-division of proteobacteria, were compared. Analysis was performed on N-terminal, C-terminal and whole fliC sequences. The fliC analyses confirmed the inferred relationship between P. mendocina, P. oleovorans and P. aeruginosa based on 16S rRNA sequence comparisons. In addition, the analyses indicated that P. putida PRS2000 was closely related to P. fluorescens SBW25 and P. fluorescens NCIMB 9046T, but suggested that P. putida PaW8 and P. putida PRS2000 were more closely related to other Pseudomonas spp. than they were to each other. There were a number of inconsistencies in inferred evolutionary relationships between strains, depending on the analysis performed. In particular, whole flagellin gene comparisons often differed from those obtained using N- and C-terminal sequences. However, there were also inconsistencies between the terminal region analyses, suggesting that phylogenetic relationships inferred on the basis of fliC sequence should be treated with caution. Although the central domain of fliC is highly variable between Pseudomonas strains, there was evidence of sequence similarities between the central domains of different Pseudomonas fliC sequences. This indicates the possibility of recombination in the central domain of fliC genes within Pseudomonas species, and between these genes and those from other bacteria. PMID:11518318

  20. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  1. Structure and sequence divergence of two archaebacterial genes

    SciTech Connect

    Cue, D.; Beckler, G.S.; Reeve, J.N.; Konisky, J.

    1985-06-01

    The DNA sequences of a region that includes the hisA gene of two related methanogenic archaebacteria, Methanococcus voltae and Methanococcus vannielii, have been compared. Both organisms show a similar genome organization in this region, displaying three open reading frames (ORFs) separated by regions of very high A+T content. Two of the ORFs, including ORFHisA, show significant DNA sequence homology. As might be expected for organisms having a genome that is A+T-rich, there is a high preference for A and U as the third base in codons. A ribosome binding site, G-G-T-G, is located 6 base pairs preceding the ATG translation initiation sequence of both hisA genes. The sequences upstream of the two hisA genes show only limited sequence homology. The M. voltae intergenic region contains four tandemly arranged repetitions of an 11-base-pair sequence, whereas the M. vannielii sequence contains both direct and inverted repetitive sequences. Based on the degree of hisA sequence homology, the authors conclude that M. voltae and M. vannielii are less closely related taxonomically than are members of the enteric group of eubacteria.

  2. Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data

    PubMed Central

    Gilchrist, Michael J; Christensen, Mikkel B; Harland, Richard; Pollet, Nicolas; Smith, James C; Ueno, Naoto; Papalopulu, Nancy

    2008-01-01

    Background Non-sequence gene data (images, literature, etc.) can be found in many different public databases. Access to these data is mostly by text based methods using gene names; however, gene annotation is neither complete, nor fully systematic between organisms, and is also not generally stable over time. This provides some challenges for text based access, especially for cross-species searches. We propose a method for non-sequence data retrieval based on sequence similarity, which removes dependence on annotation and text searches. This work was motivated by the need to provide better access to large numbers of in situ images, and the observation that such image data were usually associated with a specific gene sequence. Sequence similarity searches are found in existing gene oriented databases, but mostly give indirect access to non-sequence data via navigational links. Results Three applications were built to explore the proposed method: accessing image data, literature and gene names. Searches are initiated with the sequence of the user's gene of interest, which is searched against a database of sequences associated with the target data. The matching (non-sequence) target data are returned directly to the user's browser, organised by sequence similarity. The method worked well for the intended application in image data management. Comparison with text based searches of the image data set showed the accuracy of the method. Applied to literature searches it facilitated retrieval of mostly high relevance references. Applied to gene name data it provided a useful analysis of name variation of related genes within and between species. Conclusion This method makes a powerful and useful addition to existing methods for searching gene data based on text retrieval or curated gene lists. In particular the method facilitates cross-species comparisons, and enables the handling of novel or otherwise un-annotated genes. Applications using the method are quick and easy to

  3. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  4. Mechanism of Gene Amplification via Yeast Autonomously Replicating Sequences

    PubMed Central

    Dhar, M. K.

    2015-01-01

    The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification. PMID:25685838

  5. Honey bee promoter sequences for targeted gene expression.

    PubMed

    Schulte, C; Leboulle, G; Otte, M; Grünewald, B; Gehne, N; Beye, M

    2013-08-01

    The honey bee, Apis mellifera, displays a rich behavioural repertoire, social organization and caste differentiation, and has an interesting mode of sex determination, but we still know little about its underlying genetic programs. We lack stable transgenic tools in honey bees that would allow genetic control of gene activity in stable transgenic lines. As an initial step towards a transgenic method, we identified promoter sequences in the honey bee that can drive constitutive, tissue-specific and cold shock-induced gene expression. We identified the promoter sequences of Am-actin5c, elp2l, Am-hsp83 and Am-hsp70 and showed that, except for the elp2l sequence, the identified sequences were able to drive reporter gene expression in Sf21 cells. We further demonstrated through electroporation experiments that the putative neuron-specific elp2l promoter sequence can direct gene expression in the honey bee brain. The identification of these promoter sequences is an important initial step in studying the function of genes with transgenic experiments in the honey bee, an organism with a rich set of interesting phenotypes. PMID:23668189

  6. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes.

  7. Cloning and sequencing of the gene for human. beta. -casein

    SciTech Connect

    Loennerdal, B.; Bergstroem, S.; Andersson, Y.; Hialmarsson, K.; Sundgyist, A.; Hernell, O. )

    1990-02-26

    Human {beta}-casein is a major protein in human milk. This protein is part of the casein micelle and has been suggested to have several physiological functions in the newborn. Since there is limited information on {beta}casein and the factors that affect its concentration in human milk, the authors have isolated and sequenced the gene for this protein. A human mammary gland cDNA library (Clontech) in gt 11 was screened by plaque hy-hybridization using a 42-mer synthetic {sup 32}p-labelled oligo-nucleotide. Positive clones were identified and isolated, DNA was prepared and the gene isolated by cleavage with EcoR1. Following subcloning (PUC18), restriction mapping and Southern blotting, DNA for sequencing was prepared. The gene was sequenced by the dideoxy method. Human {beta}-casein has 212 amino acids and the amino acid sequence deducted from the nucleotide sequence is to 91% identical to the published sequence for human {beta}-casein show a high degree of conservation at the leader peptide and the highly phosphorylated sequences, but also deletions and divergence at several positions. These results provide insight into the structure of the human {beta}-casein gene and will facilitate studies on factors affecting its expression.

  8. Structure and sequence of the gene encoding human keratocan.

    PubMed

    Tasheva, E S; Funderburgh, J L; Funderburgh, M L; Corpuz, L M; Conrad, G W

    1999-01-01

    Keratocan is one of the three major keratan sulfate proteoglycans characteristically expressed in cornea. We have isolated cDNA and genomic clones and determined the sequence of the entire human keratocan (Kera) gene. The gene is spread over 7.65 kb of DNA and contains three exons. An open reading frame starting at the beginning of the second exon encodes a protein of 352 aa. The amino acid sequence of keratocan shows high identity among mammalian species. This evolutionary conservation between the keratocan proteins as well as the restricted expression of Kera gene in cornea suggests that this molecule might be important in developing and maintaining corneal transparency.

  9. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  10. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  11. Analysis of simple sequence repeats in mammalian cell cycle genes.

    PubMed

    Trivedi, Seema; Wills, Christopher; Metzgar, David

    2014-01-01

    Simple sequence repeats (SSRs), or microsatellites are hyper-mutable and can lead to disorders. Here we explore SSR distribution in cell cycle-associated genes [grouped into: checkpoint; regulation; replication, repair, and recombination (RRR); and transition] in humans and orthologues of eight mammals. Among the gene groups studied, transition genes have the highest SSR density. Trinucleotide repeats are not abundant and introns have higher repeat density than exons. Many repeats in human genes are conserved; however, CG motifs are conserved only in regulation genes. SSR variability in cell cycle genes represents a genetic Achilles' heel, yet SSRs are common in all groups of genes. This tolerance many be due to i) positions in introns where they do not disrupt gene function, ii) essential roles in regulation, iii) specific value of adaptability, and/or iv) lack of negative selection pressure. Present study may be useful for further exploration of their medical relevance and potential functionality.

  12. Data on meq gene sequence analysis of Ludhiana MDV isolates.

    PubMed

    Gupta, Mridula; Deka, Dipak; Ramneek

    2016-12-01

    The data described are related to the article entitled "Sequence Analysis of Meq oncogene among Indian isolates of Marek׳s Disease Herpesvirus" M. Gupta, D. Deka, Ramneek, 2016. Seven meq genes of Ludhiana Marek׳s disease virus (MDV) field isolates were PCR amplified by using proof reading Platinum Pfx DNA polymerase enzyme, sequenced and then analyzed for the distinct polymorphisms and point mutations. The sequences were named as LDH 1758, LDH 2003, LDH 2483, LDH 2614, LDH 2700, LDH 2929 and LDH 3262. At this point, their deduced Meq amino acid sequences were compared with GenBank available already sequenced meq genes worldwide in their deduced amino acid form to study their identity/similarity with each other. PMID:27656677

  13. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  14. Sequence Variability in Staphylococcal Enterotoxin Genes seb, sec, and sed

    PubMed Central

    Johler, Sophia; Sihto, Henna-Maria; Macori, Guerrino; Stephan, Roger

    2016-01-01

    Ingestion of staphylococcal enterotoxins preformed by Staphylococcus aureus in food leads to staphylococcal food poisoning, the most prevalent foodborne intoxication worldwide. There are five major staphylococcal enterotoxins: SEA, SEB, SEC, SED, and SEE. While variants of these toxins have been described and were linked to specific hosts or levels or enterotoxin production, data on sequence variation is still limited. In this study, we aim to extend the knowledge on promoter and gene variants of the major enterotoxins SEB, SEC, and SED. To this end, we determined seb, sec, and sed promoter and gene sequences of a well-characterized set of enterotoxigenic Staphylococcus aureus strains originating from foodborne outbreaks, human infections, human nasal colonization, rabbits, and cattle. New nucleotide sequence variants were detected for all three enterotoxins and a novel amino acid sequence variant of SED was detected in a strain associated with human nasal colonization. While the seb promoter and gene sequences exhibited a high degree of variability, the sec and sed promoter and gene were more conserved. Interestingly, a truncated variant of sed was detected in all tested sed harboring rabbit strains. The generated data represents a further step towards improved understanding of strain-specific differences in enterotoxin expression and host-specific variation in enterotoxin sequences. PMID:27258311

  15. Sequence Variability in Staphylococcal Enterotoxin Genes seb, sec, and sed.

    PubMed

    Johler, Sophia; Sihto, Henna-Maria; Macori, Guerrino; Stephan, Roger

    2016-01-01

    Ingestion of staphylococcal enterotoxins preformed by Staphylococcus aureus in food leads to staphylococcal food poisoning, the most prevalent foodborne intoxication worldwide. There are five major staphylococcal enterotoxins: SEA, SEB, SEC, SED, and SEE. While variants of these toxins have been described and were linked to specific hosts or levels or enterotoxin production, data on sequence variation is still limited. In this study, we aim to extend the knowledge on promoter and gene variants of the major enterotoxins SEB, SEC, and SED. To this end, we determined seb, sec, and sed promoter and gene sequences of a well-characterized set of enterotoxigenic Staphylococcus aureus strains originating from foodborne outbreaks, human infections, human nasal colonization, rabbits, and cattle. New nucleotide sequence variants were detected for all three enterotoxins and a novel amino acid sequence variant of SED was detected in a strain associated with human nasal colonization. While the seb promoter and gene sequences exhibited a high degree of variability, the sec and sed promoter and gene were more conserved. Interestingly, a truncated variant of sed was detected in all tested sed harboring rabbit strains. The generated data represents a further step towards improved understanding of strain-specific differences in enterotoxin expression and host-specific variation in enterotoxin sequences.

  16. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  17. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  18. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    PubMed

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  19. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  20. Sequence and gene expression evolution of paralogous genes in willows.

    PubMed

    Harikrishnan, Srilakshmy L; Pucholt, Pascal; Berlin, Sofia

    2015-12-22

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows.

  1. Sequence and gene expression evolution of paralogous genes in willows

    PubMed Central

    Harikrishnan, Srilakshmy L.; Pucholt, Pascal; Berlin, Sofia

    2015-01-01

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows. PMID:26689951

  2. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications.

    PubMed

    Herzog, M; Maroteaux, L

    1986-11-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage.

  3. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    PubMed Central

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  4. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications.

    PubMed

    Herzog, M; Maroteaux, L

    1986-11-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  5. Thermodynamics-based models of transcriptional regulation with gene sequence.

    PubMed

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  6. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome. PMID:27502218

  7. Sequence Validation of Candidates for Selectively Important Genes in Sunflower

    PubMed Central

    Chapman, Mark A.; Mandel, Jennifer R.; Burke, John M.

    2013-01-01

    Analyses aimed at identifying genes that have been targeted by past selection provide a powerful means for investigating the molecular basis of adaptive differentiation. In the case of crop plants, such studies have the potential to not only shed light on important evolutionary processes, but also to identify genes of agronomic interest. In this study, we test for evidence of positive selection at the DNA sequence level in a set of candidate genes previously identified in a genome-wide scan for genotypic evidence of selection during the evolution of cultivated sunflower. In the majority of cases, we were able to confirm the effects of selection in shaping diversity at these loci. Notably, the genes that were found to be under selection via our sequence-based analyses were devoid of variation in the cultivated sunflower gene pool. This result confirms a possible strategy for streamlining the search for adaptively-important loci process by pre-screening the derived population to identify the strongest candidates before sequencing them in the ancestral population. PMID:23991009

  8. Nucleotide sequence of the vaccinia virus hemagglutinin gene.

    PubMed

    Shida, H

    1986-04-30

    Vaccinia virus hemagglutinin (HA) is expressed at late time of infection cycle, and it is nonessential for virus growth. Location of the HA structural gene was determined by hybrid-arrested and hybrid-selected translation methods at the right terminus of the HindIII A fragment. The position of the HA gene was confirmed by the production of the complete HA protein in the cells transfected with the plasmid containing that region. Examination of this nucleotide sequence revealed the positions of cleavage sites for a number of restriction endonucleases. The deduced amino acid sequence revealed that the HA protein is a member of typical surface membrane glycoproteins. Comparison of the nucleotide sequence upstream of the HA coding region with corresponding region of other late genes suggested the existence of the consensus decanucleotides TTCATTTa/tGT between 34 to 18 bp upstream to the initiation codon followed by a cluster of A or T, a unique feature of the late genes of vaccinia virus. These results in conjunction with the ease of isolating HA- mutants provide a basis for a new site suitable for inserting foreign genes.

  9. Glycoprotein Gene Sequence Variation in Rhesus Monkey Rhadinovirus

    PubMed Central

    Shin, Young C.; Jones, Leandro R.; Manrique, Julieta; Lauer, William; Carville, Angela; Mansfield, Keith G.; Desrosiers, Ronald C.

    2010-01-01

    Gene sequences for seven glycoproteins from 20 independent isolates of rhesus monkey rhadinovirus (RRV) and of the corresponding seven glycoprotein genes from nine strains of the Kaposi’s sarcoma-associated herpesvirus (KSHV) were obtained and analyzed. Phylogenetic analysis revealed two discrete groupings of RRV gH sequences, two discrete groupings of RRV gL sequences and two discrete groupings of RRV gB sequences. We called these phylogenetic groupings gHa, gHb, gLa, gLb, gBa and gBb. gHa was always paired with gLa and gHb was always paired with gLb for any individual RRV isolate. Since gH and gL are known to be interacting partners, these results suggest the need of matching sequence types for function of these cooperating proteins. gB phylogenetic grouping was not associated with gH/gL phylogenetic grouping. Our results demonstrate two distinct, distantly-related phylogenetic groupings of gH and gL of RRV despite a remarkable degree of sequence conservation within each individual phylogenetic group. PMID:20172576

  10. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2014-01-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy, and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:24510847

  11. Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    PubMed Central

    Harper, Marc A.; Chen, Zugen; Toy, Traci; Machado, Iara M. P.; Nelson, Stanley F.; Liao, James C.; Lee, Christopher J.

    2011-01-01

    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110–$340. PMID:21364744

  12. Cloning and sequence of the human adrenodoxin reductase gene.

    PubMed Central

    Lin, D; Shi, Y F; Miller, W L

    1990-01-01

    Adrenodoxin reductase (ferrodoxin:NADP+ oxidoreductase, EC 1.18.1.2) is a flavoprotein mediating electron transport to all mitochondrial forms of cytochrome P450. We cloned the human adrenodoxin reductase gene and characterized it by restriction endonuclease mapping and DNA sequencing. The entire gene is approximately 12 kilobases long and consists of 12 exons. The first exon encodes the first 26 of the 32 amino acids of the signal peptide, and the second exon encodes the remainder of signal peptide and the apparent FAD binding site. The remaining 10 exons are clustered in a region of only 4.3 kilobases, separated from the first two exons by a large intron of about 5.6 kilobases. Two forms of human adrenodoxin reductase mRNA, differing by the presence or absence of 18 bases in the middle of the sequence, arise from alternate splicing at the 5' end of exon 7. This alternately spliced region is directly adjacent to the NADPH binding site, which is entirely contained in exon 6. The immediate 5' flanking region lacks TATA and CAAT boxes; however, this region is rich in G + C and contains six copies of the sequence GGGCGGG, resembling promoter sequences of "housekeeping" genes. RNase protection experiments show that transcription is initiated from multiple sites in the 5' flanking region, located about 21-91 base pairs upstream from the AUG translational initiation codon. Images PMID:2236061

  13. Nucleotide sequence of the tobacco (Nicotiana tabacum) anionic peroxidase gene

    SciTech Connect

    Diaz-De-Leon, F.; Klotz, K.L.; Lagrimini, L.M. )

    1993-03-01

    Peroxidases have been implicated in numerous physiological processes including lignification (Grisebach, 1981), wound-healing (Espelie et al., 1986), phenol oxidation (Lagrimini, 1991), pathogen defense (Ye et al., 1990), and the regulation of cell elongation through the formation of interchain covalent bonds between various cell wall polymers (Fry, 1986; Goldberg et al., 1986; Bradley et al., 1992). However, a complete description of peroxidase action in vivo is not available because of the vast number of potential substrates and the existence of multiple isoenzymes. The tobacco anionic peroxidase is one of the better-characterized isoenzymes. This enzyme has been shown to oxidize a number of significant plant secondary compounds in vitro including cinnamyl alcohols, phenolic acids, and indole-3-acetic acid (Maeder, 1980; Lagrimini, 1991). A cDNA encoding the enzyme has been obtained, and this enzyme was shown to be expressed at the highest levels in lignifying tissues (xylem and tracheary elements) and also in epidermal tissue (Lagrimini et al., 1987). It was shown at this time that there were four distinct copies of the anionic peroxidase gene in tobacco (Nicotiana tabacum). A tobacco genomic DNA library was constructed in the [lambda]-phase EMBL3, from which two unique peroxidase genes were sequenced. One of these clones, [lambda]POD1, was designated as a pseudogene when the exonic sequences were found to differ from the cDNA sequences by 1%, and several frame shifts in the coding sequences indicated a dysfunctional gene (the authors' unpublished results). The other clone, [lambda]POD3, described in this manuscript, was designated as the functional tobacco anionic peroxidase gene because of 100% homology with the cDNA. Significant structural elements include an AS-2 box indicated in shoot-specific expression (Lam and Chua, 1989), a TATA box, and two intervening sequences. 10 refs., 1 tab.

  14. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-12-21

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.

  15. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-01-01

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2. PMID:26782391

  16. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence.

    PubMed

    Fizames, Cécile; Muños, Stéphane; Cazettes, Céline; Nacry, Philippe; Boucherez, Jossia; Gaymard, Frédéric; Piquemal, David; Delorme, Valérie; Commes, Thérèse; Doumas, Patrick; Cooke, Richard; Marti, Jacques; Sentenac, Hervé; Gojon, Alain

    2004-01-01

    Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warrants the identification of the genes corresponding to the majority of the tags found experimentally, with a high level of reliability, and provides a reference database for SAGE studies in Arabidopsis. This new resource allowed us to characterize the expression of more than 3,000 genes, for which there is no expressed sequence tag (EST) or cDNA in the databases. Moreover, 85% of the tags were specific for one gene. To illustrate this advantage of SAGE for functional genomics, we show that our data allow an unambiguous analysis of most of the individual genes belonging to 12 different ion transporter multigene families. These results indicate that, compared with EST-based tag to gene assignment, the use of the annotated genome sequence greatly improves gene identification in SAGE studies. However, more than 6,000 different tags remained with no gene match, suggesting that a significant proportion of transcripts present in the roots originate from yet unknown or wrongly annotated genes. The root transcriptome characterized in this study markedly differs from those obtained in other organs, and provides a unique resource for investigating the functional specificities of the root system. As an example of the use of SAGE for transcript profiling in Arabidopsis, we report here the identification of 270 genes differentially expressed between roots of plants grown either with NO3- or NH4NO3 as N source.

  17. Telencephalic embryonic subtractive sequences: a unique collection of neurodevelopmental genes.

    PubMed

    Bulfone, Alessandro; Carotenuto, Pietro; Faedo, Andrea; Aglio, Veruska; Garzia, Livia; Bello, Anna Maria; Basile, Andrea; Andrè, Alessandra; Cocchia, Massimo; Guardiola, Ombretta; Ballabio, Andrea; Rubenstein, John L R; Zollo, Massimo

    2005-08-17

    The vertebrate telencephalon is composed of many architectonically and functionally distinct areas and structures, with billions of neurons that are precisely connected. This complexity is fine-tuned during development by numerous genes. To identify genes involved in the regulation of telencephalic development, a specific subset of differentially expressed genes was characterized. Here, we describe a set of cDNAs encoded by genes preferentially expressed during development of the mouse telencephalon that was identified through a functional genomics approach. Of 832 distinct transcripts found, 223 (27%) are known genes. Of the remaining, 228 (27%) correspond to expressed sequence tags of unknown function, 58 (7%) are homologs or orthologs of known genes, and 323 (39%) correspond to novel rare transcripts, including 48 (14%) new putative noncoding RNAs. As an example of this latter group of novel precursor transcripts of micro-RNAs, telencephalic embryonic subtractive sequence (TESS) 24.E3 was functionally characterized, and one of its targets was identified: the zinc finger transcription factor ZFP9. The TESS transcriptome has been annotated, mapped for chromosome loci, and arrayed for its gene expression profiles during neural development and differentiation (in Neuro2a and neural stem cells). Within this collection, 188 genes were also characterized on embryonic and postnatal tissue by in situ hybridization, demonstrating that most are specifically expressed in the embryonic CNS. The full information has been organized into a searchable database linked to other genomic resources, allowing easy access to those who are interested in the dissection of the molecular basis of telencephalic development.

  18. The uteroglobin gene region: hormonal regulation, repetitive elements and complete nucleotide sequence of the gene.

    PubMed Central

    Suske, G; Wenz, M; Cato, A C; Beato, M

    1983-01-01

    Differential uteroglobin induction represents an appropriate model for the molecular analysis of the mechanism by which steroid hormones control gene expression in mammals. We have analyzed the structure and hormonal regulation of a 35 Kb region of genomic DNA in which the uteroglobin gene is located. The complete sequence of 3,700 nucleotides including the uteroglobin gene and its flanking regions has been determined, and the limits of the gene established by S1 nuclease mapping. Several regions containing repeated sequences were mapped by blot hybridization, one of which is located within the large intron in the uteroglobin gene. Analysis of the RNAs extracted from endometrium, lung and liver, after treatment with estrogen and/or progesterone shows that within the 35 Kb region, the uteroglobin gene is the only DNA segment whose transcription into stable RNA is induced by progesterone. Images PMID:6304644

  19. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  20. From expression cloning to gene modeling: The development of Xenopus gene sequence resources

    PubMed Central

    Gilchrist, Michael J

    2012-01-01

    The Xenopus community has made concerted efforts over the last 10–12 years systematically to improve the available sequence information for this amphibian model organism ideally suited to the study of early development in vertebrates. Here I review progress in the collection of both sequence data and physical clone reagents for protein coding genes. I conclude that we have cDNA sequences for around 50% and full-length clones for about 35% of the genes in Xenopus tropicalis, and similar numbers but a smaller proportion for Xenopus laevis. In addition, I demonstrate that the gaps in the current genome assembly create problems for the computational elucidation of gene sequences, and suggest some ways to ameliorate the effects of this. genesis 50:143–154, 2012. © 2012 Wiley Periodicals, Inc. PMID:22344767

  1. DNA sequence of the Escherichia coli tonB gene.

    PubMed Central

    Postle, K; Good, R F

    1983-01-01

    The nucleotide sequence of a cloned section of the Escherichia coli chromosome containing the tonB gene has been determined. Transcription initiation and termination sites for tonB RNA have been determined by S1 nuclease mapping. The tonB promoter and terminator resemble other E. coli promoters and terminators; the sequence of the tonB terminator region suggests that it may function bidirectionally. The DNA sequence specifies an open translation reading frame between the 5' and 3' RNA termini whose location is consistent with the position of previously isolated tonB::IS1 mutations. The DNA sequence predicts a proline-rich protein with a calculated size of 26.1-26.6 kilodaltons (239-244 amino acids), depending on which of three potential initiation codons is utilized. The predicted NH2 terminus of tonB protein resembles the proteolytically cleaved signal sequences of E. coli periplasmic and outer membrane proteins; the overall hydrophilic character of the protein sequence suggests that the bulk of the tonB protein is not embedded within the inner or outer membrane. A significant discrepancy exists between the calculated size of tonB protein and the apparent size of 36 kilodaltons determined by sodium dodecyl sulfate/polyacrylamide gel electrophoresis. Images PMID:6310567

  2. Nucleotide sequence of Bacillus phage Nf terminal protein gene.

    PubMed Central

    Leavitt, M C; Ito, J

    1987-01-01

    The nucleotide sequence of Bacillus phage Nf gene E has been determined. Gene E codes for phage terminal protein which is the primer necessary for the initiation of DNA replication. The deduced amino acid sequence of Nf terminal protein is approximately 66% homologous with the terminal proteins of Bacillus phages PZA and luminal diameter 29, and shows similar hydropathy and secondary structure predictions. A serine which has been identified as the residue which covalently links the protein to the 5' end of the genome in luminal diameter 29, is conserved in all three phages. The hydropathic and secondary structural environment of this serine is similar in these phage terminal proteins and also similar to the linking serine of adenovirus terminal protein. PMID:3601672

  3. Full-Length Minor Ampullate Spidroin Gene Sequence

    PubMed Central

    Chen, Gefei; Liu, Xiangqin; Zhang, Yunlong; Lin, Senzhu; Yang, Zijiang; Johansson, Jan; Rising, Anna; Meng, Qing

    2012-01-01

    Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level. PMID:23251707

  4. Understanding mechanisms underlying human gene expression variation with RNA sequencing

    PubMed Central

    Pickrell, Joseph K.; Marioni, John C.; Pai, Athma A.; Degner, Jacob F.; Engelhardt, Barbara E.; Nkadori, Everlyne; Veyrieras, Jean-Baptiste; Stephens, Matthew; Gilad, Yoav; Pritchard, Jonathan K.

    2011-01-01

    Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal1. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project2. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals. PMID:20220758

  5. Full-length minor ampullate spidroin gene sequence.

    PubMed

    Chen, Gefei; Liu, Xiangqin; Zhang, Yunlong; Lin, Senzhu; Yang, Zijiang; Johansson, Jan; Rising, Anna; Meng, Qing

    2012-01-01

    Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level. PMID:23251707

  6. Cloning, nucleotide sequence, and expression of Achromobacter protease I gene.

    PubMed

    Ohara, T; Makino, K; Shinagawa, H; Nakata, A; Norioka, S; Sakiyama, F

    1989-12-01

    Achromobacter protease I (API) is a lysine-specific serine protease which hydrolyzes specifically the lysyl peptide bond. A gene coding for API was cloned from Achromobacter lyticus M497-1. Nucleotide sequence of the cloned DNA fragment revealed that the gene coded for a single polypeptide chain of 653 amino acids. The N-terminal 205 amino acids, including signal peptide and the threonine/serine-rich C-terminal 180 amino acids are flanking the 268 amino acid-mature protein which was identified by protein sequencing. Escherichia coli carrying a plasmid containing the cloned API gene overproduced and secreted a protein of Mr 50,000 (API') into the periplasm. This protein exhibited a distinct endopeptidase activity specific for lysyl bonds as well. The N-terminal amino acid sequence of API' was the same as mature API, suggesting that the enzyme retained the C-terminal extended peptide chain. The present experiments indicate that API, an extracellular protease produced by gram-negative bacteria, is synthesized in vivo as a precursor protein bearing long extended peptide chains at both N and C termini. PMID:2684982

  7. Human retinoblastoma susceptibility gene: cloning, identification, and sequence.

    PubMed

    Lee, W H; Bookstein, R; Hong, F; Young, L J; Shew, J Y; Lee, E Y

    1987-03-13

    Recent evidence indicates the existence of a genetic locus in chromosome region 13q14 that confers susceptibility to retinoblastoma, a cancer of the eye in children. A gene encoding a messenger RNA (mRNA) of 4.6 kilobases (kb), located in the proximity of esterase D, was identified as the retinoblastoma susceptibility (RB) gene on the basis of chromosomal location, homozygous deletion, and tumor-specific alterations in expression. Transcription of this gene was abnormal in six of six retinoblastomas examined: in two tumors, RB mRNA was not detectable, while four others expressed variable quantities of RB mRNA with decreased molecular size of about 4.0 kb. In contrast, full-length RB mRNA was present in human fetal retina and placenta, and in other tumors such as neuroblastoma and medulloblastoma. DNA from retinoblastoma cells had a homozygous gene deletion in one case and hemizygous deletion in another case, while the remainder were not grossly different from normal human control DNA. The gene contains at least 12 exons distributed in a region of over 100 kb. Sequence analysis of complementary DNA clones yielded a single long open reading frame that could encode a hypothetical protein of 816 amino acids. A computer-assisted search of a protein sequence database revealed no closely related proteins. Features of the predicted amino acid sequence include potential metal-binding domains similar to those found in nucleic acid-binding proteins. These results provide a framework for further study of recessive genetic mechanisms in human cancers.

  8. Second-generation sequencing for gene discovery in the Brassicaceae.

    PubMed

    Hayward, Alice; Vighnesh, Guru; Delay, Christina; Samian, Mohd Rafizan; Manoli, Sahana; Stiller, Jiri; McKenzie, Megan; Edwards, David; Batley, Jacqueline

    2012-08-01

    The Brassicaceae contains the most diverse collection of agriculturally important crop species of all plant families. Yet, this is one of the few families that do not form functional symbiotic associations with mycorrhizal fungi in the soil for improved nutrient acquisition. The genes involved in this symbiosis were more recently recruited by legumes for symbiotic association with nitrogen-fixing rhizobia bacteria. This study applied second-generation sequencing (SGS) and analysis tools to discover that two such genes, NSP1 (Nodulation Signalling Pathway 1) and NSP2, remain conserved in diverse members of the Brassicaceae despite the absence of these symbioses. We demonstrate the utility of SGS data for the discovery of putative gene homologs and their analysis in complex polyploid crop genomes with little prior sequence information. Furthermore, we show how this data can be applied to enhance downstream reverse genetics analyses. We hypothesize that Brassica NSP genes may function in the root in other plant-microbe interaction pathways that were recruited for mycorrhizal and rhizobial symbioses during evolution.

  9. Second-generation sequencing for gene discovery in the Brassicaceae.

    PubMed

    Hayward, Alice; Vighnesh, Guru; Delay, Christina; Samian, Mohd Rafizan; Manoli, Sahana; Stiller, Jiri; McKenzie, Megan; Edwards, David; Batley, Jacqueline

    2012-08-01

    The Brassicaceae contains the most diverse collection of agriculturally important crop species of all plant families. Yet, this is one of the few families that do not form functional symbiotic associations with mycorrhizal fungi in the soil for improved nutrient acquisition. The genes involved in this symbiosis were more recently recruited by legumes for symbiotic association with nitrogen-fixing rhizobia bacteria. This study applied second-generation sequencing (SGS) and analysis tools to discover that two such genes, NSP1 (Nodulation Signalling Pathway 1) and NSP2, remain conserved in diverse members of the Brassicaceae despite the absence of these symbioses. We demonstrate the utility of SGS data for the discovery of putative gene homologs and their analysis in complex polyploid crop genomes with little prior sequence information. Furthermore, we show how this data can be applied to enhance downstream reverse genetics analyses. We hypothesize that Brassica NSP genes may function in the root in other plant-microbe interaction pathways that were recruited for mycorrhizal and rhizobial symbioses during evolution. PMID:22765874

  10. Cloning and sequencing the genes encoding goldfish and carp ependymin.

    PubMed

    Adams, D S; Shashoua, V E

    1994-04-20

    Ependymins (EPNs) are brain glycoproteins thought to function in optic nerve regeneration and long-term memory consolidation. To date, epn genes have been characterized in two orders of teleost fish. In this study, polymerase chain reactions (PCR) were used to amplify the complete 1.6-kb epn genes, gf-I and cc-I, from genomic DNA of Cypriniformes, goldfish and carp, respectively. Amplified bands were cloned and sequenced. Each gene consists of six exons and five introns. The exon portion of gf-I encodes a predicted 215-amino-acid (aa) protein previously characterized as GF-I, while cc-I encodes a predicted 215-aa protein 95% homologous to GF-I.

  11. Nucleotide sequence and temporal expression of a baculovirus regulatory gene.

    PubMed

    Guarino, L A; Summers, M D

    1987-07-01

    The nucleotide sequence of a trans-activating regulatory gene (IE-1) of the baculovirus Autographa californica nuclear polyhedrosis virus has been determined. This gene encodes a protein of 581 amino acids with a predicted molecular weight of 66,856. A DNA fragment containing the entire coding sequence of IE-1 was inserted downstream of an RNA promoter. Subsequent cell-free transcription and translation directed the synthesis of a single peptide with an apparent molecular weight of 70,000. Quantitative S1 nuclease analysis indicated that IE-1 was maximally synthesized during a 1-h virus adsorption period and that steady-state levels of IE-1 message were maintained during the first 24 h of infection. Northern blot hybridization indicated that several late transcripts which overlap the IE-1 gene were transcribed from both strands. The precise locations of the 5' and 3' ends of these overlapping transcripts were mapped using S1 nuclease. The overlapping transcripts were grouped in two transcriptional units. One unit was composed of IE-1 and overlapping gamma transcripts which initiated upstream of IE-1 and terminated downstream of IE-1. The other unit, transcribed from the opposite strand, consisted of gamma transcripts with coterminal 5' ends and extended 3' ends. The shorter, more abundant transcripts in this unit overlapped 30 to 40 bases of IE-1 at the 3' end, while the longer transcripts overlapped the entire IE-1 gene. Transcription of several early A. californica nuclear polyhedrosis virus genes, in addition to 39K, was shown to be trans-activated by IE-1, indicating that IE-1 may have a central role in the regulation of beta-gene expression. PMID:16789264

  12. araB Gene and nucleotide sequence of the araC gene of Erwinia carotovora.

    PubMed Central

    Lei, S P; Lin, H C; Heffernan, L; Wilcox, G

    1985-01-01

    The araB and araC genes of Erwinia carotovora were expressed in Escherichia coli and Salmonella typhimurium. The araB and araC genes in E. coli, E. carotovora, and S. typhimurium were transcribed in divergent directions. In E. carotovora, the araB and araC genes were separated by 3.5 kilobase pairs, whereas in E. coli and S. typhimurium they were separated by 147 base pairs. The nucleotide sequence of the E. carotovora araC gene was determined. The predicted sequence of AraC protein of E. carotovora was 18 and 29 amino acids longer than that of AraC protein of E. coli and S. typhimurium, respectively. The DNA sequence of the araC gene of E. carotovora was 58% homologous to that of E. coli and 59% homologous to that of S. typhimurium, with respect to the common region they share. The predicted amino acid sequence of AraC protein was 57% homologous to that of E. coli and 58% homologous to that of S. typhimurium. The 5' noncoding regions of the araB and araC genes of E. carotovora had little homology to either of the other two species. Images PMID:3902795

  13. Detection and sequence analysis of accessory gene regulator genes of Staphylococcus pseudintermedius isolates

    PubMed Central

    Chitra, M. Ananda; Jayanthy, C.; Nagarajan, B.

    2015-01-01

    Background: Staphylococcus pseudintermedius (SP) is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr) locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objective of this study was to detect and sequence analyzing the AgrA, B, and D of SP isolated from canine skin infections. Materials and Methods: In this study, we have isolated and identified SP from canine pyoderma and otitis cases by polymerase chain reaction (PCR) and confirmed by PCR-restriction fragment length polymorphism. Primers for SP agrA and agrBD genes were designed using online primer designing software and BLAST searched for its specificity. Amplification of the agr genes was carried out for 53 isolates of SP by PCR and sequencing of agrA, B, and D were carried out for five isolates and analyzed using DNAstar and Mega5.2 software. Results: A total of 53 (59%) SP isolates were obtained from 90 samples. 15 isolates (28%) were confirmed to be methicillin-resistant SP (MRSP) with the detection of the mecA gene. Accessory gene regulator A, B, and D genes were detected in all the SP isolates. Complete nucleotide sequences of the above three genes for five isolates were submitted to GenBank, and their accession numbers are from KJ133557 to KJ133571. AgrA amino acid sequence analysis showed that it is mainly made of alpha-helices and is hydrophilic in nature. AgrB is a transmembrane protein, and AgrD encodes the precursor of the autoinducing peptide (AIP). Sequencing of the agrD gene revealed that the 5 canine SP strains tested could be divided into three Agr specificity groups (RIPTSTGFF, KIPTSTGFF, and RIPISTGFF) based on the putative AIP produced by each strain. The AIP of

  14. Technology development for gene discovery and full-length sequencing

    SciTech Connect

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  15. DNA sequence of a gene encoding a BALB/c mouse Ld transplantation antigen.

    PubMed

    Moore, K W; Sher, B T; Sun, Y H; Eakle, K A; Hood, L

    1982-02-01

    The sequence of a gene, denoted 27.5, encoding a transplantation antigen for the BALB/c mouse has been determined. Gene transfer studies and comparison of the translated sequence with the partial amino acid sequence of the Ld transplantation antigen establish that gene 27.5 encodes an Ld polypeptide. A comparison of the gene 27.5 sequence with several complementary DNA sequences suggests that the BALB/c mouse may contain a number of closely related L-like genes. Gene 27.5 has eight exons that correlate with the structural domains of the transplantation antigen. PMID:7058332

  16. Divergence of human [alpha]-chain constant region gene sequences: A novel recombinant [alpha]2 gene

    SciTech Connect

    Chintalacharuvu, K. R.; Morrison, S.L. ); Raines, M. )

    1994-06-01

    IgA is the major Ig synthesized in humans and provides the first line of defense at the mucosal surfaces. The constant region of IgA heavy chain is encoded by the [alpha] gene on chromosome 14. Previous studies have indicated the presence of two [alpha] genes, [alpha]1 and [alpha]2 existing in two allotypic forms, [alpha]2 m(1) and [alpha]2 m(2). Here the authors report the cloning and complete nucleotide sequence determination of a novel human [alpha] gene. Nucleotide sequence comparison with the published [alpha] sequences suggests that the gene arose as a consequence of recombination or gene conversion between the two [alpha]2 alleles. The authors have expressed the gene as a chimeric protein in myeloma cells indicating that it encodes a functional protein. The novel IgA resembles IgA2 m(2) in that disulfide bonds link H and L chains. This novel recombinant gene provides insights into the mechanisms of generation of different constant regions and suggests that within human populations, multiple alleles of [alpha] may be present providing IgAs of different structures.

  17. Next generation sequencing in predicting gene function in podophyllotoxin biosynthesis.

    PubMed

    Marques, Joaquim V; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A; May, Gregory D; Crow, John A; Davin, Laurence B; Lewis, Norman G

    2013-01-01

    Podophyllum species are sources of (-)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (-)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (-)-matairesinol into (-)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (-)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways.

  18. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  19. Cloning, sequencing, gene organization, and localization of the human ribosomal protein RPL23A gene

    SciTech Connect

    Fan, Wufang; Christensen, M.; Eichler, E.

    1997-12-01

    The intron-containing gene for human ribosomal protein RPL23A has been cloned, sequenced, and localized. The gene is approximately 4.0 kb in length and contains five exons and four introns. All splice sites exactly match the AG/GT consensus rule. The transcript is about 0.6 kb and is detected in all tissues examined. In adult tissues, the RPL23A transcript is dramatically more abundant in pancreas, skeletal muscle, and heart, while much less abundant in kidney, brain, placenta, lung, and liver. A full-length cDNA clone of 576 nt was identified, and the nucleotide sequence was found to match the exon sequence precisely. The open reading frame encodes a polypeptide of 156 amino acids, which is absolutely conserved with the rat RPL23A protein. In the 5{prime} flanking region of the gene, a canonical TATA sequence and a defined CAAT box were found for the first time in a mammalian ribosomal protein gene. The intron-containing RPL23A gene was mapped to cytogenetic band 17q11 by fluorescence in situ hybridization. 33 refs., 4 figs.

  20. Intervening sequences in an Archaea DNA polymerase gene.

    PubMed

    Perler, F B; Comb, D G; Jack, W E; Moran, L S; Qiang, B; Kucera, R B; Benner, J; Slatko, B E; Nwankwo, D O; Hempstead, S K

    1992-06-15

    The DNA polymerase gene from the Archaea Thermococcus litoralis has been cloned and expressed in Escherichia coli. It is split by two intervening sequences (IVSs) that form one continuous open reading frame with the three polymerase exons. To our knowledge, neither IVS is similar to previously described introns. However, the deduced amino acid sequences of both IVSs are similar to open reading frames present in mobile group I introns. The second IVS (IVS2) encodes an endonuclease, I-Tli I, that cleaves at the exon 2-exon 3 junction after IVS2 has been deleted. IVS2 self-splices in E. coli to yield active polymerase, but processing is abolished if the IVS2 reading frame is disrupted. Silent changes in the DNA sequence at the exon 2-IVS2 junction that maintain the original protein sequence do not inhibit splicing. These data suggest that protein rather than mRNA splicing may be responsible for production of the mature polymerase. PMID:1608969

  1. A special-purpose processor for gene sequence analysis.

    PubMed

    Fagin, B; Watt, J G; Gross, R

    1993-04-01

    Advances in computational biology have occurred primarily in the areas of software and algorithm development; new designs of hardware to support biological computing are extremely scarce. This is due, we believe, to the presence of a non-trivial knowledge gap between molecular biologists and computer designers. The existence of this gap is unfortunate, as it has long been known that for certain problems, special-purpose computers can achieve significant cost/performance gains over general-purpose machines. We describe one such computer here: a custom accelerator for gene sequence analysis. The accelerator implements a version of the Needleman-Wunsch algorithm for nucleotide sequence alignment. Sequence lengths are constrained only by available memory; the product of sequence lengths in the current implementation can be up to 2(22). The machine is implemented as two NuBus boards connected to a Mac IIf/x, using a mixture of TTL and FPGA technology clocked at 10 MHz. The boards are completely functional, and yield a 15-fold performance improvement over an unassisted host.

  2. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  3. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.

  4. Excision of plastid marker genes using directly repeated DNA sequences.

    PubMed

    Mudd, Elisabeth A; Madesis, Panagiotis; Avila, Elena Martin; Day, Anil

    2014-01-01

    Excision of marker genes using DNA direct repeats makes use of the predominant homologous recombination pathways present in the plastids of algae and plants. The method is simple, efficient, and widely applicable to plants and microalgae. Marker excision frequency is dependent on the length and number of directly repeated sequences. When two repeats are used a repeat size of greater than 600 bp promotes efficient excision of the marker gene. A wide variety of sequences can be used to make the direct repeats. Only a single round of transformation is required, and there is no requirement to introduce site-specific recombinases by retransformation or sexual crosses. Selection is used to maintain the marker and ensure homoplasmy of transgenic plastid genomes. Release of selection allows the accumulation of marker-free plastid genomes generated by marker excision, which is spontaneous, random, and a unidirectional process. Positive selection is provided by linking marker excision to restoration of the coding region of an herbicide resistance gene from two overlapping but incomplete coding regions. Cytoplasmic sorting allows the segregation of cells with marker-free transgenic plastids. The marker-free shoots resulting from direct repeat-mediated excision of marker genes have been isolated by vegetative propagation of shoots in the T0 generation. Alternatively, accumulation of marker-free plastid genomes during growth, development and flowering of T0 plants allows the collection of seeds that give rise to a high proportion of marker-free T1 seedlings. The simplicity and convenience of direct repeat excision facilitates its widespread use to isolate marker-free crops. PMID:24599849

  5. Modeling DNA sequence-based cis-regulatory gene networks.

    PubMed

    Bolouri, Hamid; Davidson, Eric H

    2002-06-01

    Gene network analysis requires computationally based models which represent the functional architecture of regulatory interactions, and which provide directly testable predictions. The type of model that is useful is constrained by the particular features of developmentally active cis-regulatory systems. These systems function by processing diverse regulatory inputs, generating novel regulatory outputs. A computational model which explicitly accommodates this basic concept was developed earlier for the cis-regulatory system of the endo16 gene of the sea urchin. This model represents the genetically mandated logic functions that the system executes, but also shows how time-varying kinetic inputs are processed in different circumstances into particular kinetic outputs. The same basic design features can be utilized to construct models that connect the large number of cis-regulatory elements constituting developmental gene networks. The ultimate aim of the network models discussed here is to represent the regulatory relationships among the genomic control systems of the genes in the network, and to state their functional meaning. The target site sequences of the cis-regulatory elements of these genes constitute the physical basis of the network architecture. Useful models for developmental regulatory networks must represent the genetic logic by which the system operates, but must also be capable of explaining the real time dynamics of cis-regulatory response as kinetic input and output data become available. Most importantly, however, such models must display in a direct and transparent manner fundamental network design features such as intra- and intercellular feedback circuitry; the sources of parallel inputs into each cis-regulatory element; gene battery organization; and use of repressive spatial inputs in specification and boundary formation. Successful network models lead to direct tests of key architectural features by targeted cis-regulatory analysis. PMID

  6. Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: single-gene, gene panel, or exome/genome sequencing.

    PubMed

    Xue, Yuan; Ankala, Arunkanth; Wilcox, William R; Hegde, Madhuri R

    2015-06-01

    Next-generation sequencing is changing the paradigm of clinical genetic testing. Today there are numerous molecular tests available, including single-gene tests, gene panels, and exome sequencing or genome sequencing. As a result, ordering physicians face the conundrum of selecting the best diagnostic tool for their patients with genetic conditions. Single-gene testing is often most appropriate for conditions with distinctive clinical features and minimal locus heterogeneity. Next-generation sequencing-based gene panel testing, which can be complemented with array comparative genomic hybridization and other ancillary methods, provides a comprehensive and feasible approach for heterogeneous disorders. Exome sequencing and genome sequencing have the advantage of being unbiased regarding what set of genes is analyzed, enabling parallel interrogation of most of the genes in the human genome. However, current limitations of next-generation sequencing technology and our variant interpretation capabilities caution us against offering exome sequencing or genome sequencing as either stand-alone or first-choice diagnostic approaches. A growing interest in personalized medicine calls for the application of genome sequencing in clinical diagnostics, but major challenges must be addressed before its full potential can be realized. Here, we propose a testing algorithm to help clinicians opt for the most appropriate molecular diagnostic tool for each scenario.

  7. Microdiversity of extracellular enzyme genes among sequenced prokaryotic genomes

    PubMed Central

    Zimmerman, Amy E; Martiny, Adam C; Allison, Steven D

    2013-01-01

    Understanding the relationship between prokaryotic traits and phylogeny is important for predicting and modeling ecological processes. Microbial extracellular enzymes have a pivotal role in nutrient cycling and the decomposition of organic matter, yet little is known about the phylogenetic distribution of genes encoding these enzymes. In this study, we analyzed 3058 annotated prokaryotic genomes to determine which taxa have the genetic potential to produce alkaline phosphatase, chitinase and β-N-acetyl-glucosaminidase enzymes. We then evaluated the relationship between the genetic potential for enzyme production and 16S rRNA phylogeny using the consenTRAIT algorithm, which calculated the phylogenetic depth and corresponding 16S rRNA sequence identity of clades of potential enzyme producers. Nearly half (49.2%) of the genomes analyzed were found to be capable of extracellular enzyme production, and these were non-randomly distributed across most prokaryotic phyla. On average, clades of potential enzyme-producing organisms had a maximum phylogenetic depth of 0.008004–0.009780, though individual clades varied broadly in both size and depth. These values correspond to a minimum 16S rRNA sequence identity of 98.04–98.40%. The distribution pattern we found is an indication of microdiversity, the occurrence of ecologically or physiologically distinct populations within phylogenetically related groups. Additionally, we found positive correlations among the genes encoding different extracellular enzymes. Our results suggest that the capacity to produce extracellular enzymes varies at relatively fine-scale phylogenetic resolution. This variation is consistent with other traits that require a small number of genes and provides insight into the relationship between taxonomy and traits that may be useful for predicting ecological function. PMID:23303371

  8. Identification of novel hereditary cancer genes by whole exome sequencing.

    PubMed

    Sokolenko, Anna P; Suspitsin, Evgeny N; Kuligina, Ekatherina Sh; Bizin, Ilya V; Frishman, Dmitrij; Imyanitov, Evgeny N

    2015-12-28

    Whole exome sequencing (WES) provides a powerful tool for medical genetic research. Several dozens of WES studies involving patients with hereditary cancer syndromes have already been reported. WES led to breakthrough in understanding of the genetic basis of some exceptionally rare syndromes; for example, identification of germ-line SMARCA4 mutations in patients with ovarian hypercalcemic small cell carcinomas indeed explains a noticeable share of familial aggregation of this disease. However, studies on common cancer types turned out to be more difficult. In particular, there is almost a dozen of reports describing WES analysis of breast cancer patients, but none of them yet succeeded to reveal a gene responsible for the significant share of missing heritability. Virtually all components of WES studies require substantial improvement, e.g. technical performance of WES, interpretation of WES results, mode of patient selection, etc. Most of contemporary investigations focus on genes with autosomal dominant mechanism of inheritance; however, recessive and oligogenic models of transmission of cancer susceptibility also need to be considered. It is expected that the list of medically relevant tumor-predisposing genes will be rapidly expanding in the next few years. PMID:26427841

  9. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  10. Analysis on the preference for sequence matching between mRNA sequences and the corresponding introns in ribosomal protein genes.

    PubMed

    Zhang, Qiang; Li, Hong; Zhao, Xiaoqing; Zheng, Yan; Meng, Hu; Jia, Yun; Xue, Hui; Bo, Sulin

    2016-03-01

    Introns after splicing still play an important role. Introns can accomplish gene expression and regulation by interaction with corresponding mRNA sequences. Based on the Smith-Waterman method, local comparing makes us get the optimal matched segments between intron sequences and mRNA sequences. Analyzing the distribution regulation of the optimal matching region on mRNA sequences of ribosomal protein genes about 27 species, we find a strong interaction between UTR region sequences and introns. There are a lot of the optimal matching regions and low matching ones, and the latter are supposed to be the combined regions of protein complexes. The optimal matching frequency distributions have obvious differences nearby the mRNA functional sites such as translation initiation and termination sites, exon-exon joints and EJC regions. This conclusion shows that intron sequences and mature mRNA sequences are co-evolved and interactive to play their functions. PMID:26707402

  11. Sequencing, characterization, and gene expression analysis of the histidine decarboxylase gene cluster of Morganella morganii.

    PubMed

    Ferrario, Chiara; Borgo, Francesca; de Las Rivas, Blanca; Muñoz, Rosario; Ricci, Giovanni; Fortina, Maria Grazia

    2014-03-01

    The histidine decarboxylase gene cluster of Morganella morganii DSM30146(T) was sequenced, and four open reading frames, named hdcT1, hdc, hdcT2, and hisRS were identified. Two putative histidine/histamine antiporters (hdcT1 and hdcT2) were located upstream and downstream the hdc gene, codifying a pyridoxal-P dependent histidine decarboxylase, and followed by hisRS gene encoding a histidyl-tRNA synthetase. This organization was comparable with the gene cluster of other known Gram negative bacteria, particularly with that of Klebsiella oxytoca. Recombinant Escherichia coli strains harboring plasmids carrying the M. morganii hdc gene were shown to overproduce histidine decarboxylase, after IPTG induction at 37 °C for 4 h. Quantitative RT-PCR experiments revealed the hdc and hisRS genes were highly induced under acidic and histidine-rich conditions. This work represents the first description and identification of the hdc-related genes in M. morganii. Results support the hypothesis that the histidine decarboxylation reaction in this prolific histamine producing species may play a role in acid survival. The knowledge of the role and the regulation of genes involved in histidine decarboxylation should improve the design of rational strategies to avoid toxic histamine production in foods.

  12. Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

    Technology Transfer Automated Retrieval System (TEKTRAN)

    : Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...

  13. Next generation sequencing in synovial sarcoma reveals novel gene mutations

    PubMed Central

    Vlenterie, Myrella; Hillebrandt-Roeffen, Melissa H.S.; Flucke, Uta E.; Groenen, Patricia J.T.A.; Tops, Bastiaan B.J.; Kamping, Eveline J.; Pfundt, Rolph; de Bruijn, Diederik R.H.; van Kessel, Ad H.M. Geurts; van Krieken, Han J.H.J.M.; van der Graaf, Winette T.A.; Versleijen-Jonkers, Yvonne M.H.

    2015-01-01

    Over 95% of all synovial sarcomas (SS) share a unique translocation, t(X;18), however, they show heterogeneous clinical behavior. We analyzed multiple SS to reveal additional genetic alterations besides the translocation. Twenty-six SS from 22 patients were sequenced for 409 cancer-related genes using the Comprehensive Cancer Panel (Life Technologies, USA) on an Ion Torrent platform. The detected variants were verified by Sanger sequencing and compared to matched normal DNAs. Copy number variation was assessed in six tumors using the Oncoscan array (Affymetrix, USA). In total, eight somatic mutations were detected in eight samples. These mutations have not been reported previously in SS. Two of these, in KRAS and CCND1, represent known oncogenic mutations in other malignancies. Additional mutations were detected in RNF213, SEPT9, KDR, CSMD3, MLH1 and ERBB4. DNA alterations occurred more often in adult tumors. A distinctive loss of 6q was found in a metastatic lesion progressing under pazopanib, but not in the responding lesion. Our results emphasize t(X;18) as a single initiating event in SS and as the main oncogenic driver. Our results also show the occurrence of additional genetic events, mutations or chromosomal aberrations, occurring more frequently in SS with an onset in adults. PMID:26415226

  14. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    PubMed

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-06-20

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc.

  15. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    PubMed

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc. PMID:27322403

  16. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  17. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    SciTech Connect

    Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J. R.; Amaral-Zettler, L.; Gilbert, J. A.

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences - the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  18. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    PubMed

    Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; San Gil, Inigo; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spo, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  19. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    PubMed Central

    Jenior, Matthew L.; Koumpouras, Charles C.; Westcott, Sarah L.; Highlander, Sarah K.

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  20. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    PubMed

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  1. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    PubMed

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  2. Genome-wide gene-gene interaction analysis for next-generation sequencing.

    PubMed

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-03-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study.

  3. Intervening sequence of DNA identified in the structural portion of a mouse beta-globin gene.

    PubMed Central

    Tilghman, S M; Tiemeier, D C; Seidman, J G; Peterlin, B M; Sullivan, M; Maizel, J V; Leder, P

    1978-01-01

    The unusual electron microscopic appearance of a hybrid formed between 9S mouse beta-globin mRNA and its corresponding cloned gene segment is caused by at least one, and possibly two, intervening sequences of DNA that interrupt the mouse beta-globin gene. Such an interpretation is consistent with a paradoxical restriction site pattern previously noted in this gene and with the nucleotide sequence of that portion of the gene that spans both structural and intervening sequences. The large intervening sequence, approximately 550 base pairs in length, occurs in the structural globin sequence and immediately follows the beta-globin codon corresponding to amino acid 104. A smaller, putative intervening sequence is located close to the 5' end of the beta-globin-coding sequence but may reside beyond its initiation codon. The beta-globin gene thus appears to be encoded in two, and possibly three, discontinuous segments. Images PMID:273235

  4. Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

    PubMed Central

    Ulpinnis, Chris; Scholz, Uwe; Altmann, Thomas

    2015-01-01

    A major goal of maize genomic research is to identify sequence polymorphisms responsible for phenotypic variation in traits of economic importance. Large-scale detection of sequence variation is critical for linking genes, or genomic regions, to phenotypes. However, due to its size and complexity, it remains expensive to generate whole genome sequences of sufficient coverage for divergent maize lines, even with access to next generation sequencing (NGS) technology. Because methods involving reduction of genome complexity, such as genotyping-by-sequencing (GBS), assess only a limited fraction of sequence variation, targeted sequencing of selected genomic loci offers an attractive alternative. We therefore designed a sequence capture assay to target 29 Mb genomic regions and surveyed a total of 4,648 genes possibly affecting biomass production in 21 diverse inbred maize lines (7 flints, 14 dents). Captured and enriched genomic DNA was sequenced using the 454 NGS platform to 19.6-fold average depth coverage, and a broad evaluation of read alignment and variant calling methods was performed to select optimal procedures for variant discovery. Sequence alignment with the B73 reference and de novo assembly identified 383,145 putative single nucleotide polymorphisms (SNPs), of which 42,685 were non-synonymous alterations and 7,139 caused frameshifts. Presence/absence variation (PAV) of genes was also detected. We found that substantial sequence variation exists among genomic regions targeted in this study, which was particularly evident within coding regions. This diversification has the potential to broaden functional diversity and generate phenotypic variation that may lead to new adaptations and the modification of important agronomic traits. Further, annotated SNPs identified here will serve as useful genetic tools and as candidates in searches for phenotype-altering DNA variation. In summary, we demonstrated that sequencing of captured DNA is a powerful approach for

  5. Sequence of the dog immunoglobulin alpha and epsilon constant region genes

    SciTech Connect

    Patel, M.; Selinger, D.; Mark, G.E.; Hollis, G.F.; Hickey, G.J.

    1995-03-01

    The immunoglobulin alpha (IGHAC) and epsilon (IGHEC) germline constant region genes were isolated from a dog liver genomic DNA library. Sequence analysis indicates that the dog IGHEC gene is encoded by four exons spread out over 1.7 kilobases (kb). The IGHAC sequence encompasses 1.5 kb and includes all three constant region coding exons. The complete exon/intron sequence of these genes is described. 28 refs., 2 figs., 2 tabs.

  6. Cloning and sequencing of the alcohol dehydrogenase II gene from Zymomonas mobilis

    DOEpatents

    Ingram, Lonnie O.; Conway, Tyrrell

    1992-01-01

    The alcohol dehydrogenase II gene from Zymomonas mobilis has been cloned and sequenced. This gene can be expressed at high levels in other organisms to produce acetaldehyde or to convert acetaldehyde to ethanol.

  7. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

    PubMed Central

    Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

    2012-01-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244

  8. Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations.

    PubMed

    Martin, Dorrelyn P; Miya, Jharna; Reeser, Julie W; Roychowdhury, Sameek

    2016-01-01

    RNA sequencing (RNAseq) is a versatile method that can be utilized to detect and characterize gene expression, mutations, gene fusions, and noncoding RNAs. Standard RNAseq requires 30 - 100 million sequencing reads and can include multiple RNA products such as mRNA and noncoding RNAs. We demonstrate how targeted RNAseq (capture) permits a focused study on selected RNA products using a desktop sequencer. RNAseq capture can characterize unannotated, low, or transiently expressed transcripts that may otherwise be missed using traditional RNAseq methods. Here we describe the extraction of RNA from cell lines, ribosomal RNA depletion, cDNA synthesis, preparation of barcoded libraries, hybridization and capture of targeted transcripts and multiplex sequencing on a desktop sequencer. We also outline the computational analysis pipeline, which includes quality control assessment, alignment, fusion detection, gene expression quantification and identification of single nucleotide variants. This assay allows for targeted transcript sequencing to characterize gene expression, gene fusions, and mutations. PMID:27585245

  9. Short DNA sequences inserted for gene targeting can accidentally interfere with off-target gene expression.

    PubMed

    Meier, Ingo D; Bernreuther, Christian; Tilling, Thomas; Neidhardt, John; Wong, Yong Wee; Schulze, Christian; Streichert, Thomas; Schachner, Melitta

    2010-06-01

    Targeting of genes in mice, a key approach to study development and disease, often leaves a neo cassette, loxP, or FRT sites inserted in the mouse genome. Insertion of neo can influence the expression of neighboring genes, but similar effects have not been reported for loxP sites. We therefore performed microarray analyses of mice in which the Ncam or the Tnr gene were targeted either by insertion of neo or loxP/FRT sites. In the case of Ncam, neo, but not loxP/FRT insertion, led to a 2-fold reduction in mRNA levels of 3 genes located at distances between 0.2 and 3.1 Mb from the target. In contrast, after introduction of loxP/FRT sites into introns of Tnr, we observed a 2.5- to 4-fold reduction in the transcript level of the Gas5 gene, 1.1 Mb away from Tnr, most probably due to disruption of a conserved regulatory element in Tnr. Insertion of short DNA sequences such as loxP/FRT can thus influence off-target mRNA levels if these sites are accidentally placed into regulatory elements. Our results imply that conditional knockout mice should be analyzed for genomic positional side effects that may influence the animals' phenotypes. PMID:20110269

  10. Gene sequence, localization, and evolutionary conservation of DAZLA, a candidate male sterility gene.

    PubMed

    Seboun, E; Barbaux, S; Bourgeron, T; Nishi, S; Agulnik, A; Egashira, M; Nikkawa, N; Bishop, C; Fellous, M; McElreavey, K; Kasahara, M; Algonik, A

    1997-04-15

    We have isolated the human homologue of the mouse germ cell-specific transcript Tpx2, which we had previously mapped to mouse chromosome 17. Sequence analysis shows that the human gene is part of the DAZ (Deleted in Azoospermia) family, represents the human homologue of the mouse Dazla and Drosophila boule genes, and is termed DAZLA. Like Dazla and boule, DAZLA is single copy and maps to 3p25. This defines a new region of synteny between mouse chromosome 17 and human chromosome 3. Unlike DAZ, which has multiple DAZ repeats, DAZLA encodes a putative RNA-binding protein with a single RNA-binding motif and a single DAZ repeat. DAZLA is more closely related to Dazla in the mouse than to the Y-linked homologue DAZ (88% identity overall with mouse Dazla compared to 76% identity with the human DAZ protein sequence). Southern blot analysis showed that DAZLA is autosomal in all mammals tested and that DAZ has been recently translocated to the Y chromosome, sometime after the divergence of Old World and New World primates. To investigate the evolutionary relatedness of DAZLA and DAZ further, their partial genomic structures were obtained and compared. This revealed that the genomic organization of both genes in the 5' region is highly conserved. DAZLA is a new member of the DAZ family of genes, which is associated with spermatogenesis and male sterility. Familial cases of male infertility in humans show an autosomal recessive mode of inheritance. It is possible that some of these families may carry mutations in the DAZLA gene.

  11. Towards Experimental Annotation of Genes by High Throughput Sequencing

    SciTech Connect

    Bradbury, Andrew

    2010-06-03

    Andrew Bradbury of Los Alamos National Laboratory discusses turning annotation into a sequencing pipeline on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  12. Tetrathiobacter kashmirensis Strain CA-1 16S rRNA gene complete sequence.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This study used 1326 base pair 16S rRNA gene sequence methods to confirm the identification of a bacterium as Tetrathiobacter kashmirensis. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification of the bacterium. The isolate...

  13. Cloning and nucleotide sequence of the aroA gene of Bordetella pertussis.

    PubMed Central

    Maskell, D J; Morrissey, P; Dougan, G

    1988-01-01

    The aroA locus of Bordetella pertussis, encoding 5-enolpyruvylshikimate 3-phosphate synthase, has been cloned into Escherichia coli by using a cosmid vector. The gene is expressed in E. coli and complemented an E. coli aroA mutant. The nucleotide sequence of the B. pertussis aroA gene was determined and contains an open reading frame encoding 442 amino acids, with a calculated molecular weight for 5-enolpyruvylshikimate 3-phosphate synthase of 46,688. The amino acid sequence derived from the nucleotide sequence shows homology with the published amino acid sequences of aroA gene products of other microorganisms. PMID:2897356

  14. Sequences and evolution of human and squirrel monkey blue opsin genes.

    PubMed

    Shimmin, L C; Mai, P; Li, W H

    1997-04-01

    The sequences of the entire blue opsin gene in the squirrel monkey (Saimiri boliviensis) and the five introns of the human blue opsin gene were obtained. Intron 3 of these genes contains an Alu sequence and intron 4 contains a partial mer13 sequence. A comparison of the squirrel monkey opsin sequence with published mammalian opsin sequences shows that features believed to be functionally critical are all conserved. However, the blue opsin has evolved twice as fast as rhodopsin and is only as conservative as the beta globin, which has evolved at the average rate of mammalian proteins. Interestingly, the interhelical loops are, on average, actually more conservative than the transmembrane alpha helical regions. The introns of the blue opsin gene have evolved at the average rate of introns in primate genes.

  15. Nucleotide sequence of the alpha-amylase-pullulanase gene from Clostridium thermohydrosulfuricum.

    PubMed

    Melasniemi, H; Paloheimo, M; Hemiö, L

    1990-03-01

    The nucleotide sequence of the gene (apu) encoding the thermostable alpha-amylase-pullulanase of Clostridium thermohydrosulfuricum was determined. An open reading frame of 4425 bp was present. The deduced polypeptide (Mr 165,600), including a 31 amino acid putative signal sequence, comprised 1475 amino acids, with no cysteine residues. The structural gene was preceded by the consensus promoter sequence TTGACA TATAAT, a putative regulatory sequence and a putative ribosome-binding sequence AAAGGGGG. The codon usage resembled that of Bacillus genes. The deduced sequence of the mature apu product showed similarities to various amylolytic enzymes, especially the neopullulanase of Bacillus stearothermophilus, whereas the signal sequence showed similarity to those of the alpha-amylases of B. stearothermophilus and B. subtilis. Three regions thought to be highly conserved in the primary structure of alpha-amylases could also be distinguished in the apu product, two being partly 'duplicated' in this alpha-1,4/alpha-1,6-active enzyme.

  16. [Analysis of full-length gene sequence of rabies vaccine virus aG strain].

    PubMed

    Li, Jia; Cao, Shou-Chun; Shi, Lei-Tai; Wu, Xiao-Hong; Liu, Jing-Hua; Wang, Yun-Peng; Tang, Jian-Rong; Yu, Yong-Xin; Dong, Guan-Mu

    2013-06-01

    To sequence and analyze the full-length gene sequence of rabies vaccine virus aG strain. The full-length gene sequence of aG strain was amplified by RT-PCR by 8 fragments,each PCR product was cloned into vector pGEM-T respectively, sequenced and assemblied; The 5' leader sequence was sequenced with method of 5' RACE. The homology between aG and other rabies vaccine virus was analyzed by using DNAstar and Mega4. 0 software. aG strain was 11 925nt(GenBank accession number: JN234411) in length and belonged to the genotype I . The Bioinformatics revealed that the homology showed disparation form different rabies vaccine virus. the full-length gene sequence of rabies vaccine virus aG strain provided a support for perfecting the standard for quality control of virus strains for production of rabies vaccine for human use in China.

  17. Complete Sequence Construction of the Highly Repetitive Ribosomal RNA Gene Repeats in Eukaryotes Using Whole Genome Sequence Data.

    PubMed

    Agrawal, Saumya; Ganley, Austen R D

    2016-01-01

    The ribosomal RNA genes (rDNA) encode the major rRNA species of the ribosome, and thus are essential across life. These genes are highly repetitive in most eukaryotes, forming blocks of tandem repeats that form the core of nucleoli. The primary role of the rDNA in encoding rRNA has been long understood, but more recently the rDNA has been implicated in a number of other important biological phenomena, including genome stability, cell cycle, and epigenetic silencing. Noncoding elements, primarily located in the intergenic spacer region, appear to mediate many of these phenomena. Although sequence information is available for the genomes of many organisms, in almost all cases rDNA repeat sequences are lacking, primarily due to problems in assembling these intriguing regions during whole genome assemblies. Here, we present a method to obtain complete rDNA repeat unit sequences from whole genome assemblies. Limitations of next generation sequencing (NGS) data make them unsuitable for assembling complete rDNA unit sequences; therefore, the method we present relies on the use of Sanger whole genome sequence data. Our method makes use of the Arachne assembler, which can assemble highly repetitive regions such as the rDNA in a memory-efficient way. We provide a detailed step-by-step protocol for generating rDNA sequences from whole genome Sanger sequence data using Arachne, for refining complete rDNA unit sequences, and for validating the sequences obtained. In principle, our method will work for any species where the rDNA is organized into tandem repeats. This will help researchers working on species without a complete rDNA sequence, those working on evolutionary aspects of the rDNA, and those interested in conducting phylogenetic footprinting studies with the rDNA. PMID:27576718

  18. Functional analysis and nucleotide sequence of the promoter region of the murine hck gene.

    PubMed Central

    Lock, P; Stanley, E; Holtzman, D A; Dunn, A R

    1990-01-01

    The structure and function of the promoter region and exon 1 of the murine hck gene have been characterized in detail. RNase protection analysis has established that hck transcripts initiate from heterogeneous start sites located within the hck gene. Fusion gene constructs containing hck 5'-flanking sequences and the bacterial Neor gene have been introduced into the hematopoietic cell lines FDC-P1 and WEHI-265 by using a self-inactivating retroviral vector. The transcriptional start sites of the fusion gene are essentially identical to those of the endogenous hck gene. Analysis of infected WEHI-265 cell lines treated with bacterial lipopolysaccharide (LPS) reveals a 3- to 5-fold elevation in the levels of endogenous hck mRNA and a 1.4- to 2.6-fold increase in the level of Neor fusion gene transcripts, indicating that hck 5'-flanking sequences are capable of conferring LPS responsiveness on the Neor gene. The 5'-flanking region of the hck gene contains sequences similar to an element which is thought to be involved in the LPS responsiveness of the class II major histocompatibility gene A alpha k. A subset of these sequences are also found in the 5'-flanking regions of other LPS-responsive genes. Moreover, this motif is related to the consensus binding sequence of NF-kappa B, a transcription factor which is known to be regulated by LPS. Images PMID:2388619

  19. Complete nucleotide sequence of a 16S ribosomal RNA gene from Escherichia coli.

    PubMed Central

    Brosius, J; Palmer, M L; Kennedy, P J; Noller, H F

    1978-01-01

    The complete nucleotide sequence of the 16S RNA gene from the rrnB cistron of Escherichia coli has been determined by using three rapid DNA sequencing methods. Nearly all of the structure has been confirmed by two to six independent sequence determinations on both DNA strands. The length of the 16S rRNA chain inferred from the DNA sequence is 1541 nucleotides, in close agreement with previous estimates. We note discrepancies between this sequence and the most recent version of it reported from direct RNA sequencing [Ehresmann, C., Stiegler, P., Carbon, P. & Ebel, J.P. (1977) FEBS Lett. 84, 337-341]. A few of these may be explained by heterogeneity among 16S rRNA sequences from different cistrons. No nucleotide sequences were found in the 16S rRNA gene that cannot be reconciled with RNase digestion products of mature 16S rRNA. Images PMID:368799

  20. Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi.

    PubMed

    Waterhouse, Robert M; Zdobnov, Evgeny M; Kriventseva, Evgenia V

    2011-01-01

    Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multicopy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage, and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the "knockout-rate prediction" of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multicopy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously underappreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of "single-copy control" versus "multicopy license" may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space.

  1. Contiguous Genomic DNA Sequence Comprising the 19-kD Zein Gene Family from Maize1

    PubMed Central

    Song, Rentao; Messing, Joachim

    2002-01-01

    A new approach has been undertaken to analyze the sequences and linear organization of the 19-kD zein genes in maize (Zea mays). A high-coverage, large-insert genomic library of the inbred line B73 based on bacterial artificial chromosomes was used to isolate a redundant set of clones containing members of the 19-kD zein gene family, which previously had been estimated to consist of 50 members. The redundant set of clones was used to create bins of overlapping clones that represented five distinct genomic regions. Representative clones containing the entire set of 19-kD zein genes were chosen from each region and sequenced. Seven bacterial artificial chromosome clones yielded 1,160 kb of genomic DNA. Three of them formed a contiguous sequence of 478 kb, the longest contiguous sequenced region of the maize genome. Altogether, these DNA sequences provide the linear organization of 25 19-kD zein genes, one-half the number previously estimated. It is suggested that the difference is because of haplotypes exhibiting different degrees of gene amplification in the zein multigene family. About one-half the genes present in B73 appear to be expressed. Because some active genes have only been duplicated recently, they are so conserved in their sequence that previous cDNA sequence analysis resulted in “unigenes” that were actually derived from different gene copies. This analysis also shows that the 22- and 19-kD zein gene families shared a common ancestor. Although both ancestral genes had the same incremental gene amplification, the 19-kD zein branch exhibited a greater degree of far-distance gene translocations than the 22-kD zein gene family. PMID:12481046

  2. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant. PMID:26252423

  3. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  4. Compilation of 5S rRNA and 5S rRNA gene sequences

    PubMed Central

    Specht, Thomas; Wolters, Jörn; Erdmann, Volker A.

    1990-01-01

    The BERLIN RNA DATABANK as of Dezember 31, 1989, contains a total of 667 sequences of 5S rRNAs or their genes, which is an increase of 114 new sequence entries over the last compilation (1). It covers sequences from 44 archaebacteria, 267 eubacteria, 20 plastids, 6 mitochondria, 319 eukaryotes and 11 eukaryotic pseudogenes. The hardcopy shows only the list (Table 1) of those organisms whose sequences have been determined. The BERLIN RNA DATABANK uses the format of the EMBL Nucleotide Sequence Data Library complemented by a Sequence Alignment (SA) field including secondary structure information. PMID:1692116

  5. Characterization of promoter sequence of toll-like receptor genes in Vechur cattle

    PubMed Central

    Lakshmi, R.; Jayavardhanan, K. K.; Aravindakshan, T. V.

    2016-01-01

    Aim: To analyze the promoter sequence of toll-like receptor (TLR) genes in Vechur cattle, an indigenous breed of Kerala with the sequence of Bos taurus and access the differences that could be attributed to innate immune responses against bovine mastitis. Materials and Methods: Blood samples were collected from Jugular vein of Vechur cattle, maintained at Vechur cattle conservation center of Kerala Veterinary and Animal Sciences University, using an acid-citrate-dextrose anticoagulant. The genomic DNA was extracted, and polymerase chain reaction was carried out to amplify the promoter region of TLRs. The amplified product of TLR2, 4, and 9 promoter regions was sequenced by Sanger enzymatic DNA sequencing technique. Results: The sequence of promoter region of TLR2 of Vechur cattle with the B. taurus sequence present in GenBank showed 98% similarity and revealed variants for four sequence motifs. The sequence of the promoter region of TLR4 of Vechur cattle revealed 99% similarity with that of B. taurus sequence but not reveals significant variant in motifregions. However, two heterozygous loci were observed from the chromatogram. Promoter sequence of TLR9 gene also showed 99% similarity to B. taurus sequence and revealed variants for four sequence motifs. Conclusion: The results of this study indicate that significant variation in the promoter of TLR2 and 9 genes in Vechur cattle breed and may potentially link the influence the innate immunity response against mastitis diseases. PMID:27397987

  6. Mouse mammary tumor virus-like gene sequences are present in lung patient specimens

    PubMed Central

    2011-01-01

    Background Previous studies have reported on the presence of Murine Mammary Tumor Virus (MMTV)-like gene sequences in human cancer tissue specimens. Here, we search for MMTV-like gene sequences in lung diseases including carcinomas specimens from a Mexican population. This study was based on our previous study reporting that the INER51 lung cancer cell line, from a pleural effusion of a Mexican patient, contains MMTV-like env gene sequences. Results The MMTV-like env gene sequences have been detected in three out of 18 specimens studied, by PCR using a specific set of MMTV-like primers. The three identified MMTV-like gene sequences, which were assigned as INER6, HZ101, and HZ14, were 99%, 98%, and 97% homologous, respectively, as compared to GenBank sequence accession number AY161347. The INER6 and HZ-101 samples were isolated from lung cancer specimens, and the HZ-14 was isolated from an acute inflammatory lung infiltrate sample. Two of the env sequences exhibited disruption of the reading frame due to mutations. Conclusion In summary, we identified the presence of MMTV-like gene sequences in 2 out of 11 (18%) of the lung carcinomas and 1 out of 7 (14%) of acute inflamatory lung infiltrate specimens studied of a Mexican Population. PMID:21943279

  7. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    PubMed Central

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2005-01-01

    We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085

  8. Study of carD gene sequence in clinical isolates of Mycobacterium tuberculosis.

    PubMed

    Sarmadian, Hossein; Nazari, Razieh; Zolfaghari, Mohammad Reza; Pirayandeh, Mina; Sadrnia, Maryam; Arjomandzadegan, Mohammad; Titov, Leonid P; Rajabi, Fariba; Ahmadi, Azam; Shojapoor, Mana

    2014-03-01

    Mycobacterium tuberculosis growth rate is closely coupled to rRNA transcription which is regulated through carD gene. The aim of this study was to determine the sequence of carD gene in drug susceptible and resistant clinical isolates of M. tuberculosis and designing of a PCR assay based on carD sequence for rapid detection of this bacterium.Specific primers for amplification of carD gene were carefully designed, so that whole sequence of gene could be amplified; therefore primers were positioned at the upstream (promoter of this gene and ispD gene) and downstream (in ispD gene). DNA from 41 clinical isolates of M. tuberculosis with different pattern of drug resistance was used in the study. PCR conditions and annealing temperature were designed by means of online programs. PCR products were sequenced by ABI system.PCR product of carD gene was a 524 bp fragment. This method could detect all resistant and susceptible strains of M. tuberculosis. The size of amplified fragment was similar in all investigated samples. Sequence analysis showed that there was similar sequence in all of our isolates therefore probably this gene is considered to be conservative. Translation of nucleotide mode to amino acids was showed that TRCF domain in N-terminal of protein CarD was found to be fully conservative.This is the first study on the sequence of carD gene in clinical isolates of M. tuberculosis. This conservative gene is recommended for use as a target for designing of suitable inhibitors as anti-tuberculosis drug because its importance for life of MTB. In the other hand, a PCR detection method based on detection of carD gene was recommended for rapid detection in routine test.

  9. Identification of a conserved sequence in the non-coding regions of many human genes.

    PubMed Central

    Donehower, L A; Slagle, B L; Wilde, M; Darlington, G; Butel, J S

    1989-01-01

    We have analyzed a sequence of approximately 70 base pairs (bp) that shows a high degree of similarity to sequences present in the non-coding regions of a number of human and other mammalian genes. The sequence was discovered in a fragment of human genomic DNA adjacent to an integrated hepatitis B virus genome in cells derived from human hepatocellular carcinoma tissue. When one of the viral flanking sequences was compared to nucleotide sequences in GenBank, more than thirty human genes were identified that contained a similar sequence in their non-coding regions. The sequence element was usually found once or twice in a gene, either in an intron or in the 5' or 3' flanking regions. It did not share any similarities with known short interspersed nucleotide elements (SINEs) or presently known gene regulatory elements. This element was highly conserved at the same position within the corresponding human and mouse genes for myoglobin and N-myc, indicating evolutionary conservation and possible functional importance. Preliminary DNase I footprinting data suggested that the element or its adjacent sequences may bind nuclear factors to generate specific DNase I hypersensitive sites. The size, structure, and evolutionary conservation of this sequence indicates that it is distinct from other types of short interspersed repetitive elements. It is possible that the element may have a cis-acting functional role in the genome. Images PMID:2536922

  10. Molecular cloning and sequencing of the gene encoding the fimbrial subunit protein of Bacteroides gingivalis.

    PubMed Central

    Dickinson, D P; Kubiniec, M A; Yoshimura, F; Genco, R J

    1988-01-01

    The gene encoding the fimbrial subunit protein of Bacteroides gingivalis 381, fimbrilin, has been cloned and sequenced. The gene was present as a single copy on the bacterial chromosome, and the codon usage in the gene conformed closely to that expected for an abundant protein. The predicted size of the mature protein was 35,924 daltons, and the secretory form may have had a 10-amino-acid, hydrophilic leader sequence similar to the leader sequences of the MePhe fimbriae family. The protein sequence had no marked similarity to known fimbrial sequences, and no homologous sequences could be found in other black-pigmented Bacteroides species, suggesting that fimbrillin represents a class of fimbrial subunit protein of limited distribution. Images PMID:2895100

  11. Gene expression profile in the anterior regeneration of the earthworm using expressed sequence tags.

    PubMed

    Cho, Sung-Jin; Lee, Myung Sik; Tak, Eun Sik; Lee, Eun; Koh, Ki Seok; Ahn, Chi Hyun; Park, Soon Cheol

    2009-01-01

    In order to gain insight into the gene expression profiles associated with anterior regeneration of the earthworm, Perionyx excavatus, we analyzed 1,159 expressed sequence tags (ESTs) derived from cDNA library early anterior regenerated tissue. Among the 1,159 ESTs analyzed, 622 (53.7%) ESTs showed significant similarity to known genes and represented 338 genes, of which 233 ESTs were singletons and 105 ESTs manifested as two or more ESTs. While 663 ESTs (57.2%) were sequenced only once, 308 ESTs (26.6%) appeared 2 to 5 times, and 188 ESTs (16.2%) were sequenced more than 5 times. A total of 803 genes were categorized into 15 groups according to their biological functions. Among 1,159 ESTs sequenced, we found several gene encoding signaling molecules, such as Notch and Distal-less. The ESTs used in this study should provide a resource for future research in earthworm regeneration. PMID:19129665

  12. Analysis of the regions flanking the human insulin gene and sequence of an Alu family member.

    PubMed Central

    Bell, G I; Pictet, R; Rutter, W J

    1980-01-01

    The regions around the human insulin gene have been studied by heteroduplex, hybridization and sequence analysis. These studies indicated that there is a region of heterogeneous length located approximately 700 bp before the 5' end of the gene; and that the 19 kb of cloned DNA which includes the 1430 bp insulin gene as well as 5650 bp before and 11,500 bp after the gene is single copy sequence except for 500 bp located 6000 bp from the 3' end of the gene. This 500 bp segment contains a member of the Alu family of dispersed middle repetitive sequences as well as another less highly repeated homopolymeric segment. The sequence of this region was determined. This Alu repeat is bordered by 19 bp direct repeats and also contains an 83 bp sequence which is present twice. The regions flanking the human and rat I insulin genes were compared by heteroduplex analysis to localize homologous sequences in the flanking regions which could be involved in the regulation of insulin biosynthesis. The homology between the two genes is restricted to the region encoding preproinsulin and a short region of approximately 60 bp flanking the 5' side of the genes. Images PMID:6253909

  13. Murine candidate bleomycin induced pulmonary fibrosis susceptibility genes identified by gene expression and sequence analysis of linkage regions

    PubMed Central

    Haston, C; Tomko, T; Godin, N; Kerckhoff, L; Hallett, M

    2005-01-01

    Background: Pulmonary fibrosis is a complex disease for which the predisposing genetic variants remain unknown. In a prior study, susceptibility to bleomycin induced pulmonary fibrosis was mapped to loci Blmpf1 and Blmpf2 on chromosomes 17 and 11, respectively, in a C57BL/6J (B6, susceptible) and C3Hf/KAM (C3H, resistant) mouse cross. Methods: Herein, the genetic basis of bleomycin induced pulmonary fibrosis was investigated in an approach combining gene expression and sequencing data with previously mapped linkage intervals. Results: In this study, gene expression analysis with microarrays revealed 1892 genes or ESTs (expressed sequence tags) to be differentially expressed between bleomycin treated B6 and C3H mice and 67 of these genetic elements map to Blmpf1 or Blmpf2. This group included genes involved in an oxidative stress response, in apoptosis, and in immune regulation. A comparison of the B6 and C3H sequence, for Blmpf1 and Blmpf2, made using the NCBI database and available C3H sequence, revealed approximately 35% of the genes in these regions contain non-synonymous coding sequence changes. An assessment of genotype/phenotype correlation among other inbred strains revealed 36% of these B6/C3H sequence variations predict for the known bleomycin induced fibrosis susceptibility of the DBA (susceptible) and A/J (resistant) mouse strains. Conclusions: Combining genomics approaches of differential gene expression and sequence variation potentially identifies approximately 5% the linked genes as fibrosis susceptibility candidate genes in this mouse cross. PMID:15937080

  14. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  15. Genomic sequence and organization of two members of a human lectin gene family

    SciTech Connect

    Gitt, M.A.; Barondes, S.H. )

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family.

  16. The essential helicase gene RAD3 suppresses short-sequence recombination in Saccharomyces cerevisiae.

    PubMed Central

    Bailis, A M; Maines, S; Negritto, M T

    1995-01-01

    We have isolated an allele of the essential DNA repair and transcription gene RAD3 that relaxes the restriction against recombination between short DNA sequences in Saccharomyces cerevisiae. Double-strand break repair and gene replacement events requiring recombination between short identical or mismatched sequences were stimulated in the rad3-G595R mutant cells. We also observed an increase in the physical stability of double-strand breaks in the rad3-G595R mutant cells. These results suggest that the RAD3 gene suppresses recombination involving short homologous sequences by promoting the degradation of the ends of broken DNA molecules. PMID:7623796

  17. Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

    PubMed Central

    Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

    2012-01-01

    Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922

  18. Cloning and sequencing of human lambda immunoglobulin genes by the polymerase chain reaction.

    PubMed

    Songsivilai, S; Bye, J M; Marks, J D; Hughes-Jones, N C

    1990-12-01

    Universal oligonucleotide primers, designed for amplifying and sequencing genes encoding the rearranged human lambda immunoglobulin variable region, were validated by amplification of the lambda light chain genes from four human heterohybridoma cell lines and in the generation of a cDNA library of human V lambda sequences from Epstein-Barr virus-transformed human peripheral blood lymphocytes. This technique allows rapid cloning and sequencing of human immunoglobulin genes, and has potential applications in the rescue of unstable human antibody-producing cell lines and in the production of human monoclonal antibodies.

  19. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  20. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  1. Two sequence classes of kinetoplastid 5S ribosomal RNA gene revealed among bodonid spliced leader RNA gene arrays.

    PubMed

    Santana, D M; Lukes, J; Sturm, N R; Campbell, D A

    2001-11-13

    The spliced leader RNA genes of Bodo saltans, Cryptobia helicis and Dimastigella trypaniformis were analyzed as molecular markers for additional taxa within the suborder Bodonina. The non-transcribed spacer regions were distinctive for each organism, and 5S rRNA genes were present in Bodo and Dimastigella but not in C. helicis. Two sequence classes of 5S rRNA were evident from analysis of the bodonid genes. The two classes of 5S rRNA genes were found in other Kinetoplastids independent of co-localization with the spliced leader RNA gene.

  2. Sequences of CAZ-3 and CTX-2 extended-spectrum beta-lactamase genes.

    PubMed Central

    Chanal, C; Sirot, D; Malaure, H; Poupart, M C; Sirot, J

    1994-01-01

    The nucleotide sequences of blaTEM genes coding for the extended-spectrum beta-lactamases CAZ-3 and CTX-2 were determined. The gene for CAZ-3 is identical to blaTEM-12b. The gene for CTX-2 differs from characterized blaTEM genes for extended-spectrum beta-lactamases by a new combination of already known mutations. We propose for CTX-2 the designation TEM-25. PMID:7840586

  3. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    SciTech Connect

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  4. Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

    SciTech Connect

    Onda, M.; Kudo, S.; Fukuda, M. ); Rearden, A. ); Mattei, G.M. )

    1993-08-01

    Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification of this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.

  5. Exome sequencing and arrayCGH detection of gene sequence and copy number variation between ILS and ISS mouse strains.

    PubMed

    Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M

    2014-06-01

    It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to

  6. Complex repetitive arrangements of gene sequence in the candidate region of the spinal muscular atrophy gene in 5q13

    SciTech Connect

    Theodosiou, A.M.; Nesbit, A.M.; Daniels, R.J.; Campbell, L.; Francis, M.J.; Christodoulou, Z.; Morrison, K.E.; Davies, K.E. |

    1994-12-01

    Childhood-onset proximal spinal muscular atrophy (SMA) is a heritable neurological disorder, which has been mapped by genetic linkage analysis to chromosome 5q13, in the interval between markers D5S435 and D5S557. Here, we present gene sequences that have been isolated from this interval, several of which show sequence homologies to exons of {beta}-glucuronidase. These gene sequences are repeated several times across the candidate region and are also present on chromosome 5p. The arrangement of these repetitive gene motifs is polymorphic between individuals. The high degree of variability observed may have some influence on the expression of the genes in the region. Since SMA is not inherited as a classical autosomal recessive disease, novel genomic rearrangements arising from aberrant recombination events between the complex repeats may be associated with the phenotype observed.

  7. Signature, a web server for taxonomic characterization of sequence samples using signature genes.

    PubMed

    Dutilh, Bas E; He, Ying; Hekkelman, Maarten L; Huynen, Martijn A

    2008-07-01

    Signature genes are genes that are unique to a taxonomic clade and are common within it. They contain a wealth of information about clade-specific processes and hold a strong evolutionary signal that can be used to phylogenetically characterize a set of sequences, such as a metagenomics sample. As signature genes are based on gene content, they provide a means to assess the taxonomic origin of a sequence sample that is complementary to sequence-based analyses. Here, we introduce Signature (http://www.cmbi.ru.nl/signature), a web server that identifies the signature genes in a set of query sequences, and therewith phylogenetically characterizes it. The server produces a list of taxonomic clades that share signature genes with the set of query sequences, along with an insightful image of the tree of life, in which the clades are color coded based on the number of signature genes present. This allows the user to quickly see from which part(s) of the taxonomy the query sequences likely originate.

  8. Comparative sequence analysis of a gene-dense region among closely related species of Drosophila melanogaster.

    PubMed

    Kawahara, Yoshihiro; Matsuo, Takashi; Nozawa, Masafumi; Shin-I, Tadasu; Kohara, Yuji; Aigaki, Toshiro

    2004-12-01

    Comparative sequence analysis among closely related species is essential for investigating the evolution of non-coding sequences, which evolve more rapidly than protein-coding sequences. We sequenced the cytogenetic map 56F10-16, a gene-dense region of D. simulans and D. sechellia, closely related species to D. melanogaster. About 57 kb of the genomic sequences containing 19 genes were annotated from each species according to the corresponding region of the D. melanogaster genome. The order and orientation of genes were perfectly conserved among the three species, and no transposable elements were found. The rate of nucleotide substitutions in the non-coding sequences was lower than that at the fourfold-degenerate sites, implying functional constraints in the non-coding regions. The sequence information from three closely related species, allowed us to estimate the insertions and the deletions that may have occurred in the lineages of D. simulans and D. sechellia using the D. melanogaster sequence as an outgroup. The number of deletions was twice that of insertions for the introns of D. simulans. More remarkably, the deletion outnumbered insertions by 7.5 times for the intergenic sequences of D. sechellia. These results suggest that the non-coding sequences have been shortened by deletion biases. However, the deletion bias was lower than that previously estimated for pseudogenes, suggesting that the non-coding sequences are already rich in functional elements, possibly involved in the regulation of gene expression including transcription and pre-mRNA processing. These features of non-coding sequences may be common to other gene-dense regions contributing to the compactness of the Drosophila genome.

  9. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  10. Comparison of MALDI-TOF MS, Housekeeping Gene Sequencing, and 16S rRNA Gene Sequencing for Identification of Aeromonas Clinical Isolates

    PubMed Central

    Shin, Hee Bong; Yoon, Jihoon; Lee, Yangsoon; Kim, Myung Sook

    2015-01-01

    Purpose The genus Aeromonas is a pathogen that is well known to cause severe clinical illnesses, ranging from gastroenteritis to sepsis. Accurate identification of A. hydrophila, A. caviae, and A. veronii is important for the care of patients. However, species identification remains difficult using conventional methods. The aim of this study was to compare the accuracy of different methods of identifying Aeromonas at the species level: a biochemical method, matrix-assisted laser desorption ionization mass spectrometry-time of flight (MALDI-TOF MS), 16S rRNA sequencing, and housekeeping gene sequencing (gyrB, rpoB). Materials and Methods We analyzed 65 Aeromonas isolates recovered from patients at a university hospital in Korea between 1996 and 2012. The isolates were recovered from frozen states and tested using the following four methods: a conventional biochemical method, 16S rRNA sequencing, housekeeping gene sequencing with phylogenetic analysis, and MALDI-TOF MS. Results The conventional biochemical method and 16S rRNA sequencing identified Aeromonas at the genus level very accurately, although species level identification was unsatisfactory. MALDI-TOF MS system correctly identified 60 (92.3%) isolates at the species level and an additional four (6.2%) at the genus level. Overall, housekeeping gene sequencing with phylogenetic analysis was found to be the most accurate in identifying Aeromonas at the species level. Conclusion The most accurate method of identification of Aeromonas to species level is by housekeeping gene sequencing, although high cost and technical difficulty hinder its usage in clinical settings. An easy-to-use identification method is needed for clinical laboratories, for which MALDI-TOF MS could be a strong candidate. PMID:25684008

  11. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library

    PubMed Central

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Aim Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. Methods After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Results Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. Significance This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes

  12. Ancient conserved regions in new gene sequences and the protein databases

    SciTech Connect

    Green, P.; Hillier, L.; Waterston, R. ); Lipman, D.; States, D.; Claverie, J.M. )

    1993-03-19

    Sets of new gene sequences from human, nematode, and yeast were compared with each other and with a set of Escherichia coli genes in order to detect ancient evolutionarily conserved regions (ACRs) in the encoded proteins. Nearly all of the ACRs so identified were found to be homologous to sequences in the protein databases. This suggests that currently known proteins may already include representatives of most ACRs and that new sequences not similar to any database sequence are unlikely to contain ACRs. Preliminary analyses indicate that moderately expressed genes may be more likely to contain ACRs than rarely expressed genes. It is estimated that there are fewer than 900 ACRs in all. 20 refs., 2 figs., 4 tabs.

  13. Transcriptome sequencing of Hydrangea macrophylla to uncover genes related to reblooming and powdery mildew resistance

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Massively parallel pyrosequencing technology has been used extensively on agronomic crops and model plants. Transcriptome sequencing is a useful first step in functional genomic studies, microarray and gene expression studies, single nucleotide polymorphism (SNP) surveys, quantitative trait loci (QT...

  14. A Neurospora crassa ribosomal protein gene, homologous to yeast CRY1, contains sequences potentially coordinating its transcription with rRNA genes.

    PubMed Central

    Tyler, B M; Harrison, K

    1990-01-01

    We have isolated and sequenced a Neurospora crassa ribosomal protein gene (designated crp-2) strongly homologous to the rp59 gene (CRY1) of yeast and the S14 ribosomal protein gene of mammals. The inferred sequence of the crp-2 protein is more homologous (83%) to the mammalian S14 sequence than to the yeast rp59 sequence (69%). The gene has three intervening sequences (IVSs) two of which are offset 7 bp from the position of IVSs in the mammalian genes. None correspond to the position of the IVS in the yeast gene. Crp-2 was mapped by RFLP analysis to the right arm of linkage group III. The 5' region of the gene contains three copies of a sequence, the Ribo box, previously shown to be required for transcription of both 5S and 40S rRNA genes. We speculate that the Ribo box may coordinate ribosomal protein and rRNA gene transcription. Images PMID:1977135

  15. Targeting of AID-mediated sequence diversification to immunoglobulin genes.

    PubMed

    Kothapalli, Naga Rama; Fugmann, Sebastian D

    2011-04-01

    Activation-induced cytidine deaminase (AID) is a key enzyme for antibody-mediated immune responses. Antibodies are encoded by the immunoglobulin genes and AID acts as a transcription-dependent DNA mutator on these genes to improve antibody affinity and effector functions. An emerging theme in field is that many transcribed genes are potential targets of AID, presenting an obvious danger to genomic integrity. Thus there are mechanisms in place to ensure that mutagenic outcomes of AID activity are specifically restricted to the immunoglobulin loci. Cis-regulatory targeting elements mediate this effect and their mode of action is probably a combination of immunoglobulin gene specific activation of AID and a perversion of faithful DNA repair towards error-prone outcomes.

  16. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  17. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.

  18. Analysis of mammalian gene batteries reveals both stable ancestral cores and highly dynamic regulatory sequences

    PubMed Central

    Ettwiller, Laurence; Budd, Aidan; Spitz, François; Wittbrodt, Joachim

    2008-01-01

    Background Changes in gene regulation are suspected to comprise one of the driving forces for evolution. To address the extent of cis-regulatory changes and how they impact on gene regulatory networks across eukaryotes, we systematically analyzed the evolutionary dynamics of target gene batteries controlled by 16 different transcription factors. Results We found that gene batteries show variable conservation within vertebrates, with slow and fast evolving modules. Hence, while a key gene battery associated with the cell cycle is conserved throughout metazoans, the POU5F1 (Oct4) and SOX2 batteries in embryonic stem cells show strong conservation within mammals, with the striking exception of rodents. Within the genes composing a given gene battery, we could identify a conserved core that likely reflects the ancestral function of the corresponding transcription factor. Interestingly, we show that the association between a transcription factor and its target genes is conserved even when we exclude conserved sequence similarities of their promoter regions from our analysis. This supports the idea that turnover, either of the transcription factor binding site or its direct neighboring sequence, is a pervasive feature of proximal regulatory sequences. Conclusions Our study reveals the dynamics of evolutionary changes within metazoan gene networks, including both the composition of gene batteries and the architecture of target gene promoters. This variation provides the playground required for evolutionary innovation around conserved ancestral core functions. PMID:19087242

  19. Nucleotide sequence of the pnd gene in plasmid R483 and role of the pnd gene product in plasmolysis.

    PubMed

    Ono, K; Akimoto, S; Ohnishi, Y

    1987-01-01

    The pnd gene of R plasmid R483, like the srnB gene of the F plasmid, increases the degradation of stable RNA in Escherichia coli. The nucleotide sequence of the pnd locus was determined and compared with that of the srnB locus. The genes have open reading frames that are 54% homologous, and both have an upstream inverted repeat sequence. The pnd gene expression seems to decrease the osmotic barrier of the cytoplasmic membrane, since no plasmolytic vacuoles were formed in the cells carrying the gene when the cells were exposed to hypertonic sucrose solution. This result suggests that RNase I in the periplasm passes through the altered membrane to degrade stable RNA in the cytoplasm.

  20. [Research on constructing phylogenetics trees of ruminants basing on the database of milk protein gene sequences].

    PubMed

    Fan, B L; Li, N; Wu, C X

    2000-01-01

    Primers designed according to the sequences of four milk protein genes of cow Bos taurus (alpha-lactoalbumin, beta-lactoglobin, beta- and kappa-casein) were used to amplify the full length gene of alpha-lactalbumin in yak Bos grunniens (2999 bp), water buffalo Bubalus arnee bubalis (278 bp), partial sequence of this gene in red deer cervus elaphs xanthopygus (1582 bp), 5' and 3' flanking region of beta-lactoglobin gene (2167 bp and 1096 bp in length respectively), 5'-flanking region and exon VIII to exon IX of beta-casein gene (987 bp and 1096 bp in length respectively), exonIV of kappa-casein gene (780 bp). All the amplified DNA fragments were cloned and the Nt sequences were determined. Phylogenetic tree containing 20 species (or subspecies) of ruminantia suborder was constructed according to the partial sequence of kappa-casein gene exon IV (363 bp in length), which shows good monophyly of the Bovidae. And trees constructed according to other milk protein genes indicate that all the milk protein genes have good features for drawing phylogenetics tree at least among species belonging to different subfamilies.

  1. Nonessential region of bacteriophage P4: DNA sequence, transcription, gene products, and functions.

    PubMed Central

    Ghisotti, D; Finkel, S; Halling, C; Dehò, G; Sironi, G; Calendar, R

    1990-01-01

    We sequenced the leftmost 2,640 base pairs of bacteriophage P4 DNA, thus completing the sequence of the 11,627-base-pair P4 genome. The newly sequenced region encodes three nonessential genes, which are called gop, beta, and cII (in order, from left to right). The gop gene product kills Escherichia coli when the beta protein is absent; the gop and beta genes are transcribed rightward from the same promoter. The cII gene is transcribed leftward to a rho-independent terminator. Mutation of this terminator creates a temperature-sensitive phenotype, presumably owing to a defect in expression of the beta gene. Images PMID:2403440

  2. Sequence and evolution of HLA-DR7- and -DRw53-associated beta-chain genes.

    PubMed

    Young, J A; Wilkinson, D; Bodmer, W F; Trowsdale, J

    1987-07-01

    cDNA clones representing products of the DR7 and DRw53 beta-chain genes were isolated from the human B-lymphoblastoid cell line MANN (DR7,DRw53,DQw2, DPw2). The DRw53 beta sequence was identical to a DRw53 beta sequence derived from cells with a DR4 haplotype. In contrast, the DR7 beta sequence was as unrelated to DR4 beta sequence as it was to other DR beta-related genes, except at the 3'-untranslated region. These results suggest that the DR7 and DR4 haplotypes may have been derived relatively recently from a common ancestral haplotype and that the DR4 and DR7 beta-chain genes have undergone more rapid diversification in their beta 1 domains, most probably as a result of natural selection, than have the DRw53 beta-chain genes. Short tracts of sequence within the DR7 and DRw53 beta 1 domains were shared with other DR beta sequences, indicating that exchanges of genetic information between beta 1 domains of DR beta-related genes have played a part in their evolution. Serological analysis of mouse L-cell transfectants expressing surface HLA-DR7 molecules, confirmed by antibody binding and allelic sequence comparisons, identified amino acid residues that may be critical to the binding of a monomorphic DR- and DP-specific monoclonal antibody.

  3. Bioinformatic tools for DNA/protein sequence analysis, functional assignment of genes and protein classification.

    PubMed

    Rehm, B H

    2001-12-01

    The development of efficient DNA sequencing methods has led to the achievement of the DNA sequence of entire genomes from (to date) 55 prokaryotes, 5 eukaryotic organisms and 10 eukaryotic chromosomes. Thus, an enormous amount of DNA sequence data is available and even more will be forthcoming in the near future. Analysis of this overwhelming amount of data requires bioinformatic tools in order to identify genes that encode functional proteins or RNA. This is an important task, considering that even in the well-studied Escherichia coli more than 30% of the identified open reading frames are hypothetical genes. Future challenges of genome sequence analysis will include the understanding of gene regulation and metabolic pathway reconstruction including DNA chip technology, which holds tremendous potential for biomedicine and the biotechnological production of valuable compounds. The overwhelming volume of information often confuses scientists. This review intends to provide a guide to choosing the most efficient way to analyze a new sequence or to collect information on a gene or protein of interest by applying current publicly available databases and Web services. Recently developed tools that allow functional assignment of genes, mainly based on sequence similarity of the deduced amino acid sequence, using the currently available and increasing biological databases will be discussed.

  4. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  5. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-02-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  6. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed Central

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-01-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  7. AGenDA: gene prediction by cross-species sequence comparison.

    PubMed

    Taher, Leila; Rinner, Oliver; Garg, Saurabh; Sczyrba, Alexander; Morgenstern, Burkhard

    2004-07-01

    Automatic gene prediction is one of the major challenges in computational sequence analysis. Traditional approaches to gene finding rely on statistical models derived from previously known genes. By contrast, a new class of comparative methods relies on comparing genomic sequences from evolutionary related organisms to each other. These methods are based on the concept of phylogenetic footprinting: they exploit the fact that functionally important regions in genomic sequences are usually more conserved than non-functional regions. We created a WWW-based software program for homology-based gene prediction at BiBiServ (Bielefeld Bioinformatics Server). Our tool takes pairs of evolutionary related genomic sequences as input data, e.g. from human and mouse. The server runs CHAOS and DIALIGN to create an alignment of the input sequences and subsequently searches for conserved splicing signals and start/stop codons near regions of local sequence conservation. Genes are predicted based on local homology information and splice signals. The server returns predicted genes together with a graphical representation of the underlying alignment. The program is available at http://bibiserv.TechFak.Uni-Bielefeld.DE/agenda/.

  8. The nucleotide sequence of the amiE gene of Pseudomonas aeruginosa.

    PubMed

    Brammar, W J; Charles, I G; Matfield, M; Liu, C P; Drew, R E; Clarke, P H

    1987-05-11

    The nucleotide sequence of the amiE gene, encoding the aliphatic amidase of Pseudomonas aeruginosa, has been determined. The sequence of 1038 nucleotides shows a strong bias in favour of codons with G or C in the third position, and only 44 different codons are utilised.

  9. Sequence divergence and chromosomal rearrangements during the evolution of human pseudoautosomal genes and their mouse homologs

    SciTech Connect

    Ellison, J.; Li, X.; Francke, U.

    1994-09-01

    The pseudoautosomal region (PAR) is an area of sequence identity between the X and Y chromosomes and is important for mediating X-Y pairing during male meiosis. Of the seven genes assigned to the human PAR, none of the mouse homologs have been isolated by a cross-hybridization strategy. Two of these homologs, Csfgmra and II3ra, have been isolated using a functional assay for the gene products. These genes are quite different in sequence from their human homologs, showing only 60-70% sequence similarity. The Csfgmra gene has been found to further differ from its human homolog in being isolated not on the sex chromosomes, but on a mouse autosome (chromosome 19). Using a mouse-hamster somatic cell hybrid mapping panel, we have mapped the II3ra gene to yet another mouse autosome, chromosome 14. Attempts to clone the mouse homolog of the ANT3 locus resulted in the isolation of two related genes, Ant1 and Ant2, but failed to yield the Ant3 gene. Southern blot analysis of the ANT/Ant genes showed the Ant1 and Ant2 sequences to be well-conserved among all of a dozen mammals tested. In contrast, the ANT3 gene only showed hybridization to non-rodent mammals, suggesting it is either greatly divergent or has been deleted in the rodent lineage. Similar experiments with other human pseudoautosomal probes likewise showed a lack of hybridization to rodent sequences. The results show a definite trend of extensive divergence of pseudoautosomal sequences in addition to chromosomal rearrangements involving X;autosome translocations and perhaps gene deletions. Such observations have interesting implications regarding the evolution of this important region of the sex chromosomes.

  10. A 5.8S nuclear ribosomal RNA gene sequence database: applications to ecology and evolution

    NASA Technical Reports Server (NTRS)

    Cullings, K. W.; Vogler, D. R.

    1998-01-01

    We complied a 5.8S nuclear ribosomal gene sequence database for animals, plants, and fungi using both newly generated and GenBank sequences. We demonstrate the utility of this database as an internal check to determine whether the target organism and not a contaminant has been sequenced, as a diagnostic tool for ecologists and evolutionary biologists to determine the placement of asexual fungi within larger taxonomic groups, and as a tool to help identify fungi that form ectomycorrhizae.

  11. Nucleotide sequence of the DNA packaging and capsid synthesis genes of bacteriophage P2.

    PubMed Central

    Linderoth, N A; Ziermann, R; Haggård-Ljungquist, E; Christie, G E; Calendar, R

    1991-01-01

    Overlapping DNA fragments containing the DNA packaging and capsid synthesis gene region of bacteriophage P2 were cloned and sequenced. In this report we present the complete nucleotide sequence of this 6550 bp region. Each of six open reading frames found in the interval was assigned to one of the essential genes (Q, P, O, N, M and L) by correlating genetic, physical and mutational data with DNA and protein sequence information. Polypeptides predicted were: a capsid completion protein, gpL; the major capsid precursor, gpN; the presumed capsid scaffolding protein; gpO; the ATPase and proposed endonuclease subunits of terminase, gpP and gpM, respectively; and a candidate for the portal protein, gpQ. These gene and protein sequences exhibited no homology to analogous genes or proteins of other bacteriophages. Expression of gene Q in E. coli from a plasmid caused production of a Mr 39,000 Da protein that restored Qam34 growth. This sequence analysis found only genes previously known from analysis of conditional-lethal mutations. No new capsid genes were found. Images PMID:1837355

  12. Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

    SciTech Connect

    Fields, C.A.

    1996-06-01

    The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progress report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.

  13. Insertion of a Telomere Repeat Sequence into a Mammalian Gene Causes Chromosome Instability

    PubMed Central

    Kilburn, April E.; Shea, Martin J.; Sargent, R. Geoffrey; Wilson, John H.

    2001-01-01

    Telomere repeat sequences cap the ends of eucaryotic chromosomes and help stabilize them. At interstitial sites, however, they may destabilize chromosomes, as suggested by cytogenetic studies in mammalian cells that correlate interstitial telomere sequence with sites of spontaneous and radiation-induced chromosome rearrangements. In no instance is the length, purity, or orientation of the telomere repeats at these potentially destabilizing interstitial sites known. To determine the effects of a defined interstitial telomere sequence on chromosome instability, as well as other aspects of DNA metabolism, we deposited 800 bp of the functional vertebrate telomere repeat, TTAGGG, in two orientations in the second intron of the adenosine phosphoribosyltransferase (APRT) gene in Chinese hamster ovary cells. In one orientation, the deposited telomere sequence did not interfere with expression of the APRT gene, whereas in the other it reduced mRNA levels slightly. The telomere sequence did not induce chromosome truncation and the seeding of a new telomere at a frequency above the limits of detection. Similarly, the telomere sequence did not alter the rate or distribution of homologous recombination events. The interstitial telomere repeat sequence in both orientations, however, dramatically increased gene rearrangements some 30-fold. Analysis of individual rearrangements confirmed the involvement of the telomere sequence. These studies define the telomere repeat sequence as a destabilizing element in the interior of chromosomes in mammalian cells. PMID:11113187

  14. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    PubMed

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  15. The mouse collagen X gene: complete nucleotide sequence, exon structure and expression pattern.

    PubMed Central

    Elima, K; Eerola, I; Rosati, R; Metsäranta, M; Garofalo, S; Perälä, M; De Crombrugghe, B; Vuorio, E

    1993-01-01

    Overlapping genomic clones covering the 7.2 kb mouse alpha 1(X) collagen gene, 0.86 kb of promoter and 1.25 kb of 3'-flanking sequences were isolated from two genomic libraries and characterized by nucleotide sequencing. Typical features of the gene include a unique three-exon structure, similar to that in the chick gene, with the entire triple-helical domain of 463 amino acids coded by a single large exon. The highest degree of amino acid and nucleotide sequence conservation was seen in the coding region for the collagenous and C-terminal non-collagenous domains between the mouse and known chick, bovine and human collagen type X sequences. More divergence between the sequences occurred in the N-terminal non-collagenous domain. Similarity between the mammalian collagen X sequences extended into the 3'-untranslated sequence, particularly near the polyadenylation site. The promoter of the mouse collagen X gene was found to contain two TATAA boxes 159 bp apart; primer extension analyses of the transcription start site revealed that both were functional. The promoter has an unusual structure with a very low G + C content of 28% between positions -220 and -1 of the upstream transcription start site. Northern and in situ hybridization analyses confirmed that the expression of the alpha 1(X) collagen gene is restricted to hypertrophic chondrocytes in tissues undergoing endochondral calcification. The detailed sequence information of the gene is useful for studies on the promoter activity of the gene and for generation of transgenic mice. Images Figure 3 Figure 5 Figure 6 PMID:8424763

  16. RNA Sequencing Revealed Numerous Polyketide Synthase Genes in the Harmful Dinoflagellate Karenia mikimotoi

    PubMed Central

    Kimura, Kei; Okuda, Shujiro; Nakayama, Kei; Shikata, Tomoyuki; Takahashi, Fumio; Yamaguchi, Haruo; Skamoto, Setsuko; Yamaguchi, Mineo; Tomaru, Yuji

    2015-01-01

    The dinoflagellate Karenia mikimotoi forms blooms in the coastal waters of temperate regions and occasionally causes massive fish and invertebrate mortality. This study aimed to elucidate the toxic effect of K. mikimotoi on marine organisms by using the genomics approach; RNA-sequence libraries were constructed, and data were analyzed to identify toxin-related genes. Next-generation sequencing produced 153,406 transcript contigs from the axenic culture of K. mikimotoi. BLASTX analysis against all assembled contigs revealed that 208 contigs were polyketide synthase (PKS) sequences. Thus, K. mikimotoi was thought to have several genes encoding PKS metabolites and to likely produce toxin-like polyketide molecules. Of all the sequences, approximately 30 encoded eight PKS genes, which were remarkably similar to those of Karenia brevis. Our phylogenetic analyses showed that these genes belonged to a new group of PKS type-I genes. Phylogenetic and active domain analyses showed that the amino acid sequence of four among eight Karenia PKS genes was not similar to any of the reported PKS genes. These PKS genes might possibly be associated with the synthesis of polyketide toxins produced by Karenia species. Further, a homology search revealed 10 contigs that were similar to a toxin gene responsible for the synthesis of saxitoxin (sxtA) in the toxic dinoflagellate Alexandrium fundyense. These contigs encoded A1–A3 domains of sxtA genes. Thus, this study identified some transcripts in K. mikimotoi that might be associated with several putative toxin-related genes. The findings of this study might help understand the mechanism of toxicity of K. mikimotoi and other dinoflagellates. PMID:26561394

  17. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  18. RNA Sequencing Revealed Numerous Polyketide Synthase Genes in the Harmful Dinoflagellate Karenia mikimotoi.

    PubMed

    Kimura, Kei; Okuda, Shujiro; Nakayama, Kei; Shikata, Tomoyuki; Takahashi, Fumio; Yamaguchi, Haruo; Skamoto, Setsuko; Yamaguchi, Mineo; Tomaru, Yuji

    2015-01-01

    The dinoflagellate Karenia mikimotoi forms blooms in the coastal waters of temperate regions and occasionally causes massive fish and invertebrate mortality. This study aimed to elucidate the toxic effect of K. mikimotoi on marine organisms by using the genomics approach; RNA-sequence libraries were constructed, and data were analyzed to identify toxin-related genes. Next-generation sequencing produced 153,406 transcript contigs from the axenic culture of K. mikimotoi. BLASTX analysis against all assembled contigs revealed that 208 contigs were polyketide synthase (PKS) sequences. Thus, K. mikimotoi was thought to have several genes encoding PKS metabolites and to likely produce toxin-like polyketide molecules. Of all the sequences, approximately 30 encoded eight PKS genes, which were remarkably similar to those of Karenia brevis. Our phylogenetic analyses showed that these genes belonged to a new group of PKS type-I genes. Phylogenetic and active domain analyses showed that the amino acid sequence of four among eight Karenia PKS genes was not similar to any of the reported PKS genes. These PKS genes might possibly be associated with the synthesis of polyketide toxins produced by Karenia species. Further, a homology search revealed 10 contigs that were similar to a toxin gene responsible for the synthesis of saxitoxin (sxtA) in the toxic dinoflagellate Alexandrium fundyense. These contigs encoded A1-A3 domains of sxtA genes. Thus, this study identified some transcripts in K. mikimotoi that might be associated with several putative toxin-related genes. The findings of this study might help understand the mechanism of toxicity of K. mikimotoi and other dinoflagellates. PMID:26561394

  19. Sequence-based model of gap gene regulatory network

    PubMed Central

    2014-01-01

    Background The detailed analysis of transcriptional regulation is crucially important for understanding biological processes. The gap gene network in Drosophila attracts large interest among researches studying mechanisms of transcriptional regulation. It implements the most upstream regulatory layer of the segmentation gene network. The knowledge of molecular mechanisms involved in gap gene regulation is far less complete than that of genetics of the system. Mathematical modeling goes beyond insights gained by genetics and molecular approaches. It allows us to reconstruct wild-type gene expression patterns in silico, infer underlying regulatory mechanism and prove its sufficiency. Results We developed a new model that provides a dynamical description of gap gene regulatory systems, using detailed DNA-based information, as well as spatial transcription factor concentration data at varying time points. We showed that this model correctly reproduces gap gene expression patterns in wild type embryos and is able to predict gap expression patterns in Kr mutants and four reporter constructs. We used four-fold cross validation test and fitting to random dataset to validate the model and proof its sufficiency in data description. The identifiability analysis showed that most model parameters are well identifiable. We reconstructed the gap gene network topology and studied the impact of individual transcription factor binding sites on the model output. We measured this impact by calculating the site regulatory weight as a normalized difference between the residual sum of squares error for the set of all annotated sites and for the set with the site of interest excluded. Conclusions The reconstructed topology of the gap gene network is in agreement with previous modeling results and data from literature. We showed that 1) the regulatory weights of transcription factor binding sites show very weak correlation with their PWM score; 2) sites with low regulatory weight are

  20. Molecular evolution of homologous gene sequences in germline-limited and somatic chromosomes of Acricotopus.

    PubMed

    Staiber, Wolfgang

    2004-08-01

    The origin of germline-limited chromosomes (Ks) as descendants of somatic chromosomes (Ss) and their structural evolution was recently elucidated in the chironomid Acricotopus. The Ks consist of large S-homologous sections and of heterochromatic segments containing germline-specific, highly repetitive DNA sequences. Less is known about the molecular evolution and features of the sequences in the S-homologous K sections. More information about this was received by comparing homologous gene sequences of Ks and Ss. Genes for 5.8S, 18S, 28S, and 5S ribosomal RNA were choosen for the comparison and therefore isolated first by PCR from somatic DNA of Acricotopus and sequenced. Specific K DNA was collected by microdissection of monopolar moving K complements from differential gonial mitoses and was then amplified by degenerate oligonucleotide primer (DOP)-PCR. With the sequence data of the somatic rDNAs, the homologous 5.8S and 5S rDNA sequences were isolated by PCR from the DOP-PCR sequence pool of the Ks. In addition, a number of K DOP-PCR sequences were directly cloned and analysed. One K clone contained a section of a putative N-acetyltransferase gene. Compared with its homolog from the Ss, the sequence exhibited few nucleotide substitutions (99.2% sequence identity). The same was true for the 5.8S and 5S sequences from Ss and Ks (97.5%-100% identity). This supports the idea that the S-homologous K sequences may be conserved and do not evolve independently from their somatic homologs. Possible mechanisms effecting such conservation of S-derived sequences in the Ks are discussed.

  1. Next-generation Sequencing of 16S Ribosomal RNA Gene Amplicons

    PubMed Central

    Sanschagrin, Sylvie; Yergeau, Etienne

    2014-01-01

    One of the major questions in microbial ecology is “who is there?” This question can be answered using various tools, but one of the long-lasting gold standards is to sequence 16S ribosomal RNA (rRNA) gene amplicons generated by domain-level PCR reactions amplifying from genomic DNA. Traditionally, this was performed by cloning and Sanger (capillary electrophoresis) sequencing of PCR amplicons. The advent of next-generation sequencing has tremendously simplified and increased the sequencing depth for 16S rRNA gene sequencing. The introduction of benchtop sequencers now allows small labs to perform their 16S rRNA sequencing in-house in a matter of days. Here, an approach for 16S rRNA gene amplicon sequencing using a benchtop next-generation sequencer is detailed. The environmental DNA is first amplified by PCR using primers that contain sequencing adapters and barcodes. They are then coupled to spherical particles via emulsion PCR. The particles are loaded on a disposable chip and the chip is inserted in the sequencing machine after which the sequencing is performed. The sequences are retrieved in fastq format, filtered and the barcodes are used to establish the sample membership of the reads. The filtered and binned reads are then further analyzed using publically available tools. An example analysis where the reads were classified with a taxonomy-finding algorithm within the software package Mothur is given. The method outlined here is simple, inexpensive and straightforward and should help smaller labs to take advantage from the ongoing genomic revolution. PMID:25226019

  2. Draft Genome Sequence and Gene Annotation of the Uropathogenic Bacterium Proteus mirabilis Pr2921

    PubMed Central

    Giorello, F. M.; Romero, V.; Farias, J.; Scavone, P.; Umpiérrez, A.; Zunino, P.

    2016-01-01

    Here, we report the genome sequence of Proteus mirabilis Pr2921, a uropathogenic bacterium that can cause severe complicated urinary tract infections. After gene annotation, we identified two additional copies of ucaA, one of the most studied fimbrial protein genes, and other fimbriae related-proteins that are not present in P. mirabilis HI4320. PMID:27340058

  3. Bacterial metabarcoding by 16S rRNA gene ion torrent amplicon sequencing.

    PubMed

    Fantini, Elio; Gianese, Giulio; Giuliano, Giovanni; Fiore, Alessia

    2015-01-01

    Ion Torrent is a next generation sequencing technology based on the detection of hydrogen ions produced during DNA chain elongation; this technology allows analyzing and characterizing genomes, genes, and species. Here, we describe an Ion Torrent procedure applied to the metagenomic analysis of 16S rRNA gene amplicons to study the bacterial diversity in food and environmental samples. PMID:25343859

  4. Bacterial metabarcoding by 16S rRNA gene ion torrent amplicon sequencing.

    PubMed

    Fantini, Elio; Gianese, Giulio; Giuliano, Giovanni; Fiore, Alessia

    2015-01-01

    Ion Torrent is a next generation sequencing technology based on the detection of hydrogen ions produced during DNA chain elongation; this technology allows analyzing and characterizing genomes, genes, and species. Here, we describe an Ion Torrent procedure applied to the metagenomic analysis of 16S rRNA gene amplicons to study the bacterial diversity in food and environmental samples.

  5. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    PubMed Central

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies. PMID:25614560

  6. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    PubMed Central

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  7. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders.

    PubMed

    O'Roak, Brian J; Vives, Laura; Fu, Wenqing; Egertson, Jarrett D; Stanaway, Ian B; Phelps, Ian G; Carvill, Gemma; Kumar, Akash; Lee, Choli; Ankenman, Katy; Munson, Jeff; Hiatt, Joseph B; Turner, Emily H; Levy, Roie; O'Day, Diana R; Krumm, Niklas; Coe, Bradley P; Martin, Beth K; Borenstein, Elhanan; Nickerson, Deborah A; Mefford, Heather C; Doherty, Dan; Akey, Joshua M; Bernier, Raphael; Eichler, Evan E; Shendure, Jay

    2012-12-21

    Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes-CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1-may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a β-catenin-chromatin-remodeling network to ASD etiology. PMID:23160955

  8. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    NASA Technical Reports Server (NTRS)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  9. Detection of novel sequence heterogeneity and haplotypic diversity of HLA class II genes.

    PubMed

    Santamaria, P; Boyce-Jacino, M T; Lindstrom, A L; Barbosa, J J; Faras, A J; Rich, S S

    1991-01-01

    Nucleic acid sequences of the second exons of HLA-DRB1, -DRB3/4/5, -DQB1, and -DQA1 genes were determined from 43 homozygous cell lines, representing each of the known class II haplotypes, and from 30 unrelated Caucasian subjects, comprising 60 haplotypes. This systematic sequence analysis was undertaken in order to a) determine the existence of sequence microheterogeneity among cell lines which type as identical by methods other than sequencing; b) determine whether direct sequencing of class II genes will identify the presence of more extensive sequence polymorphism at the population level than that identified with other typing methods; c) accurately determine the molecular composition of the known class II haplotypes; and d) study their evolutionary relatedness by maximum parsimony analysis. The identification of seven previously unidentified haplotypes carrying five new allelic amino acid sequences suggests that sequence microheterogeneity at the population level may be more frequent than previously thought. Maximum parsimony analysis of these haplotypes allowed their evolutionary classification and indicates that the higher mutation rate at DRB1 compared to DQB1 loci in most haplotypic groups is inversed in specific haplotype lineages. Furthermore, the extent and localization of gene conversions and point mutations at class II loci in the evolution of these haplotypes is significantly different at each locus. Identification of additional HLA class II molecular microheterogeneity suggests that direct sequence analysis of class II HLA genes can uncover new allelic sequences in the population and may represent a useful alternative to current typing methodologies to study the effects of sequence allelism in organ transplantation.

  10. Identification of the gene defect responsible for severe hypercholesterolaemia using whole-exome sequencing

    PubMed Central

    Sun, Li-Yuan; Zhang, Yong-Biao; Jiang, Long; Wan, Ning; Wu, Wen-Feng; Pan, Xiao-Dong; Yu, Jun; Zhang, Feng; Wang, Lu-Ya

    2015-01-01

    Familial hypercholesterolaemia (FH) is a serious genetic metabolic disease. We identified a specific family in which the proband had typical homozygous phenotype of FH, but couldn’t detect any mutations in usual pathogenic genes using traditional sequencing. This study is the first attempt to use whole exome sequencing (WES) to identify the pathogenic genes in Chinese FH. The routine examinations were performed on all parentage members, and WES on 5 members. We used bioinformatics methods to splice and filter out the pathogenic gene. Finally, Sanger sequencing and cDNA sequencing were used to verify the candidate genes. Half of parentage members had got hypercholesterolaemia. WES identified LDLR IVS8[−10] as a candidate mutation from 222,267 variations. The Sanger sequencing showed proband had a homozygous mutation inherited from his parents, and this loci were cosegregated with FH phenotype. The cDNA sequencing revealed that this mutations caused abnormal shearing. This mutation was first identified in Chinese patients, and this homozygous mutation is a new genetic type of FH. This is the first time that WES was used in Chinese FH patients. We detected a novel genetic type of LDLR homozygous mutation. WES is powerful tools to identify specific FH families with potentially pathogenic gene mutations. PMID:26077743

  11. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  12. Molecular Phylogenetics of the Genus Trichosporon Inferred from Mitochondrial Cytochrome b Gene Sequences

    PubMed Central

    Biswas, Swarajit Kumar; Wang, Li; Yokoyama, Koji; Nishimura, Kazuko

    2005-01-01

    Mitochondrial cytochrome b (cyt b) genes of 42 strains representing 23 species of the genus Trichosporon were partially sequenced to determine their molecular phylogenetic relationships. Almost half of the 22 strains investigated (from 11 different species) contained introns in their sequences. Analysis of a 396-bp coding sequence from each strain of Trichosporon under investigation showed a total of 141 (35.6%) variable nucleotide sites. A phylogenetic tree based on the cyt b gene sequences revealed that all species of Trichosporon except Trichosporon domesticum and Trichosporon montevideense had species-specific cyt b genes. Trichosporon sp. strain CBS 5581 was identified as Trichosporon pullulans, and one clinical isolate, IFM 48794, was identified as Trichosporon faecale. Analysis of 132-bp deduced amino acid sequences showed a total of 34 (25.75%) variable amino acid sites. T. domesticum and T. montevideense, Trichosporon asahii and Trichosporon asteroides, and Trichosporon gracile and Trichosporon guehoae had identical amino acid sequences. A phylogenetic tree constructed with the ascomycetes Saccharomyces douglasii and Candida glabrata taken as outgroup species and including representative species from closely related genera species of Trichosporon clustered with other basidiomycetous yeasts that contain xylose in their cell wall compositions. These results indicate the effectiveness of mitochondrial cyt b gene sequences for both species identification and the phylogenetic analysis of Trichosporon species. PMID:16207980

  13. Presence and Expression of Microbial Genes Regulating Soil Nitrogen Dynamics Along the Tanana River Successional Sequence

    NASA Astrophysics Data System (ADS)

    Boone, R. D.; Rogers, S. L.

    2004-12-01

    We report on work to assess the functional gene sequences for soil microbiota that control nitrogen cycle pathways along the successional sequence (willow, alder, poplar, white spruce, black spruce) on the Tanana River floodplain, Interior Alaska. Microbial DNA and mRNA were extracted from soils (0-10 cm depth) for amoA (ammonium monooxygenase), nifH (nitrogenase reductase), napA (nitrate reductase), and nirS and nirK (nitrite reductase) genes. Gene presence was determined by amplification of a conserved sequence of each gene employing sequence specific oligonucleotide primers and Polymerase Chain Reaction (PCR). Expression of the genes was measured via nested reverse transcriptase PCR amplification of the extracted mRNA. Amplified PCR products were visualized on agarose electrophoresis gels. All five successional stages show evidence for the presence and expression of microbial genes that regulate N fixation (free-living), nitrification, and nitrate reduction. We detected (1) nifH, napA, and nirK presence and amoA expression (mRNA production) for all five successional stages and (2) nirS and amoA presence and nifH, nirK, and napA expression for early successional stages (willow, alder, poplar). The results highlight that the existing body of previous process-level work has not sufficiently considered the microbial potential for a nitrate economy and free-living N fixation along the complete floodplain successional sequence.

  14. Molecular cloning, expression, and sequence of the pilin gene from nontypeable Haemophilus influenzae M37.

    PubMed Central

    Coleman, T; Grass, S; Munson, R

    1991-01-01

    Nontypeable Haemophilus influenzae M37 adheres to human buccal epithelial cells and exhibits mannose-resistant hemagglutination of human erythrocytes. An isogenic variant of this strain which was deficient in hemagglutination was isolated. A protein with an apparent molecular weight of 22,000 was present in the sodium dodecyl sulfate-polyacrylamide gel profile of sarcosyl-insoluble proteins from the hemagglutination-proficient strain but was absent from the profile of the isogenic hemagglutination-deficient variant. A monoclonal antibody which reacts with the hemagglutination-proficient isolate but not with the hemagglutination-deficient isolate has been characterized. This monoclonal antibody was employed in an affinity column for purification of the protein as well as to screen a genomic library for recombinant clones expressing the gene. Several clones which contained overlapping genomic fragments were identified by reaction with the monoclonal antibody. The gene for the 22-kDa protein was subcloned and sequenced. The gene for the type b pilin from H. influenzae type b strain MinnA was also cloned and sequenced. The DNA sequence of the strain MinnA gene was identical to that reported previously for two other type b strains. The DNA sequence of the strain M37 gene is 77% identical to that of the type b pilin gene, and the derived amino acid sequence is 68% identical to that of the type b pilin. Images PMID:1673447

  15. Delta sequences in the 5' non-coding region of yeast tRNA genes

    PubMed Central

    Gafner, Jürg; Robertis, Eddy M.De; Philippsen, Peter

    1983-01-01

    Two so far undetected tRNA genes were found close to delta (δ) sequences at the sup4 locus on chromosome X in the genome of Saccharomyces cerevisiae. The two genes were identified from their abundant transcription products in frog oocytes. Hybridisation experiments allowed the mapping of the transcripts in cloned DNA and DNA sequence analysis revealed the presence of one AGGtRNAArg and one GACtRNAAsp gene. tRNAAsp genes with sequences similar or identical to GACtRNAAsp exist in 14-16 copies per haploid yeast genome, whereas only one copy was detected for AGGtRNAArg. In vivo labelling of total yeast tRNA with 32P followed by hybridisation revealed that the unique AGGtRNAArg gene is transcribed in S. cerevisiae. δ sequences are present 120 bp upstream from the first coding nucleotide in the case of AGGtRNAArg, 80 bp in the case of GACtRNAAsp and 405 bp in the case of the known UACtRNATyr (sup4) gene. δ sequences, as part of Ty elements or alone, were also found by other investigators at similar distances upstream of the mRNA start in mutant alleles of protein-coding yeast genes. Although protein-coding genes are transcribed by RNA polymerase II and tRNA genes by RNA polymerase III, the 5' non-coding region of both types of genes could conceivably have a peculiar DNA or chromatin structure used as preferred landing sites by transposable elements. ImagesFig. 1.Fig. 2.Fig. 5.Fig. 6. PMID:16453444

  16. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.

    PubMed

    Dalquen, Daniel A; Altenhoff, Adrian M; Gonnet, Gaston H; Dessimoz, Christophe

    2013-01-01

    The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.

  17. Sequence and tissue-specific expression of a putative peroxidase gene from wheat (Triticum aestivum L.).

    PubMed

    Hertig, C; Rebmann, G; Bull, J; Mauch, F; Dudler, R

    1991-01-01

    We have used a cDNA clone encoding a pathogen-induced putative wheat peroxidase to screen a genomic library of wheat (Triticum aestivum L. cv. Cheyenne) and isolated one positive clone, lambda POX1. Sequence analysis revealed that this clone contains a gene encoding a putative peroxidase with a calculated pI of 8.1 which exhibits 58% and 83% sequence identity to the amino acid sequence of the turnip (Brassica rapa) peroxidase and a pathogen-induced putative wheat peroxidase, respectively. The two introns in the wheat gene are at the same positions as introns in the peroxidase genes of tomato and horseradish. Results of S1-mapping experiments suggest that this gene is neither pathogen- nor wound-induced in leaves but is constitutively expressed in roots.

  18. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

    PubMed

    Weirather, Jason L; Afshar, Pegah Tootoonchi; Clark, Tyson A; Tseng, Elizabeth; Powers, Linda S; Underwood, Jason G; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

    2015-10-15

    We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. PMID:26040699

  19. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing

    PubMed Central

    Weirather, Jason L.; Afshar, Pegah Tootoonchi; Clark, Tyson A.; Tseng, Elizabeth; Powers, Linda S.; Underwood, Jason G.; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

    2015-01-01

    We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. PMID:26040699

  20. Sequence and comparative analysis of the MIP gene in Chinese straw mushroom, Volvariella volvacea.

    PubMed

    Chen, Bing-Zhi; Gui, Fu; Xie, Bao-Gui; Zou, Feng; Jiang, Yu-Ji; Deng, You-Jin

    2012-09-01

    The mitochondrial intermediate peptidase (MIP) gene is conserved in fungi. It is linked closely with the mating-type A (mtA) gene. In this study, a fragment of the MIP gene in Volvariella volvacea (Bull. ex Fr.) Singer was first cloned by homologue-based cloning technology. Subsequently, the entire MIP DNA sequence (PYd21-MIP) was obtained after the fragment was compared with the genomic data through BLAST analysis. The PYd21-MIP sequence appeared to be homologous with the MIP gene in other fungi. Phylogenetic analysis of PYd21-MIP and other MIP sequences from diverse fungi agreed with the current organism phylogeny. Analysis of protein domains by InterProScan software and motif searching demonstrated that PYd21-MIP encodes a homologous MIP protein. These data support the hypothesis that the PYd21-MIP protein is a Hog-MIP protein homologue from V. volvacea.

  1. Cloning and sequence analysis of the major outer membrane protein genes of two Chlamydia psittaci strains.

    PubMed

    Zhang, Y X; Morrison, S G; Caldwell, H D; Baehr, W

    1989-05-01

    We cloned and sequenced the gene encoding the major outer membrane protein (MOMP) of two Chlamydia psittaci strains, guinea pig inclusion conjunctivitis (GPIC) strain 1, and meningopneumonitis (Mn) strain Cal-10. Intraspecies alignment of the two C. psittaci MOMP genes revealed 80.6% similarity, and interspecies comparison of C. trachomatis and C. psittaci MOMP genes yielded about 68% similarity. As found previously for C. trachomatis MOMP sequences, stretches of predominantly conserved sequences of GPIC and Mn MOMPs were interrupted by four variable domains whose locations were identical to those of C. trachomatis MOMPs. Seven of eight cysteine residues were found at precisely the same positions in GPIC, Mn, and C. trachomatis MOMPs, emphasizing their importance in structure and function of the protein. Collectively, these results indicate that C. psittaci and C. trachomatis MOMP genes diverged from a common ancestor.

  2. Nucleotide sequence of the transcriptional initiation region of the yeast GAL7 gene.

    PubMed Central

    Nogi, Y; Fukasawa, T

    1983-01-01

    The GAL7 gene of Saccharomyces cerevisiae encodes Gal-1-P uridylyl transferase, the second enzyme of Leloir pathway for the galactose catabolism. We have determined the sequence of 1003 base pairs surrounding and upstream of the transcriptional initiation site of the GAL7 gene. The region sequenced also encompasses the 3' end of GAL10 gene. The 5' end of GAL7 mRNA was determined on the DNA sequence by the S1 nuclease- and exonuclease VII mapping, which is located 21 to 22 base pairs upstream from the translation initiating ATG codon. The primary structure of the GAL7 5' flanking region has many features common to those of multicellular eukaryotic genes. The 3' end of GAL10 mRNA was also determined by the mapping technique with the single-strand specific nucleases to be about 600 base pairs upstream from the 5' end of GAL7 mRNA. Images PMID:6324089

  3. Sequence and tissue-specific expression of a putative peroxidase gene from wheat (Triticum aestivum L.).

    PubMed

    Hertig, C; Rebmann, G; Bull, J; Mauch, F; Dudler, R

    1991-01-01

    We have used a cDNA clone encoding a pathogen-induced putative wheat peroxidase to screen a genomic library of wheat (Triticum aestivum L. cv. Cheyenne) and isolated one positive clone, lambda POX1. Sequence analysis revealed that this clone contains a gene encoding a putative peroxidase with a calculated pI of 8.1 which exhibits 58% and 83% sequence identity to the amino acid sequence of the turnip (Brassica rapa) peroxidase and a pathogen-induced putative wheat peroxidase, respectively. The two introns in the wheat gene are at the same positions as introns in the peroxidase genes of tomato and horseradish. Results of S1-mapping experiments suggest that this gene is neither pathogen- nor wound-induced in leaves but is constitutively expressed in roots. PMID:1653627

  4. Sequence and secondary structure of the mitochondrial 16S ribosomal RNA gene of Ixodes scapularis.

    PubMed

    Krakowetz, Chantel N; Chilton, Neil B

    2015-02-01

    The complete DNA sequences and secondary structure of the mitochondrial (mt) 16S ribosomal (r) RNA gene were determined for six Ixodes scapularis adults. There were 44 variable nucleotide positions in the 1252 bp sequence alignment. Most (95%) nucleotide alterations did not affect the integrity of the secondary structure of the gene because they either occurred at unpaired positions or represented compensatory changes that maintained the base pairing in helices. A large proportion (75%) of the intraspecific variation in DNA sequence occurred within Domains I, II and VI of the 16S gene. Therefore, several regions within this gene may be highly informative for studies of the population genetics and phylogeography of I. scapularis, a major vector of pathogens of humans and domestic animals in North America.

  5. Cloning and sequencing of the rDNA gene family of the water buffalo (Bubalus bubalis).

    PubMed

    Pang, C Y; Deng, T X; Tang, D S; Yang, C Y; Jiang, H; Yang, B Z; Liang, X W

    2012-01-01

    The rDNA genes coding for ribosomal RNA in animals are complicated repeat sequences with high GC content. We amplified water buffalo rDNA gene sequences with the long and accurate (LA) PCR method, using LA Taq DNA polymerase and GC buffer, based on bioinformatic analysis of related organisms. The rDNA genes were found to consist of 9016 nucleotides, including three rRNA genes and two internal transcribed spacers (ITS), which we named 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 28S rRNA. We tested and optimized conditions for cloning these complicated rDNA sequences, including specific rules of primer design, improvements in the reaction system, and selection of the DNA polymerase.

  6. Analysis of hepatitis B virus genotyping and drug resistance gene mutations based on massively parallel sequencing.

    PubMed

    Han, Yingxin; Zhang, Yinxin; Mei, Yanhua; Wang, Yuqi; Liu, Tao; Guan, Yanfang; Tan, Deming; Liang, Yu; Yang, Ling; Yi, Xin

    2013-11-01

    Drug resistance to nucleoside analogs is a serious problem worldwide. Both drug resistance gene mutation detection and HBV genotyping are helpful for guiding clinical treatment. Total HBV DNA from 395 patients who were treated with single or multiple drugs including Lamivudine, Adefovir, Entecavir, Telbivudine, Tenofovir and Emtricitabine were sequenced using the HiSeq 2000 sequencing system and validated using the 3730 sequencing system. In addition, a mixed sample of HBV plasmid DNA was used to determine the cutoff value for HiSeq-sequencing, and 52 of the 395 samples were sequenced three times to evaluate the repeatability and stability of this technology. Of the 395 samples sequenced using both HiSeq and 3730 sequencing, the results from 346 were consistent, and the results from 49 were inconsistent. Among the 49 inconsistent results, 13 samples were detected as drug-resistance-positive using HiSeq but negative using 3730, and the other 36 samples showed a higher number of drug-resistance-positive gene mutations using HiSeq 2000 than using 3730. Gene mutations had an apparent frequency of 1% as assessed by the plasmid testing. Therefore, a 1% cutoff value was adopted. Furthermore, the experiment was repeated three times, and the same results were obtained in 49/52 samples using the HiSeq sequencing system. HiSeq sequencing can be used to analyze HBV gene mutations with high sensitivity, high fidelity, high throughput and automation and is a potential method for hepatitis B virus gene mutation detection and genotyping.

  7. Sequence Diversities of Serine-Aspartate Repeat Genes among Staphylococcus aureus Isolates from Different Hosts Presumably by Horizontal Gene Transfer

    PubMed Central

    Xue, Huping; Lu, Hong; Zhao, Xin

    2011-01-01

    Background Horizontal gene transfer (HGT) is recognized as one of the major forces for bacterial genome evolution. Many clinically important bacteria may acquire virulence factors and antibiotic resistance through HGT. The comparative genomic analysis has become an important tool for identifying HGT in emerging pathogens. In this study, the Serine-Aspartate Repeat (Sdr) family has been compared among different sources of Staphylococcus aureus (S. aureus) to discover sequence diversities within their genomes. Methodology/Principal Findings Four sdr genes were analyzed for 21 different S. aureus strains and 218 mastitis-associated S. aureus isolates from Canada. Comparative genomic analyses revealed that S. aureus strains from bovine mastitis (RF122 and mastitis isolates in this study), ovine mastitis (ED133), pig (ST398), chicken (ED98), and human methicillin-resistant S. aureus (MRSA) (TCH130, MRSA252, Mu3, Mu50, N315, 04-02981, JH1 and JH9) were highly associated with one another, presumably due to HGT. In addition, several types of insertion and deletion were found in sdr genes of many isolates. A new insertion sequence was found in mastitis isolates, which was presumably responsible for the HGT of sdrC gene among different strains. Moreover, the sdr genes could be used to type S. aureus. Regional difference of sdr genes distribution was also indicated among the tested S. aureus isolates. Finally, certain associations were found between sdr genes and subclinical or clinical mastitis isolates. Conclusions Certain sdr gene sequences were shared in S. aureus strains and isolates from different species presumably due to HGT. Our results also suggest that the distributional assay of virulence factors should detect the full sequences or full functional regions of these factors. The traditional assay using short conserved regions may not be accurate or credible. These findings have important implications with regard to animal husbandry practices that may inadvertently

  8. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  9. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ).

    PubMed

    An, Qing-Ming; Zhou, Hui-Tong; Hu, Jiang; Luo, Yu-Zhu; Hickford, Jon G H

    2015-01-01

    The adiponectin gene (ADIPOQ) plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5) of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A₁-D₁, A₂-D₂) were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A₃-C₃) and three SNPs were observed. Two patterns (A₄-B₄, A₅-B₅) and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A) putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg). In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A₁, A₂ and A₃ were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A₁-A₃, A₁-C₃, B₁-A₃ and B₁-C₃ were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits. PMID:26610572

  10. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  11. Comparative Sequence Analysis of the Sorghum Rph Region and the Maize Rp1 Resistance Gene Complex

    PubMed Central

    Ramakrishna, Wusirika; Emberton, John; SanMiguel, Phillip; Ogden, Matthew; Llaca, Victor; Messing, Joachim; Bennetzen, Jeffrey L.

    2002-01-01

    A 268-kb chromosomal segment containing sorghum (Sorghum bicolor) genes that are orthologous to the maize (Zea mays) Rp1 disease resistance (R) gene complex was sequenced. A region of approximately 27 kb in sorghum was found to contain five Rp1 homologs, but most have structures indicating that they are not functional. In contrast, maize inbred B73 has 15 Rp1 homologs in two nearby clusters of 250 and 300 kb. As at maize Rp1, the cluster of R gene homologs is interrupted by the presence of several genes that appear to have no resistance role, but these genes were different from the ones found within the maize Rp1 complex. More than 200 kb of DNA downstream from the sorghum Rp1-orthologous R gene cluster was sequenced and found to contain many duplicated and/or truncated genes. None of the duplications currently exist as simple tandem events, suggesting that numerous rearrangements were required to generate the current genomic structure. Four truncated genes were observed, including one gene that appears to have both 5′ and 3′ deletions. The maize Rp1 region is also unusually enriched in truncated genes. Hence, the orthologous maize and sorghum regions share numerous structural features, but all involve events that occurred independently in each species. The data suggest that complex R gene clusters are unusually prone to frequent internal and adjacent chromosomal rearrangements of several types. PMID:12481055

  12. The nucleotide sequence of the uvrD gene of E. coli.

    PubMed Central

    Finch, P W; Emmerson, P T

    1984-01-01

    The nucleotide sequence of a cloned section of the E. coli chromosome containing the uvrD gene has been determined. The coding region for the UvrD protein consists of 2,160 nucleotides which would direct the synthesis of a polypeptide 720 amino acids long with a calculated molecular weight of 82 kd. The predicted amino acid sequence of the UvrD protein has been compared with the amino acid sequences of other known adenine nucleotide binding proteins and a common sequence has been identified, thought to contribute towards adenine nucleotide binding. PMID:6379604

  13. Sequence analysis of the ERCC2 gene regions in human, mouse, and hamster reveals three linked genes

    SciTech Connect

    Lamerdin, J.E.; Stilwagen, S.A.; Ramirez, M.H.

    1996-06-15

    The ERCC2 (excision repair cross-complementing rodent repair group 2) gene product is involved in transcription-coupled repair as an integral member of the basal transcription factor BTF2/TFIIH complex. Defects in this gene can result in three distinct human disorders, namely the cancer-prone syndrome xeroderma pigmentosum complementation group D, trichothiodystrophy, and Cockayne syndrome. We report the comparative analysis of 91.6 kb of new sequence including 54.3 kb encompassing the human ERCC2 locus, the syntenic region in the mouse (32.6 kb), and a further 4.7 kb of sequence 3{prime} of the previously reported ERCC2 region in the hamster. In addition to ERCC2, our analysis revealed the presence of two previously undescribed genes in all three species. The first is centromeric (in the human) to ERCC2 and is most similar to the kinesin light chain gene in sea urchin. The second gene is telomeric (in the human) to ERCC2 and contains a motif found in ankyrins, some cell proteins, and transcription factors. Multiple EST matches to this putative new gene indicate that it is expressed in several human tissues, including breast. The identification and description of two new genes provides potential candidate genes for disorders mapping to this region of 19q13.2. 42 refs., 6 figs., 3 tabs.

  14. Host range selection of vaccinia recombinants containing insertions of foreign genes into non-coding sequences.

    PubMed

    Smith, K A; Stallard, V; Roos, J M; Hart, C; Cormier, N; Cohen, L K; Roberts, B E; Payne, L G

    1993-01-01

    A simple yet powerful selection system was developed for the insertion of foreign genes in vaccinia virus. The selection system utilizes the vaccinia virus K1L (29K) host range gene which is located in HindIII M. This gene is necessary for growth in RK-13 cells but not in BSC40 or CV-1 cells. A vaccinia mutant (vAbT33) unable to grow on RK-13 cells was constructed having sequences at the 3' end of the K1L gene and the adjacent M2L gene deleted and replaced with the beta-galactosidase gene regulated by the BamHI F (F7L) promoter. A recombination plasmid containing the hepatitis B surface (HBs) antigen gene regulated by the M2L promoter and the complete sequence of the K1L gene was used to insert the HBs gene into vAbT33. The M2L negative K1L positive recombinant was easily isolated in two rounds of plaque purification by plating the virus on RK-13 cell monolayers. The K1L gene selection system allows the isolation of recombinants arising at frequencies as low as 1/100,000. It was noted that recombinants containing vaccinia sequence duplications (promoters) resulted in intragenomic recombinations that eliminated all sequences between the duplications. A second recombination plasmid was constructed that allowed insertion into the vaccinia genome without the loss of vaccinia coding sequences. This was achieved by insertion of the pseudorabies virus GIII gene regulated by the vaccinia H5R (40K) promoter between the translation and transcription stop signals at the 3' end of the K1L gene. The K1L gene transcription stop signal thus became the stop signal for the inserted GIII gene and an upstream transcription stop signal present in the H5R promoter fragment provided the stop signal for the K1L gene. This manipulation of the vaccinia genome had no effect on the accumulation or 5' end of the M2L gene transcripts. Although the insertion lengthened the 3' end and lowered the accumulation of K1L transcripts it altered neither the virulence nor the immunogenicity of the

  15. Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease

    SciTech Connect

    Su, Y.; Zhang, H.; Madrid, R.

    1994-09-01

    Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses is to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.

  16. The qa repressor gene of Neurospora crassa: wild-type and mutant nucleotide sequences.

    PubMed Central

    Huiet, L; Giles, N H

    1986-01-01

    The qa-1S gene, one of two regulatory genes in the qa gene cluster of Neurospora crassa, encodes the qa repressor. The qa-1S gene together with the qa-1F gene, which encodes the qa activator protein, control the expression of all seven qa genes, including those encoding the inducible enzymes responsible for the utilization of quinic acid as a carbon source. The nucleotide sequence of the qa-1S gene and its flanking regions has been determined. The deduced coding sequence for the qa-1S protein encodes 918 amino acids with a calculated molecular weight of 100,650 and is interrupted by a single 66-base-pair intervening sequence. Both constitutive and noninducible mutants occur in the qa-1S gene and two different mutations of each type have been cloned and sequenced. All four mutations occur within the predicted coding region of the qa-1S gene. This result strongly supports the hypothesis that the qa-1S gene encodes a repressor. All four mutations are located within codons for the last 300 amino acids of the qa-1S protein. The mutations in three of the mutants involve amino acid substitutions, while the fourth mutant, which has a constitutive phenotype, contains a frameshift mutation. The two constitutive mutations occur in the most distal region of the gene, possibly implicating the COOH-terminal region of the qa repressor in binding to its target. The two noninducible mutations occur in a region proximal to the constitutive mutations, possibly implicating this region of the qa repressor in binding the inducer. Images PMID:3010294

  17. Insights into corn genes derived from large-scale cDNA sequencing.

    PubMed

    Alexandrov, Nickolai N; Brover, Vyacheslav V; Freidin, Stanislav; Troukhan, Maxim E; Tatarinova, Tatiana V; Zhang, Hongyu; Swaller, Timothy J; Lu, Yu-Ping; Bouck, John; Flavell, Richard B; Feldmann, Kenneth A

    2009-01-01

    We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST). PMID:18937034

  18. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.

    PubMed

    Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko

    2011-06-01

    Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample. PMID:21361774

  19. Sequence analysis of the Alcaligenes eutrophus chromosomally encoded ribulose bisphosphate carboxylase large and small subunit genes and their gene products.

    PubMed Central

    Andersen, K; Caton, J

    1987-01-01

    The nucleotide sequence of the chromosomally encoded ribulose bisphosphate carboxylase/oxygenase (RuBPCase) large (rbcL) and small (rbcS) subunit genes of the hydrogen bacterium Alcaligenes eutrophus ATCC 17707 was determined. We found that the two coding regions are separated by a 47-base-pair intergenic region, and both genes are preceded by plausible ribosome-binding sites. Cotranscription of the rbcL and rbcS genes has been demonstrated previously. The rbcL and rbcS genes encode polypeptides of 487 and 135 amino acids, respectively. Both genes exhibited similar codon usage which was highly biased and different from that of other organisms. The N-terminal amino acid sequence of both subunit proteins was determined by Edman degradation. No processing of the rbcS protein was detected, while the rbcL protein underwent a posttranslational loss of formylmethionyl. The A. eutrophus rbcL and rbcS proteins exhibited 56.8 to 58.3% and 35.6 to 38.5% amino acid sequence homology, respectively, with the corresponding proteins from cyanobacteria, eucaryotic algae, and plants. The A. eutrophus and Rhodospirillum rubrum rbcL proteins were only about 32% homologous. The N- and C-terminal sequences of both the rbcL and the rbcS proteins were among the most divergent regions. Known or proposed active site residues in other rbcL proteins, including Lys, His, Arg, and Asp residues, were conserved in the A. eutrophus enzyme. The A. eutrophus rbcS protein, like those of cyanobacteria, lacks a 12-residue internal sequence that is found in plant RuBPCase. Comparison of hydropathy profiles and secondary structure predictions by the method described by Chou and Fasman (P. Y. Chou and G. D. Fasman, Adv. Enzymol. 47:45-148, 1978) revealed striking similarities between A. eutrophus RuBPCase and other hexadecameric enzymes. This suggests that folding of the polypeptide chains is similar. The observed sequence homologies were consistent with the notion that both the rbcL and rbcS genes of the

  20. Multiple Cis-Acting Sequences Contribute to Evolved Regulatory Variation for Drosophila Adh Genes

    PubMed Central

    Fang, X. M.; Brennan, M. D.

    1992-01-01

    Drosophila affinidisjuncta and Drosophila hawaiiensis are closely related species that display distinct tissue-specific expression patterns for their homologous alcohol dehydrogenase genes (Adh genes). In Drosophila melanogaster transformants, both genes are expressed at high levels in the larval and adult fat bodies, but the D. affinidisjuncta gene is expressed 10-50-fold more strongly in the larval and adult midguts and Malpighian tubules. The present study reports the mapping of cis-acting sequences contributing to the regulatory differences between these two genes in transformants. Chimeric genes were constructed and introduced into the germ line of D. melanogaster. Stage- and tissue-specific expression patterns were determined by measuring steady-state RNA levels in larvae and adults. Three portions of the promoter region make distinct contributions to the tissue-specific regulatory differences between the native genes. Sequences immediately upstream of the distal promoter have a strong effect in the adult Malpighian tubules, while sequences between the two promoters are relatively important in the larval Malpighian tubules. A third gene segment, immediately upstream of the proximal promoter, influences levels of the proximal Adh transcript in all tissues and developmental stages examined, and largely accounts for the regulatory difference in the larval and adult midguts. However, these as well as other sequences make smaller contributions to various aspects of the tissue-specific regulatory differences. In addition, some chimeric genes display aberrant RNA levels for the whole organism, suggesting close physical association between sequences involved in tissue-specific regulatory differences and those important for Adh expression in the larval and adult fat bodies. PMID:1644276

  1. Diversity of Frankia in soil assessed by Illumina sequencing of nifH gene fragments.

    PubMed

    Rodriguez, David; Guerra, Trina M; Forstner, Michael R J; Hahn, Dittmar

    2016-09-01

    Targeted Illumina sequencing of nitrogenase reductase (nifH) gene fragments and analyses of pair-end reads through a modified QIIME pipeline were used to assess the diversity of the actinomyceteous genus Frankia in three soils. Soils were vegetated with host or non-host plants, and included locations in Illinois (ABA, host), Colorado (CoMt, non-host), and Wisconsin (FMWI, non-host). After filtering, seven unique sequences were recovered for soil ABA, six for CoMt, and four sequences for FMWI. These sequences were included in a Bayesian topology anchored by published sequence data from pure cultures of Frankia. Sequences from all three soils showed affinities to Frankia strains from both the Alnus and Elaeagnus host infection groups. Reads representing Casuarina-infective strains were not detected. Four sequences from soil CoMt and five sequences from soil ABA did not cluster, at 97% similarity, into a shared OTU that contained a cultured relative. These results demonstrate that targeted Illumina sequencing provides an efficient and economical method for assessing haplotype diversity of ecofunctional genes (e.g. nifH) at the genus level in microorganisms that perform important ecosystem functions. PMID:27485903

  2. Impact of Pre-Analytical Variables on Cancer Targeted Gene Sequencing Efficiency.

    PubMed

    Araujo, Luiz H; Timmers, Cynthia; Shilo, Konstantin; Zhao, Weiqiang; Zhang, Jianying; Yu, Lianbo; Natarajan, Thanemozhi G; Miller, Clinton J; Yilmaz, Ayse Selen; Liu, Tom; Amann, Joseph; Lapa E Silva, José Roberto; Ferreira, Carlos Gil; Carbone, David P

    2015-01-01

    Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized. PMID:26605948

  3. Intervening sequences in ribosomal RNA genes and bobbed phenotype in Drosophila hydei.

    PubMed

    Franz, G; Kunz, W

    1981-08-13

    The "bobbed' (bb) mutation in Drosophila is represented phenotypically by shortened and abnormally thin scutellar bristles and by delayed development. There is a direct correlation between bristle size and ribosomal RNA (rRNA) synthesis, and the bb mutation was at first explained as a deficiency of rRNA genes (rDNA). However, the bb phenotype can occur in Drosophila melanogaster and Drosophila hydei with high rDNA content, while phenotypically wild-type flies are known with few rRNA genes, suggesting that what matters is not the number of rRNA genes but their transcriptional activity. In D. melanogaster, it has recently emerged that rRNA genes interrupted by an intervening sequence are not transcribed. We now report that in D. hydei, the length of the scutellar bristle is directly proportional to the number of rRNA genes without this intervening sequence.

  4. Biologic: Gene circuits and feedback in an introductory physics sequence for biology and premedical students

    NASA Astrophysics Data System (ADS)

    Cahn, S. B.; Mochrie, S. G. J.

    2014-05-01

    We describe an educational module on feedback and gene circuits that constitute the final topic in a new year-long introductory physics sequence aimed at biology and premedical students at Yale University. The overall goals of this sequence are threefold. First to demonstrate the application of physics and mathematics in the life sciences. Second to introduce biological science majors to mathematical and physical tools, principles, and experiences. Third to seed an enduring appreciation of quantitative approaches in biology and medicine. Here, we present a module on feedback and gene circuits that focuses on a genetic toggle switch and a repressilator. The genetic toggle switch consists of two genes, each of whose protein products represses the other's expression, while the repressilator consists of three genes, each of whose protein products represses the next gene's expression. Analytic, numerical, and electronic treatments of the genetic toggle switch show bistability. A similar treatment of the repressilator reveals sustained oscillations.

  5. Sequence, expression, and polymorphism of the Peromyscus leucopus Mhc class Ib gene, M4.

    PubMed

    Crew, Mark D; Bates, Linda M

    2003-05-01

    The H2 M region harbors about 20 class I genes or gene fragments the function of which are largely obscure. The rat Mhc ( RT1) appears to contain several orthologs of H2 M region genes although orthologs in more distantly related species have yet to be clearly identified. In this report, the sequence of a genomic clone containing a Peromyscus leucopus Mhc ( Pele) class I gene is presented and based on sequence similarity was found to be the Pele ortholog of H2-M4. Unlike H2-M4, which is a pseudogene, PeleM4 appeared to be an intact Mhc class Ib gene. Appropriately splice PeleM4 mRNA transcripts were detected in the liver, lung, and thymus. Polymorphism of PeleM4 was examined by sequencing exon 2 and 3 of the PeleM4 gene from seven different Pele haplotypes and six PeleM4 alleles were identified. These results suggest that the existence of some H2 M region class Ib genes predates the divergence of Peromyscus and Mus genera which occurred 40-60 million years ago and provide an example of unique pathways in the evolution of Mhc class Ib genes.

  6. Performant Mutation Identification Using Targeted Next-Generation Sequencing of 14 Thoracic Aortic Aneurysm Genes.

    PubMed

    Proost, Dorien; Vandeweyer, Geert; Meester, Josephina A N; Salemink, Simone; Kempers, Marlies; Ingram, Christie; Peeters, Nils; Saenen, Johan; Vrints, Christiaan; Lacro, Ronald V; Roden, Dan; Wuyts, Wim; Dietz, Harry C; Mortier, Geert; Loeys, Bart L; Van Laer, Lut

    2015-08-01

    At least 14 causative genes have been identified for both syndromic and nonsyndromic forms of thoracic aortic aneurysm/dissection (TAA), an important cause of death in the industrialized world. Molecular confirmation of the diagnosis is increasingly important for gene-tailored patient management but consecutive, conventional molecular TAA gene screening is expensive and labor-intensive. To circumvent these problems, we developed a TAA gene panel for next-generation sequencing of 14 TAA genes. After validation, we applied the assay to 100 Marfan patients. We identified 90 FBN1 mutations, 44 of which were novel. In addition, Multiplex ligation-dependent probe amplification identified large deletions in six of the remaining samples, whereas false-negative results were excluded by Sanger sequencing of FBN1, TGFBR1, and TGFBR2 in the last four samples. Subsequently, we screened 55 syndromic and nonsyndromic TAA patients. We identified causal mutations in 15 patients (27%), one in each of the six following genes: ACTA2, COL3A1, TGFBR1, MYLK, SMAD3, SLC2A10 (homozygous), two in NOTCH1, and seven in FBN1. We conclude that our approach for TAA genetic testing overcomes the intrinsic hurdles of consecutive Sanger sequencing of all candidate genes and provides a powerful tool for the elaboration of clinical phenotypes assigned to different genes. PMID:25907466

  7. Transcriptome sequencing of transgenic poplar (Populus × euramericana 'Guariento') expressing multiple resistance genes

    PubMed Central

    2014-01-01

    Background Transgenic poplar (Populus × euramericana 'Guariento') plants harboring five exogenous, stress-related genes exhibit increased tolerance to multiple stresses including drought, salt, waterlogging, and insect feeding, but the complex mechanisms underlying stress tolerance in these plants have not been elucidated. Here, we analyzed the differences in the transcriptomes of the transgenic poplar line D5-20 and the non-transgenic line D5-0 using high-throughput transcriptome sequencing techniques and elucidated the functions of the differentially expressed genes using various functional annotation methods. Results We generated 11.80 Gb of sequencing data containing 63, 430, 901 sequences, with an average length of 200 bp. The processed sequences were mapped to reference genome sequences of Populus trichocarpa. An average of 62.30% and 61.48% sequences could be aligned with the reference genomes for D5-20 and D5-0, respectively. We detected 11,352 (D5-20) and 11,372 expressed genes (D5-0), 7,624 (56.61%; D5-20) and 7,453 (65.54%; D5-0) of which could be functionally annotated. A total of 782 differentially expressed genes in D5-20 were identified compared with D5-0, including 628 up-regulated and 154 down-regulated genes. In addition, 196 genes with putative functions related to stress responses were also annotated. Gene Ontology (GO) analysis revealed that 346 differentially expressed genes are mainly involved in 67 biological functions, such as DNA binding and nucleus. KEGG annotation revealed that 36 genes (21 up-regulated and 15 down-regulated) were enriched in 51 biological pathways, 9 of which are linked to glucose metabolism. KOG functional classification revealed that 475 genes were enriched in 23 types of KOG functions. Conclusion These results suggest that the transferred exogenous genes altered the expression of stress (biotic and abiotic) response genes, which were distributed in different metabolic pathways and were linked to some extent. Our

  8. Detection of a novel intragenic rearrangement in the creatine transporter gene by next generation sequencing.

    PubMed

    Yu, Hui; van Karnebeek, Clara; Sinclair, Graham; Hill, Alan; Cui, Hong; Zhang, Victor Wei; Wong, Lee-Jun

    2013-12-01

    Deficiency caused by mutations in the creatine transporter gene (SLC6A8/CT1) is an X-linked form of intellectual disability. The presence of highly homologous pseudogenes and high GC content of SLC6A8 genomic sequence complicates the molecular diagnosis of this disorder. To minimize the pseudogene interference, exons 2 to 13 of SLC6A8 were amplified as a single PCR product using gene-specific long-range PCR (LR-PCR) primers. The GC-rich exon 1 and its flanking intronic sequences were amplified separately in a short fragment under GC-rich conditions and a touchdown PCR program. Traditional Sanger sequence analysis of all coding exons of SLC6A8 from a 3-year-old boy with creatine transporter deficiency did not detect deleterious mutations. The long-range PCR product was used as template followed by massively parallel sequencing (MPS) on HiSeq2000. We were able to detect a tandem duplication involving part of exons 11 and 12 in the SLC6A8 gene. The deduced c.1592_1639dup133 mutation was confirmed to be a hemizygous insertion by targeted genomic DNA and cDNA Sanger sequencing. Combination of deep sequencing technology with long-range PCR revealed a novel intragenic duplication in the SLC6A8 gene, providing a definitive molecular diagnosis of creatine transporter deficiency in a male patient.

  9. Phylogenetic analysis of Mexican Babesia bovis isolates using msa and ssrRNA gene sequences.

    PubMed

    Genis, Alma D; Mosqueda, Juan J; Borgonio, Verónica M; Falcón, Alfonso; Alvarez, Antonio; Camacho, Minerva; de Lourdes Muñoz, Maria; Figueroa, Julio V

    2008-12-01

    Variable merozoite surface antigens of Babesia bovis are exposed glycoproteins having a role in erythrocyte invasion. Members of this gene family include msa-1 and msa-2 (msa-2c, msa-2a(1), msa-2a(2), and msa-2b). Small subunit ribosomal (ssr)RNA gene is subject to evolutive pressure and has been used in phylogenetic studies. To determine the phylogenetic relationship among B. bovis Mexican isolates using different genetic markers, PCR amplicons, corresponding to msa-1, msa-2c, msa-2b, and ssrRNA genes, were cloned and plasmids carrying the corresponding inserts were sequenced. Comparative analysis of nucleotide and deduced amino acid sequences revealed distinct degrees of variability and identity among the coding gene sequences obtained from 12 geographically different B. bovis isolates and a reference strain. Overall sequence identities of 47.7%, 72.3%, 87.7%, and 94% were determined for msa-1, msa-2b, msa-2c, and ssrRNA, respectively. A robust phylogenetic tree was obtained with msa-2b sequences. The phylogenetic analysis suggests that Mexican B. bovis isolates group in clades not concordant with the Mexican geography. However, the Mexican isolates group together in an American clade separated from the Australian clade. Sequence heterogeneity in msa-1, msa-2b, and msa-2c coding regions of Mexican B. bovis isolates present in different geographical regions can be a result of either differential evolutive pressure or cattle movement from commercial trade.

  10. The human myelin oligodendrocyte glycoprotein (MOG) gene: Complete nucleotide sequence and structural characterization

    SciTech Connect

    Paule Roth, M.; Malfroy, L.; Offer, C.; Sevin, J.; Enault, G.; Borot, N.; Pontarotti, P.; Coppin, H.

    1995-07-20

    Human myelin oligodendrocyte glycoprotein (MOG), a myelin component of the central nervous system, is a candidate target antigen for autoimmune-mediated demyelination. We have isolated and sequenced part of a cosmid clone that contains the entire human MOG gene. The primary nuclear transcript, extending from the putative start of transcription to the site of poly(A) addition, is 15,561 nucleotides in length. The human MOG gene contains 8 exons, separated by 7 introns; canonical intron/exon boundary sites are observed at each junction. The introns vary in size from 242 to 6484 bp and contain numerous repetitive DNA elements, including 14 Alu sequences within 3 introns. Another Alu element is located in the 3{prime}-untranslated region of the gene. Alu sequences were classified with respect to subfamily assignment. Seven hundred sixty-three nucleotides 5{prime} of the transcription start and 1214 nucleotides 3{prime} of the poly(A) addition sites were also sequenced. The 5{prime}-flanking region revealed the presence of several consensus sequences that could be relevant in the transcription of the MOG gene, in particular binding sites in common with other myelin gene promoters. Two polymorphic intragenic dinucleotide (CA){sub n} and tetranucleotide (TAAA){sub n} repeats were identified and may provide genetic marker tools for association and linkage studies. 50 refs., 3 figs., 3 tabs.

  11. Sequence characterization and comparative analysis of the gastrotropin gene in buffalo (Bubalus bubalis).

    PubMed

    Stafuzza, N B; Borges, M M; Amaral-Trusty, M E J

    2014-01-01

    In this study, we compared the complete sequence of the FABP6 gene from an animal representing the Murrah breed of the river buffalo (Bubalus bubalis) with the gene sequence from different mammals. The buffalo FABP6 gene is 6105 bp in length and is organized into four exons (67, 176, 90, and 54 bp), three introns (1167, 1737, and 2649 bp), a 5ꞌUTR (93 bp), and a 3ꞌUTR (72 bp). A total of 22 repetitive elements were identified at the intronic level, and four of these (L1MC, L1M5, MIRb, and Charlie4z) were identified as being exclusive to buffalo. Comparative analysis between the FABP6 gene coding sequence and the amino acid sequence with its homologues from other mammalian species showed a percentage of identity varying from 79 to 98% at the DNA coding level and 70 to 96% at the amino acid level. In addition, the alignment of the gene sequence between the Murrah and the Mediterranean breeds revealed 20 potential single nucleotide polymorphisms, which could be candidates for validation in commercial buffalo populations. PMID:25526214

  12. Detection of viral sequences by internally calibrated gene amplification.

    PubMed

    Cheung, R K; Hui, M F; Dosch, H M; Ewart, T E

    1993-05-01

    Inherent pitfalls of the polymerase chain reaction (PCR) can become serious difficulties when transferring research applications to high-volume routine procedures such as biofermentation process control and clinical diagnostics. Difficulties include 1) the danger of accidental sample contamination with positive control templates; 2) variable amplification due to positional effects in thermocycler blocks and unequal primer efficiency for sense/anti-sense strands; and 3) the need for reliable controls, which provide confidence for reporting negative reactions. Using the PCR detection system for Epstein-Barr virus as a model, we have developed a quick process to generate mutant internal co-amplification templates. These can be used for titration of amplification sensitivity. More importantly, single tube co-amplification without titrations allows determination of the minimum sensitivity achieved in each individual reaction; critical information when reporting negative diagnostic results. Mutant and native fragments are easy to distinguish by size, and sample cross contamination can be readily identified. The system should be easily adaptable to gene amplification procedures, which aim to routinely detect the presence of a given gene fragment in a controlled fashion.

  13. Identification and nucleotide sequence of the thymidine kinase gene of Shope fibroma virus

    SciTech Connect

    Upton, C.; McFadden, G.

    1986-12-01

    The thymidine kinase (TK) gene of Shope fibroma virus (SFV), a tumorigenic leporipoxvirus, was localized within the viral genome with degenerate oligonucleotide probes. These probes were constructed to two regions of high sequence conservation between the vaccinia virus TK gene and those of several known eucaryotic cellular TK genes, including human, mouse, hamster, and chicken TK genes. The oligonucleotide probes initially localized the SFV TK gene 50 kilobases (kb) from the right terminus of the 160-kb SFV genome within the 9.5-kb BamHI-HindIII fragment E. Fine-mapping analysis indicated that the TK Gene was within a 1.2-kb AvaI-HaeIII fragment, and DNA sequencing of this region revealed an open reading frame capable of encoding a polypeptide of 187 amino acids possessing considerable homology to the TK genes of the vaccinia, variola, and monkeypox orthopoxviruses and also to a variety of cellular TK genes. Homology matrix analysis and homology scores suggest that the SFV TK gene has diverged significantly from its counterpart members in the orthopoxvirus genus. Nevertheless, the presence of conserved upstream open reading frames on the 5' side of all of the poxvirus TK genes indicates a similarity of functional organization between the orthopoxviruses and leporipoxviruses. These data suggest a common ancestral origin for at least some of the unique internal regions of the leporipoxviruses and orthopoxviruses as exemplified by SFV and vaccinia virus, respectively.

  14. Interference in transcription of overexpressed genes by promoter-proximal downstream sequences

    PubMed Central

    Turchinovich, A.; Surowy, H. M.; Tonevitsky, A. G.; Burwinkel, B.

    2016-01-01

    Despite a high sequence homology among four human RNAi-effectors Argonaute proteins and their coding sequences, the efficiency of ectopic overexpression of AGO3 and AGO4 coding sequences in human cells is greatly reduced as compared to AGO1 and AGO2. While investigating this phenomenon, we documented the existence of previously uncharacterized mechanism of gene expression regulation, which is manifested in greatly varying basal transcription levels from the RNApolII promoters depending on the promoter-proximal downstream sequences. Specifically, we show that distinct overexpression of Argonaute coding sequences cannot be explained by mRNA degradation in the cytoplasm or nucleus, and exhibits on transcriptional level. Furthermore, the first 1000–2000 nt located immediately downstream the promoter had the most critical influence on ectopic gene overexpression. The transcription inhibiting effect, associated with those downstream sequences, subsided with increasing distance to the promoter and positively correlated with promoter strength. We hypothesize that the same mechanism, which we named promoter proximal inhibition (PPI), could generally contribute to basal transcription levels of genes, and could be mainly responsible for the essence of difficult-to-express recombinant proteins. Finally, our data reveal that expression of recombinant proteins in human cells can be greatly enhanced by using more permissive promoter adjacent downstream sequences. PMID:27485701

  15. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  16. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  17. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  18. Identifying significant associations of orthologous simple sequence repeats with gene ontologies.

    PubMed

    Chen, Chien-Ming; Pai, Tun-Wen; Chuang, Chia-Sheng; Huang, Jhen-Li; Tzou, Wen-Shyong; Hu, Chin-Hua

    2014-01-01

    Simple Sequence Repeats (SSRs), also known as microsatellites, regulate gene functions. SSR mutations in a disease gene may cause various genetic disorders. To identify putative functional SSRs, a web-based system, Gene Ontology SSR Hierarchy (GOSH), was developed to facilitate discovery of significant associations between SSRs and Gene Ontology (GO) terms. Using the GO hierarchy term structure, GOSH assists users with selecting functional or biological gene subsets. Significant SSR patterns are retrieved and identified via comprehensive overrepresentation analysis within a target gene subset and by comparing results with orthologous genes. Pattern relationships between different biological subsets or supersets can be observed by using the GO hierarchy structure directly. GOSH also supports GO searching through identified significant SSR patterns and all GO terms possessing such patterns are listed for consultation. GOSH is the first comprehensive and efficient online mining tool for discovering significant orthologous SSR patterns in GO terms and is available at http://gosh.cs.ntou.edu.tw/.

  19. Versatile Cosmid Vectors for the Isolation, Expression, and Rescue of Gene Sequences: Studies with the Human α -globin Gene Cluster

    NASA Astrophysics Data System (ADS)

    Lau, Yun-Fai; Kan, Yuet Wai

    1983-09-01

    We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.

  20. Complexity of genetic sequences modified by horizontal gene transfer and degraded-DNA uptake

    NASA Astrophysics Data System (ADS)

    Tremberger, George; Dehipawala, S.; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    Horizontal gene transfer has been a major vehicle for efficient transfer of genetic materials among living species and could be one of the sources for noncoding DNA incorporation into a genome. Our previous study of lnc- RNA sequence complexity in terms of fractal dimension and information entropy shows a tight regulation among the studied genes in numerous diseases. The role of sequence complexity in horizontal transferred genes was investigated with Mealybug in symbiotic relation with a 139K genome microbe and Deinococcus radiodurans as examples. The fractal dimension and entropy showed correlation R-sq of 0.82 (N = 6) for the studied Deinococcus radiodurans sequences. For comparison the Deinococcus radiodurans oxidative stress tolerant catalase and superoxide dismutase genes under extracellular dGMP growth condition showed R-sq ~ 0.42 (N = 6); and the studied arsenate reductase horizontal transferred genes for toxicity survival in several microorganisms showed no correlation. Simulation results showed that R-sq < 0.4 would be improbable at less than one percent chance, suggestive of additional selection pressure when compared to the R-sq ~ 0.29 (N = 21) in the studied transferred genes in Mealybug. The mild correlation of R-sq ~ 0.5 for fractal dimension versus transcription level in the studied Deinococcus radiodurans sequences upon extracellular dGMP growth condition would suggest that lower fractal dimension with less electron density fluctuation favors higher transcription level.

  1. Cloning, sequencing, and mapping of the human chromosome 14 heat shock protein gene (HSPA2)

    SciTech Connect

    Bonnycastle, L.L.C.; Chang-En Yu; Schellenberg, G.D.

    1994-09-01

    A genomic clone for the human heat shock protein (HSP) 70 gene located on chromosome 14 was isolated and sequenced. The gene, designated HSPA2, has a single open reading frame of 1917 bp that encodes a 639-amino acid protein with a predicted molecular weight of 70,030 Da. Analysis of the sequence indicates that HLPA2 is the human homologue of the murine Hsp 70-2 gene with 91.7% identity in the nucleotide coding sequence and 98.2% in the corresponding amino acid sequence. HSPA2 has less amino acid homology to other members of the human HSP70 gene family, 83.3% to the heat-inducible HSP70-1 gene and 86.1% with the human heat shock cognate gene HSC70. HSPA2 is constitutively expressed in most tissues, with very high levels in testis and skeletal muscle. Significant but lower levels are also expressed in ovary, small intestine, colon, brain, placenta, and kidney. A yeast artificial chromosome (YAC) clone containing HSPA2 (YAC741H4) that also contained the polymorphic marker D14S63 was identified. This 670-kb YAC was mapped to 14q24.1 by fluorescence in situ hybridization (FISH). Subsequent two-color FISH and genetic mapping placed HSPA2/D14S63 proximal to the markers D14S57 and D14S77. 50 refs., 3 figs., 1 tab.

  2. A flexible and economical barcoding approach for highly multiplexed amplicon sequencing of diverse target genes.

    PubMed

    Herbold, Craig W; Pelikan, Claus; Kuzyk, Orest; Hausmann, Bela; Angel, Roey; Berry, David; Loy, Alexander

    2015-01-01

    High throughput sequencing of phylogenetic and functional gene amplicons provides tremendous insight into the structure and functional potential of complex microbial communities. Here, we introduce a highly adaptable and economical PCR approach to barcoding and pooling libraries of numerous target genes. In this approach, we replace gene- and sequencing platform-specific fusion primers with general, interchangeable barcoding primers, enabling nearly limitless customized barcode-primer combinations. Compared to barcoding with long fusion primers, our multiple-target gene approach is more economical because it overall requires lower number of primers and is based on short primers with generally lower synthesis and purification costs. To highlight our approach, we pooled over 900 different small-subunit rRNA and functional gene amplicon libraries obtained from various environmental or host-associated microbial community samples into a single, paired-end Illumina MiSeq run. Although the amplicon regions ranged in size from approximately 290 to 720 bp, we found no significant systematic sequencing bias related to amplicon length or gene target. Our results indicate that this flexible multiplexing approach produces large, diverse, and high quality sets of amplicon sequence data for modern studies in microbial ecology. PMID:26236305

  3. Candida famata (Debaryomyces hansenii) DNA sequences containing genes involved in riboflavin synthesis.

    PubMed

    Voronovsky, Andriy Y; Abbas, Charles A; Dmytruk, Kostyantyn V; Ishchuk, Olena P; Kshanovska, Barbara V; Sybirna, Kateryna A; Gaillardin, Claude; Sibirny, Andriy A

    2004-11-01

    Previously cloned Candida famata (Debaryomyces hansenii) strain VKM Y-9 genomic DNA fragments containing genes RIB1 (codes for GTP cyclohydrolase II), RIB2 (encodes specific reductase), RIB5 (codes for dimethylribityllumazine synthase), RIB6 (encodes dihydroxybutanone phosphate synthase) and RIB7 (codes for riboflavin synthase) were sequenced. The derived amino acid sequences of C. famata RIB genes showed extensive homology to the corresponding sequences of riboflavin synthesis enzymes of other yeast species. The highest identity was observed to homologues of D. hansenii CBS767, as C. famata is the anamorph of this hemiascomycetous yeast. The D. hansenii CBS767 RIB3 gene encoding specific deaminase was cloned. This gene successfully complemented riboflavin auxotrophy of the rib3 mutant of flavinogenic yeast, Pichia guilliermondii. Putative iron-responsive elements (potential sites for binding of the transcription factors Fep1p or Aft1p and Aft2p) were found in the upstream regions of some C. famata and D. hansenii RIB genes. The sequences of C. famata RIB genes have been submitted to the EMBL data library under Accession Nos AJ810169-AJ810173. PMID:15543522

  4. Candida famata (Debaryomyces hansenii) DNA sequences containing genes involved in riboflavin synthesis.

    PubMed

    Voronovsky, Andriy Y; Abbas, Charles A; Dmytruk, Kostyantyn V; Ishchuk, Olena P; Kshanovska, Barbara V; Sybirna, Kateryna A; Gaillardin, Claude; Sibirny, Andriy A

    2004-11-01

    Previously cloned Candida famata (Debaryomyces hansenii) strain VKM Y-9 genomic DNA fragments containing genes RIB1 (codes for GTP cyclohydrolase II), RIB2 (encodes specific reductase), RIB5 (codes for dimethylribityllumazine synthase), RIB6 (encodes dihydroxybutanone phosphate synthase) and RIB7 (codes for riboflavin synthase) were sequenced. The derived amino acid sequences of C. famata RIB genes showed extensive homology to the corresponding sequences of riboflavin synthesis enzymes of other yeast species. The highest identity was observed to homologues of D. hansenii CBS767, as C. famata is the anamorph of this hemiascomycetous yeast. The D. hansenii CBS767 RIB3 gene encoding specific deaminase was cloned. This gene successfully complemented riboflavin auxotrophy of the rib3 mutant of flavinogenic yeast, Pichia guilliermondii. Putative iron-responsive elements (potential sites for binding of the transcription factors Fep1p or Aft1p and Aft2p) were found in the upstream regions of some C. famata and D. hansenii RIB genes. The sequences of C. famata RIB genes have been submitted to the EMBL data library under Accession Nos AJ810169-AJ810173.

  5. Identification of expressed resistance gene analogs from peanut (Arachis hypogaea L.) expressed sequence tags.

    PubMed

    Liu, Zhanji; Feng, Suping; Pandey, Manish K; Chen, Xiaoping; Culbreath, Albert K; Varshney, Rajeev K; Guo, Baozhu

    2013-05-01

    Low genetic diversity makes peanut (Arachis hypogaea L.) very vulnerable to plant pathogens, causing severe yield loss and reduced seed quality. Several hundred partial genomic DNA sequences as nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance genes (R) have been identified, but a small portion with expressed transcripts has been found. We aimed to identify resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs) and to develop polymorphic markers. The protein sequences of 54 known R genes were used to identify homologs from peanut ESTs from public databases. A total of 1,053 ESTs corresponding to six different classes of known R genes were recovered, and assembled 156 contigs and 229 singletons as peanut-expressed RGAs. There were 69 that encoded for NBS-LRR proteins, 191 that encoded for protein kinases, 82 that encoded for LRR-PK/transmembrane proteins, 28 that encoded for Toxin reductases, 11 that encoded for LRR-domain containing proteins and four that encoded for TM-domain containing proteins. Twenty-eight simple sequence repeats (SSRs) were identified from 25 peanut expressed RGAs. One SSR polymorphic marker (RGA121) was identified. Two polymerase chain reaction-based markers (Ahsw-1 and Ahsw-2) developed from RGA013 were homologous to the Tomato Spotted Wilt Virus (TSWV) resistance gene. All three markers were mapped on the same linkage group AhIV. These expressed RGAs are the source for RGA-tagged marker development and identification of peanut resistance genes.

  6. SGP-1: prediction and validation of homologous genes based on sequence alignments.

    PubMed

    Wiehe, T; Gebauer-Jung, S; Mitchell-Olds, T; Guigó, R

    2001-09-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of depends little on species-specific properties such as codon usage or the nucleotide distribution. may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

  7. Cloning and nucleotide sequence of the Salmonella typhimurium LT2 metF gene and its homology with the corresponding sequence of Escherichia coli.

    PubMed

    Stauffer, G V; Stauffer, L T

    1988-05-01

    The Salmonella typhimurium LT2 metF gene, encoding 5,10-methylenetetrahydrofolate reductase, has been cloned. Strains with multicopy plasmids carrying the metF gene overproduce the enzyme 44-fold. The nucleotide sequence of the metF gene was determined, and an open reading frame of 888 nucleotides was identified. The polypeptide deduced from the DNA sequence contains 296 amino acids and has a molecular weight of 33,135 daltons. Mung bean nuclease mapping experiments located the transcription start point and possible transcription termination region for the gene. There is a 25 bp nucleotide sequence between the translation termination site and the possible transcription termination region. This region possesses a GC-rich sequence that could form a stable stem and loop structure once transcribed (delta G = -9 kcal/mol), followed by an AT-rich sequence, both of which are characteristic of rho-independent transcription terminators. The nucleotide and deduced amino acid sequences of the S. typhimurium metF gene are compared with the corresponding sequences of the Escherichia coli metF gene. The nucleotide sequences show 85% homology. Most of the nucleotide differences found do not alter the amino acid sequences, which show 95% homology. The results also show that a change has occurred in the metF region of the S. typhimurium chromosome as compared to the E. coli chromosome.

  8. Whole exome sequencing reveals concomitant mutations of multiple FA genes in individual Fanconi anemia patients

    PubMed Central

    2014-01-01

    Background Fanconi anemia (FA) is a rare inherited genetic syndrome with highly variable clinical manifestations. Fifteen genetic subtypes of FA have been identified. Traditional complementation tests for grouping studies have been used generally in FA patients and in stepwise methods to identify the FA type, which can result in incomplete genetic information from FA patients. Methods We diagnosed five pediatric patients with FA based on clinical manifestations, and we performed exome sequencing of peripheral blood specimens from these patients and their family members. The related sequencing data were then analyzed by bioinformatics, and the FANC gene mutations identified by exome sequencing were confirmed by PCR re-sequencing. Results Homozygous and compound heterozygous mutations of FANC genes were identified in all of the patients. The FA subtypes of the patients included FANCA, FANCM and FANCD2. Interestingly, four FA patients harbored multiple mutations in at least two FA genes, and some of these mutations have not been previously reported. These patients’ clinical manifestations were vastly different from each other, as were their treatment responses to androstanazol and prednisone. This finding suggests that heterozygous mutation(s) in FA genes could also have diverse biological and/or pathophysiological effects on FA patients or FA gene carriers. Interestingly, we were not able to identify de novo mutations in the genes implicated in DNA repair pathways when the sequencing data of patients were compared with those of their parents. Conclusions Our results indicate that Chinese FA patients and carriers might have higher and more complex mutation rates in FANC genes than have been conventionally recognized. Testing of the fifteen FANC genes in FA patients and their family members should be a regular clinical practice to determine the optimal care for the individual patient, to counsel the family and to obtain a better understanding of FA pathophysiology

  9. Rapid Evolution of the Sequences and Gene Repertoires of Secreted Proteins in Bacteria

    PubMed Central

    Rocha, Eduardo P. C.

    2012-01-01

    Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation. PMID:23189144

  10. Identification of Differential Gene Expression in Brassica rapa Nectaries through Expressed Sequence Tag Analysis

    PubMed Central

    Hampton, Marshall; Xu, Wayne W.; Kram, Brian W.; Chambers, Emily M.; Ehrnriter, Jerad S.; Gralewski, Jonathan H.; Joyal, Teresa; Carter, Clay J.

    2010-01-01

    Background Nectaries are the floral organs responsible for the synthesis and secretion of nectar. Despite their central roles in pollination biology, very little is understood about the molecular mechanisms underlying nectar production. This project was undertaken to identify genes potentially involved in mediating nectary form and function in Brassica rapa. Methodology and Principal Findings Four cDNA libraries were created using RNA isolated from the median and lateral nectaries of B. rapa flowers, with one normalized and one non-normalized library being generated from each tissue. Approximately 3,000 clones from each library were randomly sequenced from the 5′ end to generate a total of 11,101 high quality expressed sequence tags (ESTs). Sequence assembly of all ESTs together allowed the identification of 1,453 contigs and 4,403 singleton sequences, with the Basic Localized Alignment Search Tool (BLAST) being used to identify 4,138 presumptive orthologs to Arabidopsis thaliana genes. Several genes differentially expressed between median and lateral nectaries were initially identified based upon the number of BLAST hits represented by independent ESTs, and later confirmed via reverse transcription polymerase chain reaction (RT PCR). RT PCR was also used to verify the expression patterns of eight putative orthologs to known Arabidopsis nectary-enriched genes. Conclusions/Significance This work provided a snapshot of gene expression in actively secreting B. rapa nectaries, and also allowed the identification of differential gene expression between median and lateral nectaries. Moreover, 207 orthologs to known nectary-enriched genes from Arabidopsis were identified through this analysis. The results suggest that genes involved in nectar production are conserved amongst the Brassicaceae, and also supply clones and sequence information that can be used to probe nectary function in B. rapa. PMID:20098697

  11. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  12. Alu sequence involvement in transcriptional insulation of the keratin 18 gene in transgenic mice.

    PubMed Central

    Thorey, I S; Ceceña, G; Reynolds, W; Oshima, R G

    1993-01-01

    The human keratin 18 (K18) gene is expressed in a variety of adult simple epithelial tissues, including liver, intestine, lung, and kidney, but is not normally found in skin, muscle, heart, spleen, or most of the brain. Transgenic animals derived from the cloned K18 gene express the transgene in appropriate tissues at levels directly proportional to the copy number and independently of the sites of integration. We have investigated in transgenic mice the dependence of K18 gene expression on the distal 5' and 3' flanking sequences and upon the RNA polymerase III promoter of an Alu repetitive DNA transcription unit immediately upstream of the K18 promoter. Integration site-independent expression of tandemly duplicated K18 transgenes requires the presence of either an 825-bp fragment of the 5' flanking sequence or the 3.5-kb 3' flanking sequence. Mutation of the RNA polymerase III promoter of the Alu element within the 825-bp fragment abolishes copy number-dependent expression in kidney but does not abolish integration site-independent expression when assayed in the absence of the 3' flanking sequence of the K18 gene. The characteristics of integration site-independent expression and copy number-dependent expression are separable. In addition, the formation of the chromatin state of the K18 gene, which likely restricts the tissue-specific expression of this gene, is not dependent upon the distal flanking sequences of the 10-kb K18 gene but rather may depend on internal regulatory regions of the gene. Images PMID:7692231

  13. Sequence and regulation of a gene encoding a human 89-kilodalton heat shock protein.

    PubMed Central

    Hickey, E; Brandon, S E; Smale, G; Lloyd, D; Weber, L A

    1989-01-01

    Vertebrate cells synthesize two forms of the 82- to 90-kilodalton heat shock protein that are encoded by distinct gene families. In HeLa cells, both proteins (hsp89 alpha and hsp89 beta) are abundant under normal growth conditions and are synthesized at increased rates in response to heat stress. Only the larger form, hsp89 alpha, is induced by the adenovirus E1A gene product (M. C. Simon, K. Kitchener, H. T. Kao, E. Hickey, L. Weber, R. Voellmy, N. Heintz, and J. R. Nevins, Mol. Cell. Biol. 7:2884-2890, 1987). We have isolated a human hsp89 alpha gene that shows complete sequence identity with heat- and E1A-inducible cDNA used as a hybridization probe. The 5'-flanking region contained overlapping and inverted consensus heat shock control elements that can confer heat-inducible expression on a beta-globin reporter gene. The gene contained 10 intervening sequences. The first intron was located adjacent to the translation start codon, an arrangement also found in the Drosophila hsp82 gene. The spliced mRNA sequence contained a single open reading frame encoding an 84,564-dalton polypeptide showing high homology with the hsp82 to hsp90 proteins of other organisms. The deduced hsp89 alpha protein sequence differed from the human hsp89 beta sequence reported elsewhere (N. F. Rebbe, J. Ware, R. M. Bertina, P. Modrich, and D. W. Stafford (Gene 53:235-245, 1987) in at least 99 out of the 732 amino acids. Transcription of the hsp89 alpha gene was induced by serum during normal cell growth, but expression did not appear to be restricted to a particular stage of the cell cycle. hsp89 alpha mRNA was considerably more stable than the mRNA encoding hsp70, which can account for the higher constitutive rate of hsp89 synthesis in unstressed cells. Images PMID:2527334

  14. Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

    PubMed Central

    2011-01-01

    Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247

  15. Complete nucleotide sequences of two adjacent early vaccinia virus genes located within the inverted terminal repetition.

    PubMed

    Venkatesan, S; Gershowitz, A; Moss, B

    1982-11-01

    The proximal part of the 10,000-base pair (bp) inverted terminal repetition of vaccinia virus DNA encodes at least three early mRNAs. A 2,236-bp segment of the repetition was sequenced to characterize two of the genes. This task was facilitated by constructing a series of recombinants containing overlapping deletions; oligonucleotide linkers with synthetic restriction sites provided points for radioactive labeling before sequencing by the chemical degradation method of Maxam and Gilbert (Methods Enzymol. 65:499-560, 1980). The ends of the transcripts were mapped by hybridizing labeled DNA fragments to early viral RNA and resolving nuclease S1-protected fragments in sequencing gels, by sequencing cDNA clones, and from the lengths of the RNAs. The nucleotide sequences for at least 60 bp upstream of both transcriptional initiation sites are more than 80% adenine . thymine rich and contain long runs of adenines and thymines with some homology to procaryotic and eucaryotic consensus sequences. The gene transcribed in the rightward direction encodes an RNA of approximately 530 nucleotides with a single open reading frame of 420 nucleotides. Preceding the first AUG, there is a heptanucleotide that can hybridize to the 3' end of 18S rRNA with only one mismatch. The derived amino acid sequence of the protein indicated a molecular weight of 15,500. The gene transcribed in the leftward direction encodes an RNA 1,000 to 1,100 nucleotides long with an open reading frame of 996 nucleotides and a leader sequence of only 5 to 6 nucleotides. The derived amino acid sequence of this protein indicated a molecular weight of 38,500. The 3' ends of the two transcripts were located within 100 bp of each other. Although there are adenine . thymine-rich clusters near the putative transcriptional termination sites, specific AATAAA polyadenylic acid signal sequences are absent.

  16. Molecular cloning, sequence characterization, and gene expression profiling of a novel water buffalo (Bubalus bubalis) gene, AGPAT6.

    PubMed

    Song, S; Huo, J L; Li, D L; Yuan, Y Y; Yuan, F; Miao, Y W

    2013-01-01

    Several 1-acylglycerol-3-phosphate-O-acyltransferases (AGPATs) can acylate lysophosphatidic acid to produce phosphatidic acid. Of the eight AGPAT isoforms, AGPAT6 is a crucial enzyme for glycerolipids and triacylglycerol biosynthesis in some mammalian tissues. We amplified and identified the complete coding sequence (CDS) of the water buffalo AGPAT6 gene by using the reverse transcription-polymerase chain reaction, based on the conversed sequence information of the cattle or expressed sequence tags of other Bovidae species. This novel gene was deposited in the NCBI database (accession No. JX518941). Sequence analysis revealed that the CDS of this AGPAT6 encodes a 456-amino acid enzyme (molecular mass = 52 kDa; pI = 9.34). Water buffalo AGPAT6 contains three hydrophobic transmembrane regions and a signal 37-amino acid peptide, localized in the cytoplasm. The deduced amino acid sequences share 99, 98, 98, 97, 98, 98, 97 and 95% identity with their homologous sequences from cattle, horse, human, mouse, orangutan, pig, rat, and chicken, respectively. The phylogenetic tree analysis based on the AGPAT6 CDS showed that water buffalo has a closer genetic relationship with cattle than with other species. Tissue expression profile analysis shows that this gene is highly expressed in the mammary gland, moderately expressed in the heart, muscle, liver, and brain; weakly expressed in the pituitary gland, spleen, and lung; and almost silently expressed in the small intestine, skin, kidney, and adipose tissues. Four predicted microRNA target sites are found in the water buffalo AGPAT6 CDS. These results will establish a foundation for further insights into this novel water buffalo gene. PMID:24114207

  17. Molecular cloning, sequence characterization, and gene expression profiling of a novel water buffalo (Bubalus bubalis) gene, AGPAT6.

    PubMed

    Song, S; Huo, J L; Li, D L; Yuan, Y Y; Yuan, F; Miao, Y W

    2013-10-01

    Several 1-acylglycerol-3-phosphate-O-acyltransferases (AGPATs) can acylate lysophosphatidic acid to produce phosphatidic acid. Of the eight AGPAT isoforms, AGPAT6 is a crucial enzyme for glycerolipids and triacylglycerol biosynthesis in some mammalian tissues. We amplified and identified the complete coding sequence (CDS) of the water buffalo AGPAT6 gene by using the reverse transcription-polymerase chain reaction, based on the conversed sequence information of the cattle or expressed sequence tags of other Bovidae species. This novel gene was deposited in the NCBI database (accession No. JX518941). Sequence analysis revealed that the CDS of this AGPAT6 encodes a 456-amino acid enzyme (molecular mass = 52 kDa; pI = 9.34). Water buffalo AGPAT6 contains three hydrophobic transmembrane regions and a signal 37-amino acid peptide, localized in the cytoplasm. The deduced amino acid sequences share 99, 98, 98, 97, 98, 98, 97 and 95% identity with their homologous sequences from cattle, horse, human, mouse, orangutan, pig, rat, and chicken, respectively. The phylogenetic tree analysis based on the AGPAT6 CDS showed that water buffalo has a closer genetic relationship with cattle than with other species. Tissue expression profile analysis shows that this gene is highly expressed in the mammary gland, moderately expressed in the heart, muscle, liver, and brain; weakly expressed in the pituitary gland, spleen, and lung; and almost silently expressed in the small intestine, skin, kidney, and adipose tissues. Four predicted microRNA target sites are found in the water buffalo AGPAT6 CDS. These results will establish a foundation for further insights into this novel water buffalo gene.

  18. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences.

  19. Putative and unique gene sequence utilization for the design of species specific probes as modeled by Lactobacillus plantarum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...

  20. Purification of the gam gene-product of bacteriophage Mu and determination of the nucleotide sequence of the gam gene.

    PubMed Central

    Akroyd, J E; Clayson, E; Higgins, N P

    1986-01-01

    The gam gene of bacteriophage Mu encodes a protein which protects linear double stranded DNA from exonuclease degradation in vitro and in vivo. We purified the Mu gam gene product to apparent homogeneity from cells in which it is over-produced from a plasmid clone. The purified protein is a dimer of identical subunits of 18.9 kd. It can aggregate DNA into large, rapidly sedimenting complexes and is a potent exonuclease inhibitor when bound to DNA. The N-terminal amino acid sequence of the purified protein was determined by automated degradation and the nucleotide sequence of the Mu gam gene is presented to accurately map its position in the Mu genome. Images PMID:2945162

  1. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  2. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  3. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    SciTech Connect

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.

  4. Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi

    PubMed Central

    Cacho, Ralph A.; Tang, Yi; Chooi, Yit-Heng

    2015-01-01

    Genomics has revolutionized the research on fungal secondary metabolite (SM) biosynthesis. To elucidate the molecular and enzymatic mechanisms underlying the biosynthesis of a specific SM compound, the important first step is often to find the genes that responsible for its synthesis. The accessibility to fungal genome sequences allows the bypass of the cumbersome traditional library construction and screening approach. The advance in next-generation sequencing (NGS) technologies have further improved the speed and reduced the cost of microbial genome sequencing in the past few years, which has accelerated the research in this field. Here, we will present an example work flow for identifying the gene cluster encoding the biosynthesis of SMs of interest using an NGS approach. We will also review the different strategies that can be employed to pinpoint the targeted gene clusters rapidly by giving several examples stemming from our work. PMID:25642215

  5. Comparison of the aflR gene sequences of strains in Aspergillus section Flavi.

    PubMed

    Lee, Chao-Zong; Liou, Guey-Yuh; Yuan, Gwo-Fang

    2006-01-01

    Aflatoxins are polyketide-derived secondary metabolites produced by Aspergillus parasiticus, Aspergillus flavus, Aspergillus nomius and a few other species. The toxic effects of aflatoxins have adverse consequences for human health and agricultural economics. The aflR gene, a regulatory gene for aflatoxin biosynthesis, encodes a protein containing a zinc-finger DNA-binding motif. Although Aspergillus oryzae and Aspergillus sojae, which are used in fermented foods and in ingredient manufacture, have no record of producing aflatoxin, they have been shown to possess an aflR gene. This study examined 34 strains of Aspergillus section Flavi. The aflR gene of 23 of these strains was successfully amplified and sequenced. No aflR PCR products were found in five A. sojae strains or six strains of A. oryzae. These PCR results suggested that the aflR gene is absent or significantly different in some A. sojae and A. oryzae strains. The sequenced aflR genes from the 23 positive strains had greater than 96.6 % similarity, which was particularly conserved in the zinc-finger DNA-binding domain. The aflR gene of A. sojae has two obvious characteristics: an extra CTCATG sequence fragment and a C to T transition that causes premature termination of AFLR protein synthesis. Differences between A. parasiticus/A. sojae and A. flavus/A. oryzae aflR genes were also identified. Some strains of A. flavus as well as A. flavus var. viridis, A. oryzae var. viridis and A. oryzae var. effuses have an A. oryzae-type aflR gene. For all strains with the A. oryzae-type aflR gene, there was no evidence of aflatoxin production. It is suggested that for safety reasons, the aflR gene could be examined to assess possible aflatoxin production by Aspergillus section Flavi strains.

  6. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  7. Whole Exome Sequencing in Females with Autism Implicates Novel and Candidate Genes

    PubMed Central

    Butler, Merlin G.; Rafi, Syed K.; Hossain, Waheeda; Stephan, Dietrich A.; Manzardo, Ann M.

    2015-01-01

    Classical autism or autistic disorder belongs to a group of genetically heterogeneous conditions known as Autism Spectrum Disorders (ASD). Heritability is estimated as high as 90% for ASD with a recently reported compilation of 629 clinically relevant candidate and known genes. We chose to undertake a descriptive next generation whole exome sequencing case study of 30 well-characterized Caucasian females with autism (average age, 7.7 ± 2.6 years; age range, 5 to 16 years) from multiplex families. Genomic DNA was used for whole exome sequencing via paired-end next generation sequencing approach and X chromosome inactivation status. The list of putative disease causing genes was developed from primary selection criteria using machine learning-derived classification score and other predictive parameters (GERP2, PolyPhen2, and SIFT). We narrowed the variant list to 10 to 20 genes and screened for biological significance including neural development, function and known neurological disorders. Seventy-eight genes identified met selection criteria ranging from 1 to 9 filtered variants per female. Five females presented with functional variants of X-linked genes (IL1RAPL1, PIR, GABRQ, GPRASP2, SYTL4) with cadherin, protocadherin and ankyrin repeat gene families most commonly altered (e.g., CDH6, FAT2, PCDH8, CTNNA3, ANKRD11). Other genes related to neurogenesis and neuronal migration (e.g., SEMA3F, MIDN), were also identified. PMID:25574603

  8. Candidate Resistant Genes of Sand Pear (Pyrus pyrifolia Nakai) to Alternaria alternata Revealed by Transcriptome Sequencing.

    PubMed

    Yang, Xiaoping; Hu, Hongju; Yu, Dazhao; Sun, Zhonghai; He, Xiujuan; Zhang, Jingguo; Chen, Qiliang; Tian, Rui; Fan, Jing

    2015-01-01

    Pear black spot (PBS) disease, which is caused by Alternaria alternata (Aa), is one of the most serious diseases affecting sand pear (Pyrus pyrifolia Nakai) cultivation worldwide. To investigate the defense mechanisms of sand pear in response to Aa, the transcriptome of a sand pear germplasm with differential resistance to Aa was analyzed using Illumina paired-end sequencing. Four libraries derived from PBS-resistant and PBS-susceptible sand pear leaves were characterized through inoculation or mock-inoculation. In total, 20.5 Gbp of sequence data and 101,632,565 reads were generated, representing 44717 genes. Approximately 66% of the genes or sequenced reads could be aligned to the pear reference genome. A large number (5213) of differentially expressed genes related to PBS resistance were obtained; 34 microsatellites were detected in these genes, and 28 genes were found to be closely related to PBS resistance. Using a transcriptome analysis in response to PBS inoculation and comparison analysis to the PHI database, 4 genes (Pbr039001, Pbr001627, Pbr025080 and Pbr023112) were considered to be promising candidates for sand pear resistance to PBS. This study provides insight into changes in the transcriptome of sand pear in response to PBS infection, and the findings have improved our understanding of the resistance mechanism of sand pear to PBS and will facilitate future gene discovery and functional genome studies of sand pear.

  9. A Synthesis Method of Gene Networks Having Cyclic Expression Pattern Sequences by Network Learning

    NASA Astrophysics Data System (ADS)

    Mori, Yoshihiro; Kuroe, Yasuaki

    Recently, synthesis of gene networks having desired functions has become of interest to many researchers because it is a complementary approach to understanding gene networks, and it could be the first step in controlling living cells. There exist several periodic phenomena in cells, e.g. circadian rhythm. These phenomena are considered to be generated by gene networks. We have already proposed synthesis method of gene networks based on gene expression. The method is applicable to synthesizing gene networks possessing the desired cyclic expression pattern sequences. It ensures that realized expression pattern sequences are periodic, however, it does not ensure that their corresponding solution trajectories are periodic, which might bring that their oscillations are not persistent. In this paper, in order to resolve the problem we propose a synthesis method of gene networks possessing the desired cyclic expression pattern sequences together with their corresponding solution trajectories being periodic. In the proposed method the persistent oscillations of the solution trajectories are realized by specifying passing points of them.

  10. Candidate Resistant Genes of Sand Pear (Pyrus pyrifolia Nakai) to Alternaria alternata Revealed by Transcriptome Sequencing

    PubMed Central

    Yang, Xiaoping; Hu, Hongju; Yu, Dazhao; Sun, Zhonghai; He, Xiujuan; Zhang, Jingguo; Chen, Qiliang; Tian, Rui; Fan, Jing

    2015-01-01

    Pear black spot (PBS) disease, which is caused by Alternaria alternata (Aa), is one of the most serious diseases affecting sand pear (Pyrus pyrifolia Nakai) cultivation worldwide. To investigate the defense mechanisms of sand pear in response to Aa, the transcriptome of a sand pear germplasm with differential resistance to Aa was analyzed using Illumina paired-end sequencing. Four libraries derived from PBS-resistant and PBS-susceptible sand pear leaves were characterized through inoculation or mock-inoculation. In total, 20.5 Gbp of sequence data and 101,632,565 reads were generated, representing 44717 genes. Approximately 66% of the genes or sequenced reads could be aligned to the pear reference genome. A large number (5213) of differentially expressed genes related to PBS resistance were obtained; 34 microsatellites were detected in these genes, and 28 genes were found to be closely related to PBS resistance. Using a transcriptome analysis in response to PBS inoculation and comparison analysis to the PHI database, 4 genes (Pbr039001, Pbr001627, Pbr025080 and Pbr023112) were considered to be promising candidates for sand pear resistance to PBS. This study provides insight into changes in the transcriptome of sand pear in response to PBS infection, and the findings have improved our understanding of the resistance mechanism of sand pear to PBS and will facilitate future gene discovery and functional genome studies of sand pear. PMID:26292286

  11. Transcriptome sequencing uncovers the Avr5 avirulence gene of the tomato leaf mold pathogen Cladosporium fulvum.

    PubMed

    Mesarich, Carl H; Griffiths, Scott A; van der Burgt, Ate; Okmen, Bilal; Beenen, Henriek G; Etalo, Desalegn W; Joosten, Matthieu H A J; de Wit, Pierre J G M

    2014-08-01

    The Cf-5 gene of tomato confers resistance to strains of the fungal pathogen Cladosporium fulvum carrying the avirulence gene Avr5. Although Cf-5 has been cloned, Avr5 has remained elusive. We report the cloning of Avr5 using a combined bioinformatic and transcriptome sequencing approach. RNA-Seq was performed on the sequenced race 0 strain (0WU; carrying Avr5), as well as a race 5 strain (IPO 1979; lacking a functional Avr5 gene) during infection of susceptible tomato. Forty-four in planta-induced C. fulvum candidate effector (CfCE) genes of 0WU were identified that putatively encode a secreted, small cysteine-rich protein. An expressed transcript sequence comparison between strains revealed two polymorphic CfCE genes in IPO 1979. One of these conferred avirulence to IPO 1979 on Cf-5 tomato following complementation with the corresponding 0WU allele, confirming identification of Avr5. Complementation also led to increased fungal biomass during infection of susceptible tomato, signifying a role for Avr5 in virulence. Seven of eight race 5 strains investigated escape Cf-5-mediated resistance through deletion of the Avr5 gene. Avr5 is heavily flanked by repetitive elements, suggesting that repeat instability, in combination with Cf-5-mediated selection pressure, has led to the emergence of race 5 strains deleted for the Avr5 gene.

  12. Extraordinary sequence divergence at Tsga8, an X-linked gene involved in mouse spermiogenesis.

    PubMed

    Good, Jeffrey M; Vanderpool, Dan; Smith, Kimberly L; Nachman, Michael W

    2011-05-01

    The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion-deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5' and 3' ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189

  13. Gene Sequence Variability of the Three Surface Proteins of Human Respiratory Syncytial Virus (HRSV) in Texas

    PubMed Central

    Tapia, Lorena I.; Shaw, Chad A.; Aideyan, Letisha O.; Jewell, Alan M.; Dawson, Brian C.; Haq, Taha R.; Piedra, Pedro A.

    2014-01-01

    Human respiratory syncytial virus (HRSV) has three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion (F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH-G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses were detected in 2004–2005. Different genetic variability at nucleotide level was detected between the genes, with G gene being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the antigenic site Ø in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially immune-protected community. PMID:24625544

  14. Complete sequence of human vinculin and assignment of the gene to chromosome 10

    SciTech Connect

    Weller, P.A.; Corben, E.B.; Patel, B.; Price, G.J.; Critchley, D.R. ); Ogryzko, E.P.; Zhidkova, N.I.; Koteliansky, V.E. ); Spurr, N.K. )

    1990-08-01

    The authors have determined the complete sequence of human vinculin, a cytoskeletal protein associated with cell-cell and cell-matrix junctions. Comparison of human and chicken embryo vinculin sequences shows that both proteins contain 1,066 amino acids and exhibit a high level of sequence identity (>95%). The region of greatest divergence falls within three 112-amino acid repeats spanning residues 259-589. Interestingly, nematode vinculin lacks one of these central repeats. The regions of human vinculin that are N- and C-terminal to the repeats show 54% and 61% sequence identity, respectively, to nematode vinculin. Southern blots of human genomic DNA hybridized with short vinculin cDNA fragments indicate that there is a single vinculin gene. By using a panel of human-rodent somatic cell hybrids, the human vinculin gene was mapped to chromosome 10q11.2-qter.

  15. Operator Sequence Alters Gene Expression Independently of Transcription Factor Occupancy in Bacteria

    PubMed Central

    Garcia, Hernan G.; Sanchez, Alvaro; Boedicker, James Q.; Osborne, Melisa; Gelles, Jeff; Kondev, Jane; Phillips, Rob

    2012-01-01

    SUMMARY A canonical quantitative view of transcriptional regulation holds that the only role of operator sequence is to set the probability of transcription factor binding, with operator occupancy determining the level of gene expression. In this work, we test this idea by characterizing repression in vivo and the binding of RNA polymerase in vitro in experiments where operators of various sequences were placed either upstream or downstream from the promoter in Escherichia coli. Surprisingly, we find that operators with a weaker binding affinity can yield higher repression levels than stronger operators. Repressor bound to upstream operators modulates promoter escape, and the magnitude of this modulation is not correlated with the repressor-operator binding affinity. This suggests that operator sequences may modulate transcription by altering the nature of the interaction of the bound transcription factor with the transcriptional machinery, implying a new layer of sequence dependence that must be confronted in the quantitative understanding of gene expression. PMID:22840405

  16. Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L-strand coding genes

    PubMed Central

    2013-01-01

    Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. PMID:23962312

  17. Dinoflagellate Phylogeny as Inferred from Heat Shock Protein 90 and Ribosomal Gene Sequences

    PubMed Central

    Hoppenrath, Mona; Leander, Brian S.

    2010-01-01

    Background Interrelationships among dinoflagellates in molecular phylogenies are largely unresolved, especially in the deepest branches. Ribosomal DNA (rDNA) sequences provide phylogenetic signals only at the tips of the dinoflagellate tree. Two reasons for the poor resolution of deep dinoflagellate relationships using rDNA sequences are (1) most sites are relatively conserved and (2) there are different evolutionary rates among sites in different lineages. Therefore, alternative molecular markers are required to address the deeper phylogenetic relationships among dinoflagellates. Preliminary evidence indicates that the heat shock protein 90 gene (Hsp90) will provide an informative marker, mainly because this gene is relatively long and appears to have relatively uniform rates of evolution in different lineages. Methodology/Principal Findings We more than doubled the previous dataset of Hsp90 sequences from dinoflagellates by generating additional sequences from 17 different species, representing seven different orders. In order to concatenate the Hsp90 data with rDNA sequences, we supplemented the Hsp90 sequences with three new SSU rDNA sequences and five new LSU rDNA sequences. The new Hsp90 sequences were generated, in part, from four additional heterotrophic dinoflagellates and the type species for six different genera. Molecular phylogenetic analyses resulted in a paraphyletic assemblage near the base of the dinoflagellate tree consisting of only athecate species. However, Noctiluca was never part of this assemblage and branched in a position that was nested within other lineages of dinokaryotes. The phylogenetic trees inferred from Hsp90 sequences were consistent with trees inferred from rDNA sequences in that the backbone of the dinoflagellate clade was largely unresolved. Conclusions/Significance The sequence conservation in both Hsp90 and rDNA sequences and the poor resolution of the deepest nodes suggests that dinoflagellates reflect an explosive

  18. Unconventional Sequence Requirement for Viral Late Gene Core Promoters of Murine Gammaherpesvirus 68

    PubMed Central

    Wong-Ho, Elaine; Davis, Zoe H.; Zhang, Bingqing; Huang, Jian; Gong, Hao; Deng, Hongyu; Liu, Fenyong; Glaunsinger, Britt; Sun, Ren

    2014-01-01

    Infection with the human gammaherpesviruses, Epstein-Barr virus (EBV) and Kaposi's sarcoma-associated herpesvirus (KSHV), is associated with several cancers. During lytic replication of herpesviruses, viral genes are expressed in an ordered cascade. However, the mechanism by which late gene expression is regulated has not been well characterized in gammaherpesviruses. In this study, we have investigated the cis element that mediates late gene expression during de novo lytic infection with murine gammaherpesvirus 68 (MHV-68). A reporter system was established and used to assess the activity of viral late gene promoters upon infection with MHV-68. It was found that the viral origin of lytic replication, orilyt, must be on the reporter plasmid to support activation of the late gene promoter. Furthermore, the DNA sequence required for the activation of late gene promoters was mapped to a core element containing a distinct TATT box and its neighboring sequences. The critical nucleotides of the TATT box region were determined by systematic mutagenesis in the reporter system, and the significance of these nucleotides was confirmed in the context of the viral genome. In addition, EBV and KSHV late gene core promoters could be activated by MHV-68 lytic replication, indicating that the mechanisms controlling late gene expression are conserved among gammaherpesviruses. Therefore, our results on MHV-68 establish a solid foundation for mechanistic studies of late gene regulation. PMID:24403583

  19. Evolutionary Analysis of Sequence Divergence and Diversity of Duplicate Genes in Aspergillus fumigatus

    PubMed Central

    Yang, Ence; Hulse, Amanda M.; Cai, James J.

    2012-01-01

    Gene duplication as a major source of novel genetic material plays an important role in evolution. In this study, we focus on duplicate genes in Aspergillus fumigatus, a ubiquitous filamentous fungus causing life-threatening human infections. We characterize the extent and evolutionary patterns of the duplicate genes in the genome of A. fumigatus. Our results show that A. fumigatus contains a large amount of duplicate genes with pronounced sequence divergence between two copies, and approximately 10% of them diverge asymmetrically, i.e. two copies of a duplicate gene pair diverge at significantly different rates. We use a Bayesian approach of the McDonald-Kreitman test to infer distributions of selective coefficients γ(=2Nes) and find that (1) the values of γ for two copies of duplicate genes co-vary positively and (2) the average γ for the two copies differs between genes from different gene families. This analysis highlights the usefulness of combining divergence and diversity data in studying the evolution of duplicate genes. Taken together, our results provide further support and refinement to the theories of gene duplication. Through characterizing the duplicate genes in the genome of A. fumigatus, we establish a computational framework, including parameter settings and methods, for comparative study of genetic redundancy and gene duplication between different fungal species. PMID:23225993

  20. Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads

    PubMed Central

    Dong, Jiaqiang; Feng, Yaping; Kumar, Dibyendu; Zhang, Wei; Zhu, Tingting; Luo, Ming-Cheng; Messing, Joachim

    2016-01-01

    Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41–48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups. PMID:27354512

  1. Purification of two chitinases from Rhizopus oligosporus and isolation and sequencing of the encoding genes.

    PubMed Central

    Yanai, K; Takaya, N; Kojima, N; Horiuchi, H; Ohta, A; Takagi, M

    1992-01-01

    Two chitinases were purified from Rhizopus oligosporus, a filamentous fungus belonging to the class Zygomycetes, and designated chitinase I and chitinase II. Their N-terminal amino acid sequences were determined, and two synthetic oligonucleotide probes corresponding to these amino acid sequences were synthesized. Southern blot analyses of the total genomic DNA from R. oligosporus with these oligonucleotides as probes indicated that one of the two genes encoding these two chitinases was contained in a 2.9-kb EcoRI fragment and in a 3.6-kb HindIII fragment and that the other one was contained in a 2.9-kb EcoRI fragment and in a 11.5-kb HindIII fragment. Two DNA fragments were isolated from the phage bank of R. oligosporus genomic DNA with the synthetic oligonucleotides as probes. The restriction enzyme analyses of these fragments coincided with the Southern blot analyses described above and the amino acid sequences deduced from their nucleotide sequences contained those identical to the determined N-terminal amino acid sequences of the purified chitinases, indicating that each of these fragments contained a gene encoding chitinase (designated chi 1 and chi 2, encoding chitinase I and II, respectively). The deduced amino acid sequences of these two genes had domain structures similar to that of the published sequence of chitinase of Saccharomyces cerevisiae, except that they had an additional C-terminal domain. Furthermore, there were significant differences between the molecular weights experimentally determined with the two purified enzymes and those deduced from the nucleotide sequences for both genes. Analysis of the N- and C-terminal amino acid sequences of both chitinases and comparison of them with the amino acid sequences deduced from the nucleotide sequences revealed posttranslational processing not only at the N-terminal signal sequences but also at the C-terminal domains. It is concluded that these chitinases are synthesized with pre- and prosequences in

  2. Nucleotide sequence variation of chitin synthase genes among ectomycorrhizal fungi and its potential use in taxonomy.

    PubMed Central

    Mehmann, B; Brunner, I; Braus, G H

    1994-01-01

    DNA sequences of single-copy genes coding for chitin synthases (UDP-N-acetyl-D-glucosamine:chitin 4-beta-N-acetylglucosaminyltransferase; EC 2.4.1.16) were used to characterize ectomycorrhizal fungi. Degenerate primers deduced from short, completely conserved amino acid stretches flanking a region of about 200 amino acids of zymogenic chitin synthases allowed the amplification of DNA fragments of several members of this gene family. Different DNA band patterns were obtained from basidiomycetes because of variation in the number and length of amplified fragments. Cloning and sequencing of the most prominent DNA fragments revealed that these differences were due to various introns at conserved positions. The presence of introns in basidiomycetous fungi therefore has a potential use in identification of genera by analyzing PCR-generated DNA fragment patterns. Analyses of the nucleotide sequences of cloned fragments revealed variations in nucleotide sequences from 4 to 45%. By comparison of the deduced amino acid sequences, the majority of the DNA fragments were identified as members of genes for chitin synthase class II. The deduced amino acid sequences from species of the same genus differed only in one amino acid residue, whereas identity between the amino acid sequences of ascomycetous and basidiomycetous fungi within the same taxonomic class was found to be approximately 43 to 66%. Phylogenetic analysis of the amino acid sequence of class II chitin synthase-encoding gene fragments by using parsimony confirmed the current taxonomic groupings. In addition, our data revealed a fourth class of putative zymogenic chitin synthesis. Images PMID:7944356

  3. Phylogeny of ruminants secretory ribonuclease gene sequences of pronghorn (Antilocapra americana).

    PubMed

    Beintema, Jaap J; Breukelman, Heleen J; Dubois, Jean-Yves F; Warmels, Hayo W

    2003-01-01

    Phylogenetic analyses based on primary structures of mammalian ribonucleases, indicated that three homologous enzymes (pancreatic, seminal and brain ribonucleases) present in the bovine species are the results of gene duplication events, which occurred in the ancestor of the ruminants after divergence from other artiodactyls. In this paper sequences are presented of genes encoding pancreatic and brain-type ribonuclease genes of pronghorn (Antilocapra americana). The seminal-type ribonuclease gene could not be detected in this species, neither by PCR amplification nor by Southern blot analyses, indicating that it may be deleted completely in this species. Previously we demonstrated of a study of amino acid sequences of pancreatic ribonucleases of a large number of ruminants the monophyly of bovids and cervids, and that pronghorn groups with giraffe. Here we present phylogenetic analyses of nucleotide sequences of ribonucleases and other molecules from ruminant species and compare these with published data. Chevrotain (Tragulus) always groups with the other ruminants as separate taxon from the pecora or true ruminants. Within the pecora the relationships between Bovidae, Cervidae, Giraffidae, and pronghorn (Antilocapra) cannot be decided with certainty, although in the majority of analyses Antilocapra diverges first, separately or joined with giraffe. Broad taxon sampling and investigation of specific sequence features may be as important for reliable conclusions in phylogeny as the lengths of analyzed sequences. PMID:12470934

  4. Minimum entropy decomposition: Unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences

    PubMed Central

    Eren, A Murat; Morrison, Hilary G; Lescault, Pamela J; Reveillaud, Julie; Vineis, Joseph H; Sogin, Mitchell L

    2015-01-01

    Molecular microbial ecology investigations often employ large marker gene datasets, for example, ribosomal RNAs, to represent the occurrence of single-cell genomes in microbial communities. Massively parallel DNA sequencing technologies enable extensive surveys of marker gene libraries that sometimes include nearly identical sequences. Computational approaches that rely on pairwise sequence alignments for similarity assessment and de novo clustering with de facto similarity thresholds to partition high-throughput sequencing datasets constrain fine-scale resolution descriptions of microbial communities. Minimum Entropy Decomposition (MED) provides a computationally efficient means to partition marker gene datasets into ‘MED nodes', which represent homogeneous operational taxonomic units. By employing Shannon entropy, MED uses only the information-rich nucleotide positions across reads and iteratively partitions large datasets while omitting stochastic variation. When applied to analyses of microbiomes from two deep-sea cryptic sponges Hexadella dedritifera and Hexadella cf. dedritifera, MED resolved a key Gammaproteobacteria cluster into multiple MED nodes that are specific to different sponges, and revealed that these closely related sympatric sponge species maintain distinct microbial communities. MED analysis of a previously published human oral microbiome dataset also revealed that taxa separated by less than 1% sequence variation distributed to distinct niches in the oral cavity. The information theory-guided decomposition process behind the MED algorithm enables sensitive discrimination of closely related organisms in marker gene amplicon datasets without relying on extensive computational heuristics and user supervision. PMID:25325381

  5. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    SciTech Connect

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  6. Sequence arrangement of the rRNA genes of the dipteran Sarcophaga bullata.

    PubMed

    French, C K; Fouts, D L; Manning, J E

    1981-06-11

    Velocity sedimentation studies of RNA of Sarcophaga bullata show that the major rRNA species have sedimentation values of 26S and 18S. Analysis of the rRNA under denaturing conditions indicates that there is a hidden break centrally located in the 26S rRNA species. Saturation hybridization studies using total genomic DNA and rRNA show that 0.08% of the nuclear DNA is occupied by rRNA coding sequences and that the average repetition frequency of these coding sequences is approximately 144. The arrangement of the rRNA genes and their spacer sequences on long strands of purified rDNA was determined by the examination of the structure of rRNa:DNA hybrids in the electron microscope. Long DNA strands contain several gene sets (18S + 26S) with one repeat unit containing the following sequences in order given: (a) An 18S gene of length 2.12 kb, (b) an internal transcribed spacer of length 2.01 kb, which contains a short sequence that may code for a 5.8S rRNA, (c) A 26S gene of length 4.06 kb which, in 20% of the cases, contains an intron with an average length of 5.62 kb, and (d) an external spacer of average length of 9.23 kb.

  7. Molecular genotyping of human Ureaplasma species based on multiple-banded antigen (MBA) gene sequences.

    PubMed

    Kong, F; Ma, Z; James, G; Gordon, S; Gilbert, G L

    2000-09-01

    Ureaplasma urealyticum has been divided into 14 serovars. Recently, subdivision of U. urealyticum into two species has been proposed: U. parvum (previously U. urealyticum parvo biovar), comprising four serovars (1, 3, 6, 14) and U. urealyticum (previously U. urealyticum T-960 biovar), 10 serovars (2, 4, 5, 7-13). The multiple-banded antigen (MBA) genes of these species contain both species and serovar/subtype specific sequences. Based on whole sequences of the 5'-ends of MBA genes of U. parvum serovars and partial sequences of the 5'-ends of MBA genes of U. urealyticum serovars, we previously divided each of these species into three MBA genotypes. To further elucidate the relationships between serovars, we sequenced the whole 5'-ends of MBA genes of all 10 U. urealyticum serovars and partial repetitive regions of these genes from all serovars of U. parvum and U. urealyticum. For the first time, all four serovars of U. parvum were clearly differentiated from each other. In addition, the 10 serovars of U. urealyticum were divided into five MBA genotypes, as follows: MBA genotype A comprises serovars 2, 5, 8; MBA genotype B, serovar 10 only; MBA genotype C, serovars 4, 12, 13; MBA genotype D, serovar 9 only; and MBA genotype E comprises serovars 7 and 11. There were no sequence differences between members within each MBA genotype. Further work is required to identify other genes or other regions of the MBA genes that may be used to differentiate U. urealyticum serovars within MBA genotypes A, C and E. A better understanding of the molecular basis of serotype differentiation will help to improve subtyping methods for use in studies of the pathogenesis and epidemiology of these organisms.

  8. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence

    PubMed Central

    Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method. PMID:26808495

  9. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    PubMed

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method. PMID:26808495

  10. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    PubMed

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.

  11. Nucleotide sequence of the gene for the b subunit of human factor XIII

    SciTech Connect

    Bottenus, R.E.; Ichinose, A.; Davie, E.W. )

    1990-12-01

    Factor XIII (M{sub r} 320 000) is a blood coagulation factor that stabilizes and strengthens the fibrin clot. It circulates in blood as a tetramer composed of two a subunits (M{sub r} 75 000 each) and two b subunits (M{sub r} 80 000 each). The b subunit consists of 641 amino acids and includes 10 tandem repeats of 60 amino acids known as GP-I structures, short consensus repeats (SCR), or sushi domains. In the present study, the human gene for the b subunit has been isolated from three different genomic libraries prepared in {lambda} phage. Fifteen independent phage with inserts coding for the entire gene were isolated and characterized by restriction mapping, Southern blotting, and DNA sequencing. The gene was found to be 28 kilobases in length and consisted of 12 exons (I-XII) separated by 11 intervening sequences. The leader sequence was encoded by exon I, while the carbonyl-terminal region of the protein was encoded by exon XII. Exons II-XI each coded for a single sushi domain, suggesting that the gene evolved through exon shuffling and duplication. The 12 exons in the gene ranged in size from 64 to 222 base pairs, while the introns ranged in size from 87 to 9970 nucleotides and made up 92{percent} of the gene. One nucleotide change was found in the coding region of the gene when its sequence was compared to that of the cDNA. This difference, however, did not result in a change in the amino acid sequence of the protein.

  12. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  13. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  14. Sequence of the PV2 gene of rice hoja blanca tenuivirus RNA-2.

    PubMed

    De Miranda, J R; Hull, R; Espinoza, A M

    1995-01-01

    Comparison of a partial sequence of rice hoja blanca tenuivirus RNA-2 with 40% similarity to rice stripe tenuivirus RNA-2 revealed regions of high local sequence homology at the 5' terminus, within the coding region (the pv2 gene), and in the intergenic region separating this gene from the other protein (pc2) encoded by this ambisense RNA. Analysis of the conserved regions of the pv2 protein identified two motifs found principally in viral membrane glycoproteins and six motifs found each in a wide variety of proteins. The possible significance of these results is discussed. PMID:8560781

  15. Cloning and nucleotide sequence of the anaerobically regulated pepT gene of Salmonella typhimurium.

    PubMed Central

    Miller, C G; Miller, J L; Bagga, D A

    1991-01-01

    The anaerobically regulated pepT gene of Salmonella typhimurium has been cloned in pBR328. Strains carrying the pepT plasmid, pJG17, overproduce peptidase T by approximately 70-fold. The nucleotide sequence of a 2.5-kb region including pepT has been determined. The sequence codes for a protein of 44,855 Da, consistent with a molecular weight of approximately 46,000 for peptidase T (as determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and gel filtration). The N-terminal amino acid sequence of peptidase T purified from a pJG17-containing strain matches that predicted by the nucleotide sequence. A plasmid carrying an anaerobically regulated pepT::lacZ transcriptional fusion contains only 165 bp 5' to the start of translation. This region contains a sequence highly homologous to that identified in Escherichia coli as the site of action of the FNR protein, a positive regulator of anaerobic gene expression. A region of the deduced amino acid sequence of peptidase T is similar to segments of Pseudomonas carboxypeptidase G2, the E. coli peptidase encoded by the iap gene, and E. coli peptidase D. PMID:1904438

  16. Cloning and sequence analysis of the major outer membrane protein gene of Chlamydia psittaci 6BC.

    PubMed

    Everett, K D; Andersen, A A; Plaunt, M; Hatch, T P

    1991-08-01

    The gene encoding the major outer membrane protein (MOMP) of the psittacine Chlamydia psittaci strain 6BC was cloned and sequenced. N-terminal protein sequencing of the mature MOMP indicated that it is posttranslationally processed at a site identical to the site previously identified in the MOMP of Chlamydia trachomatis L2. The nucleotide sequence of the C. psittaci 6BC MOMP gene was found to be 67 to 68% identical to those of human C. trachomatis strains, 73% identical to that of Chlamydia pneumoniae IOL-207, 79% identical to that of the C. psittaci guinea pig inclusion conjunctivitis strain, GPIC, and 83% identical to that of the C. psittaci ovine abortion strain S26/3. In contrast, the 6BC sequence was found to be greater than 99% identical to the sequences reported for two strains of C. psittaci, A22/M and Cal-10 meningopneumonitis, believed to be of nonpsittacine avian origin. Monoclonal antibody analysis confirmed the nonpsittacine avian origin of A22/M but identified the Cal-10 strain from which the MOMP gene was previously sequenced as a psittacine strain. These results confirm that psittacine and nonpsittacine avian strains of C. psittaci are closely related and distinct from the mammalian guinea pig inclusion conjunctivitis and ovine abortion strains of C. psittaci.

  17. DNA sequence analysis of conserved genes reveals hybridization events that increase genetic diversity in Verticillium dahliae.

    PubMed

    Collado-Romero, Melania; Jiménez-Díaz, Rafael M; Mercado-Blanco, Jesús

    2010-01-01

    The hybrid origin of a Verticillium dahliae isolate belonging to the vegetative compatibility group (VCG) 3 is reported in this work. Moreover, new data supporting the hybrid origin of two V. dahliae var. longisporum (VDLSP) isolates are provided as well as information about putative parentals. Thus, isolates of VDLSP and V. dahliae VCG3 were found harboring multiple sequences of actin (Act), β-tubulin (β-tub), calmodulin (Cal) and histone 3 (H3) genes. Phylogenetic analysis of these sequences, the internal transcribed sequences (ITS-1 and ITS-2) of the rRNA genes and of a V. dahliae-specific sequence provided molecular evidences for the interspecific hybrid origin of those isolates. Sequence analysis suggests that some of VDLSP isolates may have resulted from hybridization events between a V. dahliae isolate of VCG1 and/or VCG4A and, probably, a closely related taxon to Verticillium alboatrum but not this one. Similarly, phylogenetic analysis and PCR markers indicated that a V. dahliae VCG3 isolate might have arisen from a hybridization event between a V. dahliae VCG1B isolate and as yet unidentified parent. This second parental probably does not belong to the Verticillium genus according to the gene sequences dissimilarities found between the VCG3 isolate and Verticillium spp. These results suggest an important role of parasexuality in diversity and evolution in the genus Verticillium and show that interspecific hybrids within this genus may not be rare in nature.

  18. Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes.

    PubMed

    Pua, Chee Jian; Bhalshankar, Jaydutt; Miao, Kui; Walsh, Roddy; John, Shibu; Lim, Shi Qi; Chow, Kingsley; Buchan, Rachel; Soh, Bee Yong; Lio, Pei Min; Lim, Jaclyn; Schafer, Sebastian; Lim, Jing Quan; Tan, Patrick; Whiffin, Nicola; Barton, Paul J; Ware, James S; Cook, Stuart A

    2016-02-01

    Inherited cardiac conditions (ICCs) are characterised by marked genetic and allelic heterogeneity and require extensive sequencing for genetic characterisation. We iteratively optimised a targeted gene capture panel for ICCs that includes disease-causing, putatively pathogenic, research and phenocopy genes (n = 174 genes). We achieved high coverage of the target region on both MiSeq (>99.8% at ≥ 20× read depth, n = 12) and NextSeq (>99.9% at ≥ 20×, n = 48) platforms with 100% sensitivity and precision for single nucleotide variants and indels across the protein-coding target on the MiSeq. In the final assay, 40 out of 43 established ICC genes informative in clinical practice achieved complete coverage (100 % at ≥ 20×). By comparison, whole exome sequencing (WES; ∼ 80×), deep WES (∼ 500×) and whole genome sequencing (WGS; ∼ 70×) had poorer performance (88.1, 99.2 and 99.3% respectively at ≥ 20×) across the ICC target. The assay described here delivers highly accurate and affordable sequencing of ICC genes, complemented by accessible cloud-based computation and informatics. See Editorial in this issue (DOI: 10.1007/s12265-015-9667-8 ). PMID:26888179

  19. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools. PMID:27536341

  20. The 5'-flanking regions of three pea legumin genes: comparison of the DNA sequences.

    PubMed Central

    Lycett, G W; Croy, R R; Shirsat, A H; Richards, D M; Boulter, D

    1985-01-01

    Approximately 1200 nucleotides of sequence data from the promoter and 5'-flanking regions of each of three pea (Pisum sativum L.) legumin genes (legA, legB and legC) are presented. The promoter regions of all three genes were found to be identical including the "TATA box", and "CAAT box', and sequences showing homology to the SV40 enhancers. The legA sequence begins to diverge from the others about 300bp from the start codon, whereas the other two genes remain identical for another 550bp. The regions of partial homology exhibit deletions or insertions and some short, comparatively well conserved sequences. The significance of these features is discussed in terms of evolutionary mechanisms and their possible functional roles. The legC gene contains a region that may potentially form either of two mutually exclusive stem-loop structures, one of which has a stem 42bp long, which suggests that it could be fairly stable. We suggest that a mechanism of switching between such alternative structures may play some role in gene control or may represent the insertion of a transposable element. PMID:2997721

  1. Metazoan remaining genes for essential amino acid biosynthesis: sequence conservation and evolutionary analyses.

    PubMed

    Costa, Igor R; Thompson, Julie D; Ortega, José Miguel; Prosdocimi, Francisco

    2014-12-24

    Essential amino acids (EAA) consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS) and betaine-homocysteine S-methyltransferase (BHMT) diverged from the expected Tree of Life (ToL) relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.

  2. Cloning, nucleotide sequence, and expression of the Pasteurella haemolytica A1 glycoprotease gene.

    PubMed Central

    Abdullah, K M; Lo, R Y; Mellors, A

    1991-01-01

    Pasteurella haemolytica serotype A1 secretes a glycoprotease which is specific for O-sialoglycoproteins such as glycophorin A. The gene encoding the glycoprotease enzyme has been cloned in the recombinant plasmid pH1, and its nucleotide sequence has been determined. The gene (designated gcp) codes for a protein of 35.2 kDa, and an active enzyme protein of this molecular mass can be observed in Escherichia coli clones carrying pPH1. In vivo labeling of plasmid-encoded proteins in E. coli maxicells demonstrated the expression of a 35-kDa protein from pPH1. The amino-terminal sequence of the heterologously expressed protein corresponds to that predicted from the nucleotide sequence. The glycoprotease is a neutral metalloprotease, and the predicted amino acid sequence of the glycoprotease contains a putative zinc-binding site. The gene shows no significant homology with the genes for other proteases of procaryotic or eucaryotic origin. However, there is substantial homology between gcp and an E. coli gene, orfX, whose product is believed to function in the regulation of macromolecule biosynthesis. Images PMID:1885539

  3. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population. PMID:26845859

  4. OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

    PubMed

    Schreiber, Fabian; Wörheide, Gert; Morgenstern, Burkhard

    2009-07-01

    In the absence of whole genome sequences for many organisms, the use of expressed sequence tags (EST) offers an affordable approach for researchers conducting phylogenetic analyses to gain insight about the evolutionary history of organisms. Reliable alignments for phylogenomic analyses are based on orthologous gene sequences from different taxa. So far, researchers have not sufficiently tackled the problem of the completely automated construction of such datasets. Existing software tools are either semi-automated, covering only part of the necessary data processing, or implemented as a pipeline, requiring the installation and configuration of a cascade of external tools, which may be time-consuming and hard to manage. To simplify data set construction for phylogenomic studies, we set up a web server that uses our recently developed OrthoSelect approach. To the best of our knowledge, our web server is the first web-based EST analysis pipeline that allows the detection of orthologous gene sequences in EST libraries and outputs orthologous gene alignments. Additionally, OrthoSelect provides the user with an extensive results section that lists and visualizes all important results, such as annotations, data matrices for each gene/taxon and orthologous gene alignments. The web server is available at http://orthoselect.gobics.de.

  5. The nucleotide sequence of the equine herpesvirus 4 gC gene homologue.

    PubMed

    Nicolson, L; Onions, D E

    1990-11-01

    The genomic position of an equine herpesvirus 4 (EHV-4) gene homologue of the herpes simplex virus 1 (HSV-1) gC gene was determined by Southern analysis and DNA sequencing. The gene lies within a 2-kbp Bg/II-EcoRI fragment mapping between 0.15 and 0.17 within the long unique component of the EHV-4 genome and is transcribed from right to left. Putative promoter elements were identified upstream of the 1455-bp open reading frame which encodes a 485-amino-acid protein of unglycosylated molecular weight 52,513. Computer-assisted analysis of the primary sequence predicts the protein possesses a domain structure characteristic of a type 1 integral membrane glycoprotein. Four domains were distinguished--(i) an N-terminal signal sequence, (ii) a large extracellular domain containing 11 putative N-linked glycosylation sites, (iii) a hydrophobic transmembrane domain, and (iv) a C-terminal charged domain. Comparison of the predicted amino acid sequence to that of other herpesvirus glycoproteins indicated identities of between 22 and 29% with HSV-1 gC, HSV-2 gC, VZV gpV, PRV gIII, BHV-1 gIII, and MDV A antigen and of 79% with EHV-1 gp13. A gene with no apparent homologue in HSV-1 or VZV maps immediately downstream of the EHV-4 gC gene homologue. PMID:2171212

  6. Rous sarcoma virus contains sequences which permit expression of the gag gene in Escherichia coli.

    PubMed Central

    Mermer, B; Malamy, M; Coffin, J M

    1983-01-01

    Several aspects of Rous sarcoma virus gene expression, including transcription, translation, and protein processing, can occur within Escherichia coli containing cloned viral DNA. The viral long terminal repeat contains a bacterial promoter, and viral sequences at or near the authentic viral initiation codon permit the initiation of translation. These signals can direct the synthesis in E. coli of the viral gag gene precursor Pr76 or, when fused to a portion of the lacZ gene, a gag-beta-galactosidase fusion protein. Pr76 is processed into gag structural proteins in E. coli in a process which is dependent upon the gag product p15. These observations suggest that E. coli can be used for the introduction and analysis of mutations in sequences relevant to viral gene expression. Images PMID:6316124

  7. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  8. Sequence and evolution of the blue cone pigment gene in old and new world primates

    SciTech Connect

    Hunt, D.M.; Cowing, J.A.; Patel, R.

    1995-06-10

    The sequences of the blue cone photopigments in the talapoin monkey (Miopithecus talapoin), an Old World primate, and in the marmoset (Callithrix jacchus), a New World monkey, are presented. Both genes are composed of 5 exons separated by 4 introns. In this respect, they are identical to the human blue gene, and intron sizes are also similar. Based on the level of amino acid identity, both monkey pigments are members of the S branch of pigments. Alignment of these sequences with the human gene requires the insertion/deletion of two separate codons in exon 1. The silent site divergence between these primate blue genes indicates a separation of the Old and New World primate lineages around 43 million years ago. 41 refs., 1 fig., 3 tabs.

  9. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention.

  10. Further Examples of Evolution by Gene Duplication Revealed through DNA Sequence Comparisons

    PubMed Central

    Ohta, T.

    1994-01-01

    To test the theory that evolution by gene duplication occurs as a result of positive Darwinian selection that accompanies the acceleration of mutant substitutions, DNA sequences of recent duplication were analyzed by estimating the numbers of synonymous and nonsynonymous substitutions. For the troponin C family, at the period of differentiation of the fast and slow isoforms, amino acid substitutions were shown to have been accelerated relative to synonymous substitutions. Comparison of the first exon of α-actin genes revealed that amino acid substitutions were accelerated when the smooth muscle, skeletal and cardiac isoforms differentiated. Analysis of members of the heat shock protein 70 gene family of mammals indicates that heat shock responsive genes including duplicated copies are evolving rapidly, contrary to the cognitive genes which have been evolutionarily conservative. For the α(1)-antitrypsin reactive center, the acceleration of amino acid substitution has been found for gene pairs of recent duplication. PMID:7896112

  11. Computational Analyses of Simple Sequence Repeats on Human Tissue Specific Genes Promoters

    NASA Astrophysics Data System (ADS)

    FeiFei, Zhao; XiuJun, Gong; XinMi, Liu; LiFeng, Dong

    Promoter region of gene closely related with tissue specific expression and SSRs (simple sequence repeats) have been shown to have a variety of effects on an organism. This paper used a heuristic method to find SSRs and compared the most frequently SSRs on promoter region of both human tissues specific genes and human housekeeping genes. We used kidney and testis tissue as examples to show the final results. Especially, we found that (AGG)n is kidney specific SSR and (GCG)n is testis specific SSR. We also analyzed the SSRs frequency density distribution on different promoter regions of both tissue specific genes and housekeeping genes, and we found the density of housekeeping genes on core-promoter region is much higher than on other promoter regions.

  12. Plant simple sequence repeats: distribution, variation, and effects on gene expression.

    PubMed

    Sharopova, Natalya

    2008-02-01

    Genome-wide simple sequence repeat (SSR) information was analyzed together with functional annotations of Arabidopsis genes and public gene expression data for Arabidopsis and rice. Analysis of more than 15,000 Arabidopsis and more than 16,000 rice SSRs indicated that SSRs may affect the expression of hundreds of genes. Data from experiments on DNA methylation, histone acetylation, and transcript turnover suggest that SSRs may affect gene expression at transcriptional and posttranscriptional levels. Members of some functional groups were shown to be enriched with SSRs and often contained similar but non-homologous repeats within the same gene regions. In addition, the distribution of perfect and imperfect SSRs in some Arabidopsis, maize, and rice genes was used to demonstrate how two-level control of SSR variation may contribute to protein evolution.

  13. Discovery of sequence motifs related to coexpression of genes using evolutionary computation

    PubMed Central

    Fogel, Gary B.; Weekes, Dana G.; Varga, Gabor; Dow, Ernst R.; Harlow, Harry B.; Onyia, Jude E.; Su, Chen

    2004-01-01

    Transcription factors are key regulatory elements that control gene expression. Recognition of transcription factor binding site (TFBS) motifs in the upstream region of coexpressed genes is therefore critical towards a true understanding of the regulations of gene expression. The task of discovering eukaryotic TFBSs remains a challenging problem. Here, we demonstrate that evolutionary computation can be used to search for TFBSs in upstream regions of genes known to be coexpressed. Evolutionary computation was used to search for TFBSs of genes regulated by octamer-binding factor and nuclear factor kappa B. The discovered binding sites included experimentally determined known binding motifs as well as lists of putative, previously unknown TFBSs. We believe that this method to search nucleotide sequence information efficiently for similar motifs will be useful for discovering TFBSs that affect gene regulation. PMID:15266008

  14. Nucleotide sequence and revised map location of the arn gene from bacteriophage T4.

    PubMed

    Kim, B C; Kim, K; Park, E H; Lim, C J

    1997-10-31

    Non-glucosylated (Glu-) T-even phage DNAs are restricted by Escherichia coli RgIA and RgIB endonucleases with different specificities. RgIB endonuclease activity is strongly inhibited by anti-restriction endonuclease (Arn) encoded by the bacteriophage T4 genome. The nucleotide sequence of the arn gene encoding Arn was determined. The product of the cloned arn gene was overexpressed by the T7 RNA polymerase/promoter system, and its molecular size is consistent with that predicted from the open reading frame of the arn gene. The arn gene is located between the asiA gene and motA gene in the region of 161,300-161,578 nucleotides.

  15. Complete sequence and gene organization of the mitochondrial genome of Asio flammeus (Strigiformes, strigidae).

    PubMed

    Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei

    2016-07-01

    The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group.

  16. Development of Resistance during Antimicrobial Therapy Caused by Insertion Sequence Interruption of Porin Genes

    PubMed Central

    Hernández-Allés, Santiago; Benedí, Vicente J.; Martínez-Martínez, Luis; Pascual, Álvaro; Aguilar, Alicia; Tomás, Juan M.; Albertí, Sebastián

    1999-01-01

    We have demonstrated by using an in vitro approach that interruption of the OmpK36 porin gene by insertion sequences (ISs) is a common type of mutation that causes loss of porin expression and increased resistance to cefoxitin in Klebsiella pneumoniae. This mechanism also operates in vivo: of 13 porin-deficient cefoxitin-resistant clinical isolates of K. pneumoniae, 4 presented ISs in their ompK36 gene. PMID:10103203

  17. Cloning and sequencing of the ferredoxin gene of blue-green alga Anabaena siamensis

    NASA Astrophysics Data System (ADS)

    Li, Shou-Dong; Song, Li-Rong; Liu, Yong-Ding; Zhao, Jin-Dong

    1998-03-01

    The structure gene for ferredoxin, petFI, from Anabaena siamensis has been amplified by polymerase chain reaction(PCR) and cloned into cloning vector pGEM-3zf(+). The nucleotide sequence of petFI has been determined with silver staining sequencing method. There is 96.8% homology between coding region of petFI from A. siamensis and that of petFI from A. sp. 7120. Amino acid sequences of seven strains of blue-green algae are compared.

  18. How are exons encoding transmembrane sequences distributed in the exon-intron structure of genes?

    PubMed

    Sawada, Ryusuke; Mitaku, Shigeki

    2011-01-01

    The exon-intron structure of eukaryotic genes raises a question about the distribution of transmembrane regions in membrane proteins. Were exons that encode transmembrane regions formed simply by inserting introns into preexisting genes or by some kind of exon shuffling? To answer this question, the exon-per-gene distribution was analyzed for all genes in 40 eukaryotic genomes with a particular focus on exons encoding transmembrane segments. In 21 higher multicellular eukaryotes, the percentage of multi-exon genes (those containing at least one intron) within all genes in a genome was high (>70%) and with a mean of 87%. When genes were grouped by the number of exons per gene in higher eukaryotes, good exponential distributions were obtained not only for all genes but also for the exons encoding transmembrane segments, leading to a constant ratio of membrane proteins independent of the exon-per-gene number. The positional distribution of transmembrane regions in single-pass membrane proteins showed that they are generally located in the amino or carboxyl terminal regions. This nonrandom distribution of transmembrane regions explains the constant ratio of membrane proteins to the exon-per-gene numbers because there are always two terminal (i.e., the amino and carboxyl) regions - independent of the length of sequences.

  19. Analysis of Sequences Regulating Larval Expression of the Adh Gene of Drosophila Melanogaster

    PubMed Central

    Shen, NLL.; Hotaling, E. C.; Subrahmanyam, G.; Martin, P. F.; Sofer, W.

    1991-01-01

    The effects of a series of eight, 50 base pair internal deletions in the 5' region upstream of the proximal transcription start site of the Adh gene of Drosophila melanogaster were examined in a quantitative assay. Mixtures of two plasmids, one bearing a deleted gene, the other with an intact reference gene, were injected into alcohol dehydrogenase-negative embryos. Third instar larvae of the injected generation were assayed for relative alcohol dehydrogenase enzyme activity. Quantitative analysis of the eight deletions indicated that two regions were required for any detectable enzyme activity and one region was required for appropriate tissue specificity. The remaining five deletions significantly decreased, but did not eliminate activity. When the deleted genes were placed on a plasmid with an intact reference gene, activities of all but one deletion were restored to levels equivalent to that of the intact reference gene (regardless of orientation). This restoration of activity did not occur when the regulatory region of the intact gene was replaced with the Hsp70 heat shock promoter nor when the 50-base pair deletion encompassed the region that includes the TATA sequence. The fact that seven of the eight deleted genes express activity in the presence of a reference gene on the same plasmid suggests that the deleted gene is controlled by regulatory elements in the reference gene. Further, these regulatory elements exhibit no preference for their own, more proximate, promoter. PMID:1752419

  20. Analysis of Pseudomonas putida alkane-degradation gene clusters and flanking insertion sequences: evolution and regulation of the alk genes.

    PubMed

    van Beilen, J B; Panke, S; Lucchini, S; Franchini, A G; Röthlisberger, M; Witholt, B

    2001-06-01

    The Pseudomonas putida GPo1 (commonly known as Pseudomonas oleovorans GPo1) alkBFGHJKL and alkST gene clusters, which encode proteins involved in the conversion of n-alkanes to fatty acids, are located end to end on the OCT plasmid, separated by 9.7 kb of DNA. This DNA segment encodes, amongst others, a methyl-accepting transducer protein (AlkN) that may be involved in chemotaxis to alkanes. In P. putida P1, the alkBFGHJKL and alkST gene clusters are flanked by almost identical copies of the insertion sequence ISPpu4, constituting a class 1 transposon. Other insertion sequences flank and interrupt the alk genes in both strains. Apart from the coding regions of the GPo1 and P1 alk genes (80-92% sequence identity), only the alkB and alkS promoter regions are conserved. Competition experiments suggest that highly conserved inverted repeats in the alkB and alkS promoter regions bind ALKS: PMID:11390693

  1. Gene annotation and functional analysis of a newly sequenced Synechococcus strain.

    PubMed

    Li, Y; Rao, N N; Yang, Y; Zhang, Y; Gu, Y N

    2015-10-16

    Synechococcus sp PCC 7336 represents a newly sequenced strain, and its genome is obviously different from that of other Synechococcus strains. In this analysis, local alignment and annotation databases were constructed and combined with various bioinformatic tools to carry out gene annotation and functional analysis of this strain. From this analysis, we identified 5096 protein-coding genes and 47 RNA genes. Of these, 116 genes that were classified into 9 categories were associated with photosynthesis, and type V polymerase proteins that were identified are unique for this strain. An additional 107 genes were closely related to signal transduction pathways, which primarily comprised parts of two-component regulatory systems. Gene ontogeny analysis showed that 2377 genes were annotated with a total number of 9791 functional categories, and specifically that 41 genes distributed in 4 protein complexes were involved in oxidative phosphorylation. Clusters of orthologous groups classification showed that there were 1463 homologous proteins associated with 17 specific metabolic pathways, and that most of the proteins participated in primary metabolic processes such as binding and catalysis. The phylogenetic tree based on 16S rRNA sequences indicated that Synechococcus PCC 7336 is highly likely to represent a new branch.

  2. DNA sequence of immunoglobulin heavy chain variable region gene in thyroid lymphoma.

    PubMed

    Miwa, H; Takakuwa, T; Nakatsuka, S; Tomita, Y; Matsuzuka, F; Aozasa, K

    2001-10-01

    Patho-epidemiological studies have shown that thyroid lymphoma (TL) develops in thyroid affected by chronic lymphocytic thyroiditis (CLTH). CLTH is categorized as an organ-specific autoimmune disease, in which activated B-lymphocytes secrete a number of autoantibodies. Because antigenic stimulation might be involved in the pathogenesis of TL, the variable region in heavy chain (V(H)) genes was characterized in 13 cases with TL and 3 with CLTH. Clonal rearrangement of the V(H) gene was found in 11 cases of TL, and cloning study with sequencing of complimentary determining region (CDR) 3 revealed the presence of a major clone in 4. Three of the 4 cases used V(H) 3 gene, with the homologous germline gene of V3-30 in two cases and VH26 in one case. A biased usage of V(H) 3 and V(H) 4 genes with the homologous germline gene of VH26 in V(H) 3 gene was reported previously in cases with CLTH. A high level of somatic mutation (1-21%, average 12%) with non-random distribution of replacement and silent mutations was accumulated in all cases. The frequency of the occurrence of minor clones ranged from 29-44% per case, indicating the presence of on-going mutation. DNA sequencing of immunoglobulin V(H) gene suggests that TL develops among activated lymphoid cells in CLTH at the germinal center stage under antigen selection. PMID:11676854

  3. The structure and complete nucleotide sequence of the human cyclophilin 40 (PPID) gene

    SciTech Connect

    Yokoi, Haruhiko; Shimizu, Yukiko; Anazawa, Hideharu

    1996-08-01

    Cyclophilin 40 is a recently identified member of the cyclophilin family that is found in an unactivated steroid hormone receptor complex. Cyclophilin 40 possesses a region homologous to FKBP59, a member of the FK506-binding protein family that is also a component of the receptor complex. We report the isolation and sequencing of the entire human cyclophilin 40 (hCyP40) gene (human gene symbol PPID). The gene contains 10 exons (43 to 698 bp) and 9 introns encompassing 14.2 kb. The exon organization of the cyclophilin-like region is not similar to that of the human cyclophilin A gene (PPIA), suggesting their early divergence in evolution. Determination of the sequence of the 5{prime} end of the hCyP40 mRNA by an {open_quotes}anchor-ligation PCR{close_quotes} procedure showed that transcription is initiated from a cluster of sites about 80 bp upstream from the first in-frame ATG. The immediate 5{prime}-flanking region of the gene lacks typical TATA and CAAT boxes, but is GC-rich and contains Sp1 sites, features characteristic of promoters associated with housekeeping genes. The hCyP40 gene was mapped to chromosome 4 by PCR with genomic DNA from somatic cell hybrids. As shown by {open_quotes}Zoo blot{close_quotes} analysis, the cylophilin 40 gene appears to be highly conserved throughout evolution. 47 refs., 4 figs., 1 tab.

  4. Sequence determination and analysis of the NSs genes of two tospoviruses.

    PubMed

    Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

    2012-03-01

    The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade. PMID:22187101

  5. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  6. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  7. Sequence characterisation of deletion breakpoints in the dystrophin gene by PCR

    SciTech Connect

    Abbs, S.; Sandhu, S.; Bobrow, M.

    1994-09-01

    Partial deletions of the dystrophin gene account for 65% of cases of Duchenne muscular dystrophy. A high proportion of these structural changes are generated by new mutational events, and lie predominantly within two `hotspot` regions, yet the underlying reasons for this are not known. We are characterizing and sequencing the regions surrounding deletion breakpoints in order to: (i) investigate the mechanisms of deletion mutation, and (ii) enable the design of PCR assays to specifically amplify mutant and normal sequences, allowing us to search for the presence of somatic mosaicism in appropriate family members. Using this approach we have been able to demonstrate the presence of somatic mosaicism in a maternal grandfather of a DMD-affected male, deleted for exons 49-50. Three deletions, namely of exons 48-49, 49-50, and 50, have been characterized using a PCR approach that avoids any cloning procedures. Breakpoints were initially localized to within regions of a few kilobases using Southern blot restriction analyses with exon-specific probes and PCR amplification of exonic and intronic loci. Sequencing was performed directly on PCR products: (i) mutant sequences were obtained from long-range or inverse-PCR across the deletion junction fragments, and (ii) normal sequences were obtained from the products of standard PCR, vectorette PCR, or inverse-PCR performed on YACs. Further characterization of intronic sequences will allow us to amplify and sequence across other deletion breakpoints and increase our knowledge of the mechanisms of mutation in the dystophin gene.

  8. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    SciTech Connect

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.; Jones, W.A.; Kirby, R.; Woods, D.R.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homology (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.

  9. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR.

    PubMed

    D'Souza, T M; Boominathan, K; Reddy, C A

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. PMID:8837429

  10. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR.

    PubMed Central

    D'Souza, T M; Boominathan, K; Reddy, C A

    1996-01-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. PMID:8837429

  11. Sequencing, genomic organization, and preliminary promoter analysis of a black cherry (R)-(+)-mandelonitrile lyase gene.

    PubMed

    Hu, Z; Poulton, J E

    1997-12-01

    The flavoprotein (R)-(+)-mandelonitrile lyase (MDL; EC 4.1.2.10) plays a key role in cyanogenesis in rosaceous stone fruits. An MDL gene (mdl3) and its corresponding cDNA (MDL3) were isolated from black cherry (Prunus serotina) and characterized. The mdl3 gene contains 2292 bp of the 5' flanking region, the entire coding region, and 300 bp of the 3' flanking region. The coding region is interrupted by three short introns, of which one possesses the usual GC-AG splice junction dinucleotides. This gene encodes a polypeptide of 573 amino acids that includes a putative signal sequence, 13 potential N-glycosylation sites, and a presumptive flavin adenine dinucleotide-binding site. To determine whether the 5' flanking region of the mdl3 gene is capable of driving MDL expression, it was fused to the beta-glucuronidase reporter gene for Agrobacterium-mediated transformation into tobacco. Matching endogenous MDL expression patterns, beta-glucuronidase staining was observed in maturing embryos and seeds; it also occurred in postembryonic tissues, especially in association with vascular tissues. After developing a homologous transient transformation system to facilitate identification of putative regulatory sequences, we demonstrated that 125 bp (-107 to +18) of the 5' flanking sequence of the mdl3 gene is sufficient for MDL expression in protoplasts derived from immature black cherry embryos. PMID:9414550

  12. Sequence evolution and expression regulation of stress-responsive genes in natural populations of wild tomato.

    PubMed

    Fischer, Iris; Steige, Kim A; Stephan, Wolfgang; Mboup, Mamadou

    2013-01-01

    The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.

  13. Refined mapping of X-linked reticulate pigmentary disorder and sequencing of candidate genes

    PubMed Central

    2009-01-01

    X-linked reticulate pigmentary disorder with systemic manifestations in males (PDR) is very rare. Affected males are characterized by cutaneous and visceral symptoms suggestive of abnormally regulated inXammation. A genetic linkage study of a large Canadian kindred previously mapped the PDR gene to a greater than 40 Mb interval of Xp22–p21. The aim of this study was to identify the causative gene for PDR. The Canadian pedigree was expanded and additional PDR families recruited. Genetic linkage was performed using newer microsatellite markers. Positional and functional candidate genes were screened by PCR and sequencing of coding exons in affected males. The location of the PDR gene was narrowed to a ~4.9 Mb interval of Xp22.11–p21.3 between markers DXS1052 and DXS1061. All annotated coding exons within this interval were sequenced in one affected male from each of the three multiplex families as well as one singleton, but no causative mutation was identiWed. Sequencing of other X-linked genes outside of the linked interval also failed to identify the cause of PDR but revealed a novel nonsynonymous cSNP in the GRPR gene in the Maltese population. PDR is most likely due to a mutation within the linked interval not affecting currently annotated coding exons. PMID:18404279

  14. Sequencing and Transcriptional Analysis of the Biosynthesis Gene Cluster of Abscisic Acid-Producing Botrytis cinerea

    PubMed Central

    Gong, Tao; Shu, Dan; Yang, Jie; Ding, Zhong-Tao; Tan, Hong

    2014-01-01

    Botrytis cinerea is a model species with great importance as a pathogen of plants and has become used for biotechnological production of ABA. The ABA cluster of B. cinerea is composed of an open reading frame without significant similarities (bcaba3), followed by the genes (bcaba1 and bcaba2) encoding P450 monooxygenases and a gene probably coding for a short-chain dehydrogenase/reductase (bcaba4). In B. cinerea ATCC58025, targeted inactivation of the genes in the cluster suggested at least three genes responsible for the hydroxylation at carbon atom C-1' and C-4' or oxidation at C-4' of ABA. Our group has identified an ABA-overproducing strain, B. cinerea TB-3-H8. To differentiate TB-3-H8 from other B. cinerea strains with the functional ABA cluster, the DNA sequence of the 12.11-kb region containing the cluster of B. cinerea TB-3-H8 was determined. Full-length cDNAs were also isolated for bcaba1, bcaba2, bcaba3 and bcaba4 from B. cinerea TB-3-H8. Sequence comparison of the four genes and their flanking regions respectively derived from B. cinerea TB-3-H8, B05.10 and T4 revealed that major variations were located in intergenic sequences. In B. cinerea TB-3-H8, the expression profiles of the four function genes under ABA high-yield conditions were also analyzed by real-time PCR. PMID:25268614

  15. Whole-genome sequencing and identification of Morganella morganii KT pathogenicity-related genes

    PubMed Central

    2012-01-01

    Background The opportunistic enterobacterium, Morganella morganii, which can cause bacteraemia, is the ninth most prevalent cause of clinical infections in patients at Changhua Christian Hospital, Taiwan. The KT strain of M. morganii was isolated during postoperative care of a cancer patient with a gallbladder stone who developed sepsis caused by bacteraemia. M. morganii is sometimes encountered in nosocomial settings and has been causally linked to catheter-associated bacteriuria, complex infections of the urinary and/or hepatobiliary tracts, wound infection, and septicaemia. M. morganii infection is associated with a high mortality rate, although most patients respond well to appropriate antibiotic therapy. To obtain insights into the genome biology of M. morganii and the mechanisms underlying its pathogenicity, we used Illumina technology to sequence the genome of the KT strain and compared its sequence with the genome sequences of related bacteria. Results The 3,826,919-bp sequence contained in 58 contigs has a GC content of 51.15% and includes 3,565 protein-coding sequences, 72 tRNA genes, and 10 rRNA genes. The pathogenicity-related genes encode determinants of drug resistance, fimbrial adhesins, an IgA protease, haemolysins, ureases, and insecticidal and apoptotic toxins as well as proteins found in flagellae, the iron acquisition system, a type-3 secretion system (T3SS), and several two-component systems. Comparison with 14 genome sequences from other members of Enterobacteriaceae revealed different degrees of similarity to several systems found in M. morganii. The most striking similarities were found in the IS4 family of transposases, insecticidal toxins, T3SS components, and proteins required for ethanolamine use (eut operon) and cobalamin (vitamin B12) biosynthesis. The eut operon and the gene cluster for cobalamin biosynthesis are not present in the other Proteeae genomes analysed. Moreover, organisation of the 19 genes of the eut operon differs from

  16. Seminal-type ribonuclease genes in ruminants, sequence conservation without protein expression?

    PubMed

    Kleineidam, R G; Jekel, P A; Beintema, J J; Situmorang, P

    1999-04-29

    Bovine seminal ribonuclease (BS-RNase) is an interesting enzyme both for functional and structural reasons. The enzyme is the product of a gene duplication that occurred in an ancestral ruminant. It is possible to demonstrate the presence of seminal-type genes in all other investigated ruminant species, but they are not expressed and show features of pseudogenes. In this paper we report the determination of two pancreatic and one seminal-type ribonuclease gene sequences of swamp-type water buffalo (Bubalus bubalis). The two pancreatic sequences encode proteins with identical amino acid sequences as previously determined for the enzymes isolated from swamp-type and river-type water buffalo, respectively. The seminal-type sequence has no pseudogene features and codes for an enzyme with no unusual features compared with the active bovine enzyme, except for the replacement of one of the cysteines which takes part in the two intersubunit disulfide bridges. However, Western blotting demonstrates the presence of only small amounts of the pancreatic enzymes in water buffalo semen, suggesting that also in this species the seminal-type sequence is not expressed. But it is still possible that the gene is expressed somewhere else in the body or during development. Reconstruction of seminal-type ribonuclease sequences in ancestors of Bovinae and Bovidae indicates no serious abnormalities in the encoded proteins and leads us to the hypothesis that the ruminant seminal-type ribonuclease gene has not come to expression during most of its evolutionary history, but did not exhibit a high evolutionary rate that is generally observed in pseudogenes.

  17. HBOC multi-gene panel testing: comparison of two sequencing centers.

    PubMed

    Schroeder, Christopher; Faust, Ulrike; Sturm, Marc; Hackmann, Karl; Grundmann, Kathrin; Harmuth, Florian; Bosse, Kristin; Kehrer, Martin; Benkert, Tanja; Klink, Barbara; Mackenroth, Luisa; Betcheva-Krajcir, Elitza; Wimberger, Pauline; Kast, Karin; Heilig, Mechthilde; Nguyen, Huu Phuc; Riess, Olaf; Schröck, Evelin; Bauer, Peter; Rump, Andreas

    2015-07-01

    Multi-gene panels are used to identify genetic causes of hereditary breast and ovarian cancer (HBOC) in large patient cohorts. This study compares the diagnostic workflow in two centers and gives valuable insights into different next-generation sequencing (NGS) strategies. Moreover, we present data from 620 patients sequenced at both centers. Both sequencing centers are part of the German consortium for hereditary breast and ovarian cancer (GC-HBOC). All 620 patients included in this study were selected following standard BRCA1/2 testing guidelines. A set of 10 sequenced genes was analyzed per patient. Twelve samples were exchanged and sequenced at both centers. NGS results were highly concordant in 12 exchanged samples (205/206 variants = 99.51 %). One non-pathogenic variant was missed at center B due to a sequencing gap (no technical coverage). The custom enrichment at center B was optimized during this study; for example, the average number of missing bases was reduced by a factor of four (vers. 1: 1939.41, vers. 4: 506.01 bp). There were no sequencing gaps at center A, but four CCDS exons were not included in the enrichment. Pathogenic mutations were found in 12.10 % (75/620) of all patients: 4.84 % (30/620) in BRCA1, 4.35 % in BRCA2 (27/620), 0.97 % in CHEK2 (6/620), 0.65 % in ATM (4/620), 0.48 % in CDH1 (3/620), 0.32 % in PALB2 (2/620), 0.32 % in NBN (2/620), and 0.16 % in TP53 (1/620). NGS diagnostics for HBOC-related genes is robust, cost effective, and the method of choice for genetic testing in large cohorts. Adding 8 genes to standard BRCA1- and BRCA2-testing increased the mutation detection rate by one-third. PMID:26022348

  18. B5r gene based sequence analysis of Indian buffalopox virus isolates in relation to other orthopoxviruses.

    PubMed

    Singh, R K; Balamurugan, V; Hosamani, M; DE, U K; Chandra, B M; Krishnappa M P, G

    2007-01-01

    We determined complete nucleotide sequence of B5R gene homologue of Vaccinia virus (VACV) in five Buffalopox virus (BPXV) isolates of Indian origin. The obtained sequences were compared with themselves and with corresponding sequences of the other orthopoxviruses. Sequence analysis revealed 99.799.8% and 99.499.7% identities among the BPXV isolates for B5R gene at the nucleotide and amino acid levels, respectively. Sequence identities of B5R gene between BPXV and VACV isolates (98.199.7%) or other orthopoxviruses (95.699.2%) showed highly conserved nature of this protein and a closer relationship of BPXV isolates to VACV than to other orthopoxviruses.

  19. Sequence heterogeneity, multiplicity, and genomic organization of. cap alpha. - and. beta. -tubulin genes in Sea Urchins

    SciTech Connect

    Alexandraki, D.; Ruderman, J.V.

    1981-12-01

    The authors analyzed the multiplicity, heterogeneity, and organization of the genes encoding the ..cap alpha.. and ..beta.. tubulins in the sea urchin Lytechinus pictus by using cloned complementary deoxyribonucleic acid (cDNA) and genomic tubulin sequences. cDNA clones were constructed by using immature spermatogenic testis polyadenylic acid-containing ribonucleic acid as a template. ..cap alpha.. and ..beta..-tubulin clones were identified by hybrid selection and in vitro translation of the corresponding messenger ribonucleic acids, followed by immunoprecipitation and two-dimensional gel electrophoresis of the translation products. The ..cap alpha.. cDNA clone contains a sequence that encodes the 48 C-terminal amino acids of ..cap alpha.. tubulin and 104 base pairs of the 3' nontranslated portion of the messenger ribonucleic acid. The ..beta.. cDNA insertion contains the coding sequence for the 100 C-terminal amino acids of ..beta.. tubulin and 83 base pairs of the 3' noncoding sequence. Hybrid selections performed at different criteria demonstrated the presence of several heterogeneous, closely related tubulin messenger ribonucleic acids, suggesting the existence of heterogeneous ..cap alpha..- and ..beta..-tubulin genes. Hybridization analyses indicated that there are at least 9 to 13 sequences for each of the two tubulin gene families per haploid genome. Hybridization of the cDNA probes to both total genomic DNA and cloned germline DNA fragments gave no evidence for close physical linkage of ..cap alpha..-tubulin genes with ..beta..-tubulin genes at the DNA level. In contrast, these experiments indicated that some genes within the same family are clustered.

  20. Clinical Evaluation of a Multiple-Gene Sequencing Panel for Hereditary Cancer Risk Assessment

    PubMed Central

    Kurian, Allison W.; Hare, Emily E.; Mills, Meredith A.; Kingham, Kerry E.; McPherson, Lisa; Whittemore, Alice S.; McGuire, Valerie; Ladabaum, Uri; Kobayashi, Yuya; Lincoln, Stephen E.; Cargill, Michele; Ford, James M.

    2014-01-01

    Purpose Multiple-gene sequencing is entering practice, but its clinical value is unknown. We evaluated the performance of a customized germline-DNA sequencing panel for cancer-risk assessment in a representative clinical sample. Methods Patients referred for clinical BRCA1/2 testing from 2002 to 2012 were invited to donate a research blood sample. Samples were frozen at −80° C, and DNA was extracted from them after 1 to 10 years. The entire coding region, exon-intron boundaries, and all known pathogenic variants in other regions were sequenced for 42 genes that had cancer risk associations. Potentially actionable results were disclosed to participants. Results In total, 198 women participated in the study: 174 had breast cancer and 57 carried germline BRCA1/2 mutations. BRCA1/2 analysis was fully concordant with prior testing. Sixteen pathogenic variants were identified in ATM, BLM, CDH1, CDKN2A, MUTYH, MLH1, NBN, PRSS1, and SLX4 among 141 women without BRCA1/2 mutations. Fourteen participants carried 15 pathogenic variants, warranting a possible change in care; they were invited for targeted screening recommendations, enabling early detection and removal of a tubular adenoma by colonoscopy. Participants carried an average of 2.1 variants of uncertain significance among 42 genes. Conclusion Among women testing negative for BRCA1/2 mutations, multiple-gene sequencing identified 16 potentially pathogenic mutations in other genes (11.4%; 95% CI, 7.0% to 17.7%), of which 15 (10.6%; 95% CI, 6.5% to 16.9%) prompted consideration of a change in care, enabling early detection of a precancerous colon polyp. Additional studies are required to quantify the penetrance of identified mutations and determine clinical utility. However, these results suggest that multiple-gene sequencing may benefit appropriately selected patients. PMID:24733792

  1. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    PubMed

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence.

  2. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    PubMed

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence. PMID:27525940

  3. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.

    PubMed

    Scolnick, Jonathan A; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C; Amorese, Douglas A

    2015-01-01

    Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells. PMID:26132974

  4. Nucleotide sequence of the Syrian hamster intracisternal A-particle gene: close evolutionary relationship of type A particle gene to types B and D oncovirus genes.

    PubMed

    Ono, M; Toh, H; Miyata, T; Awaya, T

    1985-08-01

    We determined the complete nucleotide sequence of the intracisternal A-particle gene, IAP-H18, cloned from the normal Syrian hamster liver DNA. IAP-H18 was 7,951 base pairs in length with two identical long terminal repeats of 376 base pairs at both ends. On the coding strand, imperfect open reading frames corresponding to gag and pol of the retrovirus genome were observed, whereas many stop codons were present in the region corresponding to env. The putative H18 gag gene (809 amino acids) had a sequence homologous to the N-terminal half of the mouse mammary tumor virus gag gene and locally to the Rous sarcoma virus gag gene. The putative H18 pol gene (900 residues) was homologous to the Rous sarcoma virus pol gene almost throughout the entire region. Two conserved regions among the retrovirus pol genes have been reported. One presumably corresponds to the DNA polymerase and the RNase H domain, and the other corresponds to the DNA endonuclease domain of the multifunctional protein pol. By the comparison of the deduced amino acid sequences of the putative endonuclease domain of six representative oncovirus genomes, a phylogenetic tree of the oncovirus genomes was constructed, and the intracisternal A-particle (type A) genome was found to be more closely related to the mouse mammary tumor virus (type B) and squirrel monkey retrovirus (type D) genomes.

  5. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  6. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes.

    PubMed

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-12-10

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant-pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future.

  7. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides.

  8. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes

    PubMed Central

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-01-01

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant–pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future. PMID:26690414

  9. Origin of a novel protein-coding gene family with similar signal sequence in Schistosoma japonicum

    PubMed Central

    2012-01-01

    Background Evolution of novel protein-coding genes is the bedrock of adaptive evolution. Recently, we identified six protein-coding genes with similar signal sequence from Schistosoma japonicum egg stage mRNA using signal sequence trap (SST). To find the mechanism underlying the origination of these genes with similar core promoter regions and signal sequence, we adopted an integrated approach utilizing whole genome, transcriptome and proteome database BLAST queries, other bioinformatics tools, and molecular analyses. Results Our data, in combination with database analyses showed evidences of expression of these genes both at the mRNA and protein levels exclusively in all developmental stages of S. japonicum. The signal sequence motif was identified in 27 distinct S. japonicum UniGene entries with multiple mRNA transcripts, and in 34 genome contigs distributed within 18 scaffolds with evidence of genome-wide dispersion. No homolog of these genes or similar domain was found in deposited data from any other organism. We observed preponderance of flanking repetitive elements (REs), albeit partial copies, especially of the RTE-like and Perere class at either side of the duplication source locus. The role of REs as major mediators of DNA-level recombination leading to dispersive duplication is discussed with evidence from our analyses. We also identified a stepwise pathway towards functional selection in evolving genes by alternative splicing. Equally, the possible transcription models of some protein-coding representatives of the duplicons are presented with evidence of expression in vitro. Conclusion Our findings contribute to the accumulating evidence of the role of REs in the generation of evolutionary novelties in organisms’ genomes. PMID:22716200

  10. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification.

    PubMed

    Li, Cong-Jun; Li, Robert W; Baldwin, Ransom L; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  11. XY female with a dysgerminoma and no mutation in the coding sequence of the SRY gene.

    PubMed

    Morerio, Cristina; Calvari, Vladimiro; Rosanda, Cristina; Porta, Simona; Gambini, Claudio; Panarello, Claudio

    2002-07-01

    We report a 46,XY 11-year-old girl with pure gonadal dysgenesis who developed a dysgerminoma. The testis-determining gene SRY, a candidate for sex reversal, whose alterations seem to correlate with dysgerminoma, was analyzed and found to be normal; its coding sequence was negative for deletions and mutations. DMRT-1 gene mapping on 9p and DAX-1 on Xp21 were also normal. These results suggest the involvement of other genes in sex reversal and call into question the putative relationship between SRY alterations and dysgerminoma.

  12. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification

    PubMed Central

    Li, Cong-Jun; Li, Robert W.; Baldwin, Ransom L.; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  13. What is, mutatis mutandis, the sequence of plasmid DNAs used in gene therapy?

    PubMed

    Ratel, David; Wion, Didier

    2003-05-01

    Mutation is a fundamental biological process occurring in each living organism. Plasmid DNA which is used in gene therapy protocols or DNA vaccination passes through two different living cells which are, respectively, the producing cell (bacterial) and the target cell (eukaryotic). Hence, modifications in the nucleotide sequence of plasmids are likely to occur both in bacteria during the amplification step of plasmid DNA and in eukaryotic cells following gene transfer. In addition to these biological modifications resulting from the physical passage of the plasmid into two different living organisms, an additional source of sequence alteration resides in our mode of representation of the nucleotide sequence of plasmid DNA which uses a four letters code, whereas, bacterial DNA is made of six different nucleosides. Indeed, the therapeutic DNA paradigm seems to have neglected the qualitative importance of these DNA sequence alterations. In this review we discuss the importance and the role of these DNA sequence modifications in the context of non-viral gene therapy approaches. PMID:12710908

  14. A multi gene sequence-based phylogeny of the Musaceae (banana) family

    PubMed Central

    2011-01-01

    Background The classification of the Musaceae (banana) family species and their phylogenetic inter-relationships remain controversial, in part due to limited nucleotide information to complement the morphological and physiological characters. In this work the evolutionary relationships within the Musaceae family were studied using 13 species and DNA sequences obtained from a set of 19 unlinked nuclear genes. Results The 19 gene sequences represented a sample of ~16 kb of genome sequence (~73% intronic). The sequence data were also used to obtain estimates for the divergence times of the Musaceae genera and Musa sections. Nucleotide variation within the sample confirmed the close relationship of Australimusa and Callimusa sections and showed that Eumusa and Rhodochlamys sections are not reciprocally monophyletic, which supports the previous claims for the merger between the two latter sections. Divergence time analysis supported the previous dating of the Musaceae crown age to the Cretaceous/Tertiary boundary (~ 69 Mya), and the evolution of Musa to ~50 Mya. The first estimates for the divergence times of the four Musa sections were also obtained. Conclusions The gene sequence-based phylogeny presented here provides a substantial insight into the course of speciation within the Musaceae. An understanding of the main phylogenetic relationships between banana species will help to fine-tune the taxonomy of Musaceae. PMID:21496296

  15. Mitochondrial DNA sequence and gene organization in the [corrected] Australian blacklip [corrected] abalone Haliotis rubra (leach).

    PubMed

    Maynard, Ben T; Kerr, Lyndal J; McKiernan, Joanne M; Jansen, Eliza S; Hanna, Peter J

    2005-01-01

    The complete mitochondrial DNA of the blacklip abalone Haliotis rubra (Gastropoda: Mollusca) was cloned and 16,907 base pairs were sequenced. The sequence represents an estimated 99.85% of the mitochondrial genome, and contains 2 ribosomal RNA, 22 transfer RNA, and 13 protein-coding genes found in other metazoan mtDNA. An AT tandem repeat and a possible C-rich domain within the putative control region could not be fully sequenced. The H. rubra mtDNA gene order is novel for mollusks, separated from the black chiton Katharina tunicata by the individual translocations of 3 tRNAs. Compared with other mtDNA regions, sequences from the ATP8, NAD2, NAD4L, NAD6, and 12S rRNA genes, as well as the control region, are the most variable among representatives from Mollusca, Arthropoda, and Rhynchonelliformea, with similar mtDNA arrangements to H. rubra. These sequences are being evaluated as genetic markers within commercially important Haliotis species, and some applications and considerations for their use are discussed. PMID:16206015

  16. Genome-wide discovery of cis-elements in promoter sequences using gene expression.

    PubMed

    Troukhan, Maxim; Tatarinova, Tatiana; Bouck, John; Flavell, Richard B; Alexandrov, Nickolai N

    2009-04-01

    The availability of complete or nearly complete genome sequences, a large number of 5' expressed sequence tags, and significant public expression data allow for a more accurate identification of cis-elements regulating gene expression. We have implemented a global approach that takes advantage of available expression data, genomic sequences, and transcript information to predict cis-elements associated with specific expression patterns. The key components of our approach are: (1) precise identification of transcription start sites, (2) specific locations of cis-elements relative to the transcription start site, and (3) assessment of statistical significance for all sequence motifs. By applying our method to promoters of Arabidopsis thaliana and Mus musculus, we have identified motifs that affect gene expression under specific environmental conditions or in certain tissues. We also found that the presence of the TATA box is associated with increased variability of gene expression. Strong correlation between our results and experimentally determined motifs shows that the method is capable of predicting new functionally important cis-elements in promoter sequences. PMID:19231992

  17. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  18. EST sequencing and gene expression profiling of cultivated peanut (Arachis hypogaea L.).

    PubMed

    Bi, Yu-Ping; Liu, Wei; Xia, Han; Su, Lei; Zhao, Chuan-Zhi; Wan, Shu-Bo; Wang, Xing-Jun

    2010-10-01

    Peanut (Arachis hypogaea L.) is one of the most important oil crops in the world. However, biotechnological based improvement of peanut is far behind many other crops. It is critical and urgent to establish the biotechnological platform for peanut germplasm innovation. In this study, a peanut seed cDNA library was constructed to establish the biotechnological platform for peanut germplasm innovation. About 17,000 expressed sequence tags (ESTs) were sequenced and used for further investigation. Among which, 12.5% were annotated as metabolic related and 4.6% encoded transcription or post-transcription factors. ESTs encoding storage protein and enzymes related to protein degradation accounted for 28.8% and formed the largest group of the annotated ESTs. ESTs that encoded stress responsive proteins and pathogen-related proteins accounted for 5.6%. ESTs that encoded unknown proteins or showed no hit in the GenBank nr database accounted for 20.1% and 13.9%, respectively. A total number of 5066 EST sequences were selected to make a cDNA microarray. Expression analysis revealed that these sequences showed diverse expression patterns in peanut seeds, leaves, stems, roots, flowers, and gynophores. We also analyzed the gene expression pattern during seed development. Genes that were upregulated (≥twofold) at 15, 25, 35, and 45 days after pegging (DAP) were found and compared with 70 DAP. The potential value of these genes and their promoters in the peanut gene engineering study is discussed.

  19. DNA sequences that activate isocitrate lyase gene expression during late embryogenesis and during postgerminative growth.

    PubMed Central

    Zhang, J Z; Santes, C M; Engel, M L; Gasser, C S; Harada, J J

    1996-01-01

    We analyzed DNA sequences that regulate the expression of an isocitrate lyase gene from Brassica napus L. during late embryogenesis and during postgerminative growth to determine whether glyoxysomal function is induced by a common mechanism at different developmental stages. beta-Glucuronidase constructs were used both in transient expression assays in B. napus and in transgenic Arabidopsis thaliana to identify the segments of the isocitrate lyase 5' flanking region that influence promoter activity. DNA sequences that play the principal role in activating the promoter during post-germinative growth are located more than 1,200 bp upstream of the gene. Distinct DNA sequences that were sufficient for high-level expression during late embryogenesis but only low-level expression during postgerminative growth were also identified. Other parts of the 5' flanking region increased promoter activity both in developing seed and in seedlings. We conclude that a combination of elements is involved in regulating the isocitrate lyase gene and that distinct DNA sequences play primary roles in activating the gene in embryos and in seedlings. These findings suggest that different signals contribute to the induction of glyoxysomal function during these two developmental stages. We also showed that some of the constructs were expressed differently in transient expression assays and in transgenic plants. PMID:8934622

  20. Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.

    PubMed

    Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad

    2016-09-01

    Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. PMID:27307293

  1. Identification of recurrent NAB2-STAT6 gene fusions in solitary fibrous tumor by integrative sequencing.

    PubMed

    Robinson, Dan R; Wu, Yi-Mi; Kalyana-Sundaram, Shanker; Cao, Xuhong; Lonigro, Robert J; Sung, Yun-Shao; Chen, Chun-Liang; Zhang, Lei; Wang, Rui; Su, Fengyun; Iyer, Matthew K; Roychowdhury, Sameek; Siddiqui, Javed; Pienta, Kenneth J; Kunju, Lakshmi P; Talpaz, Moshe; Mosquera, Juan Miguel; Singer, Samuel; Schuetze, Scott M; Antonescu, Cristina R; Chinnaiyan, Arul M

    2013-02-01

    A 44-year old woman with recurrent solitary fibrous tumor (SFT)/hemangiopericytoma was enrolled in a clinical sequencing program including whole-exome and transcriptome sequencing. A gene fusion of the transcriptional repressor NAB2 with the transcriptional activator STAT6 was detected. Transcriptome sequencing of 27 additional SFTs identified the presence of a NAB2-STAT6 gene fusion in all tumors. Using RT-PCR and sequencing, we detected this fusion in all 51 SFTs, indicating high levels of recurrence. Expression of NAB2-STAT6 fusion proteins was confirmed in SFT, and the predicted fusion products harbor the early growth response (EGR)-binding domain of NAB2 fused to the activation domain of STAT6. Overexpression of the NAB2-STAT6 gene fusion induced proliferation in cultured cells and activated the expression of EGR-responsive genes. These studies establish NAB2-STAT6 as the defining driver mutation of SFT and provide an example of how neoplasia can be initiated by converting a transcriptional repressor of mitogenic pathways into a transcriptional activator.

  2. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  3. A sunflower helianthinin gene upstream sequence ensemble contains an enhancer and sites of nuclear protein interaction.

    PubMed Central

    Jordano, J; Almoguera, C; Thomas, T L

    1989-01-01

    Genes encoding helianthinin, the major seed protein in sunflower, are highly regulated. We have identified putative cis-acting and trans-acting elements that may function in the control of helianthinin expression. A 404-base pair DNA fragment on the sunflower helianthinin gene HaG3D, located 322 base pairs from the transcriptional start site, enhanced beta-glucuronidase expression in transgenic tobacco embryos. Sequences within this fragment were found to bind nuclear proteins present in both sunflower embryo and hypocotyl nuclear extracts. The binding site was localized by phenanthroline-copper ion footprinting experiments to A/T-rich sequences located from -705 to -654. Binding competition experiments revealed that these sunflower proteins also bind to upstream promoter sequences from another helianthinin gene (HaG3A) and two other plant embryo-specific genes, carrot DcG3 and French bean phaseolin. However, sequences of the cauliflower mosaic virus 35S promoter/enhancer complex failed to compete for its binding. Phenanthroline-copper ion footprinting experiments showed that the binding sites for the sunflower proteins in HaG3A (-1463 to -1514 and -702 to -653) and in phaseolin (-671 to -627) are also very A/T-rich, have similar sizes, and are located at similar distances from their respective promoters. PMID:2535527

  4. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp.

  5. Next-generation sequencing of 28 ALS-related genes in a Japanese ALS cohort.

    PubMed

    Nakamura, Ryoichi; Sone, Jun; Atsuta, Naoki; Tohnai, Genki; Watanabe, Hazuki; Yokoi, Daichi; Nakatochi, Masahiro; Watanabe, Hirohisa; Ito, Mizuki; Senda, Jo; Katsuno, Masahisa; Tanaka, Fumiaki; Li, Yuanzhe; Izumi, Yuishin; Morita, Mitsuya; Taniguchi, Akira; Kano, Osamu; Oda, Masaya; Kuwabara, Satoshi; Abe, Koji; Aiba, Ikuko; Okamoto, Koichi; Mizoguchi, Kouichi; Hasegawa, Kazuko; Aoki, Masashi; Hattori, Nobutaka; Tsuji, Shoji; Nakashima, Kenji; Kaji, Ryuji; Sobue, Gen

    2016-03-01

    We investigated the frequency and contribution of variants of the 28 known amyotrophic lateral sclerosis (ALS)-related genes in Japanese ALS patients. We designed a multiplex, polymerase chain reaction-based primer panel to amplify the coding regions of the 28 ALS-related genes and sequenced DNA samples from 257 Japanese ALS patients using an Ion Torrent PGM sequencer. We also performed exome sequencing and identified variants of the 28 genes in an additional 251 ALS patients using an Illumina HiSeq 2000 platform. We identified the known ALS pathogenic variants and predicted the functional properties of novel nonsynonymous variants in silico. These variants were confirmed by Sanger sequencing. Known pathogenic variants were identified in 19 (48.7%) of the 39 familial ALS patients and 14 (3.0%) of the 469 sporadic ALS patients. Thirty-two sporadic ALS patients (6.8%) harbored 1 or 2 novel nonsynonymous variants of ALS-related genes that might be deleterious. This study reports the first extensive genetic screening of Japanese ALS patients. These findings are useful for developing genetic screening and counseling strategies for such patients.

  6. Identification of Legionella pneumophila serogroups and other Legionella species by mip gene sequencing.

    PubMed

    Haroon, Attiya; Koide, Michio; Higa, Futoshi; Tateyama, Masao; Fujita, Jiro

    2012-04-01

    The virulence factor known as the macrophage infectivity potentiator (mip) is responsible for the intracellular survival of Legionella species. In this study, we investigated the potential of the mip gene sequence to differentiate isolates of different species of Legionella and different serogroups of Legionella pneumophila. We used 35 clinical L. pneumophila isolates and one clinical isolate each of Legionella micdadei, Legionella longbeachae, and Legionella dumoffii (collected from hospitals all over Japan between 1980 and 2007). We used 19 environmental Legionella anisa isolates (collected in the Okinawa, Nara, Osaka, and Hyogo prefectures between 1987 and 2007) and two Legionella type strains. We extracted bacterial genomic DNA and amplified out the mip gene by PCR. PCR products were purified by agarose gel electrophoresis and the mip gene was then sequenced. The L. pneumophila isolates could be divided into two groups: one group was very similar to the type strain and was composed of serogroup (SG) 1 isolates only; the second group had more sequence variations and was composed of SG1 isolates as well as SG2, SG3, SG5, and SG10 isolates. Phylogenetic analysis displayed one cluster for L. anisa isolates, while other Legionella species were present at discrete levels. Our findings show that mip gene sequencing is an effective technique for differentiating L. pneumophila strains from other Legionella species.

  7. Molecular cloning of extensive sequences of the in vitro synthesized chicken ovalbumin structural gene.

    PubMed Central

    Humphries, P; Cochet, M; Krust, A; Gerlinger, P; Kourilsky, P; Chambon, P

    1977-01-01

    Double-stranded DNA molecules complementary to ovalbumin chicken messenger RNA were synthesized in vitro and integrated into the E. coli plasmid pCR1 using an oligodG-dc tailing procedure. The resultant hybrid plasmids, amplified by transfection of E. coli, were shown by hybridization and gel electrophoresis to contain extensive DNA sequences of the ovalbumin structural gene. Images PMID:333389

  8. Prosthetic joint infection due to Lysobacter thermophilus diagnosed by 16S rRNA gene sequencing.

    PubMed

    Dhawan, B; Sebastian, S; Malhotra, R; Kapil, A; Gautam, D

    2016-01-01

    We report the first case of prosthetic joint infection caused by Lysobacter thermophilus which was identified by 16S rRNA gene sequencing. Removal of prosthesis followed by antibiotic treatment resulted in good clinical outcome. This case illustrates the use of molecular diagnostics to detect uncommon organisms in suspected prosthetic infections.

  9. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp. PMID:23898551

  10. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes.

    PubMed

    Yu, Jia-Feng; Chen, Qing-Li; Ren, Jing; Yang, Yan-Ling; Wang, Ji-Hua; Sun, Xiao

    2015-07-01

    The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.

  11. ZBTB20 is a sequence-specific transcriptional repressor of alpha-fetoprotein gene.

    PubMed

    Zhang, Hai; Cao, Dongmei; Zhou, Luting; Zhang, Ye; Guo, Xiaoqin; Li, Hui; Chen, Yuxia; Spear, Brett T; Wu, Jia-Wei; Xie, Zhifang; Zhang, Weiping J

    2015-07-15

    Alpha-fetoprotein (AFP) represents a classical model system to study developmental gene regulation in mammalian cells. We previously reported that liver ZBTB20 is developmentally regulated and plays a central role in AFP postnatal repression. Here we show that ZBTB20 is a sequence-specific transcriptional repressor of AFP. By ELISA-based DNA-protein binding assay and conventional gel shift assay, we successfully identified a ZBTB20-binding site at -104/-86 of mouse AFP gene, flanked by two HNF1 sites and two C/EBP sites in the proximal promoter. Importantly, mutation of the core sequence in this site fully abolished its binding to ZBTB20 in vitro, as well as the repression of AFP promoter activity by ZBTB20. The unique ZBTB20 site was highly conserved in rat and human AFP genes, but absent in albumin genes. These help to explain the autonomous regulation of albumin and AFP genes in the liver after birth. Furthermore, we demonstrated that transcriptional repression of AFP gene by ZBTB20 was liver-specific. ZBTB20 was dispensable for AFP silencing in other tissues outside liver. Our data define a cognate ZBTB20 site in AFP promoter which mediates the postnatal repression of AFP gene in the liver.

  12. Sequence analysis of the prion protein gene in Mongolian gazelles (Procapra gutturosa).

    PubMed

    Wang, Yiqin; Qin, Zhenkui; Bao, Yonggan; Qiao, Junwen; Yang, Lifeng; Zhao, Deming

    2009-10-01

    Prion diseases are a group of human and animal neurodegenerative conditions, which are caused by the deposition of an abnormal isoform prion protein (PrPSc) encoded by a single copy prion protein gene (Prnp). In sheep, genetic variations of Prnp were found to be associated with the incubation period, susceptibility, and species barrier to the scrapie disease. We investigated the sequence and polymorphisms of the prion protein gene of Mongolian gazelles (gPrnp). gPrnp gene sequence analysis of blood samples from 26 Mongolian gazelles showed high identity within species. The gPrnp gene was closely related to the Prnp genes of Thomson’s gazelle, blackbuck, and cattle with 100, 100, and 98.5% identity, respectively, whereas the gPrnp gene with a deletion was closely related to the Prnp genes of wildebeest, Western roe deer, and sheep with 99.3, 99.3, and 98.9% identity, respectively. Polymorphisms of the open reading frame of Prnp as amino acid substitutions were detected at codons 119(N --> S), 143(S --> G) or 160(Y --> H), 172(V --> A), 182(N --> S) and 221(V --> A). There was also deletion of one octapeptide repeat at the N-terminal octapeptide repeat region. The polymorphisms of gPrnp will assist the study of prion disease pathogenesis, resistance, and cross species transmission. PMID:19579063

  13. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method.

  14. The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications

    PubMed Central

    2005-01-01

    Background Rice is an important staple food and, with the smallest cereal genome, serves as a reference species for studies on the evolution of cereals and other grasses. Therefore, decoding its entire genome will be a prerequisite for applied and basic research on this species and all other cereals. Results We have determined and analyzed the complete sequences of two of its chromosomes, 11 and 12, which total 55.9 Mb (14.3% of the entire genome length), based on a set of overlapping clones. A total of 5,993 non-transposable element related genes are present on these chromosomes. Among them are 289 disease resistance-like and 28 defense-response genes, a higher proportion of these categories than on any other rice chromosome. A three-Mb segment on both chromosomes resulted from a duplication 7.7 million years ago (mya), the most recent large-scale duplication in the rice genome. Paralogous gene copies within this segmental duplication can be aligned with genomic assemblies from sorghum and maize. Although these gene copies are preserved on both chromosomes, their expression patterns have diverged. When the gene order of rice chromosomes 11 and 12 was compared to wheat gene loci, significant synteny between these orthologous regions was detected, illustrating the presence of conserved genes alternating with recently evolved genes. Conclusion Because the resistance and defense response genes, enriched on these chromosomes relative to the whole genome, also occur in clusters, they provide a preferred target for breeding durable disease resistance in rice and the isolation of their allelic variants. The recent duplication of a large chromosomal segment coupled with the high density of disease resistance gene clusters makes this the most recently evolved part of the rice genome. Based on syntenic alignments of these chromosomes, rice chromosome 11 and 12 do not appear to have resulted from a single whole-genome duplication event as previously suggested. PMID:16188032

  15. Resolution of the African hominoid trichotomy by use of a mitochondrial gene sequence

    SciTech Connect

    Ruvolo, M.; Disotell, T.R.; Allard, M.W. ); Brown, W.M. ); Honeycutt, R.L. )

    1991-02-15

    Mitochondrial DNA sequences encoding the cytochrome oxidase subunit II gene have been determined for five primate species, siamang (Hylobates syndactylus), lowland gorilla (Gorilla gorilla), pygmy chimpanzee (Pan paniscus), crab-eating macaque (Macaca fascicularis), and green monkey (Cercopithecus aethiops), and compared with published sequences of other primate and nonprimate species. Comparisons of cytochrome oxidase subunit II gene sequences provide clear-cut evidence from the mitochondrial genome for the separation of the African ape trichotomy into two evolutionary lineages, one leading to gorillas and the other to humans and chimpanzees. Several different tree-building methods support this same phylogenetic tree topology. The comparisons also yield trees in which a substantial length separates the divergence point of gorillas from that of humans and chimpanzees, suggesting that the lineage most immediately ancestral to humans and chimpanzees may have been in existence for a relatively long time.

  16. Case-only exome sequencing and complex disease susceptibility gene discovery: study design considerations.

    PubMed

    Wu, Lang; Schaid, Daniel J; Sicotte, Hugues; Wieben, Eric D; Li, Hu; Petersen, Gloria M

    2015-01-01

    Whole exome sequencing (WES) provides an unprecedented opportunity to identify the potential aetiological role of rare functional variants in human complex diseases. Large-scale collaborations have generated germline WES data on patients with a number of diseases, especially cancer, but less often on healthy controls under the same sequencing procedures. These data can be a valuable resource for identifying new disease susceptibility loci if study designs are appropriately applied. This review describes suggested strategies and technical considerations when focusing on case-only study designs that use WES data in complex disease scenarios. These include variant filtering based on frequency and functionality, gene prioritisation, interrogation of different data types and targeted sequencing validation. We propose that if case-only WES designs were applied in an appropriate manner, new susceptibility genes containing rare variants for human complex diseases can be detected.

  17. Identification of the promoter sequences involved in the cell specific expression of the rat somatostatin gene.

    PubMed Central

    Andrisani, O M; Hayes, T E; Roos, B; Dixon, J E

    1987-01-01

    DNA sequences containing the 5' flanking region of the rat somatostatin gene were linked to the coding sequence of the bacterial chloramphenicol acetyl transferase gene. This recombinant plasmid is active in expressing CAT activity in the neuronally derived, somatostatin producing CA-77 cell line. Deletion analyses of the somatostatin promoter show that the sequences proximal to position -60, relative to the cap site are required for expression of this promoter. A 4 base pair deletion of residues -46 through -43 within the somatostatin promoter results in a down mutation in vivo suggesting the existence of an element critical for the expression of the promoter in CA-77 cells. In addition, the somatostatin recombinant and its 5' deletion constructs preferentially express CAT activity in CA-77 cells, whereas only basal level of expression is observed in HeLa, BSC40, and RIN-5F cell lines, pointing to the cell specific nature of this promoter. Images PMID:2886975

  18. Virtual metagenome reconstruction from 16S rRNA gene sequences.

    PubMed

    Okuda, Shujiro; Tsuchiya, Yuki; Kiriyama, Chiho; Itoh, Masumi; Morisaki, Hisao

    2012-01-01

    Microbial ecologists have investigated roles of species richness and diversity in a wide variety of ecosystems. Recently, metagenomics have been developed to measure functions in ecosystems, but this approach is cost-intensive. Here we describe a novel method for the rapid and efficient reconstruction of a virtual metagenome in environmental microbial communities without using large-scale genomic sequencing. We demonstrate this approach using 16S rRNA gene sequences obtained from denaturing gradient gel electrophoresis analysis, mapped to fully sequenced genomes, to reconstruct virtual metagenome-like organizations. Furthermore, we validate a virtual metagenome using a published metagenome for cocoa bean fermentation samples, and show that metagenomes reconstructed from biofilm formation samples allow for the study of the gene pool dynamics that are necessary for biofilm growth.

  19. The crux and crust of ebolavirus: Analysis of genome sequences and glycoprotein gene.

    PubMed

    Mahale, Kiran Narasinha; Patole, Milind S

    2015-08-01

    The recent 2013-15 epidemic of Ebola virus disease (EVD) has initiated extensive sequencing and analysis of ebolavirus genomes. All ebolavirus genomes available until December 2014 have been collated and analyzed in this study to obtain phylogenetic relationship and uncover the variations amongst them. The terminal 'leader' and 'trailer' nucleotide sequences of the genomes were omitted and analysis of the intermediate region accommodating the sole seven genes (hepta-CDS region) of the virus showed relative stability of the genome, including the ones isolated from the current epidemic. The genome information was scrutinized to detect the variation in the surface glycoprotein gene and annotate its three protein products, resulting from its atypical transcription. This study will make an easy understanding of the genomes for those who desire to exploit the genome sequences for different investigations in EVD. PMID:26051281

  20. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    PubMed

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  1. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  2. Identification of a novel salt tolerance gene in wild soybean by whole-genome sequencing.

    PubMed

    Qi, Xinpeng; Li, Man-Wah; Xie, Min; Liu, Xin; Ni, Meng; Shao, Guihua; Song, Chi; Kay-Yuen Yim, Aldrin; Tao, Ye; Wong, Fuk-Ling; Isobe, Sachiko; Wong, Chi-Fai; Wong, Kwong-Sen; Xu, Chunyan; Li, Chunqing; Wang, Ying; Guan, Rui; Sun, Fengming; Fan, Guangyi; Xiao, Zhixia; Zhou, Feng; Phang, Tsui-Hung; Liu, Xuan; Tong, Suk-Wah; Chan, Ting-Fung; Yiu, Siu-Ming; Tabata, Satoshi; Wang, Jian; Xu, Xun; Lam, Hon-Ming

    2014-07-09

    Using a whole-genome-sequencing approach to explore germplasm resources can serve as an important strategy for crop improvement, especially in investigating wild accessions that may contain useful genetic resources that have been lost during the domestication process. Here we sequence and assemble a draft genome of wild soybean and construct a recombinant inbred population for genotyping-by-sequencing and phenotypic analyses to identify multiple QTLs relevant to traits of interest in agriculture. We use a combination of de novo sequencing data from this work and our previous germplasm re-sequencing data to identify a novel ion transporter gene, GmCHX1, and relate its sequence alterations to salt tolerance. Rapid gain-of-function tests show the protective effects of GmCHX1 towards salt stress. This combination of whole-genome de novo sequencing, high-density-marker QTL mapping by re-sequencing and functional analyses can serve as an effective strategy to unveil novel genomic information in wild soybean to facilitate crop improvement.

  3. Multilocus sequence typing and virulence gene profiles associated with Escherichia coli from human and animal sources.

    PubMed

    Manges, Amee R; Harel, Josée; Masson, Luke; Edens, Thaddeus J; Portt, Andrea; Reid-Smith, Richard J; Zhanel, George G; Kropinski, Andrew M; Boerlin, Patrick

    2015-04-01

    We investigated whether specific sequence types, and their shared virulence gene profiles, may be associated with both human and food animal reservoirs. A total of 600 Escherichia coli isolates were assembled from human (n=265) and food-animal (n=335) sources from overlapping geographic areas and time periods (2005-2010) in Canada. The entire collection was subjected to multilocus sequence typing and a subset of 286 E. coli isolates was subjected to an E. coli-specific virulence gene microarray. The most common sequence type (ST) was E. coli ST10, which was present in all human and food-animal sources, followed by ST69, ST73, ST95, ST117, and ST131. A core group of virulence genes was associated with all 10 common STs including artJ, ycfZ, csgA, csgE, fimA, fimH, gad, hlyE, ibeB, mviM, mviN, and ompA. STs 73, 92, and 95 exhibited the largest number of virulence genes, and all were exclusively identified from human infections. ST117 was found in both human and food-animal sources and shared virulence genes common in extraintestinal pathogenic E. coli lineages. Select groups of E. coli may be found in both human and food-animal reservoirs.

  4. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes.

    PubMed

    Soh, Y Q Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G; Graves, Tina; Minx, Patrick J; Fulton, Robert S; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L; Rozen, Steve; Hughes, Jennifer F; Owens, Elaine; Womack, James E; Murphy, William J; Cao, Qing; de Jong, Pieter; Warren, Wesley C; Wilson, Richard K; Skaletsky, Helen; Page, David C

    2014-11-01

    We sequenced the MSY (male-specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only 2% of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 45 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism.

  5. Cloning, sequencing, and characterization of the Azospirillum brasilense fhuE gene.

    PubMed

    Cui, Yanhua; Tu, Ran; Guan, Yue; Ma, Luyan; Chen, Sanfeng

    2006-03-01

    The fhuE gene of Escherichia coli encodes the FhuE protein, which is a receptor protein in the coprogen-mediated siderophore iron-transport system. A fhuE gene homologue from Azospirillum brasilense, a nitrogen-fixing soil bacterium that lives in association with the roots of cereal grasses, was cloned, sequenced, and characterized. The A. brasilense fhuE encodes a protein of 802 amino acids with a predicted molecular weight of approximately 87 kDa. The deduced amino-acid sequence showed a high level of homology to the sequences of all the known fhuE gene products. The fhuE mutant was sensitive to iron starvation and defective in coprogen-mediated iron uptake. The mutant failed to express one membrane protein of approximately 78 kDa that was induced by iron starvation in the wild type. Complementation studies showed that the A. brasilense fhuE gene, when present on a low-copy number plasmid, could restore the functions of the mutant. Mutation in fhuE gene did not affect nitrogen fixation.

  6. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes

    PubMed Central

    Soh, Y.Q. Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G.; Graves, Tina; Minx, Patrick J.; Fulton, Robert S.; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L.; Rozen, Steve; Hughes, Jennifer F.; Owens, Elaine; Womack, James E.; Murphy, William J.; Cao, Qing; de Jong, Pieter; Warren, Wesley C.; Wilson, Richard K.; Skaletsky, Helen; Page, David C.

    2014-01-01

    Summary We sequenced the MSY (Male-Specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only two percent of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 50 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs, but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism. PMID:25417157

  7. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    DOE PAGES

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less

  8. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    SciTech Connect

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  9. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes.

    PubMed

    Aziz, Ramy K; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A

    2015-01-01

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  10. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  11. An ancient repeat sequence in the ATP synthase beta-subunit gene of forcipulate sea stars.

    PubMed

    Foltz, David W

    2007-11-01

    A novel repeat sequence with a conserved secondary structure is described from two nonadjacent introns of the ATP synthase beta-subunit gene in sea stars of the order Forcipulatida (Echinodermata: Asteroidea). The repeat is present in both introns of all forcipulate sea stars examined, which suggests that it is an ancient feature of this gene (with an approximate age of 200 Mya). Both stem and loop regions show high levels of sequence constraint when compared to flanking nonrepetitive intronic regions. The repeat was also detected in (1) the family Pterasteridae, order Velatida and (2) the family Korethrasteridae, order Velatida. The repeat was not detected in (1) the family Echinasteridae, order Spinulosida, (2) the family Astropectinidae, order Paxillosida, (3) the family Solasteridae, order Velatida, or (4) the family Goniasteridae, order Valvatida. The repeat lacks similarity to published sequences in unrestricted GenBank searches, and there are no significant open reading frames in the repeat or in the flanking intron sequences. Comparison via parametric bootstrapping to a published phylogeny based on 4.2 kb of nuclear and mitochondrial sequence for a subset of these species allowed the null hypothesis of a congruent phylogeny to be rejected for each repeat, when compared separately to the published phylogeny. In contrast, the flanking nonrepetitive sequences in each intron yielded separate phylogenies that were each congruent with the published phylogeny. In four species, the repeat in one or both introns has apparently experienced gene conversion. The two introns also show a correlated pattern of nucleotide substitutions, even after excluding the putative cases of gene conversion.

  12. Escherichia coli purB gene: cloning, nucleotide sequence, and regulation by purR.

    PubMed

    He, B; Smith, J M; Zalkin, H

    1992-01-01

    Escherichia coli purB encodes adenylosuccinate lyase (ASL), the enzyme that catalyzes step 8 in the pathway for de novo synthesis of IMP and also the final reaction in the two-step sequence from IMP to AMP. Gene purB was cloned and found to encode an ASL protein of 435 amino acids having a calculated molecular weight of 49,225. E. coli ASL is homologous to the corresponding enzymes from Bacillus subtilis and chickens and also to fumarase from B. subtilis. Gene phoP is 232 bp downstream of purB. Gene purB is regulated threefold by the purine pool and purR. Transcriptional regulation of purB involves binding of the purine repressor to the 16-bp conserved pur regulon operator. The purB operator is 224 bp downstream of the transcription start site and overlaps codons 62 to 67 in the protein-coding sequence.

  13. A tool kit for quantifying eukaryotic rRNA gene sequences from human microbiome samples.

    PubMed

    Dollive, Serena; Peterfreund, Gregory L; Sherrill-Mix, Scott; Bittinger, Kyle; Sinha, Rohini; Hoffmann, Christian; Nabel, Christopher S; Hill, David A; Artis, David; Bachman, Michael A; Custers-Allen, Rebecca; Grunberg, Stephanie; Wu, Gary D; Lewis, James D; Bushman, Frederic D

    2012-07-03

    Eukaryotic microorganisms are important but understudied components of the human microbiome. Here we present a pipeline for analysis of deep sequencing data on single cell eukaryotes. We designed a new 18S rRNA gene-specific PCR primer set and compared a published rRNA gene internal transcribed spacer (ITS) gene primer set. Amplicons were tested against 24 specimens from defined eukaryotes and eight well-characterized human stool samples. A software pipeline https://sourceforge.net/projects/brocc/ was developed for taxonomic attribution, validated against simulated data, and tested on pyrosequence data. This study provides a well-characterized tool kit for sequence-based enumeration of eukaryotic organisms in human microbiome samples.

  14. Stable intronic sequence RNAs (sisRNAs): a new layer of gene regulation.

    PubMed

    Osman, Ismail; Tay, Mandy Li-Ian; Pek, Jun Wei

    2016-09-01

    Upon splicing, introns are rapidly degraded. Hence, RNAs derived from introns are commonly deemed as junk sequences. However, the discoveries of intronic-derived small nucleolar RNAs (snoRNAs), small Cajal body associated RNAs (scaRNAs) and microRNAs (miRNAs) suggested otherwise. These non-coding RNAs are shown to play various roles in gene regulation. In this review, we highlight another class of intron-derived RNAs known as stable intronic sequence RNAs (sisRNAs). sisRNAs have been observed since the 1980 s; however, we are only beginning to understand their biological significance. Recent studies have shown or suggested that sisRNAs regulate their own host's gene expression, function as molecular sinks or sponges, and regulate protein translation. We propose that sisRNAs function as an additional layer of gene regulation in the cells. PMID:27147469

  15. Nucleotide sequence of the regulatory locus controlling expression of bacterial genes for bioluminescence.

    PubMed Central

    Engebrecht, J; Silverman, M

    1987-01-01

    Production of light by the marine bacterium Vibrio fischeri and by recombinant hosts containing cloned lux genes is controlled by the density of the culture. Density-dependent regulation of lux gene expression has been shown to require a locus consisting of the luxR and luxI genes and two closely linked divergent promoters. As part of a genetic analysis to understand the regulation of bioluminescence, we have sequenced the region of DNA containing this control circuit. Open reading frames corresponding to luxR and luxI were identified; transcription start sites were defined by S1 nuclease mapping and sequences resembling promoter elements were located. Images PMID:3697093

  16. Cloning and sequencing of an ice nucleation active gene of Erwinia uredovora.

    PubMed

    Michigami, Y; Watabe, S; Abe, K; Obata, H; Arai, S

    1994-04-01

    An ice nucleation activity gene, named inaU, of the bacterium Erwinia uredovora KUIN-3 has been sequenced. This gene encodes a protein of 1034 amino acid residues, and its expression product, inaU protein, has an 832-amino acid residue segment consisting of 52 repeats of closely related 16-amino acid motifs (R-domain), flanked by N- and C-terminal sequences (N- and C-domains, respectively). The primary structure of the inaU protein is similar to those of the inaA, inaW, and inaZ gene products of Erwinia ananas, Pseudomonas fluorescens, and Pseudomonas syringae, respectively, but is smaller than any of these products in terms of the size of the R-domain. PMID:7764866

  17. Genome sequence of the phage-gene rich marine Phaeobacter arcticus type strain DSM 23566T

    PubMed Central

    Freese, Heike M.; Dalingault, Hajnalka; Petersen, Jörn; Pradella, Silke; Davenport, Karen; Teshima, Hazuki; Chen, Amy; Pati, Amrita; Ivanova, Natalia; Goodwin, Lynne A.; Chain, Patrick; Detter, John C.; Rohde, Manfred; Gronow, Sabine; Kyrpides, Nikos C.; Woyke, Tanja; Brinkhoff, Thorsten; Göker, Markus; Overmann, Jörg; Klenk, Hans-Peter

    2013-01-01

    Phaeobacter arcticus Zhang et al. 2008 belongs to the marine Roseobacter clade whose members are phylogenetically and physiologically diverse. In contrast to the type species of this genus, Phaeobacter gallaeciensis, which is well characterized, relatively little is known about the characteristics of P. arcticus. Here, we describe the features of this organism including the annotated high-quality draft genome sequence and highlight some particular traits. The 5,049,232 bp long genome with its 4,828 protein-coding and 81 RNA genes consists of one chromosome and five extrachromosomal elements. Prophage sequences identified via PHAST constitute nearly 5% of the bacterial chromosome and included a potential Mu-like phage as well as a gene-transfer agent (GTA). In addition, the genome of strain DSM 23566T encodes all of the genes necessary for assimilatory nitrate reduction. Phylogenetic analysis and intergenomic distances indicate that the classification of the species might need to be reconsidered. PMID:24501630

  18. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

    PubMed Central

    Cirulli, Elizabeth T.; Lasseigne, Brittany N.; Petrovski, Slavé; Sapp, Peter C.; Dion, Patrick A.; Leblond, Claire S.; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J.; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E.; Boone, Braden E.; Wimbish, Jack R.; Waite, Lindsay L.; Jones, Angela L.; Carulli, John P.; Day-Williams, Aaron G.; Staropoli, John F.; Xin, Winnie W.; Chesi, Alessandra; Raphael, Alya R.; McKenna-Yasek, Diane; Cady, Janet; de Jong, J.M.B. Vianney; Kenna, Kevin P.; Smith, Bradley N.; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H.; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E.; Baloh, Robert H.; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M.; Gibson, Summer; Trojanowski, John Q.; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A.; Chung, Wendy K.; Ravits, John M.; Glass, Jonathan D.; Sims, Katherine B.; Van Deerlin, Vivianna M.; Maniatis, Tom; Hayes, Sebastian D.; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S.; Bedlack, Richard S.; Harper, J. Wade; Gitler, Aaron D.; Rouleau, Guy A.; Brown, Robert; Harms, Matthew B.; Cooper, Gregory M.; Harris, Tim; Myers, Richard M.; Goldstein, David B.

    2015-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. Here we report the results of a moderate-scale sequencing study aimed at identifying new genes contributing to predisposition for ALS. We performed whole exome sequencing of 2,874 ALS patients and compared them to 6,405 controls. Several known ALS genes were found to be associated, and the non-canonical IκB kinase family TANK-Binding Kinase 1 (TBK1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. PMID:25700176

  19. Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins.

    PubMed

    Sillitoe, Ian; Lewis, Tony; Orengo, Christine

    2015-01-01

    The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH-Gene3D Web pages, the downloadable data files, and the remotely accessible Web services.

  20. Exome sequencing of extended families with autism reveals genes shared across neurodevelopmental and neuropsychiatric disorders

    PubMed Central

    2014-01-01

    Background Autism spectrum disorders (ASDs) comprise a range of neurodevelopmental conditions of varying severity, characterized by marked qualitative difficulties in social relatedness, communication, and behavior. Despite overwhelming evidence of high heritability, results from genetic studies to date show that ASD etiology is extremely heterogeneous and only a fraction of autism genes have been discovered. Methods To help unravel this genetic complexity, we performed whole exome sequencing on 100 ASD individuals from 40 families with multiple distantly related affected individuals. All families contained a minimum of one pair of ASD cousins. Each individual was captured with the Agilent SureSelect Human All Exon kit, sequenced on the Illumina Hiseq 2000, and the resulting data processed and annotated with Burrows-Wheeler Aligner (BWA), Genome Analysis Toolkit (GATK), and SeattleSeq. Genotyping information on each family was utilized in order to determine genomic regions that were identical by descent (IBD). Variants identified by exome sequencing which occurred in IBD regions and present in all affected individuals within each family were then evaluated to determine which may potentially be disease related. Nucleotide alterations that were novel and rare (minor allele frequency, MAF, less than 0.05) and predicted to be detrimental, either by altering amino acids or splicing patterns, were prioritized. Results We identified numerous potentially damaging, ASD associated risk variants in genes previously unrelated to autism. A subset of these genes has been implicated in other neurobehavioral disorders including depression (SLIT3), epilepsy (CLCN2, PRICKLE1), intellectual disability (AP4M1), schizophrenia (WDR60), and Tourette syndrome (OFCC1). Additional alterations were found in previously reported autism candidate genes, including three genes with alterations in multiple families (CEP290, CSMD1, FAT1, and STXBP5). Compiling a list of ASD candidate genes from the

  1. DNA sequence heterogeneity of Campylobacter jejuni CJIE4 prophages and expression of prophage genes.

    PubMed

    Clark, Clifford G; Chong, Patrick M; McCorrister, Stuart J; Mabon, Philip; Walker, Matthew; Westmacott, Garrett R

    2014-01-01

    Campylobacter jejuni carry temperate bacteriophages that can affect the biology or virulence of the host bacterium. Known effects include genomic rearrangements and resistance to DNA transformation. C. jejuni prophage CJIE1 shows sequence variability and variability in the content of morons. Homologs of the CJIE1 prophage enhance both adherence and invasion to cells in culture and increase the expression of a specific subset of bacterial genes. Other C. jejuni temperate phages have so far not been well characterized. In this study we describe investigations into the DNA sequence variability and protein expression in a second prophage, CJIE4. CJIE4 sequences were obtained de novo from DNA sequencing of five C. jejuni isolates, as well as from whole genome sequences submitted to GenBank by other research groups. These CJIE4 DNA sequences were heterogenous, with several different insertions/deletions (indels) in different parts of the prophage genome. Two variants of a 3-4 kb region inserted within CJIE4 had different gene content that distinguished two major conserved CJIE4 prophage families. Additional indels were detected throughout the prophage. Detection of proteins in the five isolates characterized in our laboratory in isobaric Tags for Relative and Absolute Quantitation (iTRAQ) experiments indicated that prophage proteins within each of the two large indel variants were expressed during growth of the bacteria on Mueller Hinton agar plates. These proteins included the extracellular DNase associated with resistance to DNA transformation and prophage repressor proteins. Other proteins associated with known or suspected roles in prophage biology were also expressed from CJIE4, including capsid protein, the phage integrase, and MazF, a type II toxin-antitoxin system protein. Together with the results previously obtained for the CJIE1 prophage these results demonstrate that sequence variability and expression of moron genes are both general properties of temperate

  2. Driver Gene Mutations in Stools of Colorectal Carcinoma Patients Detected by Targeted Next-Generation Sequencing.

    PubMed

    Armengol, Gemma; Sarhadi, Virinder K; Ghanbari, Reza; Doghaei-Moghaddam, Masoud; Ansari, Reza; Sotoudeh, Masoud; Puolakkainen, Pauli; Kokkola, Arto; Malekzadeh, Reza; Knuutila, Sakari

    2016-07-01

    Detection of driver gene mutations in stool DNA represents a promising noninvasive approach for screening colorectal cancer (CRC). Amplicon-based next-generation sequencing (NGS) is a good option to study mutations in many cancer genes simultaneously and from a low amount of DNA. Our aim was to assess the feasibility of identifying mutations in 22 cancer driver genes with Ion Torrent technology in stool DNA from a series of 65 CRC patients. The assay was successful in 80% of stool DNA samples. NGS results showed 83 mutations in cancer driver genes, 29 hotspot and 54 novel mutations. One to five genes were mutated in 75% of cases. TP53, KRAS, FBXW7, and SMAD4 were the top mutated genes, consistent with previous studies. Of samples with mutations, 54% presented concomitant mutations in different genes. Phosphatidylinositol 3-kinase/mitogen-activated protein kinase pathway genes were mutated in 70% of samples, with 58% having alterations in KRAS, NRAS, or BRAF. Because mutations in these genes can compromise the efficacy of epidermal growth factor receptor blockade in CRC patients, identifying mutations that confer resistance to some targeted treatments may be useful to guide therapeutic decisions. In conclusion, the data presented herein show that NGS procedures on stool DNA represent a promising tool to detect genetic mutations that could be used in the future for diagnosis, monitoring, or treating CRC. PMID:27155048

  3. Cloning, sequencing, and expression of the gene for NADH-sensitive citrate synthase of Pseudomonas aeruginosa.

    PubMed Central

    Donald, L J; Molgat, G F; Duckworth, H W

    1989-01-01

    The structural gene for the allosteric citrate synthase of Pseudomonas aeruginosa has been cloned from a genomic library by using the Escherichia coli citrate synthase gene as a hybridization probe under conditions of reduced stringency. Subcloning of portions of the original 10-kilobase-pair (kbp) clone led to isolation of the structural gene, with its promoter, within a 2,083-bp length of DNA flanked by sites for KpnI and BamHI. The nucleotide sequence of this fragment is presented; the inferred amino acid sequence was 70 and 76% identical, respectively, with the citrate synthase sequences from E. coli and Acinetobacter anitratum, two other gram-negative bacteria. DEAE-cellulose chromatography of P. aeruginosa citrate synthase from an E. coli host harboring the cloned P. aeruginosa gene gave three peaks of activity. All three enzyme peaks had subunit molecular weights of 48,000; the proteins were identical by immunological criteria and very similar in kinetics of substrate saturation and NADH inhibition. Because the cloned gene contained only one open reading frame large enough to encode a polypeptide of such a size, the three peaks must represent different forms of the same protein. A portion of the cloned P. aeruginosa gene was used as a hybridization probe under stringent conditions to identify highly homologous sequences in genomic DNA of a second strain classified as P. aeruginosa and isolates of P. putida, P. stutzeri, and P. alcaligenes. When crude extracts of each of these four isolates were mixed with antiserum raised against purified P. aeruginosa citrate synthase, however, only the P. alcaligenes extract cross-reacted. Images PMID:2507528

  4. MicroRNA-373 induces expression of genes with complementary promoter sequences.

    PubMed

    Place, Robert F; Li, Long-Cheng; Pookot, Deepa; Noonan, Emily J; Dahiya, Rajvir

    2008-02-01

    Recent studies have shown that microRNA (miRNA) regulates gene expression by repressing translation or directing sequence-specific degradation of complementary mRNA. Here, we report new evidence in which miRNA may also function to induce gene expression. By scanning gene promoters in silico for sequences complementary to known miRNAs, we identified a putative miR-373 target site in the promoter of E-cadherin. Transfection of miR-373 and its precursor hairpin RNA (pre-miR-373) into PC-3 cells readily induced E-cadherin expression. Knockdown experiments confirmed that induction of E-cadherin by pre-miR-373 required the miRNA maturation protein Dicer. Further analysis revealed that cold-shock domain-containing protein C2 (CSDC2), which possesses a putative miR-373 target site within its promoter, was also readily induced in response to miR-373 and pre-miR-373. Furthermore, enrichment of RNA polymerase II was detected at both E-cadherin and CSDC2 promoters after miR-373 transfection. Mismatch mutations to miR-373 indicated that gene induction was specific to the miR-373 sequence. Transfection of promoter-specific dsRNAs revealed that the concurrent induction of E-cadherin and CSDC2 by miR-373 required the miRNA target sites in both promoters. In conclusion, we have identified a miRNA that targets promoter sequences and induces gene expression. These findings reveal a new mode by which miRNAs may regulate gene expression.

  5. Targeted next generation sequencing of clinically significant gene mutations and translocations in leukemia.

    PubMed

    Duncavage, Eric J; Abel, Haley J; Szankasi, Philippe; Kelley, Todd W; Pfeifer, John D

    2012-06-01

    Leukemias are currently subclassified based on the presence of recurrent cytogenetic abnormalities and gene mutations. These molecular findings are the basis for risk-adapted therapy; however, such data are generally obtained by disparate methods in the clinical laboratory, and often rely on low-resolution techniques such as fluorescent in situ hybridization. Using targeted next generation sequencing, we demonstrate that the full spectrum of prognostically significant gene mutations including translocations, single nucleotide variants (SNVs), and insertions/deletions (indels) can be identified simultaneously in multiplexed sequence data. As proof of concept, we performed hybrid capture using a panel of 20 genes implicated in leukemia prognosis (covering a total of 1 Mbp) from five leukemia cell lines including K562, NB4, OCI-AML3, kasumi-1, and MV4-11. Captured DNA was then sequenced in multiplex on an Illumina HiSeq. Using an analysis pipeline based on freely available software we correctly identified DNA-level translocations in three of the three cell lines where translocations were covered by our capture probes. Furthermore, we found all published gene mutations in commonly tested genes including NPM1, FLT3, and KIT. The same methodology was applied to DNA extracted from the bone marrow of a patient with acute myeloid leukemia, and identified a t(9;11) translocation with single base accuracy as well other gene mutations. These results indicate that targeted next generation sequencing can be successfully applied in the clinical laboratory to identify a full spectrum of DNA mutations ranging from SNVs and indels to translocations. Such methods have the potential to both greatly streamline and improve the accuracy of DNA-based diagnostics.

  6. Exome sequencing identifies potential novel candidate genes in patients with unexplained colorectal adenomatous polyposis.

    PubMed

    Spier, Isabel; Kerick, Martin; Drichel, Dmitriy; Horpaopan, Sukanya; Altmüller, Janine; Laner, Andreas; Holzapfel, Stefanie; Peters, Sophia; Adam, Ronja; Zhao, Bixiao; Becker, Tim; Lifton, Richard P; Holinski-Feder, Elke; Perner, Sven; Thiele, Holger; Nöthen, Markus M; Hoffmann, Per; Timmermann, Bernd; Schweiger, Michal R; Aretz, Stefan

    2016-04-01

    In up to 30% of patients with colorectal adenomatous polyposis, no germline mutation in the known genes APC, causing familial adenomatous polyposis, MUTYH, causing MUTYH-associated polyposis, and POLE or POLD1, causing Polymerase-Proofreading-associated polyposis can be identified, although a hereditary etiology is likely. To uncover new causative genes, exome sequencing was performed using DNA from leukocytes and a total of 12 colorectal adenomas from seven unrelated patients with unexplained sporadic adenomatous polyposis. For data analysis and variant filtering, an established bioinformatics pipeline including in-house tools was applied. Variants were filtered for rare truncating point mutations and copy-number variants assuming a dominant, recessive, or tumor suppressor model of inheritance. Subsequently, targeted sequence analysis of the most promising candidate genes was performed in a validation cohort of 191 unrelated patients. All relevant variants were validated by Sanger sequencing. The analysis of exome sequencing data resulted in the identification of rare loss-of-function germline mutations in three promising candidate genes (DSC2, PIEZO1, ZSWIM7). In the validation cohort, further variants predicted to be pathogenic were identified in DSC2 and PIEZO1. According to the somatic mutation spectra, the adenomas in this patient cohort follow the classical pathways of colorectal tumorigenesis. The present study identified three candidate genes which might represent rare causes for a predisposition to colorectal adenoma formation. Especially PIEZO1 (FAM38A) and ZSWIM7 (SWS1) warrant further exploration. To evaluate the clinical relevance of these genes, investigation of larger patient cohorts and functional studies are required. PMID:26780541

  7. Nucleotide sequence analysis of a candidate gene for ataxia-telangiectasia group D (ATDC)

    SciTech Connect

    Leonhardt, E.A.; Kapp, L.N.; Young, B.R.; Murnane, J.P. )

    1994-01-01

    A radioresistant cell clone (1B3) was previously isolated after transfection of an ataxia-telangiectasia (AT) group D cell line with a human cosmid library. A cosmid rescued from the integration site in 1B3 contained human DNA from chromosome position 11q23, the same region shown by both genetic linkage and chromosome transfer to contain the genes for AT complementation groups A/B, C, and D. A gene within the cosmid (ATDC) was found to produce mRNAs of different sizes. A cDNA for one of the most abundant mRNAs (3.0 kb) was isolated from a HeLa cell library. In the present study, the authors sequenced the 3.0-kb cDNA and the surrounding intron DNA in the cosmids. They used polymerase chain reaction, with primers in the introns, to confirm the number of exons and to analyze DNA from AT group D cells for mutations within this gene. Although no mutations were found, they do not rule out the possibility that mutations may be present within the regulatory sequences or coding sequences found in other mRNAs specific for this gene. From the sequence analysis, they found that the ATDC gene product is one of a group of proteins that share multiple zinc finger motifs and an adjacent leucine zipper motif. These proteins have been proposed to form homo- or hetero-dimers involved in nucleic acid binding, consistent with the fact that many of these proteins appear to be transcriptional regulatory factors involved in carcinogenesis and/or differentiation. The likelihood that the ATDC gene product is involved in transcriptional regulation could explain the pleiomorphic characteristics of AT, including abnormal cell cycle regulation. 36 refs., 5 figs., 2 tabs.

  8. Species identification using genetic tools: the value of nuclear and mitochondrial gene sequences in whale conservation.

    PubMed

    Palumbi, S R; Cipriano, F

    1998-01-01

    DNA sequence analysis is a powerful tool for identifying the source of samples thought to be derived from threatened or endangered species. Analysis of mitochondrial DNA (mtDNA) from retail whale meat markets has shown consistently that the expected baleen whale in these markets, the minke whale, makes up only about half the products analyzed. The other products are either unregulated small toothed whales like dolphins or are protected baleen whales such as humpback, Bryde's, fin, or blue whales. Independent verification of such mtDNA identifications requires analysis of nuclear genetic loci, but this is technically more difficult than standard mtDNA sequencing. In addition, evolution of species-specific sequences (i.e., fixation of sequence differences to produce reciprocally monophyletic gene trees) is slower in nuclear than in mitochondrial genes primarily because genetic drift is slower at nuclear loci. When will use of nuclear sequences allow forensic DNA identification? Comparison of neutral theories of coalescence of mitochondrial and nuclear loci suggests a simple rule of thumb. The "three-times rule" suggests that phylogenetic sorting at nuclear loci is likely to produce species-specific sequences when mitochondrial alleles are reciprocally monophyletic and the branches leading to the mtDNA sequences of a species are three times longer than the average difference observed within species. A preliminary test of the three-times rule, which depends on many assumptions about the species and genes involved, suggests that blue and fin whales should have species-specific sequences at most neutral nuclear loci, whereas humpback and fin whales should show species-specific sequences at fewer nuclear loci. Partial sequences of actin introns from these species confirm the predictions of the three-times rule and show that blue and fin whales are reciprocally monophyletic at this locus. These intron sequences are thus good tools for the identification of these species

  9. Sequence Analysis of the Gene Encoding Amylosucrase from Neisseria polysaccharea and Characterization of the Recombinant Enzyme

    PubMed Central

    Potocki De Montalk, G.; Remaud-Simeon, M.; Willemot, R. M.; Planchot, V.; Monsan, P.

    1999-01-01

    The Neisseria polysaccharea gene encoding amylosucrase was subcloned and expressed in Escherichia coli. Sequencing revealed that the deduced amino acid sequence differs significantly from that previously published. Comparison of the sequence with that of enzymes of the α-amylase family predicted a (β/α)8-barrel domain. Six of the eight highly conserved regions in amylolytic enzymes are present in amylosucrase. Among them, four constitute the active site in α-amylases. These sites were also conserved in the sequence of glucosyltransferases and dextransucrases. Nevertheless, the evolutionary tree does not show strong homology between them. The amylosucrase was purified by affinity chromatography between fusion protein glutathione S-transferase–amylosucrase and glutathione-Sepharose 4B. The pure enzyme linearly elongated some branched chains of glycogen, to an average degree of polymerization of 75. PMID:9882648

  10. Nucleotide sequence of the SrRNA gene and phylogenetic analysis of Trichomonas tenax.

    PubMed

    Fukura, K; Yamamoto, A; Hashimoto, T; Goto, N

    1996-01-01

    The small subunit ribosomal RNA (SrRNA) gene of Trichomonas tenax ATCC30207 was amplified by PCR and the 1.55-kb product was cloned into plasmid vector pUC18. Four clones were isolated and sequenced. The insert DNAs were 1,552 bp long and their G+C contents were 48.1%; three of them had exactly the same DNA sequences and one had only one nucleotide change. A representative SrRNA sequence was analyzed and a phylogenetic tree was estimated by the neighbor-joining (NJ) method. Among the protists examined, T. tenax was placed as the closest relative of Tritrichomonas foetus, as expected from the traditional taxonomy. The total homology between the two SrRNA sequences was 89.2%.

  11. Nucleotide sequence of an immediate-early frog virus 3 gene.

    PubMed

    Willis, D; Foglesong, D; Granoff, A

    1984-12-01

    We have used "gene walking" with synthetic oligonucleotides and M13 dideoxynucleotide sequencing techniques to obtain the complete coding and flanking sequences of the gene encoding a major immediate-early RNA (molecular weight, 169,000) of frog virus 3. R-loop mapping of the cloned XbaI K fragment of frog virus 3 DNA with immediate-early RNA from infected cells showed that an RNA of approximately 500 to 600 nucleotides (the right size to code for the immediate-early viral 18-kilodalton protein of unknown function) hybridized to a region within 100 base pairs of one end of the XbaI K fragment; no evidence for splicing was observed in the electron microscope or by single-strand nuclease analysis. Further restriction mapping narrowed the location of the gene to the XbaI end of a 2-kilobase-pair XbaI-Bg/II fragment, which was bidirectionally subcloned into the bacteriophage pair mp10 and mp11 for sequencing. Mung bean nuclease mapping was used to identify both the 5' and the 3' ends of the mRNA. The 5' end mapped within an AT-rich region 19 base pairs upstream from two in-phase AUG start codons that were immediately followed by an open reading frame of 157 amino acids. Another AT-rich sequence was found at -29 base pairs from the 5' end of the mRNA start site; this sequence may function as a TATA box. The 3' end of the message displayed considerable microheterogeneity, but clearly terminated within a third AT-rich region 50 to 60 base pairs from the translation stop codon. The eucaryotic polyadenylic acid addition signal (AATAAA) was not present, a finding to be expected since frog virus 3 mRNA is not polyadenylated. Both the single-stranded mp10 clone of the XbaI-Bg/II fragment and a 15-base oligonucleotide complementary to the region flanking the two AUG translation start codons inhibited translation of the immediate-early 18-kilodalton protein in vitro, confirming the identity of the sequenced gene. As the regulatory sequences of this gene did not resemble those of

  12. ORF57 Overcomes the Detrimental Sequence Bias of Kaposi's Sarcoma-Associated Herpesvirus Lytic Genes

    PubMed Central

    Vogt, Carolin; Hackmann, Christian; Rabner, Alona; Koste, Lars; Santag, Susann; Kati, Semra; Mandel-Gutfreund, Yael; Schulz, Thomas F.

    2015-01-01

    ABSTRACT Kaposi's sarcoma-associated herpesvirus (KSHV) encodes ORF57, which enhances the expression of intronless KSHV genes on multiple posttranscriptional levels. However, it remains elusive how ORF57 recognizes viral RNAs. Here, we demonstrate that ORF57 also increases the expression of the multiple intron-containing K15 gene. The nucleotide bias of the K15 cDNA revealed an unusual high AT content. Thus, we optimized the K15 cDNA by raising the frequency of GC nucleotides, yielding an ORF57-independent version. To further prove the importance of the sequence bias of ORF57-dependent RNAs, we grouped KSHV mRNAs according to their AT content and found a correlation between AT-richness and ORF57 dependency. More importantly, latent genes, which have to be expressed in the absence of ORF57, have a low AT content and are indeed ORF57 independent. The nucleotide composition of K15 resembles that of HIV gag, which cannot be expressed unless RNA export is facilitated by the HIV Rev protein. Interestingly, ORF57 can partially rescue HIV Gag expression. Thus, the KSHV target RNAs of ORF57 and HIV gag RNA may share certain motifs based on the nucleotide bias. A bioinformatic comparison between wild-type and sequence-optimized K15 revealed a higher density for hnRNP-binding motifs in the former. We speculate that binding of particular hnRNPs to KSHV lytic transcripts is the prerequisite for ORF57 to enhance their expression. IMPORTANCE The mostly intronless genes of KSHV are only expressed in the presence of the viral regulator protein ORF57, but how ORF57 recognizes viral RNAs remains elusive. We focused on the multiple intron-containing KSHV gene K15 and revealed that its expression is also increased by ORF57. Moreover, sequences in the K15 cDNA mediate this enhancement. The quest for a target sequence or a response element for ORF57 in the lytic genes was not successful. Instead, we found the nucleotide bias to be the critical determinant of ORF57 dependency. Based on

  13. The nucleotide sequence of an equine herpesvirus 4 gene homologue of the herpes simplex virus 1 glycoprotein H gene.

    PubMed

    Nicolson, L; Cullinane, A A; Onions, D E

    1990-08-01

    The equine herpesvirus 4 (EHV-4) gene glycoprotein H (gH) gene homologue was localized by virtue of the conserved genomic position of this gene throughout members of the herpesvirus family. The gene maps immediately downstream of the thymidine kinase gene at approximately 0.49 to 0.51 map units within genomic fragment BamH1 C. The EHV-4 gH primary translation product is predicted to be a polypeptide of Mr 94,100, 855 amino acids long, which possesses features characteristic of a membrane glycoprotein, namely an N-terminal signal sequence, a large hydrophilic domain containing 11 putative N-linked glycosylation sites, a C-terminal transmembrane domain, and a charged cytoplasmic tail. Comparison to other herpesvirus glycoproteins revealed identities of 85%, 26% and 32% with the gH counterparts of the alphaherpesviruses EHV-1, herpes simplex virus 1 and varicella-zoster virus, respectively, and of 17% and 18% with those of human cytomegalovirus, herpesvirus saimiri and Epstein-Barr virus. The EHV-4 gH exhibits features previously reported to be conserved throughout the gH polypeptides of herpesviruses of all three subgroups. A region of direct repeat elements and a possible origin of DNA replication are located immediately downstream of the gH gene. PMID:2167933

  14. Characterization and sequence analysis of the human homeobox-containing gene GBX2

    SciTech Connect

    Lin, Xu; Vaccarino, F.M.; Haas, M.

    1996-02-01

    Polymerase chain reaction (PCR) was used to amplify portions of homeobox genes present in a human 11-week fetal brain cDNA library. One of these PCR products was determined by sequencing to be the Gastrulation and brain specific-2 gene (GBX2). Screening this human fetal brain cDNA library with probes specific for GBX2 led to the identification of a 2151-bp clone encodes for a protein of 347 amino acid residues. The amino acid sequence of the GBX2 homeodomain is identical (100%) to the that of homologous gene, Gbx2, expressed in the developing mouse embryo and virtually identical (97%) to a gene expressed in the developing chicken embryo, CHox7. The 5{prime} end of the GBX2 gene contains a CpG island in the untranslated region and a trinucleotide (CCG){sub 8} repeat in the coding region. The amino-terminal end of the GBX2 protein is proline-rich, with 30 proline residues in one stretch of 120 by Northern analysis in the developing human CNS as well as in other tissues. The human genomic clone for GBX2 was also isolated, characterized, and mapped to 2q36(d)-q37 by somatic cell hybrid analysis and fluorescence in situ hybridization. These studies provide a framework for designing future experiments that are needed to determine the functional significance of this gene in CNS development. 38 refs., 4 figs.

  15. Challenges in identifying cancer genes by analysis of exome sequencing data

    PubMed Central

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F.; Bandyopadhyay, Sourav; Mischel, Paul S.; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13–60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  16. Challenges in identifying cancer genes by analysis of exome sequencing data.

    PubMed

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F; Bandyopadhyay, Sourav; Mischel, Paul S; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13-60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  17. Cloning and nucleotide sequence of the hemA gene of Agrobacterium radiobacter.

    PubMed

    Drolet, M; Sasarman, A

    1991-04-01

    The hemA gene of Agrobacterium radiobacter ATCC4718 was identified by hybridization with a hemA probe from Rhizobium meliloti and cloned by complementation of a hemA mutant of Escherichia coli K12. E. coli hemA transformants carrying the hemA gene of Agrobacterium showed delta-aminolevulinic acid synthetase (delta-ALAS) activity in vitro. The hemA gene was carried on a 4.4 kb EcoRI fragment which could be reduced to a 2.6 kb EcoRI-SstI fragment without affecting its complementing or delta-ALAS activity. The sequence of the hemA gene showed an open reading frame of 1215 nucleotides, which could code for a protein of 44,361 Da. This is very close to the molecular weight of the HemA protein obtained using an in vitro coupled transcription-translation system (45,000 Da). Comparison of amino acid sequences of the delta-ALAS of A. radiobacter and Bradyrhizobium japonicum showed strong homology between the two enzymes; less, but still significant, homology was observed when A. radiobacter and human delta-ALAS were compared. Primer extension experiments enabled us to identify two promoters for the hemA gene of A. radiobacter. One of these promoters shows some similarity to the first promoter of the hemA gene of R. meliloti.

  18. Network‐Informed Gene Ranking Tackles Genetic Heterogeneity in Exome‐Sequencing Studies of Monogenic Disease

    PubMed Central

    Schulz, Reiner; Weale, Michael E.; Southgate, Laura; Oakey, Rebecca J.; Simpson, Michael A.; Schlitt, Thomas

    2015-01-01

    ABSTRACT Genetic heterogeneity presents a significant challenge for the identification of monogenic disease genes. Whole‐exome sequencing generates a large number of candidate disease‐causing variants and typical analyses rely on deleterious variants being observed in the same gene across several unrelated affected individuals. This is less likely to occur for genetically heterogeneous diseases, making more advanced analysis methods necessary. To address this need, we present HetRank, a flexible gene‐ranking method that incorporates interaction network data. We first show that different genes underlying the same monogenic disease are frequently connected in protein interaction networks. This motivates the central premise of HetRank: those genes carrying potentially pathogenic variants and whose network neighbors do so in other affected individuals are strong candidates for follow‐up study. By simulating 1,000 exome sequencing studies (20,000 exomes in total), we model varying degrees of genetic heterogeneity and show that HetRank consistently prioritizes more disease‐causing genes than existing analysis methods. We also demonstrate a proof‐of‐principle application of the method to prioritize genes causing Adams‐Oliver syndrome, a genetically heterogeneous rare disease. An implementation of HetRank in R is available via the Website http://sourceforge.net/p/hetrank/. PMID:26394720

  19. Whole-exome sequencing identifies rare pathogenic variants in new predisposition genes for familial colorectal cancer

    PubMed Central

    Esteban-Jurado, Clara; Vila-Casadesús, Maria; Garre, Pilar; Lozano, Juan José; Pristoupilova, Anna; Beltran, Sergi; Muñoz, Jenifer; Ocaña, Teresa; Balaguer, Francesc; López-Cerón, Maria; Cuatrecasas, Miriam; Franch-Expósito, Sebastià; Piqué, Josep M.; Castells, Antoni; Carracedo, Angel; Ruiz-Ponte, Clara; Abulí, Anna; Bessa, Xavier; Andreu, Montserrat; Bujanda, Luis; Caldés, Trinidad; Castellví-Bel, Sergi

    2015-01-01

    Purpose: Colorectal cancer is an important cause of mortality in the developed world. Hereditary forms are due to germ-line mutations in APC, MUTYH, and the mismatch repair genes, but many cases present familial aggregation but an unknown inherited cause. The hypothesis of rare high-penetrance mutations in new genes is a likely explanation for the underlying predisposition in some of these familial cases. Methods: Exome sequencing was performed in 43 patients with colorectal cancer from 29 families with strong disease aggregation without mutations in known hereditary colorectal cancer genes. Data analysis selected only very rare variants (0–0.1%), producing a putative loss of function and located in genes with a role compatible with cancer. Variants in genes previously involved in hereditary colorectal cancer or nearby previous colorectal cancer genome-wide association study hits were also chosen. Results: Twenty-eight final candidate variants were selected and validated by Sanger sequencing. Correct family segregation and somatic studies were used to categorize the most interesting variants in CDKN1B, XRCC4, EPHX1, NFKBIZ, SMARCA4, and BARD1. Conclusion: We identified new potential colorectal cancer predisposition variants in genes that have a role in cancer predisposition and are involved in DNA repair and the cell cycle, which supports their putative involvement in germ-line predisposition to this neoplasm. PMID:25058500

  20. Major Soybean Maturity Gene Haplotypes Revealed by SNPViz Analysis of 72 Sequenced Soybean Genomes

    PubMed Central

    Langewisch, Tiffany; Zhang, Hongxin; Vincent, Ryan; Joshi, Trupti; Xu, Dong; Bilyeu, Kristin

    2014-01-01

    In this Genomics Era, vast amounts of next-generation sequencing data have become publicly available for multiple genomes across hundreds of species. Analyses of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset and among different datasets or organisms. To facilitate the exploration of allelic variation and diversity, we have developed and deployed an in-house computer software to categorize and visualize these haplotypes. The SNPViz software enables users to analyze region-specific haplotypes from single nucleotide polymorphism (SNP) datasets for different sequenced genomes. The examination of allelic variation and diversity of important soybean [Glycine max (L.) Merr.] flowering time and maturity genes may provide additional insight into flowering time regulation and enhance researchers' ability to target soybean breeding for particular environments. For this study, we utilized two available soybean genomic datasets for a total of 72 soybean genotypes encompassing cultivars, landraces, and the wild species Glycine soja. The major soybean maturity genes E1, E2, E3, and E4 along with the Dt1 gene for plant growth architecture were analyzed in an effort to determine the number of major haplotypes for each gene, to evaluate the consistency of the haplotypes with characterized variant alleles, and to identify evidence of artificial selection. The results indicated classification of a small number of predominant haplogroups for each gene and important insights into possible allelic diversity for each gene within the context of known causative mutations. The software has both a stand-alone and web-based version and can be used to analyze other genes, examine additional soybean datasets, and view similar genome sequence and SNP datasets from other species. PMID:24727730

  1. Transcriptome Sequencing and Gene Expression Analysis of Trichoderma brevicompactum under Different Culture Conditions

    PubMed Central

    Shentu, Xu-Ping; Liu, Wei-Ping; Zhan, Xiao-Huan; Xu, Yi-Peng; Xu, Jian-Feng; Yu, Xiao-Ping; Zhang, Chuan-Xi

    2014-01-01

    Background Trichoderma brevicompactum is the Trichoderma species producing simple trichothecenes-trichodermin, a potential antifungal antibiotic and a protein synthesis inhibitor. However, the biosynthetic pathway of trichodermin in Trichoderma is not completely clarified. Therefore, transcriptome and gene expression profiling data for this species are needed as an important resource to better understand the mechanism of the trichodermin biosynthesis and provide a blueprint for further study of T. brevicompactum. Results In this study, de novo assembly of the T. brevicompactum transcriptome using the short-read sequencing technology (Illumina) was performed. In addition, two digital gene expression (DGE) libraries of T. brevicompactum under the trichodermin-producing and trichodermin-nonproducing culture conditions, respectively, were constructed to identify the differences in gene expression. A total of 23,351 unique transcripts with a mean length of 856 bp were obtained by a new Trinity de novo assembler. The variations of the gene expression under different culture conditions were also identified. The expression profiling data revealed that 3,282 unique transcripts had a significantly differential expression under the trichodermin-producing condition, as compared to the trichodermin-nonproducing condition. This study provides a large amount of transcript sequence data that will contribute to the study of the trichodermin biosynthesis in T. brevicompactum. Furthermore, quantitative real-time PCR (qRT-PCR) was found to be useful to confirm the differential expression of the unique transcripts. Conclusion Our study provides considerable gene expression information of T. brevicompactum at the transcriptional level,which will help accelerate the research on the trichodermin biosynthesis. Additionally, we have demonstrated the feasibility of using the Illumina sequencing based DGE system for gene expression profiling, and have shed new light on functional studies of

  2. Targeted enrichment of the black cottonwood (Populus trichocarpa) gene space using sequence capture

    PubMed Central

    2012-01-01

    Background High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions. Results We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or ‘poplar’). A total of 20.76Mb (5%) of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end), 86.8% of the bait regions were on average sequenced at a depth ≥10X. Few off-target regions (>250bp away from any bait) were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (≥10X) to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection. Conclusions Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies. PMID:23241106

  3. Identification of planarian homeobox sequences indicates the antiquity of most Hox/homeotic gene subclasses.

    PubMed Central

    Balavoine, G; Telford, M J

    1995-01-01

    The homeotic gene complex (HOM-C) is a cluster of genes involved in the anteroposterior axial patterning of animal embryos. It is composed of homeobox genes belonging to the Hox/HOM superclass. Originally discovered in Drosophila, Hox/HOM genes have been identified in organisms as distantly related as arthropods, vertebrates, nematodes, and cnidarians. Data obtained in parallel from the organization of the complex, the domains of gene expression during embryogenesis, and phylogenetic relationships allow the subdivision of the Hox/HOM superclass into five classes (lab, pb/Hox3, Dfd, Antp, and Abd-B) that appeared early during metazoan evolution. We describe a search for homologues of these genes in platyhelminths, triploblast metazoans emerging as an outgroup to the great coelomate ensemble. A degenerate PCR screening for Hox/HOM homeoboxes in three species of triclad planarians has revealed 10 types of Antennapedia-like genes. The homeobox-containing sequences of these PCR fragments allowed the amplification of the homeobox-coding exons for five of these genes in the species Polycelis nigra. A phylogenetic analysis shows that two genes are clear orthologues of Drosophila labial, four others are members of a Dfd/Antp superclass, and a seventh gene, although more difficult to classify with certainty, may be related to the pb/Hox3 class. Together with previously identified Hox/HOM genes in other flatworms, our analyses demonstrate the existence of an elaborate family of Hox/HOM genes in the ancestor of all triploblast animals. Images Fig. 4 PMID:7638172

  4. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  5. Sequence analysis and expression of the human parainfluenza type 1 virus nucleoprotein gene.

    PubMed

    Matsuoka, Y; Ray, R

    1991-03-01

    The nucleotide sequence of the human parainfluenza type 1 (Pl1) virus nucleoprotein (NP) gene was determined from cDNA clones of the mRNA. A cDNA clone, 4-31, containing a 1.7-kb insert, was identified as a Pl1 NP-specific clone by partial nucleotide sequence analysis and sequence comparison with Sendai virus (SV) NP. Using a vaccinia virus transient expression system, a polypeptide with electrophoretic mobility similar to that of Pl1 NP was synthesized from this clone, which reacted specifically with both polyclonal and monoclonal antibodies against Pl1 virus and Pl1 NP, respectively. The complete nucleotide sequence of this clone was determined and was found to contain a single open reading frame that can encode a protein of 524 amino acids with a predicted molecular weight of 57,547. Comparison of the amino acid sequence of Pl1 NP with that of other paramyxoviruses showed that two conserved amino acid sequences found within other paramyxoviruses are also present in Pl1 NP. Although Pl1 and SV showed a high sequence homology, approximately 100 amino acids at the C-terminal region were highly divergent as found also among other paramyxoviruses.

  6. New Hosts of Simplicimonas similis and Trichomitus batrachorum Identified by 18S Ribosomal RNA Gene Sequences

    PubMed Central

    Dimasuay, Kris Genelyn B.; Lavilla, Orlie John Y.; Rivera, Windell L.

    2013-01-01

    Trichomonads are obligate anaerobes generally found in the digestive and genitourinary tract of domestic animals. In this study, four trichomonad isolates were obtained from carabao, dog, and pig hosts using rectal swab. Genomic DNA was extracted using Chelex method and the 18S rRNA gene was successfully amplified through novel sets of primers and undergone DNA sequencing. Aligned isolate sequences together with retrieved 18S rRNA gene sequences of known trichomonads were utilized to generate phylogenetic trees using maximum likelihood and neighbor-joining analyses. Two isolates from carabao were identified as Simplicimonas similis while each isolate from dog and pig was identified as Pentatrichomonas hominis and Trichomitus batrachorum, respectively. This is the first report of S. similis in carabao and the identification of T. batrachorum in pig using 18S rRNA gene sequence analysis. The generated phylogenetic tree yielded three distinct groups mostly with relatively moderate to high bootstrap support and in agreement with the most recent classification. Pathogenic potential of the trichomonads in these hosts still needs further investigation. PMID:23936631

  7. Massive parallel IGHV gene sequencing reveals a germinal center pathway in origins of human multiple myeloma.

    PubMed

    Cowan, Graeme; Weston-Bell, Nicola J; Bryant, Dean; Seckinger, Anja; Hose, Dirk; Zojer, Niklas; Sahota, Surinder S

    2015-05-30

    Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway.

  8. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  9. Sequence Diversity of VP4 and VP7 Genes of Human Rotavirus Strains in Saudi Arabia.

    PubMed

    Abdel-Moneim, Ahmed S; Al-Malky, Mater I R; Alsulaimani, Adnan A A; Abuelsaad, Abdelaziz S A; Mohamed, Imad; Ismail, Ayman K

    2015-12-01

    Group A rotavirus is responsible for inducing severe diarrhea in young children worldwide. Rotavirus vaccines are used to control the disease in many countries. In the current study, the sequences of human rotavirus G and P types in Saudi Arabia are reported and compared to different relevant published sequences. In addition, the VP4 and VP7 genes of the G1P[8] strains are compared to different antigenic epitopes of the rotavirus vaccines. Stool samples were collected from children under 2 years suffering from severe diarrhea. Screening of the rotavirus-positive samples was performed with rapid antigen detection kit. RNA was amplified from rotavirus-positive samples by reverse transcriptase polymerase chain reaction assay for both VP4 and VP7 genes. Direct sequencing of the VP4 and VP7 genes was conducted and the obtained sequences were compared to each other and to the rotavirus vaccines. Both G1P[8] G1P[4] genotypes were detected. Phylogenetic analysis revealed that the detected strains belong to G1 lineage 1 and 2, P[8] lineage 3, and to P[4] lineage 5. Multiple amino acid substitutions were detected between the Saudi RVA strains and the commonly used vaccines. The current findings emphasize the importance of the continuous surveillance of the circulating rotavirus strains, which is crucial for monitoring virus evolution and helping in predicting the protection level afforded by rotavirus vaccines.

  10. Identification of gene sequences overexpressed in senescent and Werner syndrome human fibroblasts.

    PubMed

    Lecka-Czernik, B; Moerman, E J; Jones, R A; Goldstein, S

    1996-01-01

    The phenotype of replicative senescence is a dominant trait in human diploid fibroblasts (HDF). Therefore, we have sought to identify overexpressed and/or newly expressed causal genes by constructing and screening a subtracted cDNA library derived from polyA+RNA of prematurely senescent Werner syndrome (WS) HDF. We have identified 15 cDNA clones that are overexpressed in senescent and WS HDF. Among them are six known sequences coding for: acid sphingomyelinase, fibronectin, SPARC, nm23-metastasis suppressor protein, and two translation factors, eIF-2 beta and EF-1 alpha. Among the 10 unknown clones are: S1-5, which encodes a secreted protein containing EGF-like domains and paradoxically stimulates DNA synthesis of young HDF in an autocrine and paracrine manner, S1-3, which encodes a protein containing "zinc finger" domains, suggesting nucleic acid binding properties; S1-15, which shows sequence similarities to human alpha 2-chimerin; and S2-6, which represents a new member of the LIM family of proteins. The other five clones do not have any significant homology to known sequences. Steady-state mRNA levels of all gene sequences thus far studied are elevated in both WS and senescent normal HDF when compared to young HDF, which suggests that senescent and WS HDF enter a final common pathway where multiple gene overexpression may generate diverse antiproliferative mechanisms and pathogenic sequelae. PMID:8706786

  11. Cloning and sequencing of the major intracellular serine protease gene of Bacillus subtilis.

    PubMed Central

    Koide, Y; Nakamura, A; Uozumi, T; Beppu, T

    1986-01-01

    A Bacillus subtilis 2.7-kilobase DNA fragment containing an intracellular protease gene was cloned into Escherichia coli. The transformants produced an intracellular protease of approximately 35,000 Mr whose activity was inhibited by both phenylmethylsulfonyl fluoride and EDTA. Introduction of the fragment on a multicopy vector, pUB110, into B. subtilis caused a marked increase in the level of the intracellular protease. The nucleotide sequence of the cloned fragment showed the presence of an open reading frame for a possible proenzyme of the major intracellular serine protease (ISP-I) of B. subtilis with an NH2-terminal 17- or 20-amino-acid extension. The total amino acid sequence of the protease deduced from the nucleotide sequence showed considerable homology with that of an extracellular serine protease, subtilisin. The transcriptional initiation site of the ISP-I gene was identified by nuclease S1 mapping. No typical conserved sequence for promoters was found upstream of the open reading frame. An ISP-I-negative mutant of B. subtilis was constructed by integration of artificially deleted gene into the chromosome. The mutant sporulated normally in a nutritionally rich medium but showed decreased sporulation in a synthetic medium. The chloramphenicol resistance determinant of a plasmid integrated at the ISP-I locus was mapped by PBS1 transduction and was found to be closely linked to metC (99.5%). Images PMID:3087947

  12. Comparison of exon 5 sequences from 35 class I genes of the BALB/c mouse

    PubMed Central

    1989-01-01

    DNA sequences of the fifth exon, which encodes the transmembrane domain, were determined for the BALB/c mouse class I MHC genes and used to study the relationships between them. Based on nucleotide sequence similarity, the exon 5 sequences can be divided into seven groups. Although most members within each group are at least 80% similar to each other, comparison between groups reveals that the groups share little similarity. However, in spite of the extensive variation of the fifth exon sequences, analysis of their predicted amino acid translations reveals that only four class I gene fifth exons have frameshifts or stop codons that terminate their translation and prevent them from encoding a domain that is both hydrophobic and long enough to span a lipid bilayer. Exactly 27 of the remaining fifth exons could encode a domain that is similar to those of the transplantation antigens in that it consists of a proline-rich connecting peptide, a transmembrane segment, and a cytoplasmic portion with membrane- anchoring basic residues. The conservation of this motif in the majority of the fifth exon translations in spite of extensive variation suggests that selective pressure exists for these exons to maintain their ability to encode a functional transmembrane domain, raising the possibility that many of the nonclassical class I genes encode functionally important products. PMID:2584927

  13. Molecular cloning, sequence analysis and tissue-specific expression of Akirin2 gene in Tianfu goat.

    PubMed

    Ma, Jisi; Xu, Gangyi; Wan, Lu; Wang, Nianlu

    2015-01-01

    The Akirin2 gene is a nuclear factor and is considered as a potential functional candidate gene for meat quality. To better understand the structures and functions of Akirin2 gene, the cDNA of the Tianfu goat Akirin2 gene was cloned. Sequence analysis showed that the Tianfu goat Akirin2 cDNA full coding sequence (CDS) contains 579bp nucleotides that encode 192 amino acids. A phylogenic tree of the Akirin2 protein sequence from the Tianfu goat and other species revealed that the Tianfu goat Akirin2 was closely related with cattle and sheep Akirin2. RT-qPCR analysis showed that Akirin2 was expressed in the myocardium, liver, spleen, lung, kidney, leg muscle, abdominal muscle and the longissimus dorsi muscle. Especially, high expression levels of Akirin2 were detected in the spleen, lung, and kidney whereas lower expression levels were seen in the liver, myocardium, leg muscle, abdominal muscle and longissimus dorsi muscle. Temporal mRNA expression showed that Akirin2 expression levels in the longissimus dorsi muscle, first increased then decreased from day 1 to month 12. Western blotting results showed that the Akirin2 protein was only detected in the lung and three skeletal muscle tissues.

  14. Promoter-like sequences regulating transcriptional activity in neurexin and neuroligin genes.

    PubMed

    Runkel, Fabian; Rohlmann, Astrid; Reissner, Carsten; Brand, Stefan-Martin; Missler, Markus

    2013-10-01

    Synapse function requires the cell-adhesion molecules neurexins (Nrxn) and neuroligins (Nlgn). Although these molecules are essential for neurotransmission and prefer distinct isoform combinations for interaction, little is known about their transcriptional regulation. Here, we started to explore this important aspect because expression of Nrxn1-3 and Nlgn1-3 genes is altered in mice lacking the transcriptional regulator methyl-CpG-binding protein2 (MeCP2). Since MeCP2 can bind to methylated CpG-dinucleotides and Nrxn/Nlgn contain CpG-islands, we tested genomic sequences for transcriptional activity in reporter gene assays. We found that their influence on transcription are differentially activating or inhibiting. As we observed an activity difference between heterologous and neuronal cell lines for distinct Nrxn1 and Nlgn2 sequences, we dissected their putative promoter regions. In both genes, we identify regions in exon1 that can induce transcription, in addition to the alternative transcriptional start points in exon2. While the 5'-regions of Nrxn1 and Nlgn2 contain two CpG-rich elements that show distinct methylation frequency and binding to MeCP2, other regions may act independently of this transcriptional regulator. These data provide first insights into regulatory sequences of Nrxn and Nlgn genes that may represent an important aspect of their function at synapses in health and disease.

  15. Drosophila GRAIL: An intelligent system for gene recognition in Drosophila DNA sequences

    SciTech Connect

    Xu, Ying; Einstein, J.R.; Uberbacher, E.C.; Helt, G.; Rubin, G.

    1995-06-01

    An AI-based system for gene recognition in Drosophila DNA sequences was designed and implemented. The system consists of two main modules, one for coding exon recognition and one for single gene model construction. The exon recognition module finds a coding exon by recognition of its splice junctions (or translation start) and coding potential. The core of this module is a set of neural networks which evaluate an exon candidate for the possibility of being a true coding exon using the ``recognized`` splice junction (or translation start) and coding signals. The recognition process consists of four steps: generation of an exon candidate pool, elimination of improbable candidates using heuristic rules, candidate evaluation by trained neural networks, and candidate cluster resolution and final exon prediction. The gene model construction module takes as input the clustered exon candidates and builds a ``best`` possible single gene model using an efficient dynamic programming algorithm. 129 Drosophila sequences consisting of 441 coding exons including 216358 coding bases were extructed from GenBank and used to build statistical matrices and to train the neural networks. On this training set the system recognized 97% of the coding messages and predicted only 5% false messages. Among the ``correctly`` predicted exons, 68% match the actual exon exactly and 96% have at least one edge predicted correctly. On an independent test set consisting of 30 Drosophila sequences, the system recognized 96% of the coding messages and predicted 7% false messages.

  16. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus.

  17. Identification and characterization of rhizospheric microbial diversity by 16S ribosomal RNA gene sequencing

    PubMed Central

    Naveed, Muhammad; Mubeen, Samavia; khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad

    2014-01-01

    In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization. PMID:25477935

  18. Identification and characterization of rhizospheric microbial diversity by 16S ribosomal RNA gene sequencing.

    PubMed

    Naveed, Muhammad; Mubeen, Samavia; Khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad

    2014-01-01

    In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization.

  19. Zooplankton diversity analysis through single-gene sequencing of a community sample

    PubMed Central

    Machida, Ryuji J; Hashiguchi, Yasuyuki; Nishida, Mutsumi; Nishida, Shuhei

    2009-01-01

    Background Oceans cover more than 70% of the earth's surface and are critical for the homeostasis of the environment. Among the components of the ocean ecosystem, zooplankton play vital roles in energy and matter transfer through the system. Despite their importance, understanding of zooplankton biodiversity is limited because of their fragile nature, small body size, and the large number of species from various taxonomic phyla. Here we present the results of single-gene zooplankton community analysis using a method that determines a large number of mitochondrial COI gene sequences from a bulk zooplankton sample. This approach will enable us to estimate the species richness of almost the entire zooplankton community. Results A sample was collected from a depth of 721 m to the surface in the western equatorial Pacific off Pohnpei Island, Micronesia, with a plankton net equipped with a 2-m2 mouth opening. A total of 1,336 mitochondrial COI gene sequences were determined from the cDNA library made from the sample. From the determined sequences, the occurrence of 189 species of zooplankton was estimated. BLASTN search results showed high degrees of similarity (>98%) between the query and database for 10 species, including holozooplankton and merozooplankton. Conclusion In conjunction with the Census of Marine Zooplankton and Barcode of Life projects, single-gene zooplankton community analysis will be a powerful tool for estimating the species richness of zooplankton communities. PMID:19758460

  20. Amplification of complete gag gene sequences from geographically distinct equine infectious anemia virus isolates.

    PubMed

    Boldbaatar, Bazartseren; Bazartseren, Tsevel; Koba, Ryota; Murakami, Hironobu; Oguma, Keisuke; Murakami, Kenji; Sentsui, Hiroshi

    2013-04-01

    In the current study, primers described previously and modified versions of these primers were evaluated for amplification of full-length gag genes from different equine infectious anemia virus (EIAV) strains from several countries, including the USA, Germany and Japan. Each strain was inoculated into a primary horse leukocyte culture, and the full-length gag gene was amplified by reverse transcription polymerase chain reaction. Each amplified gag gene was cloned into a plasmid vector for sequencing, and the detectable copy numbers of target DNA were determined. Use of a mixture of two forward primers and one reverse primer in the polymerase chain reaction enabled the amplification of all EIAV strains used in this study. However, further study is required to confirm these primers as universal for all EIAV strains. The nucleotide sequence of gag is considered highly conserved, as evidenced by the use of gag-encoded capsid proteins as a common antigen for the detection of EIAV in serological tests. However, significant sequence variation in the gag genes of different EIAV strains was found in the current study. PMID:23318370

  1. Large scale in silico identification of MYB family genes from wheat expressed sequence tags.

    PubMed

    Cai, Hongsheng; Tian, Shan; Dong, Hansong

    2012-10-01

    The MYB proteins constitute one of the largest transcription factor families in plants. Much research has been performed to determine their structures, functions, and evolution, especially in the model plants, Arabidopsis, and rice. However, this transcription factor family has been much less studied in wheat (Triticum aestivum), for which no genome sequence is yet available. Despite this, expressed sequence tags are an important resource that permits opportunities for large scale gene identification. In this study, a total of 218 sequences from wheat were identified and confirmed to be putative MYB proteins, including 1RMYB, R2R3-type MYB, 3RMYB, and 4RMYB types. A total of 36 R2R3-type MYB genes with complete open reading frames were obtained. The putative orthologs were assigned in rice and Arabidopsis based on the phylogenetic tree. Tissue-specific expression pattern analyses confirmed the predicted orthologs, and this meant that gene information could be inferred from the Arabidopsis genes. Moreover, the motifs flanking the MYB domain were analyzed using the MEME web server. The distribution of motifs among wheat MYB proteins was investigated and this facilitated subfamily classification.

  2. Use of dedicated gene panel sequencing using next generation sequencing to improve the personalized care of lung cancer.

    PubMed

    Kaderbhai, Coureche Guillaume; Boidot, Romain; Beltjens, Françoise; Chevrier, Sandy; Arnould, Laurent; Favier, Laure; Lagrange, Aurélie; Coudert, Bruno; Ghiringhelli, François

    2016-04-26

    Advances in Next Generation Sequencing (NGS) technologies have improved the ability to detect potentially targetable mutations. However, the integration of NGS into clinical management in an individualized manner remains challenging. In this single-center observational study, we performed a dedicated NGS panel studying 41 cancer-related genes in 50 consecutive patients with metastatic non-small-cell lung cancer between May 2012 and October 2014. Molecular analysis could be performed in 48 patients with a good quality check. One hundred and thirty-three mutations, whose twenty-four unique mutations, were detected. At least one mutation was found in 46 patients. In 58% of cases, the Molecular Tumor Board (MTB) was able to recommend treatment with a targeted agent based on the evaluation of the tumor genetic profile and treatment history. Nine patients (18%) were subsequently treated with a MTB-recommended targeted therapy; four patients experienced a clinical benefit with a partial response or stabilization lasting more than 4 months. In this case series involving patients with metastatic non-small cell lung cancer, we show that including integrative clinical sequencing data into routine clinical management was feasible and could impact on patient therapeutic proposal.

  3. Use of dedicated gene panel sequencing using next generation sequencing to improve the personalized care of lung cancer

    PubMed Central

    Beltjens, Françoise; Chevrier, Sandy; Arnould, Laurent; Favier, Laure; Lagrange, Aurélie

    2016-01-01

    Advances in Next Generation Sequencing (NGS) technologies have improved the ability to detect potentially targetable mutations. However, the integration of NGS into clinical management in an individualized manner remains challenging. In this single-center observational study, we performed a dedicated NGS panel studying 41 cancer-related genes in 50 consecutive patients with metastatic non-small-cell lung cancer between May 2012 and October 2014. Molecular analysis could be performed in 48 patients with a good quality check. One hundred and thirty-three mutations, whose twenty-four unique mutations, were detected. At least one mutation was found in 46 patients. In 58% of cases, the Molecular Tumor Board (MTB) was able to recommend treatment with a targeted agent based on the evaluation of the tumor genetic profile and treatment history. Nine patients (18%) were subsequently treated with a MTB-recommended targeted therapy; four patients experienced a clinical benefit with a partial response or stabilization lasting more than 4 months. In this case series involving patients with metastatic non-small cell lung cancer, we show that including integrative clinical sequencing data into routine clinical management was feasible and could impact on patient therapeutic proposal. PMID:27027238

  4. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing.

    PubMed

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-05-31

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies. PMID:27025386

  5. Rapid cloning of disease-resistance genes in plants using mutagenesis and sequence capture.

    PubMed

    Steuernagel, Burkhard; Periyannan, Sambasivam K; Hernández-Pinzón, Inmaculada; Witek, Kamil; Rouse, Matthew N; Yu, Guotai; Hatta, Asyraf; Ayliffe, Mick; Bariana, Harbans; Jones, Jonathan D G; Lagudah, Evans S; Wulff, Brande B H

    2016-06-01

    Wild relatives of domesticated crop species harbor multiple, diverse, disease resistance (R) genes that could be used to engineer sustainable disease control. However, breeding R genes into crop lines often requires long breeding timelines of 5-15 years to break linkage between R genes and deleterious alleles (linkage drag). Further, when R genes are bred one at a time into crop lines, the protection that they confer is often overcome within a few seasons by pathogen evolution. If several cloned R genes were available, it would be possible to pyramid R genes in a crop, which might provide more durable resistance. We describe a three-step method (MutRenSeq)-that combines chemical mutagenesis with exome capture and sequencing for rapid R gene cloning. We applied MutRenSeq to clone stem rust resistance genes Sr22 and Sr45 from hexaploid bread wheat. MutRenSeq can be applied to other commercially relevant crops and their relatives, including, for example, pea, bean, barley, oat, rye, rice and maize. PMID:27111722

  6. Gene Profiling of Bone around Orthodontic Mini-Implants by RNA-Sequencing Analysis

    PubMed Central

    Nahm, Kyung-Yen; Heo, Jung Sun; Lee, Jae-Hyung; Lee, Dong-Yeol; Chung, Kyu-Rhim; Ahn, Hyo-Won; Kim, Seong-Hun

    2015-01-01

    This study aimed to evaluate the genes that were expressed in the healing bones around SLA-treated titanium orthodontic mini-implants in a beagle at early (1-week) and late (4-week) stages with RNA-sequencing (RNA-Seq). Samples from sites of surgical defects were used as controls. Total RNA was extracted from the tissue around the implants, and an RNA-Seq analysis was performed with Illumina TruSeq. In the 1-week group, genes in the gene ontology (GO) categories of cell growth and the extracellular matrix (ECM) were upregulated, while genes in the categories of the oxidation-reduction process, intermediate filaments, and structural molecule activity were downregulated. In the 4-week group, the genes upregulated included ECM binding, stem cell fate specification, and intramembranous ossification, while genes in the oxidation-reduction process category were downregulated. GO analysis revealed an upregulation of genes that were related to significant mechanisms, including those with roles in cell proliferation, the ECM, growth factors, and osteogenic-related pathways, which are associated with bone formation. From these results, implant-induced bone formation progressed considerably during the times examined in this study. The upregulation or downregulation of selected genes was confirmed with real-time reverse transcription polymerase chain reaction. The RNA-Seq strategy was useful for defining the biological responses to orthodontic mini-implants and identifying the specific genetic networks for targeted evaluations of successful peri-implant bone remodeling. PMID:25759820

  7. Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat

    PubMed Central

    2014-01-01

    Background Bread wheat (Triticum aestivum) has a large, complex and hexaploid genome consisting of A, B and D homoeologous chromosome sets. Therefore each wheat gene potentially exists as a trio of A, B and D homoeoloci, each of which may contribute differentially to wheat phenotypes. We describe a novel approach combining wheat cytogenetic resources (chromosome substitution ‘nullisomic-tetrasomic’ lines) with next generation deep sequencing of gene transcripts (RNA-Seq), to directly and accurately identify homoeologue-specific single nucleotide variants and quantify the relative contribution of individual homoeoloci to gene expression. Results We discover, based on a sample comprising ~5-10% of the total wheat gene content, that at least 45% of wheat genes are expressed from all three distinct homoeoloci. Most of these genes show strikingly biased expression patterns in which expression is dominated by a single homoeolocus. The remaining ~55% of wheat genes are expressed from either one or two homoeoloci only, through a combination of extensive transcriptional silencing and homoeolocus loss. Conclusions We conclude that wheat is tending towards functional diploidy, through a variety of mechanisms causing single homoeoloci to become the predominant source of gene transcripts. This discovery has profound consequences for wheat breeding and our understanding of wheat evolution. PMID:24726045

  8. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing

    PubMed Central

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-01-01

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies. PMID:27025386

  9. Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis.

    PubMed

    Awad, A; Khalil, S R; Abd-Elhakim, Y M

    2015-01-01

    Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegalensis), and Rock pigeon (Columba livia). Genomic DNA was extracted from blood samples and partial sequence of the mitochondrial cytochrome b gene (358 bp) was amplified and sequenced using universal primers. Sequences alignment and phylogenetic analyses were performed by CLC main workbench program. The obtained five sequences were deposited in GenBank and compared with those previously registered in GenBank. The similarity percentage was 88.60% between Gallus gallus and Coturnix japonica and 80.46% between Gallus gallus and Columba livia. The percentage of identity between the studied species and GenBank species ranged from 77.20% (Columba oenas and Anas platyrhynchos) to 100% (Gallus gallus and Gallus sonneratii, Coturnix coturnix and Coturnix japonica, Meleagris gallopavo and Columba livia). Amplification of the partial sequence of mitochondrial cytochrome b gene proved to be practical for identification of an avian species unambiguously. PMID:27175180

  10. Molecular phylogeny of some avian species using Cytochrome b gene sequence analysis

    PubMed Central

    Awad, A; Khalil, S. R; Abd-Elhakim, Y. M

    2015-01-01

    Veritable identification and differentiation of avian species is a vital step in conservative, taxonomic, forensic, legal and other ornithological interventions. Therefore, this study involved the application of molecular approach to identify some avian species i.e. Chicken (Gallus gallus), Muskovy duck (Cairina moschata), Japanese quail (Coturnix japonica), Laughing dove (Streptopelia senegalensis), and Rock pigeon (Columba livia). Genomic DNA was extracted from blood samples and partial sequence of the mitochondrial cytochrome b gene (358 bp) was amplified and sequenced using universal primers. Sequences alignment and phylogenetic analyses were performed by CLC main workbench program. The obtained five sequences were deposited in GenBank and compared with those previously registered in GenBank. The similarity percentage was 88.60% between Gallus gallus and Coturnix japonica and 80.46% between Gallus gallus and Columba livia. The percentage of identity between the studied species and GenBank species ranged from 77.20% (Columba oenas and Anas platyrhynchos) to 100% (Gallus gallus and Gallus sonneratii, Coturnix coturnix and Coturnix japonica, Meleagris gallopavo and Columba livia). Amplification of the partial sequence of mitochondrial cytochrome b gene proved to be practical for identification of an avian species unambiguously. PMID:27175180

  11. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein.

  12. Exome sequencing followed by genotyping suggests SYPL2 as a susceptibility gene for morbid obesity

    PubMed Central

    Jiao, Hong; Arner, Peter; Gerdhem, Paul; Strawbridge, Rona J; Näslund, Erik; Thorell, Anders; Hamsten, Anders; Kere, Juha; Dahlman, Ingrid

    2015-01-01

    Recently developed high-throughput sequencing technology shows power to detect low-frequency disease-causing variants by deep sequencing of all known exons. We used exome sequencing to identify variants associated with morbid obesity. DNA from 100 morbidly obese adult subjects and 100 controls were pooled (n=10/pool), subjected to exome capture, and subsequent sequencing. At least 100 million sequencing reads were obtained from each pool. After several filtering steps and comparisons of observed frequencies of variants between obese and non-obese control pools, we systematically selected 144 obesity-enriched non-synonymous, splicing site or 5′ upstream single-nucleotide variants for validation. We first genotyped 494 adult subjects with morbid obesity and 496 controls. Five obesity-associated variants (nominal P-value<0.05) were subsequently genotyped in 1425 morbidly obese and 782 controls. Out of the five variants, only rs62623713:A>G (NM_001040709:c.A296G:p.E99G) was confirmed. rs62623713 showed strong association with body mass index (beta=2.13 (1.09, 3.18), P=6.28 × 10−5) in a joint analysis of all 3197 genotyped subjects and had an odds ratio of 1.32 for obesity association. rs62623713 is a low-frequency (2.9% minor allele frequency) non-synonymous variant (E99G) in exon 4 of the synaptophysin-like 2 (SYPL2) gene. rs62623713 was not covered by Illumina or Affymetrix genotyping arrays used in previous genome-wide association studies. Mice lacking Sypl2 has been reported to display reduced body weight. In conclusion, using exome sequencing we identified a low-frequency coding variant in the SYPL2 gene that was associated with morbid obesity. This gene may be involved in the development of excess body fat. PMID:25406998

  13. Exome sequencing followed by genotyping suggests SYPL2 as a susceptibility gene for morbid obesity.

    PubMed

    Jiao, Hong; Arner, Peter; Gerdhem, Paul; Strawbridge, Rona J; Näslund, Erik; Thorell, Anders; Hamsten, Anders; Kere, Juha; Dahlman, Ingrid

    2015-09-01

    Recently developed high-throughput sequencing technology shows power to detect low-frequency disease-causing variants by deep sequencing of all known exons. We used exome sequencing to identify variants associated with morbid obesity. DNA from 100 morbidly obese adult subjects and 100 controls were pooled (n=10/pool), subjected to exome capture, and subsequent sequencing. At least 100 million sequencing reads were obtained from each pool. After several filtering steps and comparisons of observed frequencies of variants between obese and non-obese control pools, we systematically selected 144 obesity-enriched non-synonymous, splicing site or 5' upstream single-nucleotide variants for validation. We first genotyped 494 adult subjects with morbid obesity and 496 controls. Five obesity-associated variants (nominal P-value<0.05) were subsequently genotyped in 1425 morbidly obese and 782 controls. Out of the five variants, only rs62623713:A>G (NM_001040709:c.A296G:p.E99G) was confirmed. rs62623713 showed strong association with body mass index (beta=2.13 (1.09, 3.18), P=6.28 × 10(-5)) in a joint analysis of all 3197 genotyped subjects and had an odds ratio of 1.32 for obesity association. rs62623713 is a low-frequency (2.9% minor allele frequency) non-synonymous variant (E99G) in exon 4 of the synaptophysin-like 2 (SYPL2) gene. rs62623713 was not covered by Illumina or Affymetrix genotyping arrays used in previous genome-wide association studies. Mice lacking Sypl2 has been reported to display reduced body weight. In conclusion, using exome sequencing we identified a low-frequency coding variant in the SYPL2 gene that was associated with morbid obesity. This gene may be involved in the development of excess body fat. PMID:25406998

  14. Evidence on primate phylogeny from epsilon-globin gene sequences and flanking regions.

    PubMed

    Porter, C A; Sampaio, I; Schneider, H; Schneider, M P; Czelusniak, J; Goodman, M

    1995-01-01

    Phylogenetic relationships among various primate groups were examined based on sequences of epsilon-globin genes. epsilon-globin genes were sequenced from five species of strepsirhine primates. These sequences were aligned and compared with other known primate epsilon-globin sequences, including data from two additional strepsirhine species, one species of tarsier, 19 species of New World monkeys (representing all extant genera), and five species of catarrhines. In addition, a 2-kb segment upstream of the epsilon-globin gene was sequenced in two of the five strepsirhines examined. This upstream sequence was aligned with five other species of primates for which data are available in this segment. Domestic rabbit and goat were used as outgroups. This analysis supports the monophyly of order Primates but does not support the traditional prosimian grouping of tarsiers, lorisoids, and lemuroids; rather it supports the sister grouping of tarsiers and anthropoids into Haplorhini and the sister grouping of lorisoids and lemuroids into Strepsirhini. The mouse lemur (Microcebus murinus) and dwarf lemur (Cheirogaleus medius) appear to be most closely related to each other, forming a clade with the lemuroids, and are probably not closely related to the lorisoids, as suggested by some morphological studies. Analysis of the epsilon-globin data supports the hypothesis that the aye-aye (Daubentonia madagascariensis)