Science.gov

Sample records for 16s-rrna gene sequencing

  1. CHARACTERIZATION OF BACTERIAL BIOMASS IN MARINE SEDIMENTS BENEATH THE ROSS ICE SHEET, ANTARCTICA BY PHOSPHOLIPIDS ANALYSIS AND 16S RRNA GENE SEQUENCING

    NASA Astrophysics Data System (ADS)

    Carr, S. A.; Glossner, A. W.; Dunbar, R. B.; Vogel, S. W.; Brandes, J.; Sahl, J. W.; Pepe-Ranney, C.; Spear, J. R.; Naish, T.; Powell, R. D.; Mandernack, K. W.

    2009-12-01

    heterotrophic organisms dominate these sediments, with the implication that primary productivity is derived from above. Integrating structural analyses and δ13C values of phospholipids, porewater chemistry, δ13CDIC and δ13CDIC values with 16S rRNA gene sequences provides a more comprehensive understanding of the biogeochemical influences of microbial carbon cycling that occur beneath marine sediments of Antarctica and elsewhere.

  2. PHYLOGENETIC ANALYSIS OF 16S RRNA GENE SEQUENCES REVEALS THE PREVALENCE OF MYCOBACTERIA SP., ALPHA-PROTEOBACTERIA, AND UNCULTURED BACTERIA IN DRINKING WATER MICROBIAL COMMUNITIES

    EPA Science Inventory

    Previous studies have shown that culture-based methods tend to underestimate the densities and diversity of bacterial populations inhabiting water distribution systems (WDS). In this study, the phylogenetic diversity of drinking water bacteria was assessed using sequence analysis...

  3. MULTIPLE ENZYME RESTRICTION FRAGMENT LENGTH POLYMORPHISM ANALYSIS FOR HIGH RESOLUTION DISTINCTION OF PSEUDOMONAS (SENSU STRICTO) 16S RRNA GENES

    EPA Science Inventory

    Pseudomonas specific 16S rDNA PCR amplification and multiple enzyme restriction fragment length polymorphism (MERFLP) analysis using a single digestion mixture of Alu I, Hinf I, Rsa I, and Tru 9I distinguished 150 published sequences and reference strains of authentic Pseudomonas...

  4. Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...

  5. A HIGHLY SELECTIVE PCR PROTOCOL FOR DETECTING 16S RRNA GENES OF THE GENUS PSEUDOMONAS (SENSU STRICTO) IN ENVIRONMENTAL SAMPLES

    EPA Science Inventory

    Pseudomonas species are plant, animal, and human pathogens; exhibit plant pathogen-suppressing properties useful in biological control; or express metabolic versatilities valued in biotechnology and bioremediation. Specific detection of Pseudomonas species in the environment may ...

  6. [Gene and gene sequence patenting].

    PubMed

    Bergel, S D

    1998-01-01

    According to the author, the patenting of elements isolated or copied from the human body boils down to the issue of genes and gene sequences. He describes the current situation from the comparative law standpoint (U.S. and Spanish law mainly) and then esamines the biotechnology industry's position.

  7. Disease gene identification strategies for exome sequencing

    PubMed Central

    Gilissen, Christian; Hoischen, Alexander; Brunner, Han G; Veltman, Joris A

    2012-01-01

    Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major challenge, and novel variant prioritization strategies are required. The choice of these strategies depends on the availability of well-phenotyped patients and family members, the mode of inheritance, the severity of the disease and its population frequency. In this review, we discuss the current strategies for Mendelian disease gene identification by exome resequencing. We conclude that exome strategies are successful and identify new Mendelian disease genes in approximately 60% of the projects. Improvements in bioinformatics as well as in sequencing technology will likely increase the success rate even further. Exome sequencing is likely to become the most commonly used tool for Mendelian disease gene identification for the coming years. PMID:22258526

  8. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  9. Cloning and Sequencing the First HLA Gene

    PubMed Central

    Jordan, Bertrand R.

    2010-01-01

    This Perspectives article recounts the isolation and sequencing of the first human histocompatibility gene (HLA) in 1980–1981. At the time, general knowledge of the molecules of the immune system was already fairly extensive, and gene rearrangements in the immunoglobulin complex (discovered in 1976) had generated much excitement: HLA was quite obviously the next frontier. The author was able to use a homologous murine H-2 cDNA to identify putative human HLA genomic clones in a λ-phage library and thus to isolate and sequence the first human histocompatibility gene. This personal account relates the steps that led to this result, describes the highly competitive international environment, and highlights the role of location, connections, and sheer luck in such an achievement. It also puts this work in perspective with a short description of the current knowledge of histocompatibility genes and, finally, presents some reflections on the meaning of “discovery.” PMID:20457890

  10. Nemertean Toxin Genes Revealed through Transcriptome Sequencing

    PubMed Central

    Whelan, Nathan V.; Kocot, Kevin M.; Santos, Scott R.; Halanych, Kenneth M.

    2014-01-01

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63–74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. PMID:25432940

  11. The first determination of DNA sequence of a specific gene.

    PubMed

    Inouye, Masayori

    2016-05-10

    How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).

  12. Nucleotide Sequence of the Akv env Gene

    PubMed Central

    Lenz, Jack; Crowther, Robert; Straceski, Anthony; Haseltine, William

    1982-01-01

    The sequence of 2,191 nucleotides encoding the env gene of murine retrovirus Akv was determined by using a molecular clone of the Akv provirus. Deduction of the encoded amino acid sequence showed that a single open reading frame encodes a 638-amino acid precursor to gp70 and p15E. In addition, there is a typical leader sequence preceding the amino terminus of gp70. The locations of potential glycosylation sites and other structural features indicate that the entire gp70 molecule and most of p15E are located on the outer side of the membrane. Internal cleavage of the env precursor to generate gp70 and p15E occurs immediately adjacent to several basic amino acids at the carboxyl terminus of gp70. This cleavage generates a region of 42 uncharged, relatively hydrophobic amino acids at the amino terminus of p15E, which is located in a position analogous to the hydrophobic membrane fusion sequence of influenza virus hemagglutinin. The mature polypeptides are predicted to associate with the membrane via a region of 30 uncharged, mostly hydrophobic amino acids located near the carboxyl terminus of p15E. Distal to this membrane association region is a sequence of 35 amino acids at the carboxyl terminus of the env precursor, which is predicted to be located on the inner side of the membrane. By analogy to Moloney murine leukemia virus, a proteolytic cleavage in this region removes the terminal 19 amino acids, thus generating the carboxyl terminus of p15E. This leaves 15 amino acids at the carboxyl terminus of p15E on the inner side of the membrane in a position to interact with virion cores during budding. The precise location and order of the large RNase T1-resistant oligonucleotides in the env region were determined and compared with those from several leukemogenic viruses of AKR origin. This permitted a determination of how the differences in the leukemogenic viruses affect the primary structure of the env gene products. PMID:6283170

  13. Mouse Vk gene classification by nucleic acid sequence similarity.

    PubMed

    Strohal, R; Helmberg, A; Kroemer, G; Kofler, R

    1989-01-01

    Analyses of immunoglobulin (Ig) variable (V) region gene usage in the immune response, estimates of V gene germline complexity, and other nucleic acid hybridization-based studies depend on the extent to which such genes are related (i.e., sequence similarity) and their organization in gene families. While mouse Igh heavy chain V region (VH) gene families are relatively well-established, a corresponding systematic classification of Igk light chain V region (Vk) genes has not been reported. The present analysis, in the course of which we reviewed the known extent of the Vk germline gene repertoire and Vk gene usage in a variety of responses to foreign and self antigens, provides a classification of mouse Vk genes in gene families composed of members with greater than 80% overall nucleic acid sequence similarity. This classification differed in several aspects from that of VH genes: only some Vk gene families were as clearly separated (by greater than 25% sequence dissimilarity) as typical VH gene families; most Vk gene families were closely related and, in several instances, members from different families were very similar (greater than 80%) over large sequence portions; frequently, classification by nucleic acid sequence similarity diverged from existing classifications based on amino-terminal protein sequence similarity. Our data have implications for Vk gene analyses by nucleic acid hybridization and describe potentially important differences in sequence organization between VH and Vk genes.

  14. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  15. Presence of STA gene sequences in brewer's yeast genome.

    PubMed

    Balogh, I; Maráz, A

    1996-06-01

    STA genes are responsible for producing extracellular glucoamylase enzymes in Saccharomyces cerevisiae var. diastaticus. These genes exist in three forms, which are located on three different chromosomes. The nucleotide sequences of the STA genes are highly homologous. A sporulation-specific glucoamylase gene called SGA1 exists in every Saccharomyces cerevisiae strain, this also having a partly homologous DNA sequence with the STA genes. In this study S. cerevisiae var. diastaticus and brewer's yeast strains were characterized by pulsed-field gel electrophoresis. In many cases chromosome length polymorphism (CLP) was found. The chromosomes were hybridized with a DNA probe which was homologous with STA genes and the SGA1 gene. Presence of the SGA1 gene was detected in each strain used. Four brewing yeasts were found to have homologous sequences with the STA3 gene on chromosome XIV despite the fact that these strains were not able to produce extracellular glucoamylase enzyme.

  16. Nucleotide sequence of the coat protein gene of canine parvovirus.

    PubMed Central

    Rhode, S L

    1985-01-01

    The nucleotide sequence of the canine parvovirus (CPV2) from map units 33 to 95 has been determined. This includes the entire coat protein gene and noncoding sequences at the 3' end of the gene, exclusive of the terminal inverted repeat. The predicted capsid protein structures are discussed and compared with those of the rodent parvoviruses H-1 and MVM. PMID:3989914

  17. Housekeeping genes tend to show reduced upstream sequence conservation

    PubMed Central

    Farré, Domènec; Bellora, Nicolás; Mularoni, Loris; Messeguer, Xavier; Albà, M Mar

    2007-01-01

    Background Understanding the constraints that operate in mammalian gene promoter sequences is of key importance to understand the evolution of gene regulatory networks. The level of promoter conservation varies greatly across orthologous genes, denoting differences in the strength of the evolutionary constraints. Here we test the hypothesis that the number of tissues in which a gene is expressed is related in a significant manner to the extent of promoter sequence conservation. Results We show that mammalian housekeeping genes, expressed in all or nearly all tissues, show significantly lower promoter sequence conservation, especially upstream of position -500 with respect to the transcription start site, than genes expressed in a subset of tissues. In addition, we evaluate the effect of gene function, CpG island content and protein evolutionary rate on promoter sequence conservation. Finally, we identify a subset of transcription factors that bind to motifs that are specifically over-represented in housekeeping gene promoters. Conclusion This is the first report that shows that the promoters of housekeeping genes show reduced sequence conservation with respect to genes expressed in a more tissue-restricted manner. This is likely to be related to simpler gene expression, requiring a smaller number of functional cis-regulatory motifs. PMID:17626644

  18. Sequence determinants of prokaryotic gene expression level under heat stress.

    PubMed

    Xiong, Heng; Yang, Yi; Hu, Xiao-Pan; He, Yi-Ming; Ma, Bin-Guang

    2014-11-01

    Prokaryotic gene expression is environment-dependent and temperature plays an important role in shaping the gene expression profile. Revealing the regulation mechanisms of gene expression pertaining to temperature has attracted tremendous efforts in recent years particularly owning to the yielding of transcriptome and proteome data by high-throughput techniques. However, most of the previous works concentrated on the characterization of the gene expression profile of individual organism and little effort has been made to disclose the commonality among organisms, especially for the gene sequence features. In this report, we collected the transcriptome and proteome data measured under heat stress condition from recently published literature and studied the sequence determinants for the expression level of heat-responsive genes on multiple layers. Our results showed that there indeed exist commonness and consistent patterns of the sequence features among organisms for the differentially expressed genes under heat stress condition. Some features are attributed to the requirement of thermostability while some are dominated by gene function. The revealed sequence determinants of bacterial gene expression level under heat stress complement the knowledge about the regulation factors of prokaryotic gene expression responding to the change of environmental conditions. Furthermore, comparisons to thermophilic adaption have been performed to reveal the similarity and dissimilarity of the sequence determinants for the response to heat stress and for the adaption to high habitat temperature, which elucidates the complex landscape of gene expression related to the same physical factor of temperature.

  19. Searching gene and protein sequence databases.

    PubMed

    Barsalou, T; Brutlag, D L

    1991-01-01

    A large-scale effort to map and sequence the human genome is now under way. Crucial to the success of this research is a group of computer programs that analyze and compare data on molecular sequences. This article describes the classic algorithms for similarity searching and sequence alignment. Because good performance of these algorithms is critical to searching very large and growing databases, we analyze the running times of the algorithms and discuss recent improvements in this area.

  20. Nucleotide sequence of SHV-2 beta-lactamase gene

    SciTech Connect

    Garbarg-Chenon, A.; Godard, V.; Labia, R.; Nicolas, J.C. )

    1990-07-01

    The nucleotide sequence of plasmid-mediated beta-lactamase SHV-2 from Salmonella typhimurium (SHV-2pHT1) was determined. The gene was very similar to chromosomally encoded beta-lactamase LEN-1 of Klebsiella pneumoniae. Compared with the sequence of the Escherichia coli SHV-2 enzyme (SHV-2E.coli) obtained by protein sequencing, the deduced amino acid sequence of SHV-2pHT1 differed by three amino acid substitutions.

  1. Optimization of gene sequences under constant mutational pressure and selection

    NASA Astrophysics Data System (ADS)

    Kowalczuk, M.; Gierlik, A.; Mackiewicz, P.; Cebrat, S.; Dudek, M. R.

    1999-12-01

    We have analyzed the influence of constant mutational pressure and selection on the nucleotide composition of DNA sequences of various size, which were represented by the genes of the Borrelia burgdorferi genome. With the help of MC simulations we have found that longer DNA sequences accumulate much less base substitutions per sequence length than short sequences. This leads us to the conclusion that the accuracy of replication may determine the size of genome.

  2. Single molecule targeted sequencing for cancer gene mutation detection

    PubMed Central

    Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W.; He, Jiankui

    2016-01-01

    With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis. PMID:27193446

  3. Structure and sequence divergence of two archaebacterial genes

    SciTech Connect

    Cue, D.; Beckler, G.S.; Reeve, J.N.; Konisky, J.

    1985-06-01

    The DNA sequences of a region that includes the hisA gene of two related methanogenic archaebacteria, Methanococcus voltae and Methanococcus vannielii, have been compared. Both organisms show a similar genome organization in this region, displaying three open reading frames (ORFs) separated by regions of very high A+T content. Two of the ORFs, including ORFHisA, show significant DNA sequence homology. As might be expected for organisms having a genome that is A+T-rich, there is a high preference for A and U as the third base in codons. A ribosome binding site, G-G-T-G, is located 6 base pairs preceding the ATG translation initiation sequence of both hisA genes. The sequences upstream of the two hisA genes show only limited sequence homology. The M. voltae intergenic region contains four tandemly arranged repetitions of an 11-base-pair sequence, whereas the M. vannielii sequence contains both direct and inverted repetitive sequences. Based on the degree of hisA sequence homology, the authors conclude that M. voltae and M. vannielii are less closely related taxonomically than are members of the enteric group of eubacteria.

  4. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  5. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1997-01-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  6. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags

    SciTech Connect

    Xu, Y.; Mural, R.; Uberbacher, E.

    1997-02-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  7. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  8. Updated Sequence Information for TEM β-Lactamase Genes

    PubMed Central

    Goussard, Sylvie; Courvalin, Patrice

    1999-01-01

    The sequences of the promoter regions and of the structural genes for 13 penicillinase, extended-spectrum, and inhibitor-resistant TEM-type β-lactamases have been determined, and an updated blaTEM gene nomenclature is proposed. PMID:9925535

  9. (Gene sequencing by scanning molecular exciton microscopy)

    SciTech Connect

    Not Available

    1991-01-01

    This report details progress made in setting up a laboratory for optical microscopy of genes. The apparatus including a fluorescence microscope, a scanning optical microscope, various spectrometers, and supporting computers is described. Results in developing photon and exciton tips, and in preparing samples are presented. (GHH)

  10. Gene regulation: ancient microRNA target sequences in plants.

    PubMed

    Floyd, Sandra K; Bowman, John L

    2004-04-01

    MicroRNAs are an abundant class of small RNAs that are thought to regulate the expression of protein-coding genes in plants and animals. Here we show that the target sequence of two microRNAs, known to regulate genes in the class-III homeodomain-leucine zipper (HD-Zip) gene family of the flowering plant Arabidopsis, is conserved in homologous sequences from all lineages of land plants, including bryophytes, lycopods, ferns and seed plants. We also find that the messenger RNAs from these genes are cleaved within the same microRNA-binding site in representatives of each land-plant group, as they are in Arabidopsis. Our results indicate not only that microRNAs mediate gene regulation in non-flowering as well as flowering plants, but also that the regulation of this class of plant genes dates back more than 400 million years.

  11. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes.

  12. The nucleotide sequence of the human beta-globin gene.

    PubMed

    Lawn, R M; Efstratiadis, A; O'Connell, C; Maniatis, T

    1980-10-01

    We report the complete nucleotide sequence of the human beta-globin gene. The purpose of this study is to obtain information necessary to study the evolutionary relationships between members of the human beta-like globin gene family and to provide the basis for comparing normal beta-globin genes with those obtained from the DNA of individuals with genetic defects in hemoglobin expression.

  13. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  14. Noncoding RNA gene detection using comparative sequence analysis

    PubMed Central

    Rivas, Elena; Eddy, Sean R

    2001-01-01

    Background Noncoding RNA genes produce transcripts that exert their function without ever producing proteins. Noncoding RNA gene sequences do not have strong statistical signals, unlike protein coding genes. A reliable general purpose computational genefinder for noncoding RNA genes has been elusive. Results We describe a comparative sequence analysis algorithm for detecting novel structural RNA genes. The key idea is to test the pattern of substitutions observed in a pairwise alignment of two homologous sequences. A conserved coding region tends to show a pattern of synonymous substitutions, whereas a conserved structural RNA tends to show a pattern of compensatory mutations consistent with some base-paired secondary structure. We formalize this intuition using three probabilistic "pair-grammars": a pair stochastic context free grammar modeling alignments constrained by structural RNA evolution, a pair hidden Markov model modeling alignments constrained by coding sequence evolution, and a pair hidden Markov model modeling a null hypothesis of position-independent evolution. Given an input pairwise sequence alignment (e.g. from a BLASTN comparison of two related genomes) we classify the alignment into the coding, RNA, or null class according to the posterior probability of each class. Conclusions We have implemented this approach as a program, QRNA, which we consider to be a prototype structural noncoding RNA genefinder. Tests suggest that this approach detects noncoding RNA genes with a fair degree of reliability. PMID:11801179

  15. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  16. Nucleotide sequence of the pyruvate decarboxylase gene from Zymomonas mobilis.

    PubMed

    Neale, A D; Scopes, R K; Wettenhall, R E; Hoogenraad, N J

    1987-02-25

    Pyruvate decarboxylase (EC 4.1.1.1), the penultimate enzyme in the alcoholic fermentation pathway of Zymomonas mobilis, converts pyruvate to acetaldehyde and carbon dioxide. The complete nucleotide sequence of the structural gene encoding pyruvate decarboxylase from Zymomonas mobilis has been determined. The coding region is 1704 nucleotides long and encodes a polypeptide of 567 amino acids with a calculated subunit mass of 60,790 daltons. The amino acid sequence was confirmed by comparison with the amino acid sequence of a selection of tryptic fragments of the enzyme. The amino acid composition obtained from the nucleotide sequence is in good agreement with that obtained experimentally.

  17. Coupling sequencing by hybridization (SBH) with gel sequencing for an inexpensive analysis of genes and genomes

    SciTech Connect

    Drmanac, S.; Labat, I.; Hauser, B.; Drmanac, R.

    1996-11-01

    The speed and cost of DNA sequencing are bottlenecks in the analysis of genes end genomes. Sequencing by hybridization (SBH) is a versatile method with several applications which can accelerated DNA screening, mapping and sequencing. Requirements, achievements and problems in the development of the SBH format 1 (DNA samples arrayed) are presented and schemes for its synergetic coupling with gel sequencing techniques are discussed. It appears that by one hybridization machine with 24 boxes and four ABI gel sequencers 100- 300 Mb of DNA sequence can be determined per year. Various genetic studies based on computer assisted analysis of large collections of partial or complete DNA sequences (`sequenetics`) may be achieved in this century.

  18. Inter-specific sequence conservation and intra-individual sequence variation in a spider silk gene.

    PubMed

    Tai, Pei-Ling; Hwang, Guang-Yuh; Tso, I-Min

    2004-10-01

    Currently, studies on major ampullate spidroin 1 (MaSp1) genes of non-orb weaving spiders are few, and it is not clear whether genes of these organisms exhibit the same characteristics as those of orb-weavers. In addition, many studies have proposed that MaSp1 might be a single gene with allelic variants, but supporting evidence is still lacking. In this study, we compared partial DNA and amino acid sequences of MaSp1 cloned from different spider guilds. We also cloned partial MaSp1 sequences from genomic DNA and cDNA of the same individuals of spiders using the same primer combination to see if different molecular forms existed. In the repetitive region of partial MaSp1 sequences obtained, GGX, GA and poly-A motifs were present in all Araneomorphae and Mygalomorpae species examined. An extreme similarity in MaSp1 non-repetitive portions was found in sequences of ecribellate, cribellate and Mygalomorphae web-builders and such a result suggested that this sequence might exhibit an important function. A comparison of sequences amplified from the same individual showed that substitutions in amino acids occurred in both repetitive and non-repetitive regions, with a much higher variation in the former. These results suggest that the MaSp1 of Araneomorphae spiders exhibits several forms in an individual spider and it might be either a multiple gene or a single gene with a multiple exon/intron organization.

  19. Sequence diversity of mating-type genes in Phaeosphaeria avenaria.

    PubMed

    Ueng, Peter P; Dai, Qun; Cui, Kai-rong; Czembor, Paweł C; Cunfer, Barry M; Tsang, H; Arseniuk, Edward; Bergstrom, Gary C

    2003-05-01

    Phaeosphaeria avenaria, one of the causal agents of stagonospora leaf blotch diseases in cereals, is composed of two subspecies, P. avenaria f. sp. triticea (Pat) and P. avenaria f. sp. avenaria (Paa). The Pat subspecies was grouped into Pat1-Pat3, based on restriction fragment length polymorphism (RFLP) and ribosomal DNA (rDNA) internal transcribed spacer (ITS) sequences in previous studies. Mating-type genes and their potential use in phylogeny and molecular classification were studied by DNA hybridization and PCR amplification. The majority of Pat1 isolates reported to be homothallic and producing sexual reproduction structures on cultural media had only the MAT1-1 gene. Minor sequence variations were found in the conserved region of MAT1-1 gene in Pat1 isolates. However, both mating-type genes, MAT1-1 and MAT1-2, were identified in P. avenaria isolates represented by ATCC12277 from oats (Paa) and the Pat2 isolates from foxtail barley ( Hordeum jubatum L.). Cluster analyses based on mating-type gene conserved regions revealed that cereal Phaeosphaeria is not phylogenetically closely related to other ascomycetes, including Mycosphaerella graminicola (anamorph Septoria tritici). The sequence diversity of mating-type genes in Pat and Paa supports our previous phylogenetic relationship and molecular classification based on RFLP fingerprinting and rDNA ITS sequences.

  20. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  1. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  2. Sequence variation in the Tbx4 gene in marine mammals.

    PubMed

    Onbe, Kaori; Nishida, Shin; Sone, Emi; Kanda, Naohisa; Goto, Mutsuo; Pastene, Luis A; Tanabe, Shinsuke; Koike, Hiroko

    2007-05-01

    The amino-acid sequences of the T-domain region of the Tbx4 gene, which is required for hindlimb development, are 100% identical in humans and mice. Cetaceans have lost most of their hindlimb structure, although hindlimb buds are present in very early cetacean embryos. To examine whether the Tbx4 gene has the same function in cetaceans as in other mammals, we analyzed Tbx4 sequences from cetaceans, dugong, artiodactyls and marine carnivores. A total of 39 primers were designed using human and dog Tbx4 nucleotide sequences. Exons 3, 4, 5, 6, 7, and 8 of the Tbx4 genes from cetaceans, artiodactyls, and marine carnivores were sequenced. Non-synonymous substitution sites were detected in the T-domain regions from some cetacean species, but were not detected in those from artiodactyls, the dugong, or the carnivores. The C-terminal regions contained a number of non-synonymous substitutions. Although some indels were present, they were in groups of three nucleotides and therefore did not cause frame shifts. The dN/dS values for the T-domain and C-terminal regions of the cetacean and artiodactylous Tbx4 genes were much lower than 1, indicating that the Tbx4 gene maintains it function in cetaceans, although full expression leading to hindlimb development is suppressed.

  3. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    PubMed Central

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  4. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications.

    PubMed

    Herzog, M; Maroteaux, L

    1986-11-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage.

  5. Sequence and gene expression evolution of paralogous genes in willows

    PubMed Central

    Harikrishnan, Srilakshmy L.; Pucholt, Pascal; Berlin, Sofia

    2015-01-01

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows. PMID:26689951

  6. Sequence and gene expression evolution of paralogous genes in willows.

    PubMed

    Harikrishnan, Srilakshmy L; Pucholt, Pascal; Berlin, Sofia

    2015-12-22

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows.

  7. [Nucleotide sequence of genes for alpha- and beta-subunits of luciferase from Photobacterium leiognathi].

    PubMed

    Illarionov, B A; Protopopova, M V; Karginov, V A; Mertvetsov, N P; Gitel'zon, I I

    1988-03-01

    Nucleotide sequence of the Photobacterium leiognathi DNA containing genes of alpha and beta subunits of luciferase has been determined. We also deduced amino acid sequence and molecular mass of luciferase and localized luciferase genes in the sequenced DNA fragment.

  8. Nucleotide sequence of the human N-myc gene

    SciTech Connect

    Stanton, L.W.; Schwab, M.; Bishop, J.M.

    1986-03-01

    Human neuroblastomas frequently display amplification and augmented expression of a gene known as N-myc because of its similarity to the protooncogene c-myc. It has therefore been proposed that N-myc is itself a protooncogene, and subsequent tests have shown that N-myc and c-myc have similar biological activities in cell culture. The authors have now detailed the kinship between N-myc and c-myc by determining the nucleotide sequence of human N-myc and deducing the amino acid sequence of the protein encoded by the gene. The topography of N-myc is strikingly similar to that of c-myc: both genes contain three exons of similar lengths; the coding elements of both genes are located in the second and third exons; and both genes have unusually long 5' untranslated regions in their mRNAs, with features that raise the possibility that expression of the genes may be subject to similar controls of translation. The resemblance between the proteins encoded by N-myc and c-myc sustains previous suspicions that the genes encode related functions.

  9. Searching the expressed sequence tag (EST) databases: panning for genes.

    PubMed

    Jongeneel, C V

    2000-02-01

    The genomes of living organisms contain many elements, including genes coding for proteins. The portions of the genes expressed as mature mRNA, collectively known as the transcriptome, represent only a small part of the genome. The expressed sequence tag (EST) databases contain an increasingly large part of the transcriptome of many species. For this reason, these databases are probably the most abundant source of new coding sequences available today. However, the raw data deposited in the EST databases are to a large extent unorganised, unannotated, redundant and of relatively low quality. This paper reviews some of the characteristics of the EST data, and the methods that can be used to find novel protein sequences within them. It also documents a collection of databases, software and web sites that can be useful to biologists interested in mining the EST databases over the Internet, or in establishing a local environment for such analyses.

  10. Thermodynamics-based models of transcriptional regulation with gene sequence.

    PubMed

    Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing

    2015-12-01

    Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.

  11. Cloning and sequence analysis of an actin gene in aloe.

    PubMed

    Wen, S S; He, D W; Liao, C M; Li, J; Wen, G Q; Liu, X H

    2014-07-04

    Aloe (Aloe spp), containing abundant polysaccharides and numerous bioactive ingredients, has remarkable medical, ornamental, calleidic, and edible values. In the present study, the total RNA was extracted from aloe leaf tissue. The isolated high-quality RNA was further used to clone actin gene by using reverse transcription-polymerase chain reaction (RT-PCR). The result of sequence analysis for the amplified fragment revealed that the cloned actin gene was 1012 bp in length (GenBank accession No. KC751541.1) and contained a 924-bp coding region and encoded a protein consisting of 307 amino acids. Homologous alignment showed that it shared over 80 and 96% identity with the nucleotide and amino acid sequences of actin from other plants, respectively. In addition, the cloned gene was used for phylogenetic analyses based on the deduced amino acid sequences, and the results suggested that the actin gene is highly conserved in evolution. The findings of this study will be useful for investigating the expression patterns of other genes in Aloe.

  12. Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    PubMed Central

    Harper, Marc A.; Chen, Zugen; Toy, Traci; Machado, Iara M. P.; Nelson, Stanley F.; Liao, James C.; Lee, Christopher J.

    2011-01-01

    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110–$340. PMID:21364744

  13. Sequence diversity in 36 candidate genes for cardiovascular disorders.

    PubMed Central

    Cambien, F; Poirier, O; Nicaud, V; Herrmann, S M; Mallet, C; Ricard, S; Behague, I; Hallet, V; Blanc, H; Loukaci, V; Thillet, J; Evans, A; Ruidavets, J B; Arveiler, D; Luc, G; Tiret, L

    1999-01-01

    Two strategies involving whole-genome association studies have been proposed for the identification of genes involved in complex diseases. The first one seeks to characterize all common variants of human genes and to test their association with disease. The second one seeks to develop dense maps of single-nucleotide polymorphisms (SNPs) and to detect susceptibility genes through linkage disequilibrium. We performed a molecular screening of the coding and/or flanking regions of 36 candidate genes for cardiovascular diseases. All polymorphisms identified by this screening were further genotyped in 750 subjects of European descent. In the whole set of genes, the lengths explored spanned 53.8 kb in the 5' regions, 68.4 kb in exonic regions, and 13 kb in the 3' regions. The strength of linkage disequilibrium within candidate regions suggests that genomewide maps of SNPs might be efficient ways to identify new disease-susceptibility genes, provided that the maps are sufficiently dense. However, the relatively large number of polymorphisms within coding and regulatory regions of candidate genes raises the possibility that several of them might be functional and that the pattern of genotype-phenotype association might be more complex than initially envisaged, as actually has been observed in some well-characterized genes. These results argue in favor of both genomewide association studies and detailed studies of the overall sequence variation of candidate genes, as complementary approaches. PMID:10364531

  14. GRAIL seeks out genes buried in DNA sequence

    SciTech Connect

    Roberts, L.

    1991-11-08

    When the Human Genome Project achieves its ultimate goal, supposedly around 2005, biologists will have in hand the exact sequence of all 3 billion nucleotides arrayed along the human chromosomes. But they have never been entirely sure how they will read the language of the long string of As, Gs, Ts, and Cs. How will they even be able to pick out the genes, which account for a mere 5% of the genome, from the mass of letters in between Now Edward Ubergacher, a biophysicist-turned-computational-biologist at Oak Ridge National Laboratory, has come one step toward providing an answer: a new artificial intelligence program, called GRAIL, that can pick out the coding regions of genes in a long stretch of sequence data. So far, the Oak Ridge team has analyzed 5 million bases of DNA. One year ago, even 6 months ago, it was virtually impossible to go into human genomic sequence and find genes by computer with any reliability. Now we can go in and find 90% of the genes very quickly. GRAIL can be used on a PC, not a supercomputer, and it provides an answer almost instantly.

  15. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  16. Next-generation tag sequencing for cancer gene expression profiling.

    PubMed

    Morrissy, A Sorana; Morin, Ryan D; Delaney, Allen; Zeng, Thomas; McDonald, Helen; Jones, Steven; Zhao, Yongjun; Hirst, Martin; Marra, Marco A

    2009-10-01

    We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors, antisense transcripts, and intronic sequences, the latter possibly representing novel exons or genes. We observed increases in the diversity, abundance, and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify, in cancers and normal libraries, altered expression ratios of alternative transcript isoforms. The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries. S-AS transcripts were enriched in known cancer genes, while transcript isoforms were enriched in miRNA targeting sites. We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq, such that AT-rich tags were less abundant than GC-rich tags in LongSAGE. Tag-seq also performed better in gene discovery, identifying >98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes, which was expressed at levels below those detectable by LongSAGE. Overall, Tag-seq is sensitive to rare transcripts, has less sequence composition bias relative to LongSAGE, and allows differential expression analysis for a greater range of transcripts, including transcripts encoding important regulatory molecules.

  17. The nucleotide sequence of a nematode vitellogenin gene.

    PubMed Central

    Spieth, J; Denison, K; Zucker, E; Blumenthal, T

    1985-01-01

    The nematode, Caenorhabditis elegans, contains a family of six genes that code for vitellogenins. Here we report the complete nucleotide sequence of one of these genes, vit-5. The gene specifies a mRNA of 4869 nucleotides, including untranslated regions of 9 bases at the 5' end and 51 bases at the 3' end. Vit-5 contains four short introns totalling 218 bp. The predicted vitellogenin, yp170A, has a molecular weight of 186,430. At its N terminus it is clearly related to the vitellogenins of vertebrates. However, the vit-5-encoded protein does not contain a serine-rich sequence related to the vertebrate vitellin, phosvitin. In fact, the amino acid composition of the nematode protein is very similar to that of the vertebrate protein without phosvitin. Vit-5 has a highly asymmetric codon choice dictionary. The favored codons are different from those favored in other organisms, but are characteristic of highly expressed C. elegans genes. The strong selection against rare codons is not as great near the 5' end of the gene; rare codons are 15 times more frequent within the first 54 bp than in the next 4.8 kb. PMID:3855245

  18. Nucleotide Sequence of the Protective Antigen Gene of Bacillus Anthracis

    DTIC Science & Technology

    1988-02-02

    transcription and translation of the Bacillus megaterium protein C gene. J. Bacteriol. 158:e09-813. 9. Friedlander, A, M. 1986. Macrophages are sensitive to...of the Protective Antigen Gene of Bacillus anthracis 6. pEaltranalO opl. AMPOA’T B*u~iA S. L. Welkos, J. R. Lowe, F. Eden-McCutchan, M. Vodkin, S. M... Bacillus anthracls and the 5’ and 3’ flanking sequences were determined. Protective antigen ie one of three proteins comprising anthrax toxin. The open

  19. Cloning and sequencing of the gene encoding cytochrome c sub 553 from Desulfovibrio vulgaris Hildenborough

    SciTech Connect

    van Rooijen, G.J.H.; Voordouw, G. ); Bruschi, M. )

    1989-06-01

    The gene encoding cytochrome c{sub 553} from Desulfovibrio vulgaris Hildenborough was cloned by using two synthetic deoxyoligonucleotide probes. The amino acid sequence derived from the sequence of the gene differs from that reported by Bruschi and LeGall. Renewed protein sequencing confirmed the correctness of the DNA-derived sequence. The gene sequence indicates cytochrome c{sub 553} to be synthesized as a precursor protein with an NH{sub 2}-terminal signal sequence of 24 residues.

  20. From expression cloning to gene modeling: the development of Xenopus gene sequence resources.

    PubMed

    Gilchrist, Michael J

    2012-03-01

    The Xenopus community has made concerted efforts over the last 10-12 years systematically to improve the available sequence information for this amphibian model organism ideally suited to the study of early development in vertebrates. Here I review progress in the collection of both sequence data and physical clone reagents for protein coding genes. I conclude that we have cDNA sequences for around 50% and full-length clones for about 35% of the genes in Xenopus tropicalis, and similar numbers but a smaller proportion for Xenopus laevis. In addition, I demonstrate that the gaps in the current genome assembly create problems for the computational elucidation of gene sequences, and suggest some ways to ameliorate the effects of this.

  1. Chloroplast gene sequences and the study of plant evolution.

    PubMed Central

    Clegg, M T

    1993-01-01

    A large body of sequence data has accumulated for the chloroplast-encoded gene ribulose-1,5-biphosphate carboxylase/oxygenase (rbcL) as the result of a cooperative effort involving many laboratories. The data span all seed plants, including most major lineages from the angiosperms, and as such they provide an unprecedented opportunity to study plant evolutionary history. The full analysis of this large data set poses many problems and opportunities for plant evolutionary biologists and for biostatisticians. PMID:8421667

  2. Deciphering Poxvirus Gene Expression by RNA Sequencing and Ribosome Profiling

    PubMed Central

    Cao, Shuai; Martens, Craig A.; Porcella, Stephen F.; Xie, Zhi; Ma, Ming; Shen, Ben

    2015-01-01

    ABSTRACT The more than 200 closely spaced annotated open reading frames, extensive transcriptional read-through, and numerous unpredicted RNA start sites have made the analysis of vaccinia virus gene expression challenging. Genome-wide ribosome profiling provided an unprecedented assessment of poxvirus gene expression. By 4 h after infection, approximately 80% of the ribosome-associated mRNA was viral. Ribosome-associated mRNAs were detected for most annotated early genes at 2 h and for most intermediate and late genes at 4 and 8 h. Cluster analysis identified a subset of early mRNAs that continued to be translated at the later times. At 2 h, there was excellent correlation between the abundance of individual mRNAs and the numbers of associated ribosomes, indicating that expression was primarily transcriptionally regulated. However, extensive transcriptional read-through invalidated similar correlations at later times. The mRNAs with the highest density of ribosomes had host response, DNA replication, and transcription roles at early times and were virion components at late times. Translation inhibitors were used to map initiation sites at single-nucleotide resolution at the start of most annotated open reading frames although in some cases a downstream methionine was used instead. Additional putative translational initiation sites with AUG or alternative codons occurred mostly within open reading frames, and fewer occurred in untranslated leader sequences, antisense strands, and intergenic regions. However, most open reading frames associated with these additional translation initiation sites were short, raising questions regarding their biological roles. The data were used to construct a high-resolution genome-wide map of the vaccinia virus translatome. IMPORTANCE This report contains the first genome-wide, high-resolution analysis of poxvirus gene expression at both transcriptional and translational levels. The study was made possible by recent methodological

  3. GeneQuiz: A workbench for sequence analysis

    SciTech Connect

    Scharf, M.; Schneider, R.; Casari, G.; Bork, P.

    1994-12-31

    We present the prototype of a software system, called GeneQuiz, for large-scale biological sequence analysis. The system was designed to meet the needs that arise in computational sequence analysis and our past experience with the analysis of 171 protein sequences of yeast chromosome III. We explain the cognitive challenges associated with this particular research activity and present our model of the sequence analysis process. The prototype system consists of two parts: (i) the database update and search system (driven by perl programs and rdb, a simple relational database engine also written in perl) and (ii) the visualization and browsing system (developed under C++/ET++). The principal design requirement for the first part was the complete automation of all repetitive actions: database up- dates, efficient sequence similarity searches and sampling of results in a uniform fashion. The user is then presented with {open_quotes}hit-lists{close_quotes} that summarize the results from heterogeneous database searches. The expert`s primary task now simply becomes the further analysis of the candidate entries, where the problem is to extract adequate information about functional characteristics of the query protein rapidly. This second task is tremendously accelerated by a simple combination of the heterogeneous output into uniform relational tables and the provision of browsing mechanisms that give access to database records, sequence entries and alignment views. Indexing of molecular sequence databases provides fast retrieval of individual entries with the use of unique identifiers as well as browsing through databases using pre-existing cross-references. The presentation here covers an overview of the architecture of the system prototype and our experiences on its applicability in sequence analysis.

  4. Full-Length Minor Ampullate Spidroin Gene Sequence

    PubMed Central

    Chen, Gefei; Liu, Xiangqin; Zhang, Yunlong; Lin, Senzhu; Yang, Zijiang; Johansson, Jan; Rising, Anna; Meng, Qing

    2012-01-01

    Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level. PMID:23251707

  5. Detecting gene mutations in Japanese Alzheimer's patients by semiconductor sequencing.

    PubMed

    Yagi, Ryoichi; Miyamoto, Ryosuke; Morino, Hiroyuki; Izumi, Yuishin; Kuramochi, Masahito; Kurashige, Takashi; Maruyama, Hirofumi; Mizuno, Noriyoshi; Kurihara, Hidemi; Kawakami, Hideshi

    2014-07-01

    Alzheimer's disease (AD) is the most common form of dementia. To date, several genes have been identified as the cause of AD, including PSEN1, PSEN2, and APP. The association between APOE and late-onset AD has also been reported. We here used a bench top next-generation sequencer, which uses an integrated semiconductor device, detects hydrogen ions, and operates at a high-speed using nonoptical technology. We examined 45 Japanese AD patients with positive family histories, and 29 sporadic patients with early onset (<60-year-old). Causative mutations were detected in 5 patients in the familial group (11%). Three patients had a known heterozygous missense mutation in the PSEN1 gene (p.H163R). Two patients from 1 family had a novel heterozygous missense mutation in the PSEN1 gene (p.F386L). In the early onset group, 1 patient carrying homozygous APOEε4 had a novel heterozygous missense mutation in the PSEN2 gene (p.T421M). Approximately 43% patients were APOEε4 positive in our study. This new sequencing technology is useful for detecting genetic variations in familial AD.

  6. Technology development for gene discovery and full-length sequencing

    SciTech Connect

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  7. Detection and sequence analysis of accessory gene regulator genes of Staphylococcus pseudintermedius isolates

    PubMed Central

    Chitra, M. Ananda; Jayanthy, C.; Nagarajan, B.

    2015-01-01

    Background: Staphylococcus pseudintermedius (SP) is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr) locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objective of this study was to detect and sequence analyzing the AgrA, B, and D of SP isolated from canine skin infections. Materials and Methods: In this study, we have isolated and identified SP from canine pyoderma and otitis cases by polymerase chain reaction (PCR) and confirmed by PCR-restriction fragment length polymorphism. Primers for SP agrA and agrBD genes were designed using online primer designing software and BLAST searched for its specificity. Amplification of the agr genes was carried out for 53 isolates of SP by PCR and sequencing of agrA, B, and D were carried out for five isolates and analyzed using DNAstar and Mega5.2 software. Results: A total of 53 (59%) SP isolates were obtained from 90 samples. 15 isolates (28%) were confirmed to be methicillin-resistant SP (MRSP) with the detection of the mecA gene. Accessory gene regulator A, B, and D genes were detected in all the SP isolates. Complete nucleotide sequences of the above three genes for five isolates were submitted to GenBank, and their accession numbers are from KJ133557 to KJ133571. AgrA amino acid sequence analysis showed that it is mainly made of alpha-helices and is hydrophilic in nature. AgrB is a transmembrane protein, and AgrD encodes the precursor of the autoinducing peptide (AIP). Sequencing of the agrD gene revealed that the 5 canine SP strains tested could be divided into three Agr specificity groups (RIPTSTGFF, KIPTSTGFF, and RIPISTGFF) based on the putative AIP produced by each strain. The AIP of

  8. Nuclear gene sequences from a late pleistocene sloth coprolite.

    PubMed

    Poinar, Hendrik; Kuch, Melanie; McDonald, Gregory; Martin, Paul; Pääbo, Svante

    2003-07-01

    The determination of nuclear DNA sequences from ancient remains would open many novel opportunities such as the resolution of phylogenies, the sexing of hominid and animal remains, and the characterization of genes involved in phenotypic traits. However, to date, single-copy nuclear DNA sequences from fossils have been determined only from bones and teeth of woolly mammoths preserved in the permafrost. Since the best preserved ancient nucleic acids tend to stem from cold environments, this has led to the assumption that nuclear DNA would be retrievable only from frozen remains. We have previously shown that Pleistocene coprolites stemming from the extinct Shasta sloth (Nothrotheriops shastensis, Megatheriidae) contain mitochondrial (mt) DNA from the animal that produced them as well as chloroplast (cp) DNA from the ingested plants. Recent attempts to resolve the phylogeny of two families of extinct sloths by using strictly mitochondrial DNA has been inconclusive. We have prepared DNA extracts from a ground sloth coprolite from Gypsum Cave, Nevada, and quantitated the number of mtDNA copies for three different fragment lengths by using real-time PCR. We amplified one multicopy and three single-copy nuclear gene fragments and used the concatenated sequence to resolve the phylogeny. These results show that ancient single-copy nuclear DNA can be recovered from warm, arid climates. Thus, nuclear DNA preservation is not restricted to cold climates.

  9. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  10. Automated Gene Ontology annotation for anonymous sequence data.

    PubMed

    Hennig, Steffen; Groth, Detlef; Lehrach, Hans

    2003-07-01

    Gene Ontology (GO) is the most widely accepted attempt to construct a unified and structured vocabulary for the description of genes and their products in any organism. Annotation by GO terms is performed in most of the current genome projects, which besides generality has the advantage of being very convenient for computer based classification methods. However, direct use of GO in small sequencing projects is not easy, especially for species not commonly represented in public databases. We present a software package (GOblet), which performs annotation based on GO terms for anonymous cDNA or protein sequences. It uses the species independent GO structure and vocabulary together with a series of protein databases collected from various sites, to perform a detailed GO annotation by sequence similarity searches. The sensitivity and the reference protein sets can be selected by the user. GOblet runs automatically and is available as a public service on our web server. The paper also addresses the reliability of automated GO annotations by using a reference set of more than 6000 human proteins. The GOblet server is accessible at http://goblet.molgen.mpg.de.

  11. Organization of gene and non-gene sequences in micronuclear DNA of Oxytricha nova.

    PubMed Central

    Boswell, R E; Jahn, C L; Greslin, A F; Prescott, D M

    1983-01-01

    In order to study the derivation of the macronuclear genome from the micronuclear genome in Oxytricha nova micronuclear DNA was partially digested with EcoRI, size fractionated, and then cloned in the lambda phage Charon 8. Clones were selected a) at random b) by hybridization with macronuclear DNA or c) by hybridization with clones of macronuclear DNA. One group of these clones contains only unique sequence DNA, and all of these had sequences that were homologous to macronuclear sequences. The number of macronuclear genes with sequences homologous to these micronuclear clones indicates that macronuclear sequences are clustered in the micronuclear genome. Many micronuclear clones contain repetitive DNA sequences and hybridize to numerous EcoRI fragments of total micronuclear DNA, yielding similar but non-identical patterns. Some micronuclear clones containing these repetitive sequences also contained unique sequence DNA that hybridized to a macronuclear sequence. These clones define a major interspersed repetitive sequence family in the micronuclear genome that is eliminated during formation of the macronuclear genome. Images PMID:6304639

  12. Learning gene regulatory networks from next generation sequencing data.

    PubMed

    Jia, Bochao; Xu, Suwa; Xiao, Guanghua; Lamba, Vishal; Liang, Faming

    2017-03-10

    In recent years, next generation sequencing (NGS) has gradually replaced microarray as the major platform in measuring gene expressions. Compared to microarray, NGS has many advantages, such as less noise and higher throughput. However, the discreteness of NGS data also challenges the existing statistical methodology. In particular, there still lacks an appropriate statistical method for reconstructing gene regulatory networks using NGS data in the literature. The existing local Poisson graphical model method is not consistent and can only infer certain local structures of the network. In this article, we propose a random effect model-based transformation to continuize NGS data and then we transform the continuized data to Gaussian via a semiparametric transformation and apply an equivalent partial correlation selection method to reconstruct gene regulatory networks. The proposed method is consistent. The numerical results indicate that the proposed method can lead to much more accurate inference of gene regulatory networks than the local Poisson graphical model and other existing methods. The proposed data-continuized transformation fills the theoretical gap for how to transform discrete data to continuous data and facilitates NGS data analysis. The proposed data-continuized transformation also makes it feasible to integrate different types of data, such as microarray and RNA-seq data, in reconstruction of gene regulatory networks.

  13. Deep sequencing reveals 50 novel genes for recessive cognitive disorders.

    PubMed

    Najmabadi, Hossein; Hu, Hao; Garshasbi, Masoud; Zemojtel, Tomasz; Abedini, Seyedeh Sedigheh; Chen, Wei; Hosseini, Masoumeh; Behjati, Farkhondeh; Haas, Stefan; Jamali, Payman; Zecha, Agnes; Mohseni, Marzieh; Püttmann, Lucia; Vahid, Leyla Nouri; Jensen, Corinna; Moheb, Lia Abbasi; Bienek, Melanie; Larti, Farzaneh; Mueller, Ines; Weissmann, Robert; Darvish, Hossein; Wrogemann, Klaus; Hadavi, Valeh; Lipkowitz, Bettina; Esmaeeli-Nieh, Sahar; Wieczorek, Dagmar; Kariminejad, Roxana; Firouzabadi, Saghar Ghasemi; Cohen, Monika; Fattahi, Zohreh; Rost, Imma; Mojahedi, Faezeh; Hertzberg, Christoph; Dehghan, Atefeh; Rajab, Anna; Banavandi, Mohammad Javad Soltani; Hoffer, Julia; Falah, Masoumeh; Musante, Luciana; Kalscheuer, Vera; Ullmann, Reinhard; Kuss, Andreas Walter; Tzschach, Andreas; Kahrizi, Kimia; Ropers, H Hilger

    2011-09-21

    Common diseases are often complex because they are genetically heterogeneous, with many different genetic defects giving rise to clinically indistinguishable phenotypes. This has been amply documented for early-onset cognitive impairment, or intellectual disability, one of the most complex disorders known and a very important health care problem worldwide. More than 90 different gene defects have been identified for X-chromosome-linked intellectual disability alone, but research into the more frequent autosomal forms of intellectual disability is still in its infancy. To expedite the molecular elucidation of autosomal-recessive intellectual disability, we have now performed homozygosity mapping, exon enrichment and next-generation sequencing in 136 consanguineous families with autosomal-recessive intellectual disability from Iran and elsewhere. This study, the largest published so far, has revealed additional mutations in 23 genes previously implicated in intellectual disability or related neurological disorders, as well as single, probably disease-causing variants in 50 novel candidate genes. Proteins encoded by several of these genes interact directly with products of known intellectual disability genes, and many are involved in fundamental cellular processes such as transcription and translation, cell-cycle control, energy metabolism and fatty-acid synthesis, which seem to be pivotal for normal brain development and function.

  14. Sequence polymorphism and evolution of three cetacean MHC genes.

    PubMed

    Xu, Shi Xia; Ren, Wen Hua; Li, Shu Zhen; Wei, Fu Wen; Zhou, Kai Ya; Yang, Guang

    2009-09-01

    Sequence variability at three major histocompatibility complex (MHC) genes (DQB, DRA, and MHC-I) of cetaceans was investigated in order to get an overall understanding of cetacean MHC evolution. Little sequence variation was detected at the DRA locus, while extensive and considerable variability were found at the MHC-I and DQB loci. Phylogenetic reconstruction and sequence comparison revealed extensive sharing of identical MHC alleles among different species at the three MHC loci examined. Comparisons of phylogenetic trees for these MHC loci with the trees reconstructed only based on non-PBR sites revealed that allelic similarity/identity possibly reflected common ancestry and were not due to adaptive convergence. At the same time, trans-species evolution was also evidenced that the allelic diversity of the three MHC loci clearly pre-dated species divergence events according to the relaxed molecular clock. It may be the forces of balancing selection acting to maintain the high sequence variability and identical alleles in trans-specific manner at the MHC-I and DQB loci.

  15. From genes to markers: exploiting gene sequence information to develop tools for plant breeding.

    PubMed

    Garcia, Melissa; Mather, Diane E

    2014-01-01

    Once the sequence is known for a gene of interest, it is usually possible to design markers to detect polymorphisms within the gene. Such markers can be particularly useful in plant breeding, especially if they detect the causal polymorphism within the gene and are diagnostic of the phenotype. In this chapter, we (1) discuss how gene sequences are obtained and aligned and how polymorphic sites can be identified or predicted; (2) explain the principles of PCR primer design and PCR amplification and provide guidelines for their application in the design and testing of markers; (3) discuss detection methods for presence/absence (dominant) polymorphisms, length polymorphisms and single nucleotide polymorphisms (SNPs); and (4) outline some of the factors that affect the utility of markers in plant breeding and explain how markers can be evaluated (validated) for use in plant breeding.

  16. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Zook, Justin M.; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S.; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B.

    2015-01-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing®, or Ion Torrent PGM®. The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  17. Sequences of the recA gene and protein.

    PubMed

    Sancar, A; Stachelek, C; Konigsberg, W; Rupp, W D

    1980-05-01

    We have determined the nucleotide sequence of the recA gene of Escherichia coli; this permits the formulation of the primary structure for the recA protein. This structure is consistent with the amino acid composition of the tryptic peptides obtained from the recA protein. The coding region of the recA gene has 1059 base pairs, which specify 352 amino acids. The recA protein has alanine and phenylalanine as its NH2- and COOH-terminal amino acids, respectively, and has the following amino acid composition: Cys3 Asp20 Asn15 Met9 Thr17 Ser20 Glu30 Gln13 Pro10 Gly35 Ala38 Val22 Ile27 Leu31 Tyr7 Phe10 His2Lys27 Trp2 Arg14. Of the three cysteine residues, only two can be alkylated under reducing and denaturing conditions. The molecular weight of the recA polypeptide is 37,842.

  18. How to analyze gene expression using RNA-sequencing data.

    PubMed

    Ramsköld, Daniel; Kavak, Ersen; Sandberg, Rickard

    2012-01-01

    RNA-Seq is arising as a powerful method for transcriptome analyses that will eventually make microarrays obsolete for gene expression analyses. Improvements in high-throughput sequencing and efficient sample barcoding are now enabling tens of samples to be run in a cost-effective manner, competing with microarrays in price, excelling in performance. Still, most studies use microarrays, partly due to the ease of data analyses using programs and modules that quickly turn raw microarray data into spreadsheets of gene expression values and significant differentially expressed genes. Instead RNA-Seq data analyses are still in its infancy and the researchers are facing new challenges and have to combine different tools to carry out an analysis. In this chapter, we provide a tutorial on RNA-Seq data analysis to enable researchers to quantify gene expression, identify splice junctions, and find novel transcripts using publicly available software. We focus on the analyses performed in organisms where a reference genome is available and discuss issues with current methodology that have to be solved before RNA-Seq data can utilize its full potential.

  19. Nucleotide sequence of the hemolysin I gene from Actinobacillus pleuropneumoniae.

    PubMed Central

    Frey, J; Meier, R; Gygi, D; Nicolet, J

    1991-01-01

    The DNA sequence of the gene encoding the structural protein of hemolysin I (HlyI) of Actinobacillus pleuropneumoniae serotype 1 strain 4074 was analyzed. The nucleotide sequence shows a 3,072-bp reading frame encoding a protein of 1,023 amino acids with a calculated molecular size of 110.1 kDa. This corresponds to the HlyI protein, which has an apparent molecular size on sodium dodecyl sulfate gels of 105 kDa. The structure of the protein derived from the DNA sequence shows three hydrophobic regions in the N-terminal part of the protein, 13 glycine-rich domains in the second half of the protein, and a hydrophilic C-terminal area, all of which are typical of the cytotoxins of the RTX (repeats in the structural toxin) toxin family. The derived amino acid sequence of HlyI shows 42% homology with the hemolysin of A. pleuropneumoniae serotype 5, 41% homology with the leukotoxin of Pasteurella haemolytica, and 56% homology with the Escherichia coli alpha-hemolysin. The 13 glycine-rich repeats and three hydrophobic areas of the HlyI sequence show more similarity to the E. coli alpha-hemolysin than to either the A. pleuropneumoniae serotype 5 hemolysin or the leukotoxin (while the last two are more similar to each other). Two types of RTX hemolysins therefore seem to be present in A. pleuropneumoniae, one (HlyI) resembling the alpha-hemolysin and a second more closely related to the leukotoxin. Ca(2+)-binding experiments using HlyI and recombinant A. pleuropneumoniae prohemolysin (HlyIA) that was produced in E. coli shows that HlyI binds 45Ca2+, probably because of the 13 glycine-rich repeated domains. Activation of the prohemolysin is not required for Ca2+ binding. Images PMID:1879928

  20. Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: single-gene, gene panel, or exome/genome sequencing.

    PubMed

    Xue, Yuan; Ankala, Arunkanth; Wilcox, William R; Hegde, Madhuri R

    2015-06-01

    Next-generation sequencing is changing the paradigm of clinical genetic testing. Today there are numerous molecular tests available, including single-gene tests, gene panels, and exome sequencing or genome sequencing. As a result, ordering physicians face the conundrum of selecting the best diagnostic tool for their patients with genetic conditions. Single-gene testing is often most appropriate for conditions with distinctive clinical features and minimal locus heterogeneity. Next-generation sequencing-based gene panel testing, which can be complemented with array comparative genomic hybridization and other ancillary methods, provides a comprehensive and feasible approach for heterogeneous disorders. Exome sequencing and genome sequencing have the advantage of being unbiased regarding what set of genes is analyzed, enabling parallel interrogation of most of the genes in the human genome. However, current limitations of next-generation sequencing technology and our variant interpretation capabilities caution us against offering exome sequencing or genome sequencing as either stand-alone or first-choice diagnostic approaches. A growing interest in personalized medicine calls for the application of genome sequencing in clinical diagnostics, but major challenges must be addressed before its full potential can be realized. Here, we propose a testing algorithm to help clinicians opt for the most appropriate molecular diagnostic tool for each scenario.

  1. Gene expression profile of human bone marrow stromal cells: high-throughput expressed sequence tag sequencing analysis.

    PubMed

    Jia, Libin; Young, Marian F; Powell, John; Yang, Liming; Ho, Nicola C; Hotchkiss, Robert; Robey, Pamela Gehron; Francomano, Clair A

    2002-01-01

    Human bone marrow stromal cells (HBMSC) are pluripotent cells with the potential to differentiate into osteoblasts, chondrocytes, myelosupportive stroma, and marrow adipocytes. We used high-throughput DNA sequencing analysis to generate 4258 single-pass sequencing reactions (known as expressed sequence tags, or ESTs) obtained from the 5' (97) and 3' (4161) ends of human cDNA clones from a HBMSC cDNA library. Our goal was to obtain tag sequences from the maximum number of possible genes and to deposit them in the publicly accessible database for ESTs (dbEST of the National Center for Biotechnology Information). Comparisons of our EST sequencing data with nonredundant human mRNA and protein databases showed that the ESTs represent 1860 gene clusters. The EST sequencing data analysis showed 60 novel genes found only in this cDNA library after BLAST analysis against 3.0 million ESTs in NCBI's dbEST database. The BLAST search also showed the identified ESTs that have close homology to known genes, which suggests that these may be newly recognized members of known gene families. The gene expression profile of this cell type is revealed by analyzing both the frequency with which a message is encountered and the functional categorization of expressed sequences. Comparing an EST sequence with the human genomic sequence database enables assignment of an EST to a specific chromosomal region (a process called digital gene localization) and often enables immediate partial determination of intron/exon boundaries within the genomic structure. It is expected that high-throughput EST sequencing and data mining analysis will greatly promote our understanding of gene expression in these cells and of growth and development of the skeleton.

  2. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  3. Identification of novel hereditary cancer genes by whole exome sequencing.

    PubMed

    Sokolenko, Anna P; Suspitsin, Evgeny N; Kuligina, Ekatherina Sh; Bizin, Ilya V; Frishman, Dmitrij; Imyanitov, Evgeny N

    2015-12-28

    Whole exome sequencing (WES) provides a powerful tool for medical genetic research. Several dozens of WES studies involving patients with hereditary cancer syndromes have already been reported. WES led to breakthrough in understanding of the genetic basis of some exceptionally rare syndromes; for example, identification of germ-line SMARCA4 mutations in patients with ovarian hypercalcemic small cell carcinomas indeed explains a noticeable share of familial aggregation of this disease. However, studies on common cancer types turned out to be more difficult. In particular, there is almost a dozen of reports describing WES analysis of breast cancer patients, but none of them yet succeeded to reveal a gene responsible for the significant share of missing heritability. Virtually all components of WES studies require substantial improvement, e.g. technical performance of WES, interpretation of WES results, mode of patient selection, etc. Most of contemporary investigations focus on genes with autosomal dominant mechanism of inheritance; however, recessive and oligogenic models of transmission of cancer susceptibility also need to be considered. It is expected that the list of medically relevant tumor-predisposing genes will be rapidly expanding in the next few years.

  4. IDENTIFICATION OF ACTIVE BACTERIAL COMMUNITIES IN A MODEL DRINKING WATER BIOFILM SYSTEM USING 16S RRNA-BASED CLONE LIBRARIES

    EPA Science Inventory

    Recent phylogenetic studies have used DNA as the target molecule for the development of environmental 16S rDNA clone libraries. As DNA may persist in the environment, DNA-based libraries cannot be used to identify metabolically active bacteria in water systems. In this study, a...

  5. Structural gene and complete amino acid sequence of Vibrio alginolyticus collagenase.

    PubMed Central

    Takeuchi, H; Shibano, Y; Morihara, K; Fukushima, J; Inami, S; Keil, B; Gilles, A M; Kawamoto, S; Okuda, K

    1992-01-01

    The DNA encoding the collagenase of Vibrio alginolyticus was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited both collagenase antigen and collagenase activity. The open reading frame from the ATG initiation codon was 2442 bp in length for the collagenase structural gene. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature collagenase consists of 739 amino acids with an Mr of 81875. The amino acid sequences of 20 polypeptide fragments were completely identical with the deduced amino acid sequences of the collagenase gene. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified collagenase reported previously. The analyses of both the DNA and amino acid sequences of the collagenase gene were rigorously performed, but we could not detect any significant sequence similarity to other collagenases. Images Fig. 2. PMID:1311172

  6. Variation in the nucleotide sequence of a prolamin gene family in wild rice.

    PubMed

    Barbier, P; Ishihama, A

    1990-07-01

    Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.

  7. Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

    Technology Transfer Automated Retrieval System (TEKTRAN)

    : Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...

  8. Mutational analysis of DBD*--a unique antileukemic gene sequence.

    PubMed

    Ji, Yan-shan; Johnson, Betty H; Webb, M Scott; Thompson, E Brad

    2002-01-01

    DBD* is a novel gene encoding an 89 amino acid peptide that is constitutively lethal to leukemic cells. DBD* was derived from the DNA binding domain of the human glucocorticoid receptor by a frameshift that replaces the final 21 C-terminal amino acids of the domain. Previous studies suggested that DBD* no longer acted as the natural DNA binding domain. To confirm and extend these results, we mutated DBD* in 29 single amino acid positions, critical for the function in the native domain or of possible functional significance in the novel 21 amino acid C-terminal sequence. Steroid-resistant leukemic ICR-27-4 cells were transiently transfected by electroporation with each of the 29 mutants. Cell kill was evaluated by trypan blue dye exclusion, a WST-1 tetrazolium-based assay for cell respiration, propidium iodide exclusion, and Hoechst 33258 staining of chromatin. Eleven of the 29 point mutants increased, whereas four decreased antileukemic activity. The remainder had no effect on activity. The nonconcordances between these effects and native DNA binding domain function strongly suggest that the lethality of DBD* is distinct from that of the glucocorticoid receptor. Transfections of fragments of DBD* showed that optimal activity localized to the sequence for its C-terminal 32 amino acids.

  9. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    PubMed

    Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; San Gil, Inigo; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spo, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  10. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    SciTech Connect

    Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J. R.; Amaral-Zettler, L.; Gilbert, J. A.

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences - the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  11. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    PubMed Central

    Jenior, Matthew L.; Koumpouras, Charles C.; Westcott, Sarah L.; Highlander, Sarah K.

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  12. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    PubMed

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.

  13. Computational sequence analysis of predicted long dsRNA transcriptomes of major crops reveals sequence complementarity with human genes.

    PubMed

    Jensen, Peter D; Zhang, Yuanji; Wiggins, B Elizabeth; Petrick, Jay S; Zhu, Jin; Kerstetter, Randall A; Heck, Gregory R; Ivashuta, Sergey I

    2013-01-01

    Long double-stranded RNAs (long dsRNAs) are precursors for the effector molecules of sequence-specific RNA-based gene silencing in eukaryotes. Plant cells can contain numerous endogenous long dsRNAs. This study demonstrates that such endogenous long dsRNAs in plants have sequence complementarity to human genes. Many of these complementary long dsRNAs have perfect sequence complementarity of at least 21 nucleotides to human genes; enough complementarity to potentially trigger gene silencing in targeted human cells if delivered in functional form. However, the number and diversity of long dsRNA molecules in plant tissue from crops such as lettuce, tomato, corn, soy and rice with complementarity to human genes that have a long history of safe consumption supports a conclusion that long dsRNAs do not present a significant dietary risk.

  14. Lipoprotein lipase gene sequencing and plasma lipid profile[S

    PubMed Central

    Pirim, Dilek; Wang, Xingbin; Radwan, Zaheda H.; Niemsiri, Vipavee; Hokanson, John E.; Hamman, Richard F.; Barmada, M. Michael; Demirci, F. Yesim; Kamboh, M. Ilyas

    2014-01-01

    Lipoprotein lipase (LPL) plays a crucial role in lipid metabolism by hydrolyzing triglyceride (TG)-rich particles and affecting HDL cholesterol (HDL-C) levels. In this study, the entire LPL gene plus flanking regions were resequenced in individuals with extreme HDL-C/TG levels (n = 95), selected from a population-based sample of 623 US non-Hispanic White (NHW) individuals. A total of 176 sequencing variants were identified, including 28 novel variants. A subset of 64 variants [common tag single nucleotide polymorphisms (tagSNP) and selected rare variants] were genotyped in the total sample, followed by association analyses with major lipid traits. A gene-based association test including all genotyped variants revealed significant association with HDL-C (P = 0.024) and TG (P = 0.006). Our single-site analysis revealed seven independent signals (P < 0.05; r2 < 0.40) with either HDL-C or TG. The most significant association was for the SNP rs295 exerting opposite effects on TG and HDL-C levels with P values of 7.5.10−4 and 0.002, respectively. Our work highlights some common variants and haplotypes in LPL with significant associations with lipid traits; however, the analysis of rare variants using burden tests and SKAT-O method revealed negligible effects on lipid traits. Comprehensive resequencing of LPL in larger samples is warranted to further test the role of rare variants in affecting plasma lipid levels. PMID:24212298

  15. Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition.

    PubMed

    Muraya, Moses M; Schmutzer, Thomas; Ulpinnis, Chris; Scholz, Uwe; Altmann, Thomas

    2015-01-01

    A major goal of maize genomic research is to identify sequence polymorphisms responsible for phenotypic variation in traits of economic importance. Large-scale detection of sequence variation is critical for linking genes, or genomic regions, to phenotypes. However, due to its size and complexity, it remains expensive to generate whole genome sequences of sufficient coverage for divergent maize lines, even with access to next generation sequencing (NGS) technology. Because methods involving reduction of genome complexity, such as genotyping-by-sequencing (GBS), assess only a limited fraction of sequence variation, targeted sequencing of selected genomic loci offers an attractive alternative. We therefore designed a sequence capture assay to target 29 Mb genomic regions and surveyed a total of 4,648 genes possibly affecting biomass production in 21 diverse inbred maize lines (7 flints, 14 dents). Captured and enriched genomic DNA was sequenced using the 454 NGS platform to 19.6-fold average depth coverage, and a broad evaluation of read alignment and variant calling methods was performed to select optimal procedures for variant discovery. Sequence alignment with the B73 reference and de novo assembly identified 383,145 putative single nucleotide polymorphisms (SNPs), of which 42,685 were non-synonymous alterations and 7,139 caused frameshifts. Presence/absence variation (PAV) of genes was also detected. We found that substantial sequence variation exists among genomic regions targeted in this study, which was particularly evident within coding regions. This diversification has the potential to broaden functional diversity and generate phenotypic variation that may lead to new adaptations and the modification of important agronomic traits. Further, annotated SNPs identified here will serve as useful genetic tools and as candidates in searches for phenotype-altering DNA variation. In summary, we demonstrated that sequencing of captured DNA is a powerful approach for

  16. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    PubMed

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-06-20

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc.

  17. Sequences and expression of pyruvate dehydrogenase genes from Pseudomonas aeruginosa.

    PubMed Central

    Rae, J L; Cutfield, J F; Lamont, I L

    1997-01-01

    A mutant of Pseudomonas aeruginosa, OT2100, which appeared to be defective in the production of the fluorescent yellow-green siderophore pyoverdine had been isolated previously following transposon mutagenesis (T. R. Merriman and I. L. Lamont, Gene 126:17-23, 1993). DNA from either side of the transposon insertion site was cloned, and the sequence was determined. The mutated gene had strong identity with the dihydrolipoamide acetyltransferase (E2) components of pyruvate dehydrogenase (PDH) from other bacterial species. Enzyme assays revealed that the mutant was defective in the E2 subunit of PDH, preventing assembly of a functional complex. PDH activity in OT2100 cell extracts was restored when extract from an E1 mutant was added. On the basis of this evidence, OT2100 was identified as an aceB or E2 mutant. A second gene, aceA, which is likely to encode the E1 component of PDH, was identified upstream from aceB. Transcriptional analysis revealed that aceA and aceB are expressed as a 5-kb polycistronic transcript from a promoter upstream of aceA. An intergenic region of 146 bp was located between aceA and aceB, and a 2-kb aceB transcript that originated from a promoter in the intergenic region was identified. DNA fragments upstream of aceA and aceB were shown to have promoter activities in P. aeruginosa, although only the aceA promoter was active in Escherichia coli. It is likely that the apparent pyoverdine-deficient phenotype of mutant OT2100 is a consequence of acidification of the growth medium due to accumulation of pyruvic acid in the absence of functional PDH. PMID:9171401

  18. Mutational analysis of the gene start sequences of pneumonia virus of mice.

    PubMed

    Dibben, Oliver; Easton, Andrew J

    2007-12-01

    The transcriptional start sequence of pneumonia virus of mice is more variable than that of the other pneumoviruses, with five different nine-base gene start (GS) sequences found in the PVM genome. The sequence requirements of the PVM gene start signal, and the efficiency of transcriptional initiation of the different virus genes, was investigated using a reverse genetics approach with a minigenome construct containing two reporter genes. A series of GS mutants were created, where each of the nine bases of the gene start consensus sequence of a reporter gene was changed to every other possible base, and the resulting effect on initiation of transcription was assayed. Nucleotide positions 1, 2 and 7 were found to be most sensitive to mutation whilst positions 4, 5 and 9 were relatively insensitive. The L gene GS sequence was found to have only 20% of the activity of the consensus sequence whilst the published M2 gene start sequence was found to be non-functional. A minigenome construct in which the two reporter genes were separated by the F-M2 gene junction of PVM was used to confirm the presence of two alternative, functional, GS sequences that could both drive the transcription of the PVM M2 gene.

  19. Isolation of Hox Cluster Genes from Insects Reveals an Accelerated Sequence Evolution Rate

    PubMed Central

    Hadrys, Heike; Simon, Sabrina; Kaune, Barbara; Schmitt, Oliver; Schöner, Anja; Jakob, Wolfgang; Schierwater, Bernd

    2012-01-01

    Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda) that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera). We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx) from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution. PMID:22685537

  20. Syndromic ciliopathies: From single gene to multi gene analysis by SNP arrays and next generation sequencing.

    PubMed

    Knopp, C; Rudnik-Schöneborn, S; Eggermann, T; Bergmann, C; Begemann, M; Schoner, K; Zerres, K; Ortiz Brüchle, N

    2015-10-01

    Joubert syndrome (JS) and related disorders (JSRD), Meckel syndrome (MKS) and Bardet-Biedl syndrome (BBS) are autosomal recessive ciliopathies with a broad clinical and genetic overlap. In our multiethnic cohort of 88 MKS, 61 JS/JSRD and 66 BBS families we performed genetic analyses and were able to determine mutation frequencies and detection rates for the most frequently mutated MKS genes. On the basis of determined mutation frequencies, a next generation gene panel for JS/JSRD and MKS was established. Furthermore 35 patients from 26 unrelated consanguineous families were investigated by SNP array-based homozygosity mapping and subsequent DNA sequencing of known candidate genes according to runs of homozygosity size in descending order. This led to the identification of the causative homozygous mutation in 62% of unrelated index cases. Based on our data we discuss various strategies for diagnostic mutation detection in the syndromic ciliopathies JS/JSRD, MKS and BBS.

  1. A next-generation sequencing gene panel (MiamiOtoGenes) for comprehensive analysis of deafness genes.

    PubMed

    Tekin, Demet; Yan, Denise; Bademci, Guney; Feng, Yong; Guo, Shengru; Foster, Joseph; Blanton, Susan; Tekin, Mustafa; Liu, Xuezhong

    2016-03-01

    Extreme genetic heterogeneity along with remarkable variation in the distribution of causative variants across in different ethnicities makes single gene testing inefficient for hearing loss. We developed a custom capture/next-generation sequencing gene panel of 146 known deafness genes with a total target size of approximately 1 MB. The genes were identified by searching databases including Hereditary Hearing Loss Homepage, the Human Genome Mutation Database (HGMD), Online Mendelian Inheritance in Man (OMIM) and most recent peer-reviewed publications related to the genetics of deafness. The design covered all coding exons, UTRs and 25 bases of intronic flanking sequences for each exon. To validate our panel, we used 6 positive controls with variants in known deafness genes and 8 unsolved samples from individuals with hearing loss. Mean coverage of the targeted exons was 697X. On average, each sample had 99.8%, 96.2% and 92.7% of the targeted region coverage of 1X, 50X and 100X reads, respectively. Analysis detected all known variants in nuclear genes. These results prove the accuracy and reliability of the custom capture experiment.

  2. Cloning and sequencing of the alcohol dehydrogenase II gene from Zymomonas mobilis

    DOEpatents

    Ingram, Lonnie O.; Conway, Tyrrell

    1992-01-01

    The alcohol dehydrogenase II gene from Zymomonas mobilis has been cloned and sequenced. This gene can be expressed at high levels in other organisms to produce acetaldehyde or to convert acetaldehyde to ethanol.

  3. 5' sequences are important positive and negative determinants of the longevity of Chlamydomonas chloroplast gene transcripts.

    PubMed Central

    Salvador, M L; Klein, U; Bogorad, L

    1993-01-01

    We have found that sequences in the 5' leader of the Chlamydomonas chloroplast rbcL gene, when fused 5' to foreign genes, destabilize transcripts of these chimeric genes in the chloroplast of transgenic Chlamydomonas but that 5' sequences of the rbcL structural gene prevent this destabilization. Transcripts of the chloroplast rbcL gene are about equally abundant at all times in Chlamydomonas reinhardtii growing on an alternating 12-h light/12-h dark cycle. However, Chlamydomonas chloroplast transformants, harboring chimeric genes containing the same rbcL promoter with 63 or 92 bp of the rbcL 5' leader sequence fused upstream of the Escherichia coli uidA (beta-glucuronidase, GUS) gene, accumulated GUS transcripts only in the dark. Transcripts disappeared rapidly upon illumination of the cells. The same phenomenon was exhibited by transcripts of chimeric genes in which the GUS gene coding sequence was replaced by other unrelated genes. The precipitous light-induced drop in GUS transcript abundance was found to be due to an approximately 16-fold increase in the rate of degradation of GUS transcripts in light rather than to a decrease in the rate of transcription of the GUS gene. Transcripts of a chimeric rbcL-GUS construct in which the leader sequence of the rbcL gene was replaced by 103 bp of the leader sequence of the atpB gene were stable in illuminated cells. The destabilizing effect of the rbcL 5' leader sequence was reversed by adding 257 bp of the 5' coding region of the rbcL gene. The results show that chloroplast transcript levels in illuminated Chlamydomonas cells--and perhaps in other cases--can be determined, at least to some extent, by sequences and interactions of sequences transcribed from the 5' ends of genes. Images PMID:8434017

  4. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications

    PubMed Central

    Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn; Knight, Rob; Cole, James R; Amaral-Zettler, Linda; Gilbert, Jack A; Karsch-Mizrachi, Ilene; Johnston, Anjanette; Cochrane, Guy; Vaughan, Robert; Hunter, Christopher; Park, Joonhong; Morrison, Norman; Rocca-Serra, Philippe; Sterk, Peter; Arumugam, Manimozhiyan; Bailey, Mark; Baumgartner, Laura; Birren, Bruce W; Blaser, Martin J; Bonazzi, Vivien; Booth, Tim; Bork, Peer; Bushman, Frederic D; Buttigieg, Pier Luigi; Chain, Patrick S G; Charlson, Emily; Costello, Elizabeth K; Huot-Creasy, Heather; Dawyndt, Peter; DeSantis, Todd; Fierer, Noah; Fuhrman, Jed A; Gallery, Rachel E; Gevers, Dirk; Gibbs, Richard A; Gil, Inigo San; Gonzalez, Antonio; Gordon, Jeffrey I; Guralnick, Robert; Hankeln, Wolfgang; Highlander, Sarah; Hugenholtz, Philip; Jansson, Janet; Kau, Andrew L; Kelley, Scott T; Kennedy, Jerry; Knights, Dan; Koren, Omry; Kuczynski, Justin; Kyrpides, Nikos; Larsen, Robert; Lauber, Christian L; Legg, Teresa; Ley, Ruth E; Lozupone, Catherine A; Ludwig, Wolfgang; Lyons, Donna; Maguire, Eamonn; Methé, Barbara A; Meyer, Folker; Muegge, Brian; Nakielny, Sara; Nelson, Karen E; Nemergut, Diana; Neufeld, Josh D; Newbold, Lindsay K; Oliver, Anna E; Pace, Norman R; Palanisamy, Giriprakash; Peplies, Jörg; Petrosino, Joseph; Proctor, Lita; Pruesse, Elmar; Quast, Christian; Raes, Jeroen; Ratnasingham, Sujeevan; Ravel, Jacques; Relman, David A; Assunta-Sansone, Susanna; Schloss, Patrick D; Schriml, Lynn; Sinha, Rohini; Smith, Michelle I; Sodergren, Erica; Spor, Aymé; Stombaugh, Jesse; Tiedje, James M; Ward, Doyle V; Weinstock, George M; Wendel, Doug; White, Owen; Whiteley, Andrew; Wilke, Andreas; Wortman, Jennifer R; Yatsunenko, Tanya; Glöckner, Frank Oliver

    2012-01-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences—the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The ‘environmental packages’ apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere. PMID:21552244

  5. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  6. Sequence of the indoleglycerol phosphate synthase (trpC) gene from Rhodobacter capsulatus.

    PubMed Central

    Becker-Rudzik, M; Young, D A; Marrs, B L

    1992-01-01

    We have isolated, cloned, and sequenced the indoleglycerol phosphate synthase gene (trpC) from Rhodobacter capsulatus. Normalized alignment scores comparing the trpC gene of R. capsulatus with the trpC genes of other bacterial species are reported. An unexpected degree of similarity to the trpC gene of Bacillus subtilis was found. PMID:1644778

  7. Novel primers for complete mitochondrial cytochrome b gene sequencing in mammals.

    PubMed

    Naidu, Ashwin; Fitak, Robert R; Munguia-Vega, Adrian; Culver, Melanie

    2012-03-01

    Sequence-based species identification relies on the extent and integrity of sequence data available in online databases such as GenBank. When identifying species from a sample of unknown origin, partial DNA sequences obtained from the sample are aligned against existing sequences in databases. When the sequence from the matching species is not present in the database, high-scoring alignments with closely related sequences might produce unreliable results on species identity. For species identification in mammals, the cytochrome b (cyt b) gene has been identified to be highly informative; thus, large amounts of reference sequence data from the cyt b gene are much needed. To enhance availability of cyt b gene sequence data on a large number of mammalian species in GenBank and other such publicly accessible online databases, we identified a primer pair for complete cyt b gene sequencing in mammals. Using this primer pair, we successfully PCR amplified and sequenced the complete cyt b gene from 40 of 44 mammalian species representing 10 orders of mammals. We submitted 40 complete, correctly annotated, cyt b protein coding sequences to GenBank. To our knowledge, this is the first single primer pair to amplify the complete cyt b gene in a broad range of mammalian species. This primer pair can be used for the addition of new cyt b gene sequences and to enhance data available on species represented in GenBank. The availability of novel and complete gene sequences as high-quality reference data can improve the reliability of sequence-based species identification.

  8. Virulence genes in clinical and environmental Stenotrophomas maltophilia isolates: a genome sequencing and gene expression approach.

    PubMed

    Adamek, Martina; Linke, Burkhard; Schwartz, Thomas

    2014-01-01

    The rate of nosocomial infections with the opportunistic pathogen Stenotrophomonas maltophilia has remarkably increased in the last decade. To determine S. maltophilia virulence genes, the complete genome sequences of two S. maltophilia isolates were compared. The clinical strain SKK35 was proved virulent in an amoeba host-pathogen model, and wastewater strain RA8 was determined as non-virulent in the amoeba model. The genome sequences of three additional S. maltophilia strains, K279a (clinical, non-virulent against amoeba), R511-3 and SKA14 (both environmental, non-virulent against amoeba) were taken into account as reference strains. We were able to show that all clinical and environmental S. maltophilia strains presented comparable distribution of so far identified potential virulence genes, regardless to their virulence potential against amoebae. Aside from that, strain SKK35 was found harboring a putative, strain specific pathogenicity island, encoding two proteins from the RTX (repeats-in-toxin) family. The actual expression of the RTX genes was verified in growth experiments in different culture media containing blood or blood components and in co-cultures with amoeba.

  9. The lux genes in Photobacterium leiognathi are closely linked with genes corresponding in sequence to riboflavin synthesis genes.

    PubMed

    Lee, C Y; Meighen, E A

    1992-07-31

    Three open reading frames (ORFs) have been found in the region downstream of the luxG gene in the Photobacterium leiognathi lux operon. These genes (ORF I, II, and III) are not only closely linked to the lux operon and transcribed in the same direction but also show the same organization and code for proteins homologous in sequence to the gene products of ribB, ribA, and ribH of Bacillus subtilis, respectively. The Photobacterium leiognathi gene (ORF II) corresponding to ribA was expressed in Escherichia coli in the bacteriophage T7 promoter-RNA polymerase system and a 40 kDa 35S-labeled polypeptide has been detected on SDS-PAGE. Expression of DNA extending from luxBEG to ORF II inserted between a strong promoter and a reporter gene and transferred by conjugation into Vibrio harveyi did not affect the expression of the reporter gene. The results provide evidence that neither promoter nor terminator sites were present in the DNA between the luxG and ORF II indicating that these genes might be part of the lux operon.

  10. Cloning, sequencing and characterization of lipase genes from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...

  11. Cloning and nucleotide sequence of the aroA gene of Bordetella pertussis.

    PubMed Central

    Maskell, D J; Morrissey, P; Dougan, G

    1988-01-01

    The aroA locus of Bordetella pertussis, encoding 5-enolpyruvylshikimate 3-phosphate synthase, has been cloned into Escherichia coli by using a cosmid vector. The gene is expressed in E. coli and complemented an E. coli aroA mutant. The nucleotide sequence of the B. pertussis aroA gene was determined and contains an open reading frame encoding 442 amino acids, with a calculated molecular weight for 5-enolpyruvylshikimate 3-phosphate synthase of 46,688. The amino acid sequence derived from the nucleotide sequence shows homology with the published amino acid sequences of aroA gene products of other microorganisms. PMID:2897356

  12. The chromosomal arrangement of human alpha-like globin genes: sequence homology and alpha-globin gene deletions.

    PubMed

    Lauer, J; Shen, C K; Maniatis, T

    1980-05-01

    We report the isolation of a cluster of four alpha-like globin genes from a bacteriophage lambda library of human DNA (Lawn et al., 1978). Analysis of the cloned DNA confirms the linkage arrangement of the two adult alpha-globin genes (alpha 1 and alpha 2) previously derived from genomic blotting experiments (Orkin, 1978) and identifies two additional closely linked alpha-like genes. The nucleotide sequence of a portion of each of these alpha-like genes was determined. One of these sequences is tentatively identified as an embryonic zeta-globin gene (zeta 1) by comparison with structural data derived from purified zeta-globin protein (J. Clegg, personal communication), while the other sequence cannot be matched with any known alpha-like polypeptide sequence (we designate this sequence phi alpha 1). Localization of the four alpha-like sequences on a restriction map of the gene cluster indicates that the genes have the same transcriptional orientation and are arranged in the order 5'-zeta 1-phi alpha 1-alpha 2-alpha 1-3'. Genomic blotting experiments identified a second, nonallelic zeta-like globin gene (phi 2) located 10-12 kb 5' to the cloned zeta-globin gene. Comparison of the locations of restriction sites within alpha 1 and alpha 2 and heteroduplex studies reveal extensive sequence homology within and flanking the two genes. The homologous sequences, which are interrupted by two blocks of nonhomology, span a region of approximately 4 kb. This extensive sequence homology between two genes which are thought to be the products of an ancient duplication event suggests the existence of a mechanism for sequence matching during evolution. One consequence of this arrangement of homologous sequences is the occurrence of two types of deletions in recombinant phage DNA during propagation in E. coli. The locations and sizes of the two types of deletions are indistinguishable from those of the two types of deletions associated with alpha-thalassemia 2 (Embury et al., 1979

  13. Sequence analysis of a rainbow trout cDNA library and creation of a gene index.

    PubMed

    Rexroad, C E; Lee, Y; Keele, J W; Karamycheva, S; Brown, G; Koop, B; Gahr, S A; Palti, Y; Quackenbush, J

    2003-01-01

    Expressed sequence tag (EST) projects have produced extremely valuable resources for identifying genes affecting phenotypes of interest. A large-scale EST sequencing project for rainbow trout was initiated to identify and functionally annotate as many unique transcripts as possible. Over 45,000 5' ESTs were obtained by sequencing clones from a single normalized library constructed using mRNA from six tissues. The production of this sequence data and creation of a rainbow trout Gene Index eliminating redundancy and providing annotation for these sequences will facilitate research in this species.

  14. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  15. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  16. Sequence Analysis of Toxin Gene-Bearing Corynebacterium diphtheriae Strains, Australia.

    PubMed

    Doyle, Christine J; Mazins, Adam; Graham, Rikki M A; Fang, Ning-Xia; Smith, Helen V; Jennison, Amy V

    2017-01-01

    By conducting a molecular characterization of Corynebacterium diphtheriae strains in Australia, we identified novel sequences, nonfunctional toxin genes, and 5 recent cases of toxigenic cutaneous diphtheria. These findings highlight the importance of extrapharyngeal infections for toxin gene-bearing (functional or not) and non-toxin gene-bearing C. diphtheriae strains. Continued surveillance is recommended.

  17. Flagellar apparatus gene sequences of Aeromonas hydrophila AL09-73 isolate

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Flagellar apparatus genes of recent outbreak Aeromonas hydrophila AL09-73 isolate were sequenced and characterized. Total 28 flagellar genes were identified. The sizes of the genes range from 318 to 2001 nucleotides, which potentially encode different complex flagellar proteins. At nucleotide and...

  18. Compilation of 5S rRNA and 5S rRNA gene sequences

    PubMed Central

    Specht, Thomas; Wolters, Jörn; Erdmann, Volker A.

    1990-01-01

    The BERLIN RNA DATABANK as of Dezember 31, 1989, contains a total of 667 sequences of 5S rRNAs or their genes, which is an increase of 114 new sequence entries over the last compilation (1). It covers sequences from 44 archaebacteria, 267 eubacteria, 20 plastids, 6 mitochondria, 319 eukaryotes and 11 eukaryotic pseudogenes. The hardcopy shows only the list (Table 1) of those organisms whose sequences have been determined. The BERLIN RNA DATABANK uses the format of the EMBL Nucleotide Sequence Data Library complemented by a Sequence Alignment (SA) field including secondary structure information. PMID:1692116

  19. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  20. Nucleotide sequences of the cylindrical inclusion protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N; Yaegashi, H

    1999-02-01

    The nucleotide sequences of the cylindrical inclusion protein (CIP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CIP genes of both isolates comprised 1902 nucleotides and encoded 634 amino acids containing consensus nucleotide binding motif. The sequence similarities between the two isolates at the nucleotide and amino acid levels were 91% and 98%, respectively. When the CIP gene sequences of the Japanese ZYMV isolates were compared with those of previously reported ZYMV isolates, the nucleotide and amino acid sequence similarities ranged between 81% and 97%, and between 95% and 97%, respectively. Phylogenetic analysis of the deduced amino acid sequences of the CIP genes indicated that the Japanese ZYMV isolates were closely related to those of other ZYMV isolates.

  1. Molecular cloning and sequencing of the gene encoding the fimbrial subunit protein of Bacteroides gingivalis.

    PubMed Central

    Dickinson, D P; Kubiniec, M A; Yoshimura, F; Genco, R J

    1988-01-01

    The gene encoding the fimbrial subunit protein of Bacteroides gingivalis 381, fimbrilin, has been cloned and sequenced. The gene was present as a single copy on the bacterial chromosome, and the codon usage in the gene conformed closely to that expected for an abundant protein. The predicted size of the mature protein was 35,924 daltons, and the secretory form may have had a 10-amino-acid, hydrophilic leader sequence similar to the leader sequences of the MePhe fimbriae family. The protein sequence had no marked similarity to known fimbrial sequences, and no homologous sequences could be found in other black-pigmented Bacteroides species, suggesting that fimbrillin represents a class of fimbrial subunit protein of limited distribution. Images PMID:2895100

  2. Multilocus Sequence Analysis of Housekeeping Genes and Antigenic Determinant Genes in Bordetella pertussis Strains Isolated in Korea

    PubMed Central

    Jung, Sang-Oun; Moon, Yu Mi; Kim, So-Hyeon; Sung, Hwa Young; Kwon, Seung-Jik; Kang, Yeon Ho; Yu, Jae Yon

    2011-01-01

    Objectives To confirm genotype diversities of clinical isolates of Bordetella pertussis and to evaluate the risk of pertussis outbreak in Korea. Methods Seven housekeeping genes and 10 antigenic determinant genes from clinical B. pertussis isolates were analyzed by Multilocus sequence typing (MLST). Results More variant pattern was observed in antigenic determinant genes. Especially, PtxS1 gene was the most variant gene; five genotypes were observed from eight global genotypes. In the bacterial type, the number of observed sequence types in the isolates was seven and the most frequent form was type 1 (79.6%). This major sequence type also showed a time-dependent transition pattern. Older isolates (1968 and 1975) showed type 1 and 6 in housekeeping genes and antigenic determinant genes, respectively. However, these were changed to type 2 and 1 in isolates 1999–2008. This transition was mainly attributed to genotype change of PtxS1 and Fim3 gene; the tendency of genotype change was to avoid vaccine-derived genotype. In addition, there was second transition in 2009. In this period, only the sequence type of antigenic determinant genes was changed to type 2. Based Upon Related Sequence Types (BURST) analysis confirmed that there were two clonal complexes (ACCI and ACCII) in the Korean isolates. Moreover, the recently increased sequence type was revealed as AST2 derived from AST 3 in ACCI. Conclusions Genotype changes in Korean distributing strains are still progressing and there was a specific driving force in antigenic determinant genes. Therefore continuous surveillance of genotype change of the distributing strains should be performed to confirm interrelationship of genotype change with vaccine immunity. PMID:24159461

  3. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  4. Identification of sequences regulating the transcription of a Dictyostelium gene selectively expressed in prespore cells.

    PubMed Central

    Early, A E; Williams, J G

    1989-01-01

    There has been considerable debate about the relative contributions of transcriptional and post-transcriptional mechanisms to the regulation of prespore gene expression in Dictyostelium. We have determined the DNA sequence upstream of D19, the Dictyostelium gene encoding PsA, a prespore-specific, cell surface protein of unknown function. Our analysis of gene fusions, in which D19 upstream sequences are placed adjacent to a heterologous reporter gene, indicates that transcriptional signals alone are sufficient for the correct temporal and cell-type specific expression of this gene. We also show that the 5' and 3' boundaries of the minimal sequences necessary for correct developmental regulation lie within the region 338 to 122 nucleotides upstream of the start site of transcription but that flanking sequences seem to be necessary for optimal expression. Images PMID:2550894

  5. Analysis of the regions flanking the human insulin gene and sequence of an Alu family member.

    PubMed Central

    Bell, G I; Pictet, R; Rutter, W J

    1980-01-01

    The regions around the human insulin gene have been studied by heteroduplex, hybridization and sequence analysis. These studies indicated that there is a region of heterogeneous length located approximately 700 bp before the 5' end of the gene; and that the 19 kb of cloned DNA which includes the 1430 bp insulin gene as well as 5650 bp before and 11,500 bp after the gene is single copy sequence except for 500 bp located 6000 bp from the 3' end of the gene. This 500 bp segment contains a member of the Alu family of dispersed middle repetitive sequences as well as another less highly repeated homopolymeric segment. The sequence of this region was determined. This Alu repeat is bordered by 19 bp direct repeats and also contains an 83 bp sequence which is present twice. The regions flanking the human and rat I insulin genes were compared by heteroduplex analysis to localize homologous sequences in the flanking regions which could be involved in the regulation of insulin biosynthesis. The homology between the two genes is restricted to the region encoding preproinsulin and a short region of approximately 60 bp flanking the 5' side of the genes. Images PMID:6253909

  6. IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing.

    PubMed

    Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai

    2016-11-28

    Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only.

  7. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.

  8. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  9. Genomic sequence and organization of two members of a human lectin gene family

    SciTech Connect

    Gitt, M.A.; Barondes, S.H. )

    1991-01-01

    The authors have isolated and sequenced the genomic DNA encoding a human dimeric soluble lactose-binding lectin. The gene has four exons, and its upstream region contains sequences that suggest control by glucocorticoids, heat (environmental) shock, metals, and other factors. They have also isolated and sequenced three exons of the gene encoding another human putative lectin, the existence of which was first indicated by isolation of its cDNA. Comparisons suggest a general pattern of genomic organization of members of this lectin gene family.

  10. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  11. Genome-wide analysis reveals diverged patterns of codon bias, gene expression, and rates of sequence evolution in picea gene families.

    PubMed

    De La Torre, Amanda R; Lin, Yao-Cheng; Van de Peer, Yves; Ingvarsson, Pär K

    2015-03-05

    The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms.

  12. Utility of rpoB Gene Sequencing for Identification of Nontuberculous Mycobacteria in the Netherlands

    PubMed Central

    de Zwaan, Rina; van Ingen, Jakko

    2014-01-01

    In the Netherlands, clinical isolation of nontuberculous mycobacteria (NTM) has increased over the past decade. Proper identification of isolates is important, as NTM species differ strongly in clinical relevance. Most of the currently applied identification methods cannot distinguish between all different Mycobacterium species and complexes within species. rpoB gene sequencing exhibits a promising level of discrimination among rapidly and slowly growing mycobacteria, including the Mycobacterium avium complex. In this study, we prospectively compared rpoB gene sequencing with our routine algorithm of reverse line blot identification combined with partial 16S rRNA gene sequencing of 455 NTM isolates. rpoB gene sequencing identified 403 isolates to species level as 45 different known species and identified 44 isolates to complex level, and eight isolates remained unidentifiable to species level. In contrast, our reference reverse line blot assay with adjunctive 16S rRNA gene sequencing identified 390 isolates to species level (30 distinct species) and identified 56 isolates to complex level, and nine isolates remained unidentified. The higher discriminatory power of rpoB gene sequencing results largely from the distinction of separate species within complexes and subspecies. Also, Mycobacterium gordonae, Mycobacterium kansasii, and Mycobacterium interjectum were separated into multiple groupings with relatively low sequence similarity (98 to 94%), suggesting that these are complexes of closely related species. We conclude that rpoB gene sequencing is a more discriminative identification technique than the combination of reverse line blot and 16S rRNA gene sequencing and could introduce a major improvement in clinical care of NTM disease and the research on the epidemiology and clinical relevance of NTM. PMID:24808238

  13. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    SciTech Connect

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  14. Detection and sequence analysis of the DNA repair gene RAD51 in the Korean spider Callobius koreanus (Amaurobiidae).

    PubMed

    Kim, J Y; Park, Y C

    2015-11-19

    We identified a partial sequence (483 bp) of the RAD51 gene from the Korean spider Callobius koreanus. Sequence variation was found at one position during alignment with the human RAD51 gene sequence. This partial sequence included the region corresponding to exon 4 in the human RAD51 gene, which encodes 39 amino acids. These results show that the RAD51 gene is highly conserved between human and spiders.

  15. Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

    SciTech Connect

    Onda, M.; Kudo, S.; Fukuda, M. ); Rearden, A. ); Mattei, G.M. )

    1993-08-01

    Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification of this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.

  16. Prediction of human rotavirus serotype by nucleotide sequence analysis of the VP7 protein gene.

    PubMed Central

    Green, K Y; Sears, J F; Taniguchi, K; Midthun, K; Hoshino, Y; Gorziglia, M; Nishikawa, K; Urasawa, S; Kapikian, A Z; Chanock, R M

    1988-01-01

    Human rotavirus field isolates were characterized by direct sequence analysis of the gene encoding the serotype-specific major neutralization protein (VP7). Single-stranded RNA transcripts were prepared from virus particles obtained directly from stool specimens or after two or three passages in MA-104 cells. Two regions of the gene (nucleotides 307 through 351 and 670 through 711) which had previously been shown to contain regions of sequence divergence among rotavirus serotypes were sequenced by the dideoxynucleotide method with two different synthetic oligonucleotide primers. The resulting nucleotide sequences were compared with the corresponding sequences from rotaviruses of known serotype (serotype 1, 2, 3, or 4). A total of 25 field isolates and 10 laboratory strains examined by this method exhibited marked sequence identity in both areas of the gene with the corresponding regions of 1 of the 4 reference strains. In addition, the predicted serotype from the sequence analysis correlated in each case with the serotype determined when the rotaviruses were examined by plaque reduction neutralization or reactivity with serotype-specific monoclonal antibodies. These data suggest that as a result of the high degree of sequence conservation observed among rotaviruses of the same serotype, it is possible to predict the serotype of a rotavirus isolate by direct sequence analysis of its VP7 gene. PMID:2833626

  17. Evolution of the RH gene family in vertebrates revealed by brown hagfish (Eptatretus atami) genome sequences.

    PubMed

    Suzuki, Akinori; Komata, Hidero; Iwashita, Shogo; Seto, Shotaro; Ikeya, Hironobu; Tabata, Mitsutoshi; Kitano, Takashi

    2017-02-01

    In vertebrates, there are four major genes in the RH (Rhesus) gene family, RH, RHAG, RHBG, and RHCG. These genes are thought to have been formed by the two rounds of whole-genome duplication (2R-WGD) in the common ancestor of all vertebrates. In our previous work, where we analyzed details of the gene duplications process of this gene family, three nucleotide sequences belonging to this family were identified in Far Eastern brook lamprey (Lethenteron reissneri), and the phylogenetic positions of the genes were determined. Lampreys, along with hagfishes, are cyclostomata (jawless fishes), which is a sister group of gnathostomata (jawed vertebrates). Although those results suggested that one gene was orthologous to the gnathostome RHCG genes, we did not identify clear orthologues for other genes. In this study, therefore, we identified three novel cDNA sequences that belong to the RH gene family using de novo transcriptome analysis of another cyclostome: the brown hagfish (Eptatretus atami). We also determined the nucleotide sequences for the RHBG and RHCG genes in a red stingray (Dasyatis akajei), which belongs to the cartilaginous fishes. The phylogenetic tree showed that two brown hagfish genes, which were probably duplicated in the cyclostome lineage, formed a cluster with the gnathostome RHAG genes, whereas another brown hagfish gene formed a cluster with the gnathostome RHCG genes. We estimated that the RH genes had a higher evolutionary rate than the RHAG, RHBG, and RHCG genes. Interestingly, in the RHBG genes, only the bird lineage showed a higher rate of nonsynonymous substitutions. It is likely that this higher rate was caused by a state of relaxed functional constraints rather than positive selection nor by pseudogenization.

  18. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  19. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library

    PubMed Central

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Aim Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. Methods After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Results Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. Significance This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes

  20. A model organism for new gene discovery by cDNA sequencing

    SciTech Connect

    El-Saved, N.M.; Donelson, J.E.; Alarcon, C.M.

    1994-09-01

    One method of new gene discovery is single pass sequencing of cDNAs to identify expressed sequence tags (ESTs). Model organisms can have biological properties which makes their use advantageous over studies with humans. One such model organism with advantages for cDNA sequencing is the African trypanosome T. brucei rhodesiense. This organism has the same 40 nucleotide sequence (splice leader sequence) on the 5{prime} end of all mRNAs. We have constructed a 5{prime} cDNA library by priming off the splice leader sequence and have begun sequencing this cDNA library. To date, over nearly 500 such cDNA expressed sequence tags (ESTs) have been examined. Forty-three percent of the sequences sampled from the trypanosome cDNA library have significant similarities to sequences already in the protein and translated nucleic acid databases. Among these are cDNA sequences which encode previously reported T. brucej proteins such as the VSG, tubulin, calflagin, etc., and proteins previously identified in other trypanosomatids. Other cDNAs display significant similarities to genes in unrelated organisms encoding several ribosomal proteins, metabolic enzymes, GTP binding proteins, transcription factors, cyclophillin, nucleosomal histones, histone H1, and a macrophage stress protein, among others. The 57% of the cDNAs that are not similar to sequences currently in the databases likely encode both trypanosome-specific proteins and housekeeping proteins shared with other eukaryotes. These cDNA ESTs provide new avenues of research for exploring both the biochemistry and the genome organization of this parasite, as well as a resource for identifying the 5{prime} sequence of novel genes likely to have homology to genes expressed in other organisms.

  1. Cloning and sequencing of a Candida albicans catalase gene and effects of disruption of this gene.

    PubMed

    Wysong, D R; Christin, L; Sugar, A M; Robbins, P W; Diamond, R D

    1998-05-01

    Catalase plays a key role as an antioxidant, protecting aerobic organisms from the toxic effects of hydrogen peroxide, and in some cases has been postulated to be a virulence factor. To help elucidate the function of catalase in Candida albicans, a single C. albicans-derived catalase gene, designated CAT1, was isolated and cloned. Degenerate PCR primers based on highly conserved areas of other fungal catalase genes were used to amplify a 411-bp product from genomic DNA of C. albicans ATCC 10261. By using this product as a probe, catalase clones were isolated from genomic libraries of C. albicans. Nucleotide sequence analysis revealed an open reading frame encoding a protein of 487 amino acid residues. Construction of a CAT1-deficient mutant was achieved by using the Ura-blaster technique for sequential disruption of multiple alleles by integrative transformation using URA3 as a selectable marker. Resulting mutants exhibited normal morphology and comparable growth rates of both yeast and mycelial forms. Enzymatic analysis revealed an abundance of catalase in the wild-type strain but decreasing catalase activity in heterozygous mutants and no detectable catalase in a homozygous null mutant. In vitro assays showed the mutant strains to be more sensitive to damage by both neutrophils and concentrations of exogenous peroxide that were sublethal for the parental strain. Compared to the parental strain, the homozygous null mutant strain was far less virulent for mice in an intravenous infection model of disseminated candidiasis. Definitive linkage of CAT1 with virulence would require restoration of activity by reintroduction of the gene into mutants. However, initial results in mice, taken together with the enhanced susceptibility of catalase-deficient hyphae to damage by human neutrophils, suggest that catalase may enhance the pathogenicity of C. albicans.

  2. Transcriptome sequencing of Hydrangea macrophylla to uncover genes related to reblooming and powdery mildew resistance

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Massively parallel pyrosequencing technology has been used extensively on agronomic crops and model plants. Transcriptome sequencing is a useful first step in functional genomic studies, microarray and gene expression studies, single nucleotide polymorphism (SNP) surveys, quantitative trait loci (QT...

  3. De novo transcriptome sequencing of axolotl blastema for identification of differentially expressed genes during limb regeneration

    PubMed Central

    2013-01-01

    Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514

  4. Complete nucleotide sequence of the 23S rRNA gene of the Cyanobacterium, Anacystis nidulans.

    PubMed Central

    Douglas, S E; Doolittle, W F

    1984-01-01

    The nucleotide sequence of the Anacystis nidulans 23S rRNA gene, including the 5'- and 3'-flanking regions has been determined. The gene is 2876 nucleotides long and shows higher primary sequence homology to the 23S rRNAs of plastids (84.5%) than to that of E. coli (79%). The predicted rRNA transcript also shares many secondary structural features with those of plastids, reinforcing the endosymbiont hypothesis for the origin of these organelles. PMID:6326060

  5. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.

  6. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  7. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites*

    PubMed Central

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-01-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi’an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi’an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%–99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites. PMID:23024043

  8. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites.

    PubMed

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-10-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi'an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi'an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%-99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites.

  9. [cDNA cloning and sequence analysis of pluripotency genes in tree shrews (Tupaia belangeri)].

    PubMed

    Wang, Cai-Yun; Ma, Yun-Han; He, Da-Jian; Yang, Shi-Hua

    2013-04-01

    In this paper, partial sequences of the tree shrew (Tupaia belangeri) Klf4, Sox2, and c-Myc genes were cloned and sequenced, which were 382, 612, and 485 bp in length and encoded 127, 204, and 161 amino acids, respectively. Whereas, their cDNA sequence identities with those of human were 89%, 98%, and 89%, respectively. Their phylogenetic tree results indicated different topologies and suggested individual evolutional pathways. These results can facilitate further functional studies.

  10. A 5.8S nuclear ribosomal RNA gene sequence database: applications to ecology and evolution

    NASA Technical Reports Server (NTRS)

    Cullings, K. W.; Vogler, D. R.

    1998-01-01

    We complied a 5.8S nuclear ribosomal gene sequence database for animals, plants, and fungi using both newly generated and GenBank sequences. We demonstrate the utility of this database as an internal check to determine whether the target organism and not a contaminant has been sequenced, as a diagnostic tool for ecologists and evolutionary biologists to determine the placement of asexual fungi within larger taxonomic groups, and as a tool to help identify fungi that form ectomycorrhizae.

  11. Human DNA polymerase alpha gene: sequences controlling expression in cycling and serum-stimulated cells.

    PubMed Central

    Pearson, B E; Nasheuer, H P; Wang, T S

    1991-01-01

    We have investigated the DNA polymerase alpha promoter sequence requirements for the expression of a heterologous gene in actively cycling cells and following serum addition to serum-deprived cells. An 11.4-kb genomic clone that spans the 5' end of this gene and includes 1.62 kb of sequence upstream from the translation start site was isolated. The transcription start site was mapped at 46 +/- 1 nucleotides upstream from the translation start site. The upstream sequence is GC rich and lacks a TATA sequence but has a CCAAT sequence on the opposite strand. Analysis of a set of deletion constructs in transient transfection assays demonstrated that efficient expression of the reporter in cycling cells requires 248 bp of sequence upstream from the cap site. Clustered within these 248 nucleotides are sequences similar to consensus sequences for Sp1-, Ap1-, Ap2-, and E2F-binding sites. The CCAAT sequence and the potential E2F- and Ap1-binding sites are shown to be protected from DNase I digestion by partially purified nuclear proteins. The DNA polymerase alpha promoter can confer upon the reporter an appropriate, late response to serum addition. No single sequence element could be shown to confer serum inducibility. Rather, multiple sequence elements appear to mediate the full serum response. Images PMID:2005899

  12. Gene tree discordance of wild and cultivated Asian rice deciphered by genome-wide sequence comparison.

    PubMed

    Yang, Ching-chia; Sakai, Hiroaki; Numa, Hisataka; Itoh, Takeshi

    2011-05-15

    Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.

  13. Sequence divergence and chromosomal rearrangements during the evolution of human pseudoautosomal genes and their mouse homologs

    SciTech Connect

    Ellison, J.; Li, X.; Francke, U.

    1994-09-01

    The pseudoautosomal region (PAR) is an area of sequence identity between the X and Y chromosomes and is important for mediating X-Y pairing during male meiosis. Of the seven genes assigned to the human PAR, none of the mouse homologs have been isolated by a cross-hybridization strategy. Two of these homologs, Csfgmra and II3ra, have been isolated using a functional assay for the gene products. These genes are quite different in sequence from their human homologs, showing only 60-70% sequence similarity. The Csfgmra gene has been found to further differ from its human homolog in being isolated not on the sex chromosomes, but on a mouse autosome (chromosome 19). Using a mouse-hamster somatic cell hybrid mapping panel, we have mapped the II3ra gene to yet another mouse autosome, chromosome 14. Attempts to clone the mouse homolog of the ANT3 locus resulted in the isolation of two related genes, Ant1 and Ant2, but failed to yield the Ant3 gene. Southern blot analysis of the ANT/Ant genes showed the Ant1 and Ant2 sequences to be well-conserved among all of a dozen mammals tested. In contrast, the ANT3 gene only showed hybridization to non-rodent mammals, suggesting it is either greatly divergent or has been deleted in the rodent lineage. Similar experiments with other human pseudoautosomal probes likewise showed a lack of hybridization to rodent sequences. The results show a definite trend of extensive divergence of pseudoautosomal sequences in addition to chromosomal rearrangements involving X;autosome translocations and perhaps gene deletions. Such observations have interesting implications regarding the evolution of this important region of the sex chromosomes.

  14. Cloning, sequencing, and disruption of the Bacillus subtilis sigma 28 gene.

    PubMed Central

    Helmann, J D; Márquez, L M; Chamberlin, M J

    1988-01-01

    Bacillus subtilis contains multiple forms of RNA polymerase holoenzyme, distinguished by the presence of different specificity determinants known as sigma factors. The sigma 28 factor was initially purified as a unique transcriptional activity in vegetatively growing B. subtilis cells. Purification of the sigma 28 protein has allowed tryptic peptides to be prepared and sequenced. The sequence of one tryptic peptide fragment was used to prepare an oligonucleotide probe specific for the sigma 28 structural gene, and the gene was isolated from a B. subtilis subgenomic library. The complete nucleotide sequence of the sigma 28 gene was determined, and the cloned sigma 28 gene was used to construct a mutant strain which does not express the sigma 28 protein. This strain also failed to synthesize flagellin protein and grew as long filaments. The predicted sigma 28 gene product is a 254-amino-acid polypeptide with a calculated molecular weight of 29,500. The sigma 28 protein sequence was similar to that of other sequenced sigma factors and to the flbB gene product of Escherichia coli. Since the flbB gene product is a positive regulator of flagellar synthesis in E. coli, it is likely that sigma 28 functions to regulate flagellar synthesis in B. subtilis. Images PMID:2832368

  15. [Recent progress in gene mapping through high-throughput sequencing technology and forward genetic approaches].

    PubMed

    Lu, Cairui; Zou, Changsong; Song, Guoli

    2015-08-01

    Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.

  16. Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

    SciTech Connect

    Fields, C.A.

    1996-06-01

    The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progress report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.

  17. The mouse collagen X gene: complete nucleotide sequence, exon structure and expression pattern.

    PubMed Central

    Elima, K; Eerola, I; Rosati, R; Metsäranta, M; Garofalo, S; Perälä, M; De Crombrugghe, B; Vuorio, E

    1993-01-01

    Overlapping genomic clones covering the 7.2 kb mouse alpha 1(X) collagen gene, 0.86 kb of promoter and 1.25 kb of 3'-flanking sequences were isolated from two genomic libraries and characterized by nucleotide sequencing. Typical features of the gene include a unique three-exon structure, similar to that in the chick gene, with the entire triple-helical domain of 463 amino acids coded by a single large exon. The highest degree of amino acid and nucleotide sequence conservation was seen in the coding region for the collagenous and C-terminal non-collagenous domains between the mouse and known chick, bovine and human collagen type X sequences. More divergence between the sequences occurred in the N-terminal non-collagenous domain. Similarity between the mammalian collagen X sequences extended into the 3'-untranslated sequence, particularly near the polyadenylation site. The promoter of the mouse collagen X gene was found to contain two TATAA boxes 159 bp apart; primer extension analyses of the transcription start site revealed that both were functional. The promoter has an unusual structure with a very low G + C content of 28% between positions -220 and -1 of the upstream transcription start site. Northern and in situ hybridization analyses confirmed that the expression of the alpha 1(X) collagen gene is restricted to hypertrophic chondrocytes in tissues undergoing endochondral calcification. The detailed sequence information of the gene is useful for studies on the promoter activity of the gene and for generation of transgenic mice. Images Figure 3 Figure 5 Figure 6 PMID:8424763

  18. RNA Sequencing Revealed Numerous Polyketide Synthase Genes in the Harmful Dinoflagellate Karenia mikimotoi

    PubMed Central

    Kimura, Kei; Okuda, Shujiro; Nakayama, Kei; Shikata, Tomoyuki; Takahashi, Fumio; Yamaguchi, Haruo; Skamoto, Setsuko; Yamaguchi, Mineo; Tomaru, Yuji

    2015-01-01

    The dinoflagellate Karenia mikimotoi forms blooms in the coastal waters of temperate regions and occasionally causes massive fish and invertebrate mortality. This study aimed to elucidate the toxic effect of K. mikimotoi on marine organisms by using the genomics approach; RNA-sequence libraries were constructed, and data were analyzed to identify toxin-related genes. Next-generation sequencing produced 153,406 transcript contigs from the axenic culture of K. mikimotoi. BLASTX analysis against all assembled contigs revealed that 208 contigs were polyketide synthase (PKS) sequences. Thus, K. mikimotoi was thought to have several genes encoding PKS metabolites and to likely produce toxin-like polyketide molecules. Of all the sequences, approximately 30 encoded eight PKS genes, which were remarkably similar to those of Karenia brevis. Our phylogenetic analyses showed that these genes belonged to a new group of PKS type-I genes. Phylogenetic and active domain analyses showed that the amino acid sequence of four among eight Karenia PKS genes was not similar to any of the reported PKS genes. These PKS genes might possibly be associated with the synthesis of polyketide toxins produced by Karenia species. Further, a homology search revealed 10 contigs that were similar to a toxin gene responsible for the synthesis of saxitoxin (sxtA) in the toxic dinoflagellate Alexandrium fundyense. These contigs encoded A1–A3 domains of sxtA genes. Thus, this study identified some transcripts in K. mikimotoi that might be associated with several putative toxin-related genes. The findings of this study might help understand the mechanism of toxicity of K. mikimotoi and other dinoflagellates. PMID:26561394

  19. Genomic structure and nucleotide sequence of the p55 gene of the puffer fish Fugu rubripes

    SciTech Connect

    Elgar, G.; Rattray, F.; Greystrong, J.; Brenner, S.

    1995-06-10

    The p55 gene, which codes for a 55-kDa erythrocyte membrane protein, has been cloned and sequenced from the genome of the Japanese puffer fish Fugu rubripes (Fugu). This organism has the smallest recorded vertebrate genome and therefore provides an efficient way to sequence genes at the genomic level. The gene encoding p55 covers 5.5 kb from the beginning to the end of the coding sequence, four to six times smaller than the estimated size of the human gene, and is encoded by 12 exons. The structure of this gene has not been previously elucidated, but from this and other data we would predict a similar or identical structure in mammals. The predicted amino acid sequence of this gene in Fugu, coding for a polypeptide of 467 amino acids, is very similar to that of the human gene with the exception of the first two exons, which differ considerably. The predicted Fugu protein has a molecular weight (52.6 kDa compared with 52.3 kDa) and an isoelectric point very similar to those of human p55. In human, the p55 gene lies in the gene-dense Xq28 region, just 30 kb 3{prime} to the Factor VIII gene, and is estimated to cover 20-30 kb. Its 5{prime} end is associated with a CpG island, although there is no evidence that this is the case in Fugu. The small size of genes in Fugu and the high coding homology that they share with their mammalian equivalents, both in structure and sequence, make this compact vertebrate genome an ideal model for genomic studies. 23 refs., 3 figs.

  20. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  1. Complete Genome Sequence of an Acinetobacter Strain Harboring the NDM-1 Gene.

    PubMed

    Sun, Yang; Song, Yang; Song, Hongbin; Liu, Jun; Wang, Pengzhi; Qiu, Shaofu; Chen, Shuo; Zhu, Lingwei; Ji, Xue; Wang, Zhongqiang; Liu, Nan; Xia, Liliang; Chen, Weijun; Feng, Shuzhang

    2013-04-18

    The NDM-1 gene is a significant public health concern. Acinetobacter is one of the most prevalent opportunistic pathogens causing recent nosocomial infections with NDM-1, and drug-resistant strains pose serious threats to public health worldwide. Herein, we present the genomic sequence of Acinetobacter calcoaceticus subsp. anitratus XM1570, a multidrug-resistant isolate that carries the blaNDM-1 gene.

  2. [Exome sequencing: an efficient strategy for identifying the causative genes of monogenic disorders].

    PubMed

    Rebiya, Nuli; Patamu, Mohemaiti

    2011-10-01

    The development of new generation sequencing technologies has brought new opportunities for the study of diseases. Exome sequencing has shown to be an effective, rapid, high performance technique that has already been used in research of inherited diseases such as monogenic disorders. It has already been approved by scientists in the field of monogenic disorder study, and will become widely used. This approach will accelerate discovery of the causative genes of Mendelian disorders. This article reviews some recent applications of exome sequencing in the study of gene-related diseases.

  3. Codon usage in mammalian genes is biased by sequence slippage mechanisms.

    PubMed

    Bains, W

    1993-01-01

    The codons for some conserved amino acids are found to be the same between homologous genes from different species when the statistics of codon usage would suggest that they should be different. I examine whether this 'coincidence' of codon usage could be due to genetic mechanisms homogenising the DNA around specific sites. This paper describes the further analysis of the coincident codons in 19 genes (a total of 96 homologues) for slippage. Coincident codons arise in contexts of increased sequence simplicity, and have a high chance of occurring within sequences similar to the recombination-prone minisatellite 'core' sequence. This suggests a role of genetic homogenisation in their generation.

  4. Nucleotide sequences of the coat protein genes of two Japanese zucchini yellow mosaic virus isolates.

    PubMed

    Kundu, A K; Ohshima, K; Sako, N

    1997-10-01

    The nucleotide (nt) sequences of the coat protein (CP) genes of two Japanese zucchini yellow mosaic virus (ZYMV) isolates (ZYMV-169 and ZYMV-M) were determined. The CP genes of both isolates were 837 nt long and encoded 279 amino acids (aa). The nt and deduced aa sequence similarities between the two isolates were 92% and 94.6%, respectively. The deduced aa sequences of CPs of the Japanese isolates were compared with those of previously reported ZYMV isolates by phylogenetic analysis. This comparison lead us to divide all ZMYV isolates into 3 groups in which ZYMV-169 formed its own distinct group.

  5. Tobacco nuclear gene for the 31 kd chloroplast ribonucleoprotein: genomic organization, sequence analysis and expression.

    PubMed Central

    Li, Y Q; Ye, L Z; Sugita, M; Sugiura, M

    1991-01-01

    We have previously identified three chloroplast ribonucleoproteins and characterized their cDNAs. Here we present the genomic organization, sequence and expression of one of their genes. The 31 kd ribonucleoprotein (cp31) from tobacco (Nicotiana sylvestris) chloroplasts is coded for by a single-copy nuclear gene. This gene was isolated and its sequence was determined. The gene contains four exons and three introns. The position of its first intron is conserved among the genes for the maize abscisic acid-induced glycine-rich protein, the human hnRNP A1 protein and cp31. The transcription start site was determined to be 168 bp upstream from the translational initiation codon in both leaf and root tissues. No alternatively spliced transcripts was detected, suggesting that a diversity of chloroplast ribonucleoproteins is generated probably by gene amplification rather than alternative splicing. Images PMID:2057356

  6. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    NASA Technical Reports Server (NTRS)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  7. Sequence and diversity of rabbit T-cell receptor gamma chain genes

    SciTech Connect

    Isono, T.; Kim, C.J.; Seto, A.

    1995-03-01

    The nucleotide sequences of one constant (C), six variable (V), and two joining (J) gene segments coding for the rabbit T-cell receptor gamma chain (Tcrg) were determined by directly sequencing fragments amplified by the cassette-ligation mediated polymerase chain reaction. The Tcrg-C gene segment did not encode a cysteine residue for connection to the Tcr delta chain in the connecting region, and two variant forms of the Tcrg-C gene segment were generated by alternative splicing, like the human Tcrg-C2 gene. Five of six rabbit Tcrg-V gene segments belonged to the same family and displayed similarity to five productive human Tcrg-V1 family genes as well as the mouse Tcrg-V5 gene. The remaining rabbit Tcrg-V gene segment displayed similarity to the human Tcrg-V3 gene. Both rabbit Tcrg-J gene segments displayed similarity to the human Tcrg-J2.1 and 2.3, respectively. These findings suggested that the genomic organization of rabbit Tcrg genes is more similar to that of human than of mouse Tcrg genes. 18 refs., 4 figs., 1 tab.

  8. Inverted duplication of histone genes in chicken and disposition of regulatory sequences.

    PubMed Central

    Wang, S W; Robins, A J; d'Andrea, R; Wells, J R

    1985-01-01

    Sequence analysis of an 8.4 kb fragment containing five chicken histone genes shows that an H4-H2A gene pair is duplicated and inverted around a central H3 gene. A left and right region, each of 2.1 kb are 97% homologous and the boundaries of homology coincide with ten base pair repeats. These boundary regions also contain highly conserved gene promoter elements, suggesting that interaction of transcriptional machinery with histone genes may be connected with recombination in promoter regions, resulting in the inverted duplication structure seen in this cluster. PMID:4000938

  9. Mosaic gene conversion after a tandem duplication of mtDNA sequence in Diomedeidae (albatrosses).

    PubMed

    Eda, Masaki; Kuro-o, Masaki; Higuchi, Hiroyoshi; Hasegawa, Hiroshi; Koike, Hiroko

    2010-04-01

    Although the tandem duplication of mitochondrial (mt) sequences, especially those of the control region (CR), has been detected in metazoan species, few studies have focused on the features of the duplicated sequence itself, such as the gene conversion rate, distribution patterns of the variation, and relative rates of evolution between the copies. To investigate the features of duplicated mt sequences, we partially sequenced the mt genome of 16 Phoebastria albatrosses belonging to three species (P. albatrus, P. nigripes, and P. immutabilis). More than 2,300 base pairs of tandemly-duplicated sequence were shared by all three species. The observed gene arrangement was shared in the three Phoebastria albatrosses and suggests that the duplication event occurred in the common ancestor of the three species. Most of the copies in each individual were identical or nearly identical, and were maintained through frequent gene conversions. By contrast, portions of CR domains I and III had different phylogenetic signals, suggesting that gene conversion had not occurred in those sections after the speciation of the three species. Several lines of data, including the heterogeneity of the rate of molecular evolution, nucleotide differences, and putative secondary structures, suggests that the two sequences in CR domain I are maintained through selection; however, additional studies into the mechanisms of gene conversion and mtDNA synthesis are required to confirm this hypothesis.

  10. Metazoan mitochondrial gene sequence reference datasets for taxonomic assignment of environmental samples

    PubMed Central

    Machida, Ryuji J.; Leray, Matthieu; Ho, Shian-Lei; Knowlton, Nancy

    2017-01-01

    Mitochondrial-encoded genes are increasingly targeted in studies using high-throughput sequencing approaches for characterizing metazoan communities from environmental samples (e.g., plankton, meiofauna, filtered water). Yet, unlike nuclear ribosomal RNA markers, there is to date no high-quality reference dataset available for taxonomic assignments. Here, we retrieved all metazoan mitochondrial gene sequences from GenBank, and then quality filtered and formatted the datasets for taxonomic assignments using taxonomic assignment tools. The reference datasets—‘Midori references’—are available for download at www.reference-midori.info. Two versions are provided: (I) Midori-UNIQUE that contains all unique haplotypes associated with each species and (II) Midori-LONGEST that contains a single sequence, the longest, for each species. Overall, the mitochondrial Cytochrome oxidase subunit I gene was the most sequence-rich gene. However, sequences of the mitochondrial large ribosomal subunit RNA and Cytochrome b apoenzyme genes were observed for a large number of species in some phyla. The Midori reference is compatible with some taxonomic assignment software. Therefore, automated high-throughput sequence taxonomic assignments can be particularly effective using these datasets. PMID:28291235

  11. Mosquito vitellogenin genes: Comparative sequence analysis, gene duplication, and the role of rare synonymous codon usage in regulating expression.

    PubMed

    Isoe, Jun; Hagedorn, Henry H

    2007-01-01

    Comparative sequence analysis of mosquito vitellogenin (Vg) genes was carried out to gain a better understanding of their evolution. The genomic clones of vitellogenin genes were isolated and sequenced from all three subfamilies of the family Culicidae including Culicinae (Aedes aegypti, Ochlerotatus atropalpus, Ae. polynesiensis, Ae. albopictus, Ochlerotatus triseriatus and Culex quinquefasciatus), Toxorhynchitinae (Toxorhynchites amboinensis), and Anophelinae (Anopheles albimanus). Genomic clones of vitellogenin genes Vg-B and Vg-C were isolated from Ae. aegypti and sequenced. A comparison of Vg-B and Vg-C, with the previously characterized vitellogenin gene, Vg-A1, suggests that Vg-A1 and Vg-B probably arose by a recent gene duplication, and Vg-C apparently diverged from the two other members of the gene family in an earlier gene duplication event. Two vitellogenin genes orthologous to Vg-C were cloned from a Cx. quinquefasciatus DNA library, one of which is truncated at the N-terminal end. Single vitellogenin genes, orthologous to Vg-C, were cloned from the An. albimanus and Tx. amboinensis libraries. Incomplete sequences orthologous to Vg-B and Vg-C were isolated from the Oc. atropalpus library. Only partial sequences were isolated from Ae. polynesiensis, Ae. albopictus and Oc. triseriatus. Inferred phylogenetic relationships based on analysis of these sequences suggest that Vg-C was the ancestral gene and that a recent gene duplication gave rise to Vg-A1 and Vg-B after the separation of the genus Aedes. The deduced amino acid composition of mosquito vitellogenin proteins exhibits higher tyrosine and phenylalanine composition than other mosquito proteins except for the hexamerin storage proteins. Analysis of vitellogenin coding sequences showed that a majority of amino acid substitutions were due to conserved and moderately conserved changes suggesting that the vitellogenins are under moderately selective constrains to maintain tertiary structure. The

  12. The construction of Arabidopsis expressed sequence tag assemblies. A new resource to facilitate gene identification.

    PubMed Central

    Rounsley, S D; Glodek, A; Sutton, G; Adams, M D; Somerville, C R; Venter, J C; Kerlavage, A R

    1996-01-01

    The generation of large numbers of partial cDNA sequences, or expressed sequence tags (ESTs), has provided a method with which to sample a large number of genes from an organism. More than 25,000 Arabidopsis thaliana ESTs have been deposited in public databases, producing the largest collection of ESTs for any plant species. We describe here the application of a method of reducing redundancy and increasing information content in this collection by grouping overlapping ESTs representing the same gene into a "contig" or assembly. The increased information content of these assemblies allows more putative identifications to be assigned based on the results of similarity searches with nucleotide and protein databases. The results of this analysis indicate that sequence information is available for approximately 12,600 nonoverlapping ESTs from Arabidopsis. Comparison of the assemblies with 953 Arabidopsis coding sequences indicates that up to 57% of all Arabidopsis genes are represented by an EST. Clustering analysis of these sequences suggests that between 300 and 700 gene families are represented by between 700 and 2000 sequences in the EST database. A database of the assembled sequences, their putative identifications, and cellular roles is available through the World Wide Web. PMID:8938416

  13. Molecular Phylogenetics of the Genus Trichosporon Inferred from Mitochondrial Cytochrome b Gene Sequences

    PubMed Central

    Biswas, Swarajit Kumar; Wang, Li; Yokoyama, Koji; Nishimura, Kazuko

    2005-01-01

    Mitochondrial cytochrome b (cyt b) genes of 42 strains representing 23 species of the genus Trichosporon were partially sequenced to determine their molecular phylogenetic relationships. Almost half of the 22 strains investigated (from 11 different species) contained introns in their sequences. Analysis of a 396-bp coding sequence from each strain of Trichosporon under investigation showed a total of 141 (35.6%) variable nucleotide sites. A phylogenetic tree based on the cyt b gene sequences revealed that all species of Trichosporon except Trichosporon domesticum and Trichosporon montevideense had species-specific cyt b genes. Trichosporon sp. strain CBS 5581 was identified as Trichosporon pullulans, and one clinical isolate, IFM 48794, was identified as Trichosporon faecale. Analysis of 132-bp deduced amino acid sequences showed a total of 34 (25.75%) variable amino acid sites. T. domesticum and T. montevideense, Trichosporon asahii and Trichosporon asteroides, and Trichosporon gracile and Trichosporon guehoae had identical amino acid sequences. A phylogenetic tree constructed with the ascomycetes Saccharomyces douglasii and Candida glabrata taken as outgroup species and including representative species from closely related genera species of Trichosporon clustered with other basidiomycetous yeasts that contain xylose in their cell wall compositions. These results indicate the effectiveness of mitochondrial cyt b gene sequences for both species identification and the phylogenetic analysis of Trichosporon species. PMID:16207980

  14. Promoter sequences required for transcription of Xenopus laevis histone genes in injected frog oocyte nuclei.

    PubMed Central

    Heindl, L M; Weil, T S; Perry, M

    1988-01-01

    Amphibian oogenesis is accompanied by the accumulation of histone mRNA and proteins in the absence of ongoing DNA replication. To begin an analysis of the mechanisms by which histone gene expression is regulated during frog oogenesis and embryogenesis, we used oocyte injection to examine the upstream sequences required for transcription of genes encoding each of the five histone classes. We found that sequences necessary for maximal levels of transcription are located 100 to 200 base pairs upstream of the corresponding start sites. In this region, each promoter examined contains conserved sequence elements, several of which seem to be histone gene class specific, in addition to other, more common sequence elements believed to be used by general transcription factors. Images PMID:3221862

  15. Control sites in the sequence at the beginning of T7 gene 1.

    PubMed Central

    McConnell, D J

    1979-01-01

    The DNA sequence of the fragment Hind.30, 378 bases long, from the beginning of gene 1 of T7 is presented. It contains the C promoter, two in vitro transcriptional terminator sites and a sequence of 171 bases which probably codes for the N terminus of the T7 RNA polymerase. The sequence also codes for the RNase III cleavage site before gene 1. The overlaps with the transcriptional terminators, The RNA transcript of the sequence about the terminators can be arranged in a set of alternative double-stranded hairpin structures. It is suggested that conversion between these structures may have a role in termination; this may be influenced by interactions with ribosomes and RNase III. The region of the C promoter between genes 0.7 and 1 thus contains several sites which may be involved in the control of transcription and translation. Images PMID:493111

  16. Synthetic neomycin-kanamycin phosphotransferase, type II coding sequence for gene targeting in mammalian cells.

    PubMed

    Jin, Seung-Gi; Mann, Jeffrey R

    2005-07-01

    The bacterial neomycin-kanamycin phosphotransferase, type II enzyme is encoded by the neo gene and confers resistance to aminoglycoside drugs such as neomycin and kanamycin-bacterial selection and G418-eukaryotic cell selection. Although widely used in gene targeting in mouse embryonic stem cells, the neo coding sequence contains numerous cryptic splice sites and has a high CpG content. At least the former can cause unwanted effects in cis at the targeted locus. We describe a synthetic sequence, sneo, which encodes the same protein as that encoded by neo. This synthetic sequence has no predicted splice sites in either strand, low CpG content, and increased mammalian codon usage. In mouse embryonic stem cells sneo expressability is similar to neo. The use of sneo in gene targeting experiments should substantially reduce the probability of unwanted effects in cis due to splicing, and perhaps CpG methylation, within the coding sequence of the selectable marker.

  17. Molecular cloning and sequences of lignin peroxidase genes of Phanerochaete chrysosporium.

    PubMed Central

    Schalch, H; Gaskell, J; Smith, T L; Cullen, D

    1989-01-01

    The genomic clones encoding lignin peroxidase isozyme H8 and two closely related genes were isolated from Phanerochaete chrysosporium BKM-1767, and their nucleotide sequences were determined. The positions and approximate lengths of introns were found to be highly conserved in all three clones. Analysis of homokaryotic derivatives indicated that the three clones are not alleles of the same gene(s). Images PMID:2761543

  18. [Tryptophan 7-halogenase from Pseudomonas aureofaciens ACN strain: gene cloning and sequencing and the enzyme expression].

    PubMed

    Burd', V N; van Pee, K H

    2004-01-01

    The gene of tryptophan 7-halogenase was isolated from the Pseudomonas aureofaciens ACN strain producing pyrrolnitrin, a chlorocontaining antibiotic, and sequenced. A high homology degree (over 95%) was established for the genes and the corresponding halogenases from P. aureofaciens ACN and P. fluorescens BL915. The tryptophan 7-halogenase gene was amplified by PCR, and the corresponding enzyme was expressed in Escherichia coli cells using the pBSII SK+ vector.

  19. Changes in repeat number, sequence, and reading frame in S-antigen genes of Plasmodium falciparum.

    PubMed Central

    Saint, R B; Coppel, R L; Cowman, A F; Brown, G V; Shi, P T; Barzaga, N; Kemp, D J; Anders, R F

    1987-01-01

    The S antigens from different isolates of Plasmodium falciparum exhibit extensive size, charge, and serological diversity. We show here that the S-antigen genes behave as multiple alleles of a single locus. The size heterogeneity results from different numbers, lengths, and/or sequences of tandem repeat units encoded within the S-antigen genes. Two genes studied here encode antigenically different S antigens but nevertheless have closely related tandem repeat sequences. We show that antigenic differences can arise because repeats are translated in different reading frames. Images PMID:3313007

  20. Molecular cloning, expression, and sequence of the pilin gene from nontypeable Haemophilus influenzae M37.

    PubMed Central

    Coleman, T; Grass, S; Munson, R

    1991-01-01

    Nontypeable Haemophilus influenzae M37 adheres to human buccal epithelial cells and exhibits mannose-resistant hemagglutination of human erythrocytes. An isogenic variant of this strain which was deficient in hemagglutination was isolated. A protein with an apparent molecular weight of 22,000 was present in the sodium dodecyl sulfate-polyacrylamide gel profile of sarcosyl-insoluble proteins from the hemagglutination-proficient strain but was absent from the profile of the isogenic hemagglutination-deficient variant. A monoclonal antibody which reacts with the hemagglutination-proficient isolate but not with the hemagglutination-deficient isolate has been characterized. This monoclonal antibody was employed in an affinity column for purification of the protein as well as to screen a genomic library for recombinant clones expressing the gene. Several clones which contained overlapping genomic fragments were identified by reaction with the monoclonal antibody. The gene for the 22-kDa protein was subcloned and sequenced. The gene for the type b pilin from H. influenzae type b strain MinnA was also cloned and sequenced. The DNA sequence of the strain MinnA gene was identical to that reported previously for two other type b strains. The DNA sequence of the strain M37 gene is 77% identical to that of the type b pilin gene, and the derived amino acid sequence is 68% identical to that of the type b pilin. Images PMID:1673447

  1. High sequence turnover in the regulatory regions of the developmental gene hunchback in insects.

    PubMed

    Hancock, J M; Shaw, P J; Bonneton, F; Dover, G A

    1999-02-01

    Extensive sequence analysis of the developmental gene hunchback and its 5' and 3' regulatory regions in Drosophila melanogaster, Drosophila virilis, Musca domestica, and Tribolium castaneum, using a variety of computer algorithms, reveals regions of high sequence simplicity probably generated by slippage-like mechanisms of turnover. No regions are entirely refractory to the action of slippage, although the density and composition of simple sequence motifs varies from region to region. Interestingly, the 5' and 3' flanking regions share short repetitive motifs despite their separation by the gene itself, and the motifs are different in composition from those in the exons and introns. Furthermore, there are high levels of conservation of motifs in equivalent orthologous regions. Detailed sequence analysis of the P2 promoter and DNA footprinting assays reveal that the number, orientation, sequence, spacing, and protein-binding affinities of the BICOID-binding sites varies between species and that the 'P2' promoter, the nanos response element in the 3' untranslated region, and several conserved boxes of sequence in the gene (e.g., the two zinc-finger regions) are surrounded by cryptically-simple-sequence DNA. We argue that high sequence turnover and genetic redundancy permit both the general maintenance of promoter functions through the establishment of coevolutionary (compensatory) changes in cis- and trans-acting genetic elements and, at the same time, the possibility of subtle changes in the regulation of hunchback in the different species.

  2. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C. |

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  3. Cloning and characterization of a Leishmania gene encoding a RNA spliced leader sequence.

    PubMed Central

    Miller, S I; Landfear, S M; Wirth, D F

    1986-01-01

    Recent studies on leishmania enriettii tubulin mRNAs revealed a 35 nucleotide addition to their 5' end. The gene that codes for this 35 nucleotide leader sequence has now been cloned and sequenced. In the Leishmania genome, the spliced leader gene exists as a tandem repeat of 438 bases. There are approximately 150 copies of this gene comprising 0.1% of the parasite genome. This gene codes for a 85 nucleotide transcript that contains the spliced leader at its 5' end. The 35 nucleotide sequence and the regions immediately 5' and 3' to it are highly conserved across trypanosomatids. We have detected a RNA molecule that is a putative by-product of the processing reaction in which the 35 nucleotide spliced leader has been transferred to mRNA. We suggest that this molecule is the remnant of the spliced leader transcript after removal of the 35 nucleotide spliced leader. Images PMID:2429261

  4. Molecular phylogeny, population genetics, and evolution of heterocystous cyanobacteria using nifH gene sequences.

    PubMed

    Singh, Prashant; Singh, Satya Shila; Elster, Josef; Mishra, Arun Kumar

    2013-06-01

    In order to assess phylogeny, population genetics, and approximation of future course of cyanobacterial evolution based on nifH gene sequences, 41 heterocystous cyanobacterial strains collected from all over India have been used in the present study. NifH gene sequence analysis data confirm that the heterocystous cyanobacteria are monophyletic while the stigonematales show polyphyletic origin with grave intermixing. Further, analysis of nifH gene sequence data using intricate mathematical extrapolations revealed that the nucleotide diversity and recombination frequency is much greater in Nostocales than the Stigonematales. Similarly, DNA divergence studies showed significant values of divergence with greater gene conversion tracts in the unbranched (Nostocales) than the branched (Stigonematales) strains. Our data strongly support the origin of true branching cyanobacterial strains from the unbranched strains.

  5. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues.

    PubMed

    Lee, Je Hyuk; Daugharthy, Evan R; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C; Terry, Richard; Turczyk, Brian M; Yang, Joyce L; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M

    2015-03-01

    RNA-sequencing (RNA-seq) measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. In contrast, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq, our method enriches for context-specific transcripts over housekeeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d.

  6. Identification and sequence analyses of the granulin gene of Choristoneura fumiferana granulovirus.

    PubMed

    Bah, A; Bergeron, J; Arella, M; Lucarotti, C J; Guertin, C

    1997-01-01

    The nucleotide sequence of the granulin gene of Choristoneura fumiferana granulovirus (CfGV) was determined. The gene encodes a protein of 248 amino acids with a predicted Mr of 29.299 kDa. The granulin genes of Trichoplusia ni, Pieris brassicae and Cryptophlebia leucotreta granuloviruses showed homologies ranging from 76.7-80.5% for nucleotide sequences and 84.2-88.3% for amino acid sequences when compared to CfGV. The secondary structure of CfGV granulin protein, including the hydrophilic (polar) and hydrophobic (basic) regions, was predicted and found to be similar to other granulins. A very late baculovirus promoter motif, ATAAG, was found within the putative promoter region of the CfGV granulin gene.

  7. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

    PubMed Central

    Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

    2014-01-01

    RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209

  8. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing.

    PubMed

    Weirather, Jason L; Afshar, Pegah Tootoonchi; Clark, Tyson A; Tseng, Elizabeth; Powers, Linda S; Underwood, Jason G; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

    2015-10-15

    We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.

  9. Sequence-Modified Antibiotic Resistance Genes Provide Sustained Plasmid-Mediated Transgene Expression in Mammals.

    PubMed

    Lu, Jiamiao; Zhang, Feijie; Fire, Andrew Z; Kay, Mark A

    2017-03-30

    Conventional plasmid vectors are incapable of achieving sustained levels of transgene expression in vivo even in quiescent mammalian tissues because the transgene expression cassette is silenced. Transcriptional silencing results from the presence of the bacterial plasmid backbone or virtually any DNA sequence of >1 kb in length placed outside of the expression cassette. Here, we show that transcriptional silencing can be substantially forestalled by increasing the An/Tn sequence composition in the plasmid bacterial backbone. Increasing numbers of An/Tn sequences increased sustained transcription of both backbone sequences and adjacent expression cassettes. In order to recapitulate these expression profiles in compact and portable plasmid DNA backbones, we engineered the standard kanamycin or ampicillin antibiotic resistance genes, optimizing the number of An/Tn sequence without altering the encoded amino acids. The resulting vector backbones yield sustained transgene expression from mouse liver, providing generic DNA vectors capable of sustained transgene expression without additional genes or mammalian regulatory elements.

  10. Research Techniques Made Simple: Bacterial 16S Ribosomal RNA Gene Sequencing in Cutaneous Research.

    PubMed

    Jo, Jay-Hyun; Kennedy, Elizabeth A; Kong, Heidi H

    2016-03-01

    Skin serves as a protective barrier and also harbors numerous microorganisms collectively comprising the skin microbiome. As a result of recent advances in sequencing (next-generation sequencing), our understanding of microbial communities on skin has advanced substantially. In particular, the 16S ribosomal RNA gene sequencing technique has played an important role in efforts to identify the global communities of bacteria in healthy individuals and patients with various disorders in multiple topographical regions over the skin surface. Here, we describe basic principles, study design, and a workflow of 16S ribosomal RNA gene sequencing methodology, primarily for investigators who are not familiar with this approach. This article will also discuss some applications and challenges of 16S ribosomal RNA sequencing as well as directions for future development.

  11. Integrated next-generation sequencing analysis of whole exome and 409 cancer-related genes.

    PubMed

    Shimoda, Yuji; Nagashima, Takeshi; Urakami, Kenichi; Tanabe, Tomoe; Saito, Junko; Naruoka, Akane; Serizawa, Masakuni; Mochizuki, Tohru; Ohshima, Keiichi; Ohnami, Sumiko; Ohnami, Shumpei; Kusuhara, Masatoshi; Yamaguchi, Ken

    2016-01-01

    The use of next-generation sequencing (NGS) techniques to analyze the genomes of cancer cells has identified numerous genomic alterations, including single-base substitutions, small insertions and deletions, amplification, recombination, and epigenetic modifications. NGS contributes to the clinical management of patients as well as new discoveries that identify the mechanisms of tumorigenesis. Moreover, analysis of gene panels targeting actionable mutations enhances efforts to optimize the selection of chemotherapeutic regimens. However, whole genome sequencing takes several days and costs at least $10,000, depending on sequence coverage. Therefore, laboratories with relatively limited resources must employ a more economical approach. For this purpose, we conducted an integrated nucleotide sequence analysis of a panel of 409-cancer related genes (409-CRG) combined with whole exome sequencing (WES). Analysis of the 409-CRG panel detected low-frequency variants with high sensitivity, and WES identified moderate and high frequency somatic variants as well as germline variants.

  12. Gene Slider: sequence logo interactive data-visualization for education and research.

    PubMed

    Waese, Jamie; Pasha, Asher; Wang, Ting Ting; van Weringh, Anna; Guttman, David S; Provart, Nicholas J

    2016-12-01

    Gene Slider helps visualize the conservation and entropy of orthologous DNA and protein sequences by presenting them as one long sequence logo that can be zoomed in and out of, from an overview of the entire sequence down to just a few residues at a time. A search function enables users to find motifs such as cis-elements in promoter regions by simply 'drawing' a sequence logo representation of the desired motif as a query. In addition to displaying user-supplied FASTA files, our demonstration version of Gene Slider loads and displays a rich database of 90 000+ conserved non-coding regions across the Brassicaceae indexed to the TAIR10 Col-0 Arabidopsis thaliana sequence. It also displays transcription factor binding sites, enabling easy identification of regions that are both conserved across multiple species and may contain transcription factor binding sites.

  13. The nucleotide sequence of the sheep MHC class II DNA gene

    SciTech Connect

    Wright, H.; Redmond, J.; Ballingall, K.T.; Wright, F.

    1995-01-11

    The human MHC class II DNA gene was identified and sequenced by Trowsdale and Kelly. When a molecular map of the HLA-D region became available, it was shown that the HLA-DNA gene was unusual in not having a B gene partner situated within a few kilobases (kb), the nearest B gene being HLA-DPB1. The nearest unpaired B gene is HLA-DOB which is approximately 160 kb telomeric of HLA-DNA. More recently, the mouse MHC class II genes H-20A and H-20B were shown to be equivalent to the HLA-DNA and HLA-DOB genes. Moreover, the mouse genes expressed an MHC class II protein whose tissue distribution was restricted to B cells and epithelial cell of the thymic medulla. No corresponding HLA-DN protein has been reported. 21 refs., 3 figs.

  14. Strategy for microbiome analysis using 16S rRNA gene sequence analysis on the Illumina sequencing platform.

    PubMed

    Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko

    2011-06-01

    Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample.

  15. Characterization of six type A strains of Clostridium botulinum that contain type B toxin gene sequences.

    PubMed

    Kirma, Nameer; Ferreira, Joseph L; Baumstark, Barbara R

    2004-02-16

    Six Clostridium botulinum isolates exhibiting type A toxicity as measured by the mouse bioassay were found to contain both type A and type B neurotoxin DNA sequences. The six strains were divided into three groups based on the DNA sequence of the type B neurotoxin gene. Members of each group exhibited 100% sequence identity over the 3876 bp type B toxin open reading frame. The type B toxin sequence of all groups differed at more than 60 positions when compared to the BGB control strain.

  16. Sequence variation and methylation of the flax 5S RNA genes.

    PubMed Central

    Goldsbrough, P B; Ellis, T H; Lomonossoff, G P

    1982-01-01

    The complete sequence of the flax 5S DNA repeat is presented. Length heterogeneity is the consequence of the presence or absence of a single direct repeat and the majority of single base changes are transition mutations. No sequence variation has been found in the coding sequence. The extent of methylation of cytosines has been measured at one location in the gene and one in the spacer. The relationship between the observed sequence heterogeneity and the level of methylation is discussed in the context of the operation of a correction mechanism. Images PMID:6290983

  17. Sequence diversity within the capsular genes of Streptococcus pneumoniae serogroup 6 and 19.

    PubMed

    Elberse, Karin; Witteveen, Sandra; van der Heide, Han; van de Pol, Ingrid; Schot, Corrie; van der Ende, Arie; Berbers, Guy; Schouls, Leo

    2011-01-01

    The main virulence factor of Streptococcus pneumoniae is the capsule. The polysaccharides comprising this capsule are encoded by approximately 15 genes and differences in these genes result in different serotypes. The aim of this study was to investigate the sequence diversity of the capsular genes of serotypes 6A, 6B, 6C, 19A and 19F and to explore a possible effect of vaccination on variation and distribution of these serotypes in the Netherlands. The complete capsular gene locus was sequenced for 25 serogroup 6 and for 20 serogroup 19 isolates. If one or more genes varied in 10 or more base pairs from the reference sequence, it was designated as a capsular subtype. Allele-specific PCRs and specific gene sequencing of highly variable capsular genes were performed on 184 serogroup 6 and 195 serogroup 19 isolates to identify capsular subtypes. This revealed the presence of 6, 3 and a single capsular subtype within serotypes 6A, 6B and 6C, respectively. The serotype 19A and 19F isolates comprised 3 and 4 capsular subtypes, respectively. For serogroup 6, the genetic background, as determined by multi locus sequence typing (MLST) and multiple-locus variable number of tandem repeat analysis (MLVA), seemed to be closely related to the capsular subtypes, but this was less pronounced for serogroup 19 isolates. The data also suggest shifts in the occurrence of capsular subtypes within serotype 6A and 19A after introduction of the 7-valent pneumococcal vaccine. The shifts within these non-vaccine serotypes might indicate that these capsular subtypes are filling the niche of the vaccine serotypes. In conclusion, there is considerable DNA sequence variation of the capsular genes within pneumococcal serogroup 6 and 19. Such changes may result in altered polysaccharides or in strains that produce more capsular polysaccharides. Consequently, these altered capsules may be less sensitive for vaccine induced immunity.

  18. Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease

    SciTech Connect

    Su, Y.; Zhang, H.; Madrid, R.

    1994-09-01

    Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses is to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.

  19. Complete sequence and gene organization of the mitochondrial genome of the land snail Albinaria coerulea.

    PubMed

    Hatzoglou, E; Rodakis, G C; Lecanidou, R

    1995-08-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides) (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date.

  20. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  1. Insights into corn genes derived from large-scale cDNA sequencing.

    PubMed

    Alexandrov, Nickolai N; Brover, Vyacheslav V; Freidin, Stanislav; Troukhan, Maxim E; Tatarinova, Tatiana V; Zhang, Hongyu; Swaller, Timothy J; Lu, Yu-Ping; Bouck, John; Flavell, Richard B; Feldmann, Kenneth A

    2009-01-01

    We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST).

  2. Sequence analysis of the ERCC2 gene regions in human, mouse, and hamster reveals three linked genes

    SciTech Connect

    Lamerdin, J.E.; Stilwagen, S.A.; Ramirez, M.H.

    1996-06-15

    The ERCC2 (excision repair cross-complementing rodent repair group 2) gene product is involved in transcription-coupled repair as an integral member of the basal transcription factor BTF2/TFIIH complex. Defects in this gene can result in three distinct human disorders, namely the cancer-prone syndrome xeroderma pigmentosum complementation group D, trichothiodystrophy, and Cockayne syndrome. We report the comparative analysis of 91.6 kb of new sequence including 54.3 kb encompassing the human ERCC2 locus, the syntenic region in the mouse (32.6 kb), and a further 4.7 kb of sequence 3{prime} of the previously reported ERCC2 region in the hamster. In addition to ERCC2, our analysis revealed the presence of two previously undescribed genes in all three species. The first is centromeric (in the human) to ERCC2 and is most similar to the kinesin light chain gene in sea urchin. The second gene is telomeric (in the human) to ERCC2 and contains a motif found in ankyrins, some cell proteins, and transcription factors. Multiple EST matches to this putative new gene indicate that it is expressed in several human tissues, including breast. The identification and description of two new genes provides potential candidate genes for disorders mapping to this region of 19q13.2. 42 refs., 6 figs., 3 tabs.

  3. Sequence analysis of the ERCC2 gene regions in human, mouse, and hamster reveals three linked genes.

    PubMed

    Lamerdin, J E; Stilwagen, S A; Ramirez, M H; Stubbs, L; Carrano, A V

    1996-06-15

    The ERCC2 (excision repair cross-complementing rodent repair group 2) gene product is involved in transcription-coupled repair as an integral member of the basal transcription factor BTF2/TFIIH complex. Defects in this gene can result in three distinct human disorders, namely the cancer-prone syndrome xeroderma pigmentosum complementation group D, trichothiodystrophy, and Cockayne syndrome. We report the comparative analysis of 91.6 kb of new sequence including 54.3 kb encompassing the human ERCC2 locus, the syntenic region in the mouse (32.6 kb), and a further 4.7 kb of sequence 3' of the previously reported ERCC2 region in the hamster. In addition to ERCC2, our analysis revealed the presence of two previously undescribed genes in all three species. The first is centromeric (in the human) to ERCC2 and is most similar to the kinesin light chain gene in sea urchin. The second gene is telomeric (in the human) to ERCC2 and contains a motif found in ankyrins, some cell cycle proteins, and transcription factors. Multiple EST matches to this putative new gene indicate that it is expressed in several human tissues, including breast. The identification and description of two new genes provides potential candidate genes for disorders mapping to this region of 19q13.2.

  4. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences

    PubMed Central

    Laslett, Dean; Canback, Bjorn

    2004-01-01

    A computer program, ARAGORN, identifies tRNA and tmRNA genes. The program employs heuristic algorithms to predict tRNA secondary structure, based on homology with recognized tRNA consensus sequences and ability to form a base-paired cloverleaf. tmRNA genes are identified using a modified version of the BRUCE program. ARAGORN achieves a detection sensitivity of 99% from a set of 1290 eubacterial, eukaryotic and archaeal tRNA genes and detects all complete tmRNA sequences in the tmRNA database, improving on the performance of the BRUCE program. Recently discovered tmRNA genes in the chloroplasts of two species from the ‘green’ algae lineage are detected. The output of the program reports the proposed tRNA secondary structure and, for tmRNA genes, the secondary structure of the tRNA domain, the tmRNA gene sequence, the tag peptide and a list of organisms with matching tmRNA peptide tags. PMID:14704338

  5. Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

    PubMed Central

    2009-01-01

    Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay

  6. Cloning, sequencing and transcriptional analysis of the Choristoneura fumiferana entomopoxvirus spheroidin gene.

    PubMed

    Li, X; Barrett, J W; Yuen, L; Arif, B M

    1997-02-01

    The Choristoneura fumiferana entomopoxvirus (CfEPV) spheroidin gene was identified and localized on three XbaI restriction fragments (total size 4.73 kb). The fragments were cloned and sequenced. The spheroidin gene had an open reading frame of 2997 nucleotides encoding a putative protein with a predicted size of 115 kDa. Sequence analysis indicated that the putative protein contained 14 potential N-glycosylation sites (Asn-X-Thr; Asn-X-Ser), that are probably not used since the protein migrates on SDS-PAGE as a 115 kDa band. The protein is rich in cysteine residues (34), which explains the need for reducing agents when dissolving the occlusion bodies with alkali. The spheroidin gene sequence contains motifs characteristic of the late genes of poxviruses. These include the typical TAAATG sequence at the beginning of the coding region and two early gene termination signals (TTTTTNT) in the untranslated region of the gene. The promoter region has three TAA termination signals immediately upstream of the ATG start site. Spheroidin (SPH) appears to be conserved among different EPVs. There was 82.2% identity and 97.2% similarity at the amino acid level between the SPHs of CfEPV and Amsacta moorei EPV. Less conservation was seen with the SPH from Melolontha melolontha EPV (39.8% identity and 73.4% similarity). Transcriptional analyses of the spheroidin gene by Northern blots showed that the transcript had a size of approximately 3 kb, which is in agreement with the length of the ORF. Primer extension results, anchor PCR and sequencing confirmed that there was a poly (A)17 tract at the 5' end of the spheroidin gene transcript, a structure typical of late gene transcripts of poxviruses.

  7. Impact of Pre-Analytical Variables on Cancer Targeted Gene Sequencing Efficiency

    PubMed Central

    Araujo, Luiz H.; Timmers, Cynthia; Shilo, Konstantin; Zhao, Weiqiang; Zhang, Jianying; Yu, Lianbo; Natarajan, Thanemozhi G.; Miller, Clinton J.; Yilmaz, Ayse Selen; Liu, Tom; Amann, Joseph; Lapa e Silva, José Roberto; Ferreira, Carlos Gil; Carbone, David P.

    2015-01-01

    Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized. PMID:26605948

  8. In-depth cDNA library sequencing provides quantitative gene expression profiling in cancer biomarker discovery.

    PubMed

    Yang, Wanling; Ying, Dingge; Lau, Yu-Lung

    2009-06-01

    Quantitative gene expression analysis plays an important role in identifying differentially expressed genes in various pathological states, gene expression regulation and co-regulation, shedding light on gene functions. Although microarray is widely used as a powerful tool in this regard, it is suboptimal quantitatively and unable to detect unknown gene variants. Here we demonstrated effective detection of differential expression and co-regulation of certain genes by expressed sequence tag analysis using a selected subset of cDNA libraries. We discussed the issues of sequencing depth and library preparation, and propose that increased sequencing depth and improved preparation procedures may allow detection of many expression features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to increase sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique advantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.

  9. Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

    PubMed

    Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

    2008-01-01

    The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.

  10. Biologic: Gene circuits and feedback in an introductory physics sequence for biology and premedical students

    NASA Astrophysics Data System (ADS)

    Cahn, S. B.; Mochrie, S. G. J.

    2014-05-01

    We describe an educational module on feedback and gene circuits that constitute the final topic in a new year-long introductory physics sequence aimed at biology and premedical students at Yale University. The overall goals of this sequence are threefold. First to demonstrate the application of physics and mathematics in the life sciences. Second to introduce biological science majors to mathematical and physical tools, principles, and experiences. Third to seed an enduring appreciation of quantitative approaches in biology and medicine. Here, we present a module on feedback and gene circuits that focuses on a genetic toggle switch and a repressilator. The genetic toggle switch consists of two genes, each of whose protein products represses the other's expression, while the repressilator consists of three genes, each of whose protein products represses the next gene's expression. Analytic, numerical, and electronic treatments of the genetic toggle switch show bistability. A similar treatment of the repressilator reveals sustained oscillations.

  11. Detection of a novel intragenic rearrangement in the creatine transporter gene by next generation sequencing.

    PubMed

    Yu, Hui; van Karnebeek, Clara; Sinclair, Graham; Hill, Alan; Cui, Hong; Zhang, Victor Wei; Wong, Lee-Jun

    2013-12-01

    Deficiency caused by mutations in the creatine transporter gene (SLC6A8/CT1) is an X-linked form of intellectual disability. The presence of highly homologous pseudogenes and high GC content of SLC6A8 genomic sequence complicates the molecular diagnosis of this disorder. To minimize the pseudogene interference, exons 2 to 13 of SLC6A8 were amplified as a single PCR product using gene-specific long-range PCR (LR-PCR) primers. The GC-rich exon 1 and its flanking intronic sequences were amplified separately in a short fragment under GC-rich conditions and a touchdown PCR program. Traditional Sanger sequence analysis of all coding exons of SLC6A8 from a 3-year-old boy with creatine transporter deficiency did not detect deleterious mutations. The long-range PCR product was used as template followed by massively parallel sequencing (MPS) on HiSeq2000. We were able to detect a tandem duplication involving part of exons 11 and 12 in the SLC6A8 gene. The deduced c.1592_1639dup133 mutation was confirmed to be a hemizygous insertion by targeted genomic DNA and cDNA Sanger sequencing. Combination of deep sequencing technology with long-range PCR revealed a novel intragenic duplication in the SLC6A8 gene, providing a definitive molecular diagnosis of creatine transporter deficiency in a male patient.

  12. Detection of false positive mutations in BRCA gene by next generation sequencing.

    PubMed

    Suryavanshi, Moushumi; Kumar, Dushyant; Panigrahi, Manoj Kumar; Chowdhary, Meenakshi; Mehta, Anurag

    2016-11-15

    BRCA1 and BRCA2 genes are implicated in 20-25% of hereditary breast and ovarian cancers. New age sequencing platforms have revolutionized massively parallel sequencing in clinical practice by providing cost effective, rapid, and sensitive sequencing. This study critically evaluates the false positives in multiplex panels and suggests the need for careful analysis. We employed multiplex PCR based BRCA1 and BRCA2 community Panel with ion torrent PGM machine for evaluation of these mutations. Out of all 41samples analyzed for BRCA1 and BRCA2 five were found with 950_951 insA(Asn319fs) at Chr13:32906565 position and one sample with 1032_1033 insA(Asn346fs) at Chr13:32906647, both being frame-shift mutations in BRCA2 gene. 950_951 insA(Asn319fs) mutation is reported as pathogenic allele in NCBI dbSNP. On examination of IGV for all these samples, it was seen that both mutations had 'A' nucleotide insertion at 950, and 1032 position in exon 10 of BRCA2 gene. Sanger Sequencing did not confirm these insertions. Next-generation sequencing shows great promise by allowing rapid mutational analysis of multiple genes in human cancer but our results indicate the need for careful sequence analysis to avoid false positive results.

  13. Phylogenetic analysis of Mexican Babesia bovis isolates using msa and ssrRNA gene sequences.

    PubMed

    Genis, Alma D; Mosqueda, Juan J; Borgonio, Verónica M; Falcón, Alfonso; Alvarez, Antonio; Camacho, Minerva; de Lourdes Muñoz, Maria; Figueroa, Julio V

    2008-12-01

    Variable merozoite surface antigens of Babesia bovis are exposed glycoproteins having a role in erythrocyte invasion. Members of this gene family include msa-1 and msa-2 (msa-2c, msa-2a(1), msa-2a(2), and msa-2b). Small subunit ribosomal (ssr)RNA gene is subject to evolutive pressure and has been used in phylogenetic studies. To determine the phylogenetic relationship among B. bovis Mexican isolates using different genetic markers, PCR amplicons, corresponding to msa-1, msa-2c, msa-2b, and ssrRNA genes, were cloned and plasmids carrying the corresponding inserts were sequenced. Comparative analysis of nucleotide and deduced amino acid sequences revealed distinct degrees of variability and identity among the coding gene sequences obtained from 12 geographically different B. bovis isolates and a reference strain. Overall sequence identities of 47.7%, 72.3%, 87.7%, and 94% were determined for msa-1, msa-2b, msa-2c, and ssrRNA, respectively. A robust phylogenetic tree was obtained with msa-2b sequences. The phylogenetic analysis suggests that Mexican B. bovis isolates group in clades not concordant with the Mexican geography. However, the Mexican isolates group together in an American clade separated from the Australian clade. Sequence heterogeneity in msa-1, msa-2b, and msa-2c coding regions of Mexican B. bovis isolates present in different geographical regions can be a result of either differential evolutive pressure or cattle movement from commercial trade.

  14. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula.

    PubMed

    Ingvarsson, Pär K

    2007-03-01

    Codon bias is generally thought to be determined by a balance between mutation, genetic drift, and natural selection on translational efficiency. However, natural selection on codon usage is considered to be a weak evolutionary force and selection on codon usage is expected to be strongest in species with large effective population sizes. In this paper, I study associations between codon usage, gene expression, and molecular evolution at synonymous and nonsynonymous sites in the long-lived, woody perennial plant Populus tremula (Salicaceae). Using expression data for 558 genes derived from expressed sequence tags (EST) libraries from 19 different tissues and developmental stages, I study how gene expression levels within single tissues as well as across tissues affect codon usage and rates sequence evolution at synonymous and nonsynonymous sites. I show that gene expression have direct effects on both codon usage and the level of selective constraint of proteins in P. tremula, although in different ways. Codon usage genes is primarily determined by how highly expressed a genes is, whereas rates of sequence evolution are primarily determined by how widely expressed genes are. In addition to the effects of gene expression, protein length appear to be an important factor influencing virtually all aspects of molecular evolution in P. tremula.

  15. Organization and nucleotide sequence analysis of a ribosomal RNA gene cluster from Streptomyces ambofaciens.

    PubMed

    Pernodet, J L; Boccard, F; Alegre, M T; Gagnat, J; Guérineau, M

    1989-06-30

    The Streptomyces ambofaciens genome contains four rRNA gene clusters. These copies are called rrnA, B, C and D. The complete nucleotide (nt) sequence of rrnD has been determined. These genes possess striking similarity with other eubacterial rRNA genes. Comparison with other rRNA sequences allowed the putative localization of the sequences encoding mature rRNAs. The structural genes are arranged in the order 16S-23S-5S and are tightly linked. The mature rRNAs are predicted to contain 1528, 3120 and 120 nt, for the 16S, 23S and 5S rRNAs, respectively. The 23S rRNA is, to our knowledge, the longest of all sequenced prokaryotic 23S rRNAs. When compared to other large rRNAs it shows insertions at positions where they are also present in archaebacterial and in eukaryotic large rRNAs. Secondary structure models of S. ambofaciens rRNAs are proposed, based upon those existing for other bacterial rRNAs. Positions of putative transcription start points and of a termination signal are suggested. The corresponding putative primary transcript, containing the 16S, 23S and 5S rRNAs plus flanking regions, was folded into a secondary structure, and sequences possibly involved in rRNA maturation are described. The G + C content of the rRNA gene cluster is low (57%) compared with the overall G + C content of Streptomyces DNA (73%).

  16. AST: an automated sequence-sampling method for improving the taxonomic diversity of gene phylogenetic trees.

    PubMed

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.

  17. AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

    PubMed Central

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. PMID:24892935

  18. Distant horizontal gene transfer is rare for multiple families of prokaryotic insertion sequences.

    PubMed

    Wagner, Andreas; de la Chaux, Nicole

    2008-11-01

    Horizontal gene transfer in prokaryotes is rampant on short and intermediate evolutionary time scales. It poses a fundamental problem to our ability to reconstruct the evolutionary tree of life. Is it also frequent over long evolutionary distances? To address this question, we analyzed the evolution of 2,091 insertion sequences from all 20 major families in 438 completely sequenced prokaryotic genomes. Specifically, we mapped insertion sequence occurrence on a 16S rDNA tree of the genomes we analyzed, and we also constructed phylogenetic trees of the insertion sequence transposase coding sequences. We found only 30 cases of likely horizontal transfer among distantly related prokaryotic clades. Most of these horizontal transfer events are ancient. Only seven events are recent. Almost all of these transfer events occur between pairs of human pathogens or commensals. If true also for other, non-mobile DNA, the rarity of distant horizontal transfer increases the odds of reliable phylogenetic inference from sequence data.

  19. Nucleotide sequence and taxonomical distribution of the bacteriocin gene lin cloned from Brevibacterium linens M18.

    PubMed

    Valdes-Stauber, N; Scherer, S

    1996-04-01

    Linocin M18 is an antilisterial bacteriocin produced by the red smear cheese bacterium Brevibacterium linens M18. Oligonucleotide probes based on the N-terminal amino acid sequence were used to locate its single copy gene, lin, on the chromosomal DNA. The amino acid composition, N-terminal sequence, and molecular mass derived from the nucleotide sequence of an open reading frame of 798 nucleotides coding for 266 amino acids found on a 3-kb BamHI restriction fragment correspond closely to those obtained from the purified protein (N. Valdés-Stauber and S. Scherer, Appl. Environ. Microbiol. 60:3809-3814, 1994). No sequence homology to any protein or nucleotide sequences deposited in databases was found. Comparison of the nucleotide sequence and the N-terminal amino acid sequence derived from the protein suggests that B. linens M18 produces an N-formyl-methionyl-CAC tRNA. A wide taxonomical distribution of the gene within coryneform bacteria has been demonstrated by PCR amplification. The structural gene from linocin M18 is present at least in three Brevibacterium species, five Arthrobacter species, and five Corynebacterium species.

  20. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  1. Interference in transcription of overexpressed genes by promoter-proximal downstream sequences

    PubMed Central

    Turchinovich, A.; Surowy, H. M.; Tonevitsky, A. G.; Burwinkel, B.

    2016-01-01

    Despite a high sequence homology among four human RNAi-effectors Argonaute proteins and their coding sequences, the efficiency of ectopic overexpression of AGO3 and AGO4 coding sequences in human cells is greatly reduced as compared to AGO1 and AGO2. While investigating this phenomenon, we documented the existence of previously uncharacterized mechanism of gene expression regulation, which is manifested in greatly varying basal transcription levels from the RNApolII promoters depending on the promoter-proximal downstream sequences. Specifically, we show that distinct overexpression of Argonaute coding sequences cannot be explained by mRNA degradation in the cytoplasm or nucleus, and exhibits on transcriptional level. Furthermore, the first 1000–2000 nt located immediately downstream the promoter had the most critical influence on ectopic gene overexpression. The transcription inhibiting effect, associated with those downstream sequences, subsided with increasing distance to the promoter and positively correlated with promoter strength. We hypothesize that the same mechanism, which we named promoter proximal inhibition (PPI), could generally contribute to basal transcription levels of genes, and could be mainly responsible for the essence of difficult-to-express recombinant proteins. Finally, our data reveal that expression of recombinant proteins in human cells can be greatly enhanced by using more permissive promoter adjacent downstream sequences. PMID:27485701

  2. Differentiation of Xylella fastidiosa strains via multilocus sequence analysis of environmentally mediated genes (MLSA-E).

    PubMed

    Parker, Jennifer K; Havird, Justin C; De La Fuente, Leonardo

    2012-03-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing

  3. Identification of expressed resistance gene analogs from peanut (Arachis hypogaea L.) expressed sequence tags.

    PubMed

    Liu, Zhanji; Feng, Suping; Pandey, Manish K; Chen, Xiaoping; Culbreath, Albert K; Varshney, Rajeev K; Guo, Baozhu

    2013-05-01

    Low genetic diversity makes peanut (Arachis hypogaea L.) very vulnerable to plant pathogens, causing severe yield loss and reduced seed quality. Several hundred partial genomic DNA sequences as nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance genes (R) have been identified, but a small portion with expressed transcripts has been found. We aimed to identify resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs) and to develop polymorphic markers. The protein sequences of 54 known R genes were used to identify homologs from peanut ESTs from public databases. A total of 1,053 ESTs corresponding to six different classes of known R genes were recovered, and assembled 156 contigs and 229 singletons as peanut-expressed RGAs. There were 69 that encoded for NBS-LRR proteins, 191 that encoded for protein kinases, 82 that encoded for LRR-PK/transmembrane proteins, 28 that encoded for Toxin reductases, 11 that encoded for LRR-domain containing proteins and four that encoded for TM-domain containing proteins. Twenty-eight simple sequence repeats (SSRs) were identified from 25 peanut expressed RGAs. One SSR polymorphic marker (RGA121) was identified. Two polymerase chain reaction-based markers (Ahsw-1 and Ahsw-2) developed from RGA013 were homologous to the Tomato Spotted Wilt Virus (TSWV) resistance gene. All three markers were mapped on the same linkage group AhIV. These expressed RGAs are the source for RGA-tagged marker development and identification of peanut resistance genes.

  4. Transcriptome sequencing of black grouse (Tetrao tetrix) for immune gene discovery and microsatellite development

    PubMed Central

    Wang, Biao; Ekblom, Robert; Castoe, Todd A.; Jones, Eleanor P.; Kozma, Radoslav; Bongcam-Rudloff, Erik; Pollock, David D.; Höglund, Jacob

    2012-01-01

    The black grouse (Tetrao tetrix) is a galliform bird species that is important for both ecological studies and conservation genetics. Here, we report the sequencing of the spleen transcriptome of black grouse using 454 GS FLX Titanium sequencing. We performed a large-scale gene discovery analysis with a focus on genes that might be related to fitness in this species and also identified a large set of microsatellites. In total, we obtained 182 179 quality-filtered sequencing reads that we assembled into 9035 contigs. Using these contigs and 15 794 length-filtered (greater than 200 bp) singletons, we identified 7762 transcripts that appear to be homologues of chicken genes. A specific BLAST search with an emphasis on immune genes found 308 homologous chicken genes that have immune function, including ten major histocompatibility complex-related genes located on chicken chromosome 16. We also identified 1300 expressed sequence tag microsatellites and were able to design suitable flanking primers for 526 of these. A preliminary test of the polymorphism of the microsatellites found 10 polymorphic microsatellites of the 102 tested. Genomic resources generated in this study should greatly benefit future ecological, evolutionary and conservation genetic studies on this species. PMID:22724064

  5. Transcriptome sequencing of black grouse (Tetrao tetrix) for immune gene discovery and microsatellite development.

    PubMed

    Wang, Biao; Ekblom, Robert; Castoe, Todd A; Jones, Eleanor P; Kozma, Radoslav; Bongcam-Rudloff, Erik; Pollock, David D; Höglund, Jacob

    2012-04-01

    The black grouse (Tetrao tetrix) is a galliform bird species that is important for both ecological studies and conservation genetics. Here, we report the sequencing of the spleen transcriptome of black grouse using 454 GS FLX Titanium sequencing. We performed a large-scale gene discovery analysis with a focus on genes that might be related to fitness in this species and also identified a large set of microsatellites. In total, we obtained 182 179 quality-filtered sequencing reads that we assembled into 9035 contigs. Using these contigs and 15 794 length-filtered (greater than 200 bp) singletons, we identified 7762 transcripts that appear to be homologues of chicken genes. A specific BLAST search with an emphasis on immune genes found 308 homologous chicken genes that have immune function, including ten major histocompatibility complex-related genes located on chicken chromosome 16. We also identified 1300 expressed sequence tag microsatellites and were able to design suitable flanking primers for 526 of these. A preliminary test of the polymorphism of the microsatellites found 10 polymorphic microsatellites of the 102 tested. Genomic resources generated in this study should greatly benefit future ecological, evolutionary and conservation genetic studies on this species.

  6. SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

    PubMed Central

    Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

    2001-01-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202

  7. Complexity of genetic sequences modified by horizontal gene transfer and degraded-DNA uptake

    NASA Astrophysics Data System (ADS)

    Tremberger, George; Dehipawala, S.; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    Horizontal gene transfer has been a major vehicle for efficient transfer of genetic materials among living species and could be one of the sources for noncoding DNA incorporation into a genome. Our previous study of lnc- RNA sequence complexity in terms of fractal dimension and information entropy shows a tight regulation among the studied genes in numerous diseases. The role of sequence complexity in horizontal transferred genes was investigated with Mealybug in symbiotic relation with a 139K genome microbe and Deinococcus radiodurans as examples. The fractal dimension and entropy showed correlation R-sq of 0.82 (N = 6) for the studied Deinococcus radiodurans sequences. For comparison the Deinococcus radiodurans oxidative stress tolerant catalase and superoxide dismutase genes under extracellular dGMP growth condition showed R-sq ~ 0.42 (N = 6); and the studied arsenate reductase horizontal transferred genes for toxicity survival in several microorganisms showed no correlation. Simulation results showed that R-sq < 0.4 would be improbable at less than one percent chance, suggestive of additional selection pressure when compared to the R-sq ~ 0.29 (N = 21) in the studied transferred genes in Mealybug. The mild correlation of R-sq ~ 0.5 for fractal dimension versus transcription level in the studied Deinococcus radiodurans sequences upon extracellular dGMP growth condition would suggest that lower fractal dimension with less electron density fluctuation favors higher transcription level.

  8. Alu sequence involvement in transcriptional insulation of the keratin 18 gene in transgenic mice.

    PubMed Central

    Thorey, I S; Ceceña, G; Reynolds, W; Oshima, R G

    1993-01-01

    The human keratin 18 (K18) gene is expressed in a variety of adult simple epithelial tissues, including liver, intestine, lung, and kidney, but is not normally found in skin, muscle, heart, spleen, or most of the brain. Transgenic animals derived from the cloned K18 gene express the transgene in appropriate tissues at levels directly proportional to the copy number and independently of the sites of integration. We have investigated in transgenic mice the dependence of K18 gene expression on the distal 5' and 3' flanking sequences and upon the RNA polymerase III promoter of an Alu repetitive DNA transcription unit immediately upstream of the K18 promoter. Integration site-independent expression of tandemly duplicated K18 transgenes requires the presence of either an 825-bp fragment of the 5' flanking sequence or the 3.5-kb 3' flanking sequence. Mutation of the RNA polymerase III promoter of the Alu element within the 825-bp fragment abolishes copy number-dependent expression in kidney but does not abolish integration site-independent expression when assayed in the absence of the 3' flanking sequence of the K18 gene. The characteristics of integration site-independent expression and copy number-dependent expression are separable. In addition, the formation of the chromatin state of the K18 gene, which likely restricts the tissue-specific expression of this gene, is not dependent upon the distal flanking sequences of the 10-kb K18 gene but rather may depend on internal regulatory regions of the gene. Images PMID:7692231

  9. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  10. A semiautomated approach to gene discovery through expressed sequence tag data mining: discovery of new human transporter genes.

    PubMed

    Brown, Shoshana; Chang, Jean L; Sadée, Wolfgang; Babbitt, Patricia C

    2003-01-01

    Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.

  11. Versatile Cosmid Vectors for the Isolation, Expression, and Rescue of Gene Sequences: Studies with the Human α -globin Gene Cluster

    NASA Astrophysics Data System (ADS)

    Lau, Yun-Fai; Kan, Yuet Wai

    1983-09-01

    We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.

  12. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  13. Molecular cloning, sequence characterization, and gene expression profiling of a novel water buffalo (Bubalus bubalis) gene, AGPAT6.

    PubMed

    Song, S; Huo, J L; Li, D L; Yuan, Y Y; Yuan, F; Miao, Y W

    2013-10-01

    Several 1-acylglycerol-3-phosphate-O-acyltransferases (AGPATs) can acylate lysophosphatidic acid to produce phosphatidic acid. Of the eight AGPAT isoforms, AGPAT6 is a crucial enzyme for glycerolipids and triacylglycerol biosynthesis in some mammalian tissues. We amplified and identified the complete coding sequence (CDS) of the water buffalo AGPAT6 gene by using the reverse transcription-polymerase chain reaction, based on the conversed sequence information of the cattle or expressed sequence tags of other Bovidae species. This novel gene was deposited in the NCBI database (accession No. JX518941). Sequence analysis revealed that the CDS of this AGPAT6 encodes a 456-amino acid enzyme (molecular mass = 52 kDa; pI = 9.34). Water buffalo AGPAT6 contains three hydrophobic transmembrane regions and a signal 37-amino acid peptide, localized in the cytoplasm. The deduced amino acid sequences share 99, 98, 98, 97, 98, 98, 97 and 95% identity with their homologous sequences from cattle, horse, human, mouse, orangutan, pig, rat, and chicken, respectively. The phylogenetic tree analysis based on the AGPAT6 CDS showed that water buffalo has a closer genetic relationship with cattle than with other species. Tissue expression profile analysis shows that this gene is highly expressed in the mammary gland, moderately expressed in the heart, muscle, liver, and brain; weakly expressed in the pituitary gland, spleen, and lung; and almost silently expressed in the small intestine, skin, kidney, and adipose tissues. Four predicted microRNA target sites are found in the water buffalo AGPAT6 CDS. These results will establish a foundation for further insights into this novel water buffalo gene.

  14. Cloning and sequencing of the beta-glucosidase gene from Acetobacter xylinum ATCC 23769.

    PubMed

    Tajima, K; Nakajima, K; Yamashita, H; Shiba, T; Munekata, M; Takai, M

    2001-12-31

    The beta-glucosidase gene (bglxA) was cloned from the genomic DNA of Acetobacter xylinum ATCC 23769 and its nucleotide sequence (2200 bp) was determined. This bglxA gene was present downstream of the cellulose synthase operon and coded for a polypeptide of molecular mass 79 kDa. The overexpression of the beta-glucosidase in A. xylinum caused a tenfold increase in activity compared to the wild-type strain. In addition, the action pattern of the enzyme was identified as G3ase activity. The deduced amino acid sequence of the bglxA gene showed 72.3%, 49.6%, and 45.1% identity with the beta-glucosidases from A. xylinum subsp. sucrofermentans, Cellvibrio gilvus, and Mycobacterium tuberculosis, respectively. Based on amino acid sequence similarities, the beta-glucosidase (BglxA) was assigned to family 3 of the glycosyl hydrolases.

  15. Gene organization and complete sequence of the mitochondrial genome of Linwu mallard.

    PubMed

    Tian, Ke-Xiong; Liu, Li-Li; Yu, Qi-Fang; He, Shao-Ping; He, Jian-Hua

    2016-01-01

    Linwu mallard is an excellent native breeds from Hunan province in China. This is the first study to determine the complete mitochondrial genome sequence of L. mallard using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, with the base composition of 29.19% A, 22.19% T, 32.83% C, 15.79% G in the L. mallard (16,605 bp in length). It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of L. mallard will be useful for the phylogenetics of poultry, and be available as basic data for the genetics and breeding.

  16. Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi

    PubMed Central

    Cacho, Ralph A.; Tang, Yi; Chooi, Yit-Heng

    2015-01-01

    Genomics has revolutionized the research on fungal secondary metabolite (SM) biosynthesis. To elucidate the molecular and enzymatic mechanisms underlying the biosynthesis of a specific SM compound, the important first step is often to find the genes that responsible for its synthesis. The accessibility to fungal genome sequences allows the bypass of the cumbersome traditional library construction and screening approach. The advance in next-generation sequencing (NGS) technologies have further improved the speed and reduced the cost of microbial genome sequencing in the past few years, which has accelerated the research in this field. Here, we will present an example work flow for identifying the gene cluster encoding the biosynthesis of SMs of interest using an NGS approach. We will also review the different strategies that can be employed to pinpoint the targeted gene clusters rapidly by giving several examples stemming from our work. PMID:25642215

  17. Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...

  18. Putative and unique gene sequence utilization for the design of species specific probes as modeled by Lactobacillus plantarum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...

  19. Dinoflagellate Phylogeny as Inferred from Heat Shock Protein 90 and Ribosomal Gene Sequences

    PubMed Central

    Hoppenrath, Mona; Leander, Brian S.

    2010-01-01

    Background Interrelationships among dinoflagellates in molecular phylogenies are largely unresolved, especially in the deepest branches. Ribosomal DNA (rDNA) sequences provide phylogenetic signals only at the tips of the dinoflagellate tree. Two reasons for the poor resolution of deep dinoflagellate relationships using rDNA sequences are (1) most sites are relatively conserved and (2) there are different evolutionary rates among sites in different lineages. Therefore, alternative molecular markers are required to address the deeper phylogenetic relationships among dinoflagellates. Preliminary evidence indicates that the heat shock protein 90 gene (Hsp90) will provide an informative marker, mainly because this gene is relatively long and appears to have relatively uniform rates of evolution in different lineages. Methodology/Principal Findings We more than doubled the previous dataset of Hsp90 sequences from dinoflagellates by generating additional sequences from 17 different species, representing seven different orders. In order to concatenate the Hsp90 data with rDNA sequences, we supplemented the Hsp90 sequences with three new SSU rDNA sequences and five new LSU rDNA sequences. The new Hsp90 sequences were generated, in part, from four additional heterotrophic dinoflagellates and the type species for six different genera. Molecular phylogenetic analyses resulted in a paraphyletic assemblage near the base of the dinoflagellate tree consisting of only athecate species. However, Noctiluca was never part of this assemblage and branched in a position that was nested within other lineages of dinokaryotes. The phylogenetic trees inferred from Hsp90 sequences were consistent with trees inferred from rDNA sequences in that the backbone of the dinoflagellate clade was largely unresolved. Conclusions/Significance The sequence conservation in both Hsp90 and rDNA sequences and the poor resolution of the deepest nodes suggests that dinoflagellates reflect an explosive

  20. Genome-Wide Sequencing for Prenatal Detection of Fetal Single-Gene Disorders.

    PubMed

    van den Veyver, Ignatia B; Eng, Christine M

    2015-08-07

    New sequencing methods capable of rapidly analyzing the genome at increasing resolution have transformed diagnosis of single-gene or oligogenic genetic disorders in pediatric and adult medicine. Targeted tests, consisting of disease-focused multigene panels and diagnostic exome sequencing to interrogate the sequence of the coding regions of nearly all genes, are now clinically offered when there is suspicion for an undiagnosed genetic disorder or cancer in children and adults. Implementation of diagnostic exome and genome sequencing tests on invasively and noninvasively obtained fetal DNA samples for prenatal genetic diagnosis is also being explored. We predict that they will become more widely integrated into prenatal care in the near future. Providers must prepare for the practical, ethical, and societal dilemmas that accompany the capacity to generate and analyze large amounts of genetic information about the fetus during pregnancy.

  1. Expression and nucleotide sequence of the Clostridium acetobutylicum beta-galactosidase gene cloned in Escherichia coli.

    PubMed Central

    Hancock, K R; Rockman, E; Young, C A; Pearce, L; Maddox, I S; Scott, D B

    1991-01-01

    A gene library for Clostridium acetobutylicum NCIB 2951 was constructed in the broad-host-range cosmid pLAFR1, and cosmids containing the beta-galactosidase gene were isolated by direct selection for enzyme activity on X-Gal (5-bromo-4-chloro-3-indolyl-beta-D-galactoside) plates after conjugal transfer of the library to a lac deletion derivative of Escherichia coli. Analysis of various pSUP202 subclones of the lac cosmids on X-Gal plates localized the beta-galactosidase gene to a 5.1-kb EcoRI fragment. Expression of the Clostridium beta-galactosidase gene in E. coli was not subject to glucose repression. By using transposon Tn5 mutagenesis, two gene loci, cbgA (locus I) and cbgR (locus II), were identified as necessary for beta-galactosidase expression in E. coli. DNA sequence analysis of the entire 5.1-kb fragment identified open reading frames of 2,691 and 303 bp, corresponding to locus I and locus II, respectively, and in addition a third truncated open reading frame of 825 bp. The predicted gene product of locus I, CbgA (molecular size, 105 kDa), showed extensive amino acid sequence homology with E. coli LacZ, E. coli EbgA, and Klebsiella pneumoniae LacZ and was in agreement with the size of a polypeptide synthesized in maxicells containing the cloned 5.1-kb fragment. The predicted gene product of locus II, CbgR (molecular size, 11 kDa) shares no significant homology with any other sequence in the current DNA and protein sequence data bases, but Tn5 insertions in this gene prevent the synthesis of CbgA. Complementation experiments indicate that the gene product of cbgR is required in cis with cbgA for expression of beta-galactosidase in E. coli. Images PMID:1850729

  2. Gene sequence variability of the three surface proteins of human respiratory syncytial virus (HRSV) in Texas.

    PubMed

    Tapia, Lorena I; Shaw, Chad A; Aideyan, Letisha O; Jewell, Alan M; Dawson, Brian C; Haq, Taha R; Piedra, Pedro A

    2014-01-01

    Human respiratory syncytial virus (HRSV) has three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion (F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH-G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses were detected in 2004-2005. Different genetic variability at nucleotide level was detected between the genes, with G gene being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the antigenic site Ø in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially immune-protected community.

  3. Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis

    PubMed Central

    Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.

    2011-01-01

    The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189

  4. Whole exome sequencing in females with autism implicates novel and candidate genes.

    PubMed

    Butler, Merlin G; Rafi, Syed K; Hossain, Waheeda; Stephan, Dietrich A; Manzardo, Ann M

    2015-01-07

    Classical autism or autistic disorder belongs to a group of genetically heterogeneous conditions known as Autism Spectrum Disorders (ASD). Heritability is estimated as high as 90% for ASD with a recently reported compilation of 629 clinically relevant candidate and known genes. We chose to undertake a descriptive next generation whole exome sequencing case study of 30 well-characterized Caucasian females with autism (average age, 7.7 ± 2.6 years; age range, 5 to 16 years) from multiplex families. Genomic DNA was used for whole exome sequencing via paired-end next generation sequencing approach and X chromosome inactivation status. The list of putative disease causing genes was developed from primary selection criteria using machine learning-derived classification score and other predictive parameters (GERP2, PolyPhen2, and SIFT). We narrowed the variant list to 10 to 20 genes and screened for biological significance including neural development, function and known neurological disorders. Seventy-eight genes identified met selection criteria ranging from 1 to 9 filtered variants per female. Five females presented with functional variants of X-linked genes (IL1RAPL1, PIR, GABRQ, GPRASP2, SYTL4) with cadherin, protocadherin and ankyrin repeat gene families most commonly altered (e.g., CDH6, FAT2, PCDH8, CTNNA3, ANKRD11). Other genes related to neurogenesis and neuronal migration (e.g., SEMA3F, MIDN), were also identified.

  5. Whole Exome Sequencing in Females with Autism Implicates Novel and Candidate Genes

    PubMed Central

    Butler, Merlin G.; Rafi, Syed K.; Hossain, Waheeda; Stephan, Dietrich A.; Manzardo, Ann M.

    2015-01-01

    Classical autism or autistic disorder belongs to a group of genetically heterogeneous conditions known as Autism Spectrum Disorders (ASD). Heritability is estimated as high as 90% for ASD with a recently reported compilation of 629 clinically relevant candidate and known genes. We chose to undertake a descriptive next generation whole exome sequencing case study of 30 well-characterized Caucasian females with autism (average age, 7.7 ± 2.6 years; age range, 5 to 16 years) from multiplex families. Genomic DNA was used for whole exome sequencing via paired-end next generation sequencing approach and X chromosome inactivation status. The list of putative disease causing genes was developed from primary selection criteria using machine learning-derived classification score and other predictive parameters (GERP2, PolyPhen2, and SIFT). We narrowed the variant list to 10 to 20 genes and screened for biological significance including neural development, function and known neurological disorders. Seventy-eight genes identified met selection criteria ranging from 1 to 9 filtered variants per female. Five females presented with functional variants of X-linked genes (IL1RAPL1, PIR, GABRQ, GPRASP2, SYTL4) with cadherin, protocadherin and ankyrin repeat gene families most commonly altered (e.g., CDH6, FAT2, PCDH8, CTNNA3, ANKRD11). Other genes related to neurogenesis and neuronal migration (e.g., SEMA3F, MIDN), were also identified. PMID:25574603

  6. Characterization and analysis of an industrial strain of Streptomyces bingchenggensis by genome sequencing and gene microarray.

    PubMed

    Wang, Xiang-Jing; Zhang, Bo; Yan, Yi-Jun; An, Jing; Zhang, Ji; Liu, Chong-Xi; Xiang, Wen-Sheng

    2013-11-01

    Streptomyces bingchenggensis is a soil bacterium that produces milbemycins, a family of macrolide antibiotics that are commercially important in crop protection and veterinary medicine. In addition, S. bingchenggensis produces many other natural products including the polyether nanchangmycin and novel cyclic pentapeptides. To identify the gene clusters involved in the biosynthesis of these compounds, and better clarify the biochemical pathways of these gene clusters, the whole genome of S. bingchenggensis was sequenced, and the transcriptome profile was subsequently investigated by microarray. In comparison with other sequenced genomes in Streptomyces, S. bingchenggensis has the largest linear chromosome consisting of 11 936 683 base pairs (bp), with an average GC content of 70.8%. The 10 023 predicted protein-coding sequences include at least 47 gene clusters correlated with the biosynthesis of known or predicted secondary metabolites. Transcriptional analysis demonstrated an extremely high expression level of the milbemycin gene cluster during the entire growth period and a moderately high expression level of the nanchangmycin gene cluster during the initial hours that subsequently decreased. However, other gene clusters appear to be silent. The genome-wide analysis of the secondary metabolite gene clusters in S. bingchenggensis, coupled with transcriptional analysis, will facilitate the rational development of high milbemycins-producing strains as well as the discovery of new natural products.

  7. A Synthesis Method of Gene Networks Having Cyclic Expression Pattern Sequences by Network Learning

    NASA Astrophysics Data System (ADS)

    Mori, Yoshihiro; Kuroe, Yasuaki

    Recently, synthesis of gene networks having desired functions has become of interest to many researchers because it is a complementary approach to understanding gene networks, and it could be the first step in controlling living cells. There exist several periodic phenomena in cells, e.g. circadian rhythm. These phenomena are considered to be generated by gene networks. We have already proposed synthesis method of gene networks based on gene expression. The method is applicable to synthesizing gene networks possessing the desired cyclic expression pattern sequences. It ensures that realized expression pattern sequences are periodic, however, it does not ensure that their corresponding solution trajectories are periodic, which might bring that their oscillations are not persistent. In this paper, in order to resolve the problem we propose a synthesis method of gene networks possessing the desired cyclic expression pattern sequences together with their corresponding solution trajectories being periodic. In the proposed method the persistent oscillations of the solution trajectories are realized by specifying passing points of them.

  8. Candidate Resistant Genes of Sand Pear (Pyrus pyrifolia Nakai) to Alternaria alternata Revealed by Transcriptome Sequencing.

    PubMed

    Yang, Xiaoping; Hu, Hongju; Yu, Dazhao; Sun, Zhonghai; He, Xiujuan; Zhang, Jingguo; Chen, Qiliang; Tian, Rui; Fan, Jing

    2015-01-01

    Pear black spot (PBS) disease, which is caused by Alternaria alternata (Aa), is one of the most serious diseases affecting sand pear (Pyrus pyrifolia Nakai) cultivation worldwide. To investigate the defense mechanisms of sand pear in response to Aa, the transcriptome of a sand pear germplasm with differential resistance to Aa was analyzed using Illumina paired-end sequencing. Four libraries derived from PBS-resistant and PBS-susceptible sand pear leaves were characterized through inoculation or mock-inoculation. In total, 20.5 Gbp of sequence data and 101,632,565 reads were generated, representing 44717 genes. Approximately 66% of the genes or sequenced reads could be aligned to the pear reference genome. A large number (5213) of differentially expressed genes related to PBS resistance were obtained; 34 microsatellites were detected in these genes, and 28 genes were found to be closely related to PBS resistance. Using a transcriptome analysis in response to PBS inoculation and comparison analysis to the PHI database, 4 genes (Pbr039001, Pbr001627, Pbr025080 and Pbr023112) were considered to be promising candidates for sand pear resistance to PBS. This study provides insight into changes in the transcriptome of sand pear in response to PBS infection, and the findings have improved our understanding of the resistance mechanism of sand pear to PBS and will facilitate future gene discovery and functional genome studies of sand pear.

  9. Discrimination of germline V genes at different sequencing lengths and mutational burdens: A new tool for identifying and evaluating the reliability of V gene assignment.

    PubMed

    Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri

    2015-12-01

    Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences.

  10. Human ribosomal RNA gene: nucleotide sequence of the transcription initiation region and comparison of three mammalian genes.

    PubMed Central

    Financsek, I; Mizumoto, K; Mishima, Y; Muramatsu, M

    1982-01-01

    The transcription initiation site of the human ribosomal RNA gene (rDNA) was located by using the single-strand specific nuclease protection method and by determining the first nucleotide of the in vitro capped 45S preribosomal RNA. The sequence of 1,211 nucleotides surrounding the initiation site was determined. The sequenced region was found to consist of 75% G and C and to contain a number of short direct and inverted repeats and palindromes. By comparison of the corresponding initiation regions of three mammalian species, several conserved sequences were found upstream and downstream from the transcription starting point. Two short A + T-rich sequences are present on human, mouse, and rat ribosomal RNA genes between the initiation site and 40 nucleotides upstream, and a C + T cluster is located at a position around -60. At and downstream from the initiation site, a common sequence, T-AG-C-T-G-A-C-A-C-G-C-T-G-T-C-C-T-CT-T, was found in the three genes from position -1 through +18. The strong conservation of these sequences suggests their functional significance in rDNA. The S1 nuclease protection experiments with cloned rDNA fragments indicated the presence in human 45S RNA of molecules several hundred nucleotides shorter than the supposed primary transcript. The first 19 nucleotides of these molecules appear identical--except for one mismatch--to the nucleotide sequence of the 5' end of a supposed early processing product of the mouse 45S RNA. Images PMID:6954460

  11. Sperm competition shapes gene expression and sequence evolution in the ocellated wrasse.

    PubMed

    Dean, Rebecca; Wright, Alison E; Marsh-Rollo, Susan E; Nugent, Bridget M; Alonzo, Suzanne H; Mank, Judith E

    2017-01-01

    Gene expression differences between males and females often underlie sexually dimorphic phenotypes, and the expression levels of genes that are differentially expressed between the sexes are thought to respond to sexual selection. Most studies on the transcriptomic response to sexual selection treat sexual selection as a single force, but postmating sexual selection in particular is expected to specifically target gonadal tissue. The three male morphs of the ocellated wrasse (Symphodus ocellatus) make it possible to test the role of postmating sexual selection in shaping the gonadal transcriptome. Nesting males hold territories and have the highest reproductive success, yet we detected feminization of their gonadal gene expression compared to satellite males. Satellite males are less brightly coloured and experience more intense sperm competition than nesting males. In line with postmating sexual selection affecting gonadal gene expression, we detected a more masculinized expression profile in satellites. Sneakers are the lowest quality males and showed both de-masculinization and de-feminization of gene expression. We also detected higher rates of gene sequence evolution of male-biased genes compared to unbiased genes, which could at least in part be explained by positive selection. Together, these results reveal the potential for postmating sexual selection to drive higher rates of gene sequence evolution and shape the gonadal transcriptome profile.

  12. Nucleotide sequence and transcriptional analysis of the type A2 neurotoxin gene cluster in Clostridium botulinum.

    PubMed

    Dineen, Sean S; Bradshaw, Marite; Karasek, Charles E; Johnson, Eric A

    2004-06-01

    The nucleotide sequences of the upstream regions of the botulinum neurotoxin type A1 (BoNT/A1) cluster of Clostridium botulinum strain NCTC 2916 and the BoNT/A2 cluster of strain Kyoto-F were determined. A novel gene, designated orfx3, was identified following the orfx2 gene in both clusters. ORF-X2 and ORF-X3 exhibit similarity to the BoNT cluster associated P-47 protein. The BoNT/A1 and BoNT/A2 clusters share a similar gene arrangement, but exhibit differences in the spacing between certain genes. Sequences with similarity to transposases were identified in these intergenic regions, suggesting that these differences arose from an ancestral insertion event. Transcriptional analysis of the BoNT/A2 cluster revealed that the genes of the cluster are primarily synthesized as three polycistronic transcripts. Two divergent polycistronic transcripts, one encoding the orfx1, orfx2, and orfx3 genes, the second encoding the p47, ntnh, and bont/a2 genes, are transcribed from conserved BoNT cluster promoters. The third polycistronic transcript, expressed at low levels, encodes the positive regulatory botR gene and the orfx genes. This is the first complete analysis of a botulinum toxin A2 cluster.

  13. Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads.

    PubMed

    Dong, Jiaqiang; Feng, Yaping; Kumar, Dibyendu; Zhang, Wei; Zhu, Tingting; Luo, Ming-Cheng; Messing, Joachim

    2016-07-19

    Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41-48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups.

  14. Hunting Down Frame Shifts: Ecological Analysis of Diverse Functional Gene Sequences

    PubMed Central

    Strejcek, Michal; Wang, Qiong; Ridl, Jakub; Uhlik, Ondrej

    2015-01-01

    Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frameshifts (FS). Genes encoding for alpha subunits of biphenyl (bphA) and benzoate (benA) dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 44% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of maximum expected error filtering and single linkage pre-clustering proved to be the most efficient read processing approach. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study or available at https://github.com/strejcem/FBdenovo. The tool was also implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/. PMID:26635739

  15. Recombination sequences in plant mitochondrial genomes: diversity and homologies to known mitochondrial genes.

    PubMed Central

    Stern, D B; Palmer, J D

    1984-01-01

    Several plant mitochondrial genomes contain repeated sequences that are postulated to be sites of homologous intragenomic recombination (1-3). In this report, we have used filter hybridizations to investigate sequence relationships between the cloned mitochondrial DNA (mtDNA) recombination repeats from turnip, spinach and maize and total mtDNA isolated from thirteen species of angiosperms. We find that strong sequence homologies exist between the spinach and turnip recombination repeats and essentially all other mitochondrial genomes tested, whereas a major maize recombination repeat does not hybridize to any other mtDNA. The sequences homologous to the turnip repeat do not appear to function in recombination in any other genome, whereas the spinach repeat hybridizes to reiterated sequences within the mitochondrial genomes of wheat and two species of pokeweed that do appear to be sites of recombination. Thus, although intragenomic recombination is a widespread phenomenon in plant mitochondria, it appears that different sequences either serve as substrates for this function in different species, or else surround a relatively short common recombination site which does not cross-hybridize under our experimental conditions. Identified gene sequences from maize mtDNA were used in heterologous hybridizations to show that the repeated sequences implicated in recombination in turnip and spinach/pokeweed/wheat mitochondria include, or are closely linked to genes for subunit II of cytochrome c oxidase and 26S rRNA, respectively. Together with previous studies indicating that the 18S rRNA gene in wheat mtDNA is contained within a recombination repeat (3), these results imply an unexpectedly frequent association between recombination repeats and plant mitochondrial genes. Images PMID:6473104

  16. Characterization of microsatellites and gene contents from genome shotgun sequences of mungbean (Vigna radiata (L.) Wilczek)

    PubMed Central

    2009-01-01

    Background Mungbean is an important economical crop in Asia. However, genomic research has lagged behind other crop species due to the lack of polymorphic DNA markers found in this crop. The objective of this work is to develop and characterize microsatellite or simple sequence repeat (SSR) markers from genome shotgun sequencing of mungbean. Result We have generated and characterized a total of 470,024 genome shotgun sequences covering 100.5 Mb of the mungbean (Vigna radiata (L.) Wilczek) genome using 454 sequencing technology. We identified 1,493 SSR motifs that could be used as potential molecular markers. Among 192 tested primer pairs in 17 mungbean accessions, 60 loci revealed polymorphism with polymorphic information content (PIC) values ranging from 0.0555 to 0.6907 with an average of 0.2594. Majority of microsatellite markers were transferable in Vigna species, whereas transferability rates were only 22.90% and 24.43% in Phaseolus vulgaris and Glycine max, respectively. We also used 16 SSR loci to evaluate phylogenetic relationship of 35 genotypes of the Asian Vigna group. The genome survey sequences were further analyzed to search for gene content. The evidence suggested 1,542 gene fragments have been sequence tagged, that fell within intersected existing gene models and shared sequence homology with other proteins in the database. Furthermore, potential microRNAs that could regulate developmental stages and environmental responses were discovered from this dataset. Conclusion In this report, we provided evidence of generating remarkable levels of diverse microsatellite markers and gene content from high throughput genome shotgun sequencing of the mungbean genomic DNA. The markers could be used in germplasm analysis, accessing genetic diversity and linkage mapping of mungbean. PMID:19930676

  17. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  18. [16S rRNA gene sequence analysis for bacterial identification in the clinical laboratory].

    PubMed

    Matsumoto, Takehisa; Sugano, Mitsutoshi

    2013-12-01

    The traditional identification of bacteria on the basis of phenotypic characteristics is generally not as accurate as identification based on genotypic methods. For many years, sequencing of the 16S ribosomal RNA (rRNA) gene has served as an important tool for determining phylogenetic relationships between bacteria. The features of this molecular target that make it a useful phylogenetic tool also make it useful for bacterial detection and identification in the clinical laboratory. 16S rRNA gene sequence analysis can better identify poorly described, rarely isolated, or phenotypically aberrant strains, and can lead to the recognition of novel pathogens and noncultured bacteria. In clinical microbiology, molecular identification based on 16S rDNA sequencing is applied fundamentally to bacteria whose identification by means of other types of techniques is impossible or difficult. However, there are some cases in which 16S rRNA gene sequence analysis can not differentiate closely related bacteria such as Shigella spp. and Escherichia coli at the species level. Thus, it is important to understand the advantages and disadvantages of 16S rRNA gene sequence analysis.

  19. Phylogeny of ruminants secretory ribonuclease gene sequences of pronghorn (Antilocapra americana).

    PubMed

    Beintema, Jaap J; Breukelman, Heleen J; Dubois, Jean-Yves F; Warmels, Hayo W

    2003-01-01

    Phylogenetic analyses based on primary structures of mammalian ribonucleases, indicated that three homologous enzymes (pancreatic, seminal and brain ribonucleases) present in the bovine species are the results of gene duplication events, which occurred in the ancestor of the ruminants after divergence from other artiodactyls. In this paper sequences are presented of genes encoding pancreatic and brain-type ribonuclease genes of pronghorn (Antilocapra americana). The seminal-type ribonuclease gene could not be detected in this species, neither by PCR amplification nor by Southern blot analyses, indicating that it may be deleted completely in this species. Previously we demonstrated of a study of amino acid sequences of pancreatic ribonucleases of a large number of ruminants the monophyly of bovids and cervids, and that pronghorn groups with giraffe. Here we present phylogenetic analyses of nucleotide sequences of ribonucleases and other molecules from ruminant species and compare these with published data. Chevrotain (Tragulus) always groups with the other ruminants as separate taxon from the pecora or true ruminants. Within the pecora the relationships between Bovidae, Cervidae, Giraffidae, and pronghorn (Antilocapra) cannot be decided with certainty, although in the majority of analyses Antilocapra diverges first, separately or joined with giraffe. Broad taxon sampling and investigation of specific sequence features may be as important for reliable conclusions in phylogeny as the lengths of analyzed sequences.

  20. Analysis of a cloned colicin Ib gene: complete nucleotide sequence and implications for regulation of expression.

    PubMed Central

    Varley, J M; Boulnois, G J

    1984-01-01

    The complete nucleotide sequence of a 2,971 base pair EcoRI fragment carrying the structural gene for colicin Ib has been determined. The length of the gene is 1,881 nucleotides which is predicted to produce a protein of 626 amino acids and of molecular weight 71,364. The structural gene is flanked by likely promoter and terminator signals and in between the promoter and the ribosome binding site is an inverted repeat sequence which resembles other sequences known to bind the LexA protein. Further analysis of the 5' flanking sequences revealed a second region which may act either as a second LexA binding site and/or in the binding of cyclic AMP receptor protein. Comparison of the predicted amino acid sequence of colicin Ib with that of colicins A and E1 reveals localised homology. The implications of these similarities in the proteins and of regulation of the colicin Ib structural gene are discussed. Images PMID:6091036

  1. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    SciTech Connect

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  2. Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis.

    PubMed

    Lees, Jonathan G; Lee, David; Studer, Romain A; Dawson, Natalie L; Sillitoe, Ian; Das, Sayoni; Yeats, Corin; Dessailly, Benoit H; Rentzsch, Robert; Orengo, Christine A

    2014-01-01

    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.

  3. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  4. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    PubMed

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.

  5. Sequence analysis and mapping of the Sry gene in species of the subfamily Arvicolinae (rodentia).

    PubMed

    Acosta, M J; Marchal, J A; Romero-Fernández, I; Megías-Nogales, B; Modi, W S; Sánchez Baca, Antonio

    2010-01-01

    The rodent subfamily Arvicolinae, which contains about 125 species, presents some interesting exceptions concerning Sry, the sex determining gene in mammals. In some species multiple Sry copies have been described on the Y chromosome and in the Iberian vole, Microtus cabrerae, several Sry sequences have been cloned and mapped not only on the Y but also on the X chromosome. Here we present a comparative analysis of Sry sequences from a total of 22 species. Our study demonstrates for the first time that for most North American species, as previously reported for the European species, multiple copies of the Sry gene exist on the Y chromosome. Furthermore, we have sequenced and analyzed the full sequence of Sry from several European species, showing that the sequence and structure of the gene in this group of species present the main features described for Sry in other mammals. Finally, FISH analyses on some of these species demonstrated that all Sry sequences, despite their functional status, mapped on the euchromatic short arm of the Y chromosome.

  6. Isolation and characterization of the 5'-flanking sequence of the human ocular lens MIP gene.

    PubMed

    Wang, X Y; Ohtaka-Maruyama, C; Pisano, M M; Jaworski, C J; Chepelinsky, A B

    1995-12-29

    The MIP (major intrinsic protein) gene, a member of an ancient family of membrane channel genes, encodes the predominant fiber cell membrane protein of the ocular lens. Its specific expression in the lens fibers is temporally and spatially regulated during development. To study the regulation of expression of MIP and delineate the regulatory elements underlying its tissue specificity and ontogenic profile, we have cloned 2840 bp of the human MIP 5'-flanking sequence. The human MIP 5'-flanking sequence contains three complete Alu repetitive elements in tandem at position between nt -1699 and -2684 (nt -1699/-2684). These Alu elements appear to have had a complex evolutionary history with insertions at different times. We have fused DNA fragments containing MIP 5'-flanking sequences to the bacterial cat reporter gene encoding chloramphenicol acetyltransferase and assayed them in primary cultures of chicken lens cells. We have mapped two negative regulatory regions in the human MIP 5'-flanking sequences -1564/-1696 and -948/-1000. We demonstrated that the human MIP 5'-flanking sequence -253/+42 contains a functional promoter in lens cells but is inactive in kidney epithelial cells or mouse fibroblasts, suggesting that this sequence contains regulatory elements responsible for the lens-specific expression of MIP.

  7. Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes.

    PubMed

    Pua, Chee Jian; Bhalshankar, Jaydutt; Miao, Kui; Walsh, Roddy; John, Shibu; Lim, Shi Qi; Chow, Kingsley; Buchan, Rachel; Soh, Bee Yong; Lio, Pei Min; Lim, Jaclyn; Schafer, Sebastian; Lim, Jing Quan; Tan, Patrick; Whiffin, Nicola; Barton, Paul J; Ware, James S; Cook, Stuart A

    2016-02-01

    Inherited cardiac conditions (ICCs) are characterised by marked genetic and allelic heterogeneity and require extensive sequencing for genetic characterisation. We iteratively optimised a targeted gene capture panel for ICCs that includes disease-causing, putatively pathogenic, research and phenocopy genes (n = 174 genes). We achieved high coverage of the target region on both MiSeq (>99.8% at ≥ 20× read depth, n = 12) and NextSeq (>99.9% at ≥ 20×, n = 48) platforms with 100% sensitivity and precision for single nucleotide variants and indels across the protein-coding target on the MiSeq. In the final assay, 40 out of 43 established ICC genes informative in clinical practice achieved complete coverage (100 % at ≥ 20×). By comparison, whole exome sequencing (WES; ∼ 80×), deep WES (∼ 500×) and whole genome sequencing (WGS; ∼ 70×) had poorer performance (88.1, 99.2 and 99.3% respectively at ≥ 20×) across the ICC target. The assay described here delivers highly accurate and affordable sequencing of ICC genes, complemented by accessible cloud-based computation and informatics. See Editorial in this issue (DOI: 10.1007/s12265-015-9667-8 ).

  8. Homologous recombination drives both sequence diversity and gene content variation in Neisseria meningitidis.

    PubMed

    Kong, Ying; Ma, Jennifer H; Warren, Keisha; Tsang, Raymond S W; Low, Donald E; Jamieson, Frances B; Alexander, David C; Hao, Weilong

    2013-01-01

    The study of genetic and phenotypic variation is fundamental for understanding the dynamics of bacterial genome evolution and untangling the evolution and epidemiology of bacterial pathogens. Neisseria meningitidis (Nm) is among the most intriguing bacterial pathogens in genomic studies due to its dynamic population structure and complex forms of pathogenicity. Extensive genomic variation within identical clonal complexes (CCs) in Nm has been recently reported and suggested to be the result of homologous recombination, but the extent to which recombination contributes to genomic variation within identical CCs has remained unclear. In this study, we sequenced two Nm strains of identical serogroup (C) and multi-locus sequence type (ST60), and conducted a systematic analysis with an additional 34 Nm genomes. Our results revealed that all gene content variation between the two ST60 genomes was introduced by homologous recombination at the conserved flanking genes, and 94.25% or more of sequence divergence was caused by homologous recombination. Recombination was found in genes associated with virulence factors, antigenic outer membrane proteins, and vaccine targets, suggesting an important role of homologous recombination in rapidly altering the pathogenicity and antigenicity of Nm. Recombination was also evident in genes of the restriction and modification systems, which may undermine barriers to DNA exchange. In conclusion, homologous recombination can drive both gene content variation and sequence divergence in Nm. These findings shed new light on the understanding of the rapid pathoadaptive evolution of Nm and other recombinogenic bacterial pathogens.

  9. The 5'-flanking regions of three pea legumin genes: comparison of the DNA sequences.

    PubMed Central

    Lycett, G W; Croy, R R; Shirsat, A H; Richards, D M; Boulter, D

    1985-01-01

    Approximately 1200 nucleotides of sequence data from the promoter and 5'-flanking regions of each of three pea (Pisum sativum L.) legumin genes (legA, legB and legC) are presented. The promoter regions of all three genes were found to be identical including the "TATA box", and "CAAT box', and sequences showing homology to the SV40 enhancers. The legA sequence begins to diverge from the others about 300bp from the start codon, whereas the other two genes remain identical for another 550bp. The regions of partial homology exhibit deletions or insertions and some short, comparatively well conserved sequences. The significance of these features is discussed in terms of evolutionary mechanisms and their possible functional roles. The legC gene contains a region that may potentially form either of two mutually exclusive stem-loop structures, one of which has a stem 42bp long, which suggests that it could be fairly stable. We suggest that a mechanism of switching between such alternative structures may play some role in gene control or may represent the insertion of a transposable element. PMID:2997721

  10. Cloning, nucleotide sequence, and expression of the Pasteurella haemolytica A1 glycoprotease gene.

    PubMed Central

    Abdullah, K M; Lo, R Y; Mellors, A

    1991-01-01

    Pasteurella haemolytica serotype A1 secretes a glycoprotease which is specific for O-sialoglycoproteins such as glycophorin A. The gene encoding the glycoprotease enzyme has been cloned in the recombinant plasmid pH1, and its nucleotide sequence has been determined. The gene (designated gcp) codes for a protein of 35.2 kDa, and an active enzyme protein of this molecular mass can be observed in Escherichia coli clones carrying pPH1. In vivo labeling of plasmid-encoded proteins in E. coli maxicells demonstrated the expression of a 35-kDa protein from pPH1. The amino-terminal sequence of the heterologously expressed protein corresponds to that predicted from the nucleotide sequence. The glycoprotease is a neutral metalloprotease, and the predicted amino acid sequence of the glycoprotease contains a putative zinc-binding site. The gene shows no significant homology with the genes for other proteases of procaryotic or eucaryotic origin. However, there is substantial homology between gcp and an E. coli gene, orfX, whose product is believed to function in the regulation of macromolecule biosynthesis. Images PMID:1885539

  11. Nucleotide sequence of the Agrobacterium tumefaciens octopine Ti plasmid-encoded tmr gene.

    PubMed Central

    Heidekamp, F; Dirkse, W G; Hille, J; van Ormondt, H

    1983-01-01

    The nucleotide sequence of the tmr gene, encoded by the octopine Ti plasmid from Agrobacterium tumefaciens (pTiAch5), was determined. The T-DNA, which encompasses this gene, is involved in tumor formation and maintenance, and probably mediates the cytokinin-independent growth of transformed plant cells. The nucleotide sequence of the tmr gene displays a continuous open reading frame specifying a polypeptide chain of 240 amino acids. The 5'- terminus of the polyadenylated tmr mRNA isolated from octopine tobacco tumor cell lines was determined by nuclease S1 mapping. The nucleotide sequence 5'-TATAAAA-3', which sequence is identical to the canonical "TATA" box, was found 29 nucleotides upstream from the major initiation site for RNA synthesis. Two potential polyadenylation signals 5'-AATAAA-3' were found at 207 and 275 nucleotides downstream from the TAG stopcodon of the tmr gene. A comparison was made of nucleotide stretches, involved in transcription control of T-DNA genes. Images PMID:6312414

  12. OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

    PubMed

    Schreiber, Fabian; Wörheide, Gert; Morgenstern, Burkhard

    2009-07-01

    In the absence of whole genome sequences for many organisms, the use of expressed sequence tags (EST) offers an affordable approach for researchers conducting phylogenetic analyses to gain insight about the evolutionary history of organisms. Reliable alignments for phylogenomic analyses are based on orthologous gene sequences from different taxa. So far, researchers have not sufficiently tackled the problem of the completely automated construction of such datasets. Existing software tools are either semi-automated, covering only part of the necessary data processing, or implemented as a pipeline, requiring the installation and configuration of a cascade of external tools, which may be time-consuming and hard to manage. To simplify data set construction for phylogenomic studies, we set up a web server that uses our recently developed OrthoSelect approach. To the best of our knowledge, our web server is the first web-based EST analysis pipeline that allows the detection of orthologous gene sequences in EST libraries and outputs orthologous gene alignments. Additionally, OrthoSelect provides the user with an extensive results section that lists and visualizes all important results, such as annotations, data matrices for each gene/taxon and orthologous gene alignments. The web server is available at http://orthoselect.gobics.de.

  13. Different organisms associated with heartwater as shown by analysis of 16S ribosomal RNA gene sequences.

    PubMed

    Allsopp, M; Visser, E S; du Plessis, J L; Vogel, S W; Allsopp, B A

    1997-08-01

    Cowdria ruminantium is a rickettsial parasite which causes heartwater, a economically important disease of domestic and wild ruminants in tropical and subtropical Africa and parts of the Caribbean. Because existing diagnostic methods are unreliable, we investigated the small-subunit ribosomal RNA (srRNA) gene from heartwater-infected material to characterise the organisms present and to develop specific oligonucleotide probes for polymerase chain reaction (PCR) based diagnosis. DNA was obtained from ticks and ruminants from heartwater-free and heartwater-endemic areas from Cowdria in tissue culture. PCR was carried out using primers designed to amplify only rickettsial srRNA genes, the target region being the highly variable V1 loop. Amplicons were cloned and sequenced; 51% were C. ruminantium sequences corresponding to four genotypes, two of which were identical to previously reported C. ruminantium sequences while the other two were new. The four different Cowdria genotypes can be correlated with different phenotypes. Tissue-culture samples yielded only Cowdria genotype sequences, but an extraordinary heterogeneity of 16S sequences was obtained from field samples. In addition to Cowdria genotypes we found sequences from previously unknown Ehrlichia spp., sequences showing homology to other Rickettsiales and a variety of Pseudomonadaceae. One Ehrlichia sequence was phylogenetically closely related to Ehrlichia platys (Group II Ehrlichia) and one to Ehrlichia canis (Group III Ehrlichia). This latter sequence was from an isolate (Germishuys) made from a naturally infected sheep which, from brain smear examination and pathology, appeared to be suffering from heartwater; nevertheless no Cowdria genotype sequences were found in this isolate. In addition no Cowdria sequences were obtained from uninfected ticks. Complete 16S rRNA gene sequences were determined for two C. ruminantium genotypes and for two previously uncharacterised heartwater-associated Ehrlichia spp

  14. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention.

  15. A Bayesian Approach to Joint Modeling of Protein-DNA Binding, Gene Expression and Sequence Data

    PubMed Central

    Xie, Yang; Pan, Wei; Jeong, Kyeong S.; Xiao, Guanghua; Khodursky, Arkady B.

    2012-01-01

    The genome-wide DNA-protein binding data, DNA sequence data and gene expression data represent complementary means to deciphering global and local transcriptional regulatory circuits. Combining these different types of data can not only improve the statistical power, but also provide a more comprehensive picture of gene regulation. In this paper, we propose a novel statistical model to augment proteinDNA binding data with gene expression and DNA sequence data when available. We specify a hierarchical Bayes model and use Markov chain Monte Carlo simulations to draw inferences. Both simulation studies and an analysis of an experimental dataset show that the proposed joint modeling method can significantly improve the specificity and sensitivity of identifying target genes as compared to conventional approaches relying on a single data source. PMID:20049751

  16. Statistical analysis of nucleotide sequences of the hemagglutinin gene of human influenza A viruses.

    PubMed Central

    Ina, Y; Gojobori, T

    1994-01-01

    To examine whether positive selection operates on the hemagglutinin 1 (HA1) gene of human influenza A viruses (H1 subtype), 21 nucleotide sequences of the HA1 gene were statistically analyzed. The nucleotide sequences were divided into antigenic and nonantigenic sites. The nucleotide diversities for antigenic and nonantigenic sites of the HA1 gene were computed at synonymous and nonsynonymous sites separately. For nonantigenic sites, the nucleotide diversities were larger at synonymous sites than at nonsynonymous sites. This is consistent with the neutral theory of molecular evolution. For antigenic sites, however, the nucleotide diversities at nonsynonymous sites were larger than those at synonymous sites. These results suggest that positive selection operates on antigenic sites of the HA1 gene of human influenza A viruses (H1 subtype). PMID:8078892

  17. Sequence and evolution of the blue cone pigment gene in old and new world primates

    SciTech Connect

    Hunt, D.M.; Cowing, J.A.; Patel, R.

    1995-06-10

    The sequences of the blue cone photopigments in the talapoin monkey (Miopithecus talapoin), an Old World primate, and in the marmoset (Callithrix jacchus), a New World monkey, are presented. Both genes are composed of 5 exons separated by 4 introns. In this respect, they are identical to the human blue gene, and intron sizes are also similar. Based on the level of amino acid identity, both monkey pigments are members of the S branch of pigments. Alignment of these sequences with the human gene requires the insertion/deletion of two separate codons in exon 1. The silent site divergence between these primate blue genes indicates a separation of the Old and New World primate lineages around 43 million years ago. 41 refs., 1 fig., 3 tabs.

  18. Cloning and sequence analysis of the cellobiohydrolase I genes from some basidiomycetes.

    PubMed

    Chukeatirote, Ekachai; Maharachchikumbura, Sajeewa S N; Wongkham, Shannaphimon; Sysouphanthong, Phongeun; Phookamsak, Rungtiwa; Hyde, Kevin D

    2012-06-01

    Genes encoding the cellobiohydrolase enzyme (CBHI), designated as cbhI, were isolated from the basidiomycetes Auricularia fuscosuccinea, Pleurotus giganteus, P. eryngii, P. ostreatus, and P. sajor-caju. Initially, the fungal genomic DNA was extracted using a modified cetyltrimethyl ammonium bromide (CTAB) protocol and used as a DNA template. The cbhI genes were then amplified and cloned using the pGEM-T Easy Vector Systems. The sizes of these PCR amplicons were between 700~800 bp. The DNA sequences obtained were similar showing high identity to the cbhI gene family. These cbhI genes were partial consisting of three coding regions and two introns. The deduced amino acid sequences exhibited significant similarity to those of fungal CBHI enzymes belonging to glycosyl hydrolase family 7.

  19. Cloning and Sequence Analysis of the Cellobiohydrolase I Genes from Some Basidiomycetes

    PubMed Central

    Maharachchikumbura, Sajeewa S. N.; Wongkham, Shannaphimon; Sysouphanthong, Phongeun; Phookamsak, Rungtiwa; Hyde, Kevin D.

    2012-01-01

    Genes encoding the cellobiohydrolase enzyme (CBHI), designated as cbhI, were isolated from the basidiomycetes Auricularia fuscosuccinea, Pleurotus giganteus, P. eryngii, P. ostreatus, and P. sajor-caju. Initially, the fungal genomic DNA was extracted using a modified cetyltrimethyl ammonium bromide (CTAB) protocol and used as a DNA template. The cbhI genes were then amplified and cloned using the pGEM-T Easy Vector Systems. The sizes of these PCR amplicons were between 700~800 bp. The DNA sequences obtained were similar showing high identity to the cbhI gene family. These cbhI genes were partial consisting of three coding regions and two introns. The deduced amino acid sequences exhibited significant similarity to those of fungal CBHI enzymes belonging to glycosyl hydrolase family 7. PMID:22870052

  20. Molecular cloning, sequence characterization and expression pattern of Rab18 gene from watermelon (Citrullus lanatus).

    PubMed

    Xinli, Xiao; Lei, Peng

    2015-03-04

    The complete mRNA sequence of watermelon Rab18 gene was amplified through the rapid amplification of cDNA ends (RACE) method. The full-length mRNA was 1010 bp containing a 645 bp open reading frame, which encodes a protein of 214 amino acids. Sequence analysis revealed that watermelon Rab18 protein shares high homology with the Rab18 of cucumber (99%), muskmelon (98%), Morus notabilis (90%), tomato (89%), wine grape (89%) and potato (88%). Phylogenetic analysis revealed that watermelon Rab18 gene has a closer genetic relationship with Rab18 gene of cucumber and muskmelon. Tissue expression profile analysis indicated that watermelon Rab18 gene was highly expressed in root, stem and leaf, moderately expressed in flower and weakly expressed in fruit.

  1. DNA sequence and expression of the 36-kilodalton outer membrane protein gene of Brucella abortus.

    PubMed Central

    Ficht, T A; Bearden, S W; Sowa, B A; Adams, L G

    1989-01-01

    The cloning of the gene(s) encoding a 36-kilodalton (kDa) cell envelope protein of Brucella abortus has been previously described (T. A. Ficht, S. W. Bearden, B. A. Sowa, and L. G. Adams, Infect, Immun. 56:2036-2046, 1988). In an attempt to define the nature of the previously described duplication at this locus we have sequenced 3,500 base pairs of genomic DNA encompassing this region. The duplication represented two similar open reading frames which shared more than 85% homology at the nucleotide level but differed primarily because of the absence of 108 nucleotides from one of the two gene copies. These two genes were read from opposite strands and potentially encoded proteins which are 96% homologous. The predicted gene products were identical over the first 100 amino acids, including 22-amino-acid-long signal sequences. The amino acid composition of the predicted proteins was similar to that obtained for the Brucella porin isolated by Verstreate et al. (D. R. Verstreate, M. T. Creasy, N. T. Caveney, C. L. Baldwin, M. W. Blab, and A. J. Winter, Infect. Immun. 35:979-989, 1982) and presumably represented two copies of the porin gene, tentatively identified as omp 2a (silent) and omp 2b (expressed). The homology between the two genes extended to and included Shine-Dalgarno sequences 7 base pairs upstream from the ATG start codons. Homology at the 3' ends extended only as far as the termination codon, but both genes had putative rho-independent transcription termination sites. Localization of the promoters proved more difficult, since the canonical procaryotic sequences could not be identified in the region upstream of either gene. Promoter activity was demonstrated by ligation to a promoterless lacZ gene in pMC1871. However, only one active promoter could be identified by using this system. A 36-kDa protein was synthesized in E. coli with the promoter in the native orientation and was identical in size to the protein produced in laboratory-grown B. abortus. When

  2. Comparative human-horse sequence analysis of the CYP3A subfamily gene cluster.

    PubMed

    Schmitz, A; Demmel, S; Peters, L M; Leeb, T; Mevissen, M; Haase, B

    2010-12-01

    Cytochrome P450 enzymes (CYP450s) represent a superfamily of haem-thiolate proteins. CYP450s are most abundant in the liver, a major site of drug metabolism, and play key roles in the metabolism of a variety of substrates, including drugs and environmental contaminants. Interaction of two or more different drugs with the same enzyme can account for adverse effects and failure of therapy. Human CYP3A4 metabolizes about 50% of all known drugs, but little is known about the orthologous CYP450s in horses. We report here the genomic organization of the equine CYP3A gene cluster as well as a comparative analysis with the human CYP3A gene cluster. The equine CYP450 genes of the 3A family are located on ECA 13 between 6.97-7.53 Mb, in a region syntenic to HSA 7 99.05-99.35 Mb. Seven potential, closely linked equine CYP3A genes were found, in contrast to only four genes in the human genome. RNA was isolated from an equine liver sample, and the approximately 1.5-kb coding sequence of six CYP3A genes could be amplified by RT-PCR. Sequencing of the RT-PCR products revealed numerous hitherto unknown single nucleotide polymorphisms (SNPs) in these six CYP3A genes, and one 6-bp deletion compared to the reference sequence (EquCab2.0). The presence of the variants was confirmed in a sample of genomic DNA from the same horse. In conclusion, orthologous genes for the CYP3A family exist in horses, but their number differs from those of the human CYP3A gene family. CYP450 genes of the same family show high homology within and between mammalian species, but can be highly polymorphic.

  3. GOblet: a platform for Gene Ontology annotation of anonymous sequence data

    PubMed Central

    Groth, Detlef; Lehrach, Hans; Hennig, Steffen

    2004-01-01

    GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms. It uses a variety of different protein databases (human, murines, invertebrates, plants, sp-trembl) and their respective GO mappings. The user selects the appropriate database and alignment threshold and thereafter submits single or multiple nucleotide or protein sequences. Results are shown in different ways, e.g. as survey statistics for the main GO categories for all sequences or as detailed results for each single sequence that has been submitted. In its newest version, GOblet allows the batch submission of sequences and provides an improved display of results with the aid of Java applets. All output data, together with the Java applet, are packed to a downloadable archive for local installation and analysis. GOblet can be accessed freely at http://goblet.molgen.mpg.de. PMID:15215401

  4. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    SciTech Connect

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.; Jones, W.A.; Kirby, R.; Woods, D.R.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homology (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.

  5. Genetic analysis of the PKHD1 gene with long-rang PCR sequencing.

    PubMed

    Tong, Yong-Qing; Liu, Bei; Fu, Chao-Hong; Zheng, Hong-Yun; Gu, Jian; Liu, Hang; Luo, Hong-Bo; Li, Yan

    2016-10-01

    PKHD1 gene mutations are found responsible for autosomal recessive polycystic kidney disease (ARPKD). However, it is inconvenient to detect the mutations by common polymerase chain reaction (PCR) because the open reading frame of PKHD1 is very long. Recently, long-range (LR) PCR is demonstrated to be a more sensitive mutation screening method for PKHD1 by directly sequencing. In this study, the entire PKHD1 coding region was amplified by 29 reactions to avoid the specific PCR amplification of individual exons, which generated the size of 1 to 7 kb products by LR PCR. This method was compared to the screening method with standard direct sequencing of each individual exon of the gene by a reference laboratory in 15 patients with ARPKD. The results showed that a total of 37 genetic changes were detected with LR PCR sequencing, which included 33 variations identified by the reference laboratory with standard direct sequencing. LR PCR sequencing had 100% sensitivity, 96% specificity, and 97.0% accuracy, which were higher than those with standard direct sequencing method. In conclusion, LR PCR sequencing is a reliable method with high sensitivity, specificity and accuracy for detecting genetic variations. It also has more intronic coverage and lower cost, and is an applicable clinical method for complex genetic analyses.

  6. Sequence characterisation of deletion breakpoints in the dystrophin gene by PCR

    SciTech Connect

    Abbs, S.; Sandhu, S.; Bobrow, M.

    1994-09-01

    Partial deletions of the dystrophin gene account for 65% of cases of Duchenne muscular dystrophy. A high proportion of these structural changes are generated by new mutational events, and lie predominantly within two `hotspot` regions, yet the underlying reasons for this are not known. We are characterizing and sequencing the regions surrounding deletion breakpoints in order to: (i) investigate the mechanisms of deletion mutation, and (ii) enable the design of PCR assays to specifically amplify mutant and normal sequences, allowing us to search for the presence of somatic mosaicism in appropriate family members. Using this approach we have been able to demonstrate the presence of somatic mosaicism in a maternal grandfather of a DMD-affected male, deleted for exons 49-50. Three deletions, namely of exons 48-49, 49-50, and 50, have been characterized using a PCR approach that avoids any cloning procedures. Breakpoints were initially localized to within regions of a few kilobases using Southern blot restriction analyses with exon-specific probes and PCR amplification of exonic and intronic loci. Sequencing was performed directly on PCR products: (i) mutant sequences were obtained from long-range or inverse-PCR across the deletion junction fragments, and (ii) normal sequences were obtained from the products of standard PCR, vectorette PCR, or inverse-PCR performed on YACs. Further characterization of intronic sequences will allow us to amplify and sequence across other deletion breakpoints and increase our knowledge of the mechanisms of mutation in the dystophin gene.

  7. Iranian johnsongrass mosaic virus: the complete genome sequence, molecular and biological characterization, and comparison of coat protein gene sequences.

    PubMed

    Moradi, Zohreh; Mehrvar, Mohsen; Nazifi, Ehsan; Zakiaghl, Mohammad

    2017-02-01

    Iranian johnsongrass mosaic virus (IJMV) is one of the most prevalent viruses causing maize mosaic disease in Iran. An IJMV isolate, Maz-Bah, was obtained from the maize showing mosaic symptoms in Mazandaran, north of Iran. The complete genomic sequence of Maz-Bah is 9544 nucleotides, excluding the poly(A) tail. It contains one single open reading frame of 9165 nucleotides and encodes a large polyprotein of 3054 amino acids, flanked by a 5'-untranslated region (UTR) of 143 nucleotides and a 3'-UTR of 236 nucleotides. The entire genomic sequence of Maz-Bah isolate shares identities of 84.9 and 94.2 % with the IJMV (Shz) isolate, the lone complete genome sequence available in the GenBank at the nucleotide (nt) and deduced amino acid (aa) levels, respectively. The whole genome sequences share identities of 51.5-69.8 and 44.9-74.3 % with those of other Sugarcane mosaic virus (SCMV) subgroup potyviruses at nt and aa levels, respectively. In phylogenetic trees based on the multiple alignments of the entire nt and aa sequences, IJMV isolates formed a separate sublineage of the tree with potyviruses infecting monocotyledons of cereals, indicating that IJMV is a member of SCMV subgroup of potyviruses. IJMV is most closely related to Sorghum mosaic virus and Maize dwarf mosaic virus and less closely related to the Johnsongrass mosaic virus and Cocksfoot streak virus. To further investigate the genetic relationship of IJMV, 9 other isolates from different hosts were cloned and sequenced. The identity of IJMV CP nt and aa sequences of 11 Iranian isolates ranged from 86.4 to 99.8 % and 90.5 to 99.7 %, respectively, indicating a high nt variability in CP gene. Furthermore, in the CP-based phylogenetic tree, IJMV isolates were clustered together with a maize potyvirus described as Zea mosaic virus from Israel (with 86-89 % nt identity), indicating that both isolates probably are the strains of the same virus.

  8. Characterization of Squamate Olfactory Receptor Genes and Their Transcripts by the High-Throughput Sequencing Approach

    PubMed Central

    Dehara, Yuki; Hashiguchi, Yasuyuki; Matsubara, Kazumi; Yanai, Tokuma; Kubo, Masahito; Kumazawa, Yoshinori

    2012-01-01

    The olfactory receptor (OR) genes represent the largest multigene family in the genome of terrestrial vertebrates. Here, the high-throughput next-generation sequencing (NGS) approach was applied to characterization of OR gene repertoires in the green anole lizard Anolis carolinensis and the Japanese four-lined ratsnake Elaphe quadrivirgata. Tagged polymerase chain reaction (PCR) products amplified from either genomic DNA or cDNA of the two species were used for parallel pyrosequencing, assembling, and screening for errors in PCR and pyrosequencing. Starting from the lizard genomic DNA, we accurately identified 56 of 136 OR genes that were identified from its draft genome sequence. These recovered genes were broadly distributed in the phylogenetic tree of vertebrate OR genes without severe biases toward particular OR families. Ninety-six OR genes were identified from the ratsnake genomic DNA, implying that the snake has more OR gene loci than the anole lizard in response to an increased need for the acuity of olfaction. This view is supported by the estimated number of OR genes in the Burmese python's draft genome (∼280), although squamates may generally have fewer OR genes than terrestrial mammals and amphibians. The OR gene repertoire of the python seems unique in that many class I OR genes are retained. The NGS approach also allowed us to identify candidates of highly expressed and silent OR gene copies in the lizard's olfactory epithelium. The approach will facilitate efficient and parallel characterization of considerable unbiased proportions of multigene family members and their transcripts from nonmodel organisms. PMID:22511035

  9. Harnessing gene conversion in chicken B cells to create a human antibody sequence repertoire.

    PubMed

    Schusser, Benjamin; Yi, Henry; Collarini, Ellen J; Izquierdo, Shelley Mettler; Harriman, William D; Etches, Robert J; Leighton, Philip A

    2013-01-01

    Transgenic chickens expressing human sequence antibodies would be a powerful tool to access human targets and epitopes that have been intractable in mammalian hosts because of tolerance to conserved proteins. To foster the development of the chicken platform, it is beneficial to validate transgene constructs using a rapid, cell culture-based method prior to generating fully transgenic birds. We describe a method for the expression of human immunoglobulin variable regions in the chicken DT40 B cell line and the further diversification of these genes by gene conversion. Chicken VL and VH loci were knocked out in DT40 cells and replaced with human VK and VH genes. To achieve gene conversion of human genes in chicken B cells, synthetic human pseudogene arrays were inserted upstream of the functional human VK and VH regions. Proper expression of chimeric IgM comprised of human variable regions and chicken constant regions is shown. Most importantly, sequencing of DT40 genetic variants confirmed that the human pseudogene arrays contributed to the generation of diversity through gene conversion at both the Igl and Igh loci. These data show that engineered pseudogene arrays produce a diverse pool of human antibody sequences in chicken B cells, and suggest that these constructs will express a functional repertoire of chimeric antibodies in transgenic chickens.

  10. Quantitative gene expression profiles in real time from expressed sequence tag databases.

    PubMed

    Funari, Vincent A; Voevodski, Konstantin; Leyfer, Dimitry; Yerkes, Laura; Cramer, Donald; Tolan, Dean R

    2010-01-01

    An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative arid quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http: //tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB's output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSC and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable.

  11. A Census of rRNA Genes and Linked Genomic Sequences within a Soil Metagenomic Library

    PubMed Central

    Liles, Mark R.; Manske, Brian F.; Bintrim, Scott B.; Handelsman, Jo; Goodman, Robert M.

    2003-01-01

    We have analyzed the diversity of microbial genomes represented in a library of metagenomic DNA from soil. A total of 24,400 bacterial artificial chromosome (BAC) clones were screened for 16S rRNA genes. The sequences obtained from BAC clones were compared with a collection generated by direct PCR amplification and cloning of 16S rRNA genes from the same soil. The results indicated that the BAC library had substantially lower representation of bacteria among the Bacillus, α-Proteobacteria, and CFB groups; greater representation among the β- and γ-Proteobacteria, and OP10 divisions; and no rRNA genes from the domains Eukaryota and Archaea. In addition to rRNA genes recovered from the bacterial divisions Proteobacteria, Verrucomicrobia, Firmicutes, Cytophagales, and OP11, we identified many rRNA genes from the BAC library affiliated with the bacterial division Acidobacterium; all of these sequences were affiliated with subdivisions that lack cultured representatives. The complete sequence of one BAC clone derived from a member of the Acidobacterium division revealed a complete rRNA operon and 20 other open reading frames, including predicted gene products involved in cell division, cell cycling, folic acid biosynthesis, substrate metabolism, amino acid uptake, DNA repair, and transcriptional regulation. This study is the first step in using genomics to reveal the physiology of as-yet-uncultured members of the Acidobacterium division. PMID:12732537

  12. Sequence evolution and expression regulation of stress-responsive genes in natural populations of wild tomato.

    PubMed

    Fischer, Iris; Steige, Kim A; Stephan, Wolfgang; Mboup, Mamadou

    2013-01-01

    The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.

  13. Refined mapping of X-linked reticulate pigmentary disorder and sequencing of candidate genes

    PubMed Central

    2009-01-01

    X-linked reticulate pigmentary disorder with systemic manifestations in males (PDR) is very rare. Affected males are characterized by cutaneous and visceral symptoms suggestive of abnormally regulated inXammation. A genetic linkage study of a large Canadian kindred previously mapped the PDR gene to a greater than 40 Mb interval of Xp22–p21. The aim of this study was to identify the causative gene for PDR. The Canadian pedigree was expanded and additional PDR families recruited. Genetic linkage was performed using newer microsatellite markers. Positional and functional candidate genes were screened by PCR and sequencing of coding exons in affected males. The location of the PDR gene was narrowed to a ~4.9 Mb interval of Xp22.11–p21.3 between markers DXS1052 and DXS1061. All annotated coding exons within this interval were sequenced in one affected male from each of the three multiplex families as well as one singleton, but no causative mutation was identiWed. Sequencing of other X-linked genes outside of the linked interval also failed to identify the cause of PDR but revealed a novel nonsynonymous cSNP in the GRPR gene in the Maltese population. PDR is most likely due to a mutation within the linked interval not affecting currently annotated coding exons. PMID:18404279

  14. Quantitative Gene Expression Profiles in Real Time From Expressed Sequence Tag Databases

    PubMed Central

    FUNARI, VINCENT A.; VOEVODSKI, KONSTANTIN; LEYFER, DIMITRY; YERKES, LAURA; CRAMER, DONALD; TOLAN, DEAN R.

    2010-01-01

    An accumulation of expressed sequence tag (EST) data in the public domain and the availability of bioinformatic programs have made EST gene expression profiling a common practice. However, the utility and validity of using EST databases (e.g., dbEST) has been criticized, particularly for quantitative assessment of gene expression. Problems with EST sequencing errors, library construction, EST annotation, and multiple paralogs make generation of specific and sensitive qualitative and quantitative expression profiles a concern. In addition, most EST-derived expression data exists in previously assembled databases. The Virtual Northern Blot (VNB) (http://tlab.bu.edu/vnb.html) allows generation, evaluation, and optimization of expression profiles in real time, which is especially important for alternatively spliced, novel, or poorly characterized genes. Representative gene families with variable nucleotide sequence identity, tissue specificity, and levels of expression (bcl-xl, aldoA, and cyp2d9) are used to assess the quality of VNB’s output. The profiles generated by VNB are more sensitive and specific than those constructed with ESTs listed in preindexed databases at UCSI and NCBI. Moreover, quantitative expression profiles produced by VNB are comparable to quantization obtained from Northern blots and qPCR. The VNB pipeline generates real-time gene expression profiles for single-gene queries that are both qualitatively and quantitatively reliable. PMID:20635574

  15. Whole-genome sequencing and identification of Morganella morganii KT pathogenicity-related genes

    PubMed Central

    2012-01-01

    Background The opportunistic enterobacterium, Morganella morganii, which can cause bacteraemia, is the ninth most prevalent cause of clinical infections in patients at Changhua Christian Hospital, Taiwan. The KT strain of M. morganii was isolated during postoperative care of a cancer patient with a gallbladder stone who developed sepsis caused by bacteraemia. M. morganii is sometimes encountered in nosocomial settings and has been causally linked to catheter-associated bacteriuria, complex infections of the urinary and/or hepatobiliary tracts, wound infection, and septicaemia. M. morganii infection is associated with a high mortality rate, although most patients respond well to appropriate antibiotic therapy. To obtain insights into the genome biology of M. morganii and the mechanisms underlying its pathogenicity, we used Illumina technology to sequence the genome of the KT strain and compared its sequence with the genome sequences of related bacteria. Results The 3,826,919-bp sequence contained in 58 contigs has a GC content of 51.15% and includes 3,565 protein-coding sequences, 72 tRNA genes, and 10 rRNA genes. The pathogenicity-related genes encode determinants of drug resistance, fimbrial adhesins, an IgA protease, haemolysins, ureases, and insecticidal and apoptotic toxins as well as proteins found in flagellae, the iron acquisition system, a type-3 secretion system (T3SS), and several two-component systems. Comparison with 14 genome sequences from other members of Enterobacteriaceae revealed different degrees of similarity to several systems found in M. morganii. The most striking similarities were found in the IS4 family of transposases, insecticidal toxins, T3SS components, and proteins required for ethanolamine use (eut operon) and cobalamin (vitamin B12) biosynthesis. The eut operon and the gene cluster for cobalamin biosynthesis are not present in the other Proteeae genomes analysed. Moreover, organisation of the 19 genes of the eut operon differs from

  16. Transcriptome Sequencing and Expression Analysis of Terpenoid Biosynthesis Genes in Litsea cubeba

    PubMed Central

    Han, Xiao-Jiao; Wang, Yang-Dong; Chen, Yi-Cun; Lin, Li-Yuan; Wu, Qing-Ke

    2013-01-01

    Background Aromatic essential oils extracted from fresh fruits of Litsea cubeba (Lour.) Pers., have diverse medical and economic values. The dominant components in these essential oils are monoterpenes and sesquiterpenes. Understanding the molecular mechanisms of terpenoid biosynthesis is essential for improving the yield and quality of terpenes. However, the 40 available L. cubeba nucleotide sequences in the public databases are insufficient for studying the molecular mechanisms. Thus, high-throughput transcriptome sequencing of L. cubeba is necessary to generate large quantities of transcript sequences for the purpose of gene discovery, especially terpenoid biosynthesis related genes. Results Using Illumina paired-end sequencing, approximately 23.5 million high-quality reads were generated. De novo assembly yielded 68,648 unigenes with an average length of 834 bp. A total of 38,439 (56%) unigenes were annotated for their functions, and 35,732 and 25,806 unigenes could be aligned to the GO and COG database, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 16,130 unigenes were assigned to 297 KEGG pathways, and 61 unigenes, which contained the mevalonate and 2-C-methyl-D-erythritol 4-phosphate pathways, could be related to terpenoid backbone biosynthesis. Of the 12,963 unigenes, 285 were annotated to the terpenoid pathways using the PlantCyc database. Additionally, 14 terpene synthase genes were identified from the transcriptome. The expression patterns of the 16 genes related to terpenoid biosynthesis were analyzed by RT-qPCR to explore their putative functions. Conclusion RNA sequencing was effective in identifying a large quantity of sequence information. To our knowledge, this study is the first exploration of the L. cubeba transcriptome, and the substantial amount of transcripts obtained will accelerate the understanding of the molecular mechanisms of essential oils biosynthesis. The results may help

  17. EXONSAMPLER: a computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing.

    PubMed

    Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon

    2014-11-01

    The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection.

  18. Sequence heterogeneity, multiplicity, and genomic organization of alpha- and beta-tubulin genes in sea urchins.

    PubMed Central

    Alexandraki, D; Ruderman, J V

    1981-01-01

    We analyzed the multiplicity, heterogeneity, and organization of the genes encoding the alpha and beta tubulins in the sea urchin Lytechinus pictus by using cloned complementary deoxyribonucleic acid (cDNA) and genomic tubulin sequences. cDNA clones were constructed by using immature spermatogenic testis polyadenylic acid-containing ribonucleic acid as a template. alpha- and beta-tubulin clones were identified by hybrid selection and in vitro translation of the corresponding messenger ribonucleic acids, followed by immunoprecipitation and two-dimensional gel electrophoresis of the translation products. The alpha cDNA clone contains a sequence that encodes the 48 C-terminal amino acids of alpha tubulin and 104 base pairs of the 3' nontranslated portion of the messenger ribonucleic acid. The beta cDNA insertion contains the coding sequence for the 100-C terminal amino acids of beta tubulin and 83 pairs of the 3' noncoding sequence. Hybrid selections performed at different criteria demonstrated the presence of several heterogeneous, closely related tubulin messenger ribonucleic acids, suggesting the existence of heterogeneous alpha- and beta-tubulin genes. Hybridization analyses indicated that there are at least 9 to 13 sequences for each of the two tubulin gene families per haploid genome. Hybridization of the cDNA probes to both total genomic DNA and cloned germline DNA fragments gave no evidence for close physical linkage of alpha-tubulin genes with beta-tubulin genes at the DNA level. In contrast, these experiments indicated that some genes within the same family are clustered. Images PMID:6287219

  19. Sequence heterogeneity, multiplicity, and genomic organization of. cap alpha. - and. beta. -tubulin genes in Sea Urchins

    SciTech Connect

    Alexandraki, D.; Ruderman, J.V.

    1981-12-01

    The authors analyzed the multiplicity, heterogeneity, and organization of the genes encoding the ..cap alpha.. and ..beta.. tubulins in the sea urchin Lytechinus pictus by using cloned complementary deoxyribonucleic acid (cDNA) and genomic tubulin sequences. cDNA clones were constructed by using immature spermatogenic testis polyadenylic acid-containing ribonucleic acid as a template. ..cap alpha.. and ..beta..-tubulin clones were identified by hybrid selection and in vitro translation of the corresponding messenger ribonucleic acids, followed by immunoprecipitation and two-dimensional gel electrophoresis of the translation products. The ..cap alpha.. cDNA clone contains a sequence that encodes the 48 C-terminal amino acids of ..cap alpha.. tubulin and 104 base pairs of the 3' nontranslated portion of the messenger ribonucleic acid. The ..beta.. cDNA insertion contains the coding sequence for the 100 C-terminal amino acids of ..beta.. tubulin and 83 base pairs of the 3' noncoding sequence. Hybrid selections performed at different criteria demonstrated the presence of several heterogeneous, closely related tubulin messenger ribonucleic acids, suggesting the existence of heterogeneous ..cap alpha..- and ..beta..-tubulin genes. Hybridization analyses indicated that there are at least 9 to 13 sequences for each of the two tubulin gene families per haploid genome. Hybridization of the cDNA probes to both total genomic DNA and cloned germline DNA fragments gave no evidence for close physical linkage of ..cap alpha..-tubulin genes with ..beta..-tubulin genes at the DNA level. In contrast, these experiments indicated that some genes within the same family are clustered.

  20. DNA uptake sequences in Neisseria gonorrhoeae as intrinsic transcriptional terminators and markers of horizontal gene transfer

    PubMed Central

    Gurung, Neesha

    2016-01-01

    DNA uptake sequences are widespread throughout the Neisseria gonorrhoeae genome. These short, conserved sequences facilitate the exchange of endogenous DNA between members of the genus Neisseria. Often the DNA uptake sequences are present as inverted repeats that are able to form hairpin structures. It has been suggested previously that DNA uptake sequence inverted repeats present 3′ of genes play a role in rho-independent termination and attenuation. However, there is conflicting experimental evidence to support this role. The aim of this study was to determine the role of DNA uptake sequences in transcriptional termination. Both bioinformatics predictions, conducted using TransTermHP, and experimental evidence, from RNA-seq data, were used to determine which inverted repeat DNA uptake sequences are transcriptional terminators and in which direction. Here we show that DNA uptake sequences in the inverted repeat configuration occur in N. gonorrhoeae both where the DNA uptake sequence precedes the inverted version of the sequence and also, albeit less frequently, in reverse order. Due to their symmetrical configuration, inverted repeat DNA uptake sequences can potentially act as bi-directional terminators, therefore affecting transcription on both DNA strands. This work also provides evidence that gaps in DNA uptake sequence density in the gonococcal genome coincide with areas of DNA that are foreign in origin, such as prophage. This study differentiates for the first time, to our knowledge, between DNA uptake sequences that form intrinsic transcriptional terminators and those that do not, providing characteristic features within the flanking inverted repeat that can be identified. PMID:28348864

  1. Sense-antisense gene pairs: sequence, transcription, and structure are not conserved between human and mouse

    PubMed Central

    Wood, Emily J.; Chin-Inmanu, Kwanrutai; Jia, Hui; Lipovich, Leonard

    2013-01-01

    Previous efforts to characterize conservation between the human and mouse genomes focused largely on sequence comparisons. These studies are inherently limited because they don't account for gene structure differences, which may exist despite genomic sequence conservation. Recent high-throughput transcriptome studies have revealed widespread and extensive overlaps between genes, and transcripts, encoded on both strands of the genomic sequence. This overlapping gene organization, which produces sense-antisense (SAS) gene pairs, is capable of effecting regulatory cascades through established mechanisms. We present an evolutionary conservation assessment of SAS pairs, on three levels: genomic, transcriptomic, and structural. From a genome-wide dataset of human SAS pairs, we first identified orthologous loci in the mouse genome, then assessed their transcription in the mouse, and finally compared the genomic structures of SAS pairs expressed in both species. We found that approximately half of human SAS loci have single orthologous locations in the mouse genome; however, only half of those orthologous locations have SAS transcriptional activity in the mouse. This suggests that high human-mouse gene conservation overlooks widespread distinctions in SAS pair incidence and expression. We compared gene structures at orthologous SAS loci, finding frequent differences in gene structure between human and orthologous mouse SAS pair members. Our categorization of human SAS pairs with respect to mouse conservation of expression as well as structure points to limitations of mouse models. Gene structure differences, including at SAS loci, may account for some of the phenotypic distinctions between primates and rodents. Genes in non-conserved SAS pairs may contribute to evolutionary lineage-specific regulatory outcomes. PMID:24133500

  2. Gene sequences of the pil operon reveal relationships between symbiotic strains of Vibrio fischeri

    PubMed Central

    Browne-Silva, J.; Nishiguchi, M. K.

    2012-01-01

    Symbiosis between the bobtail squid Euprymna scolopes (Mollusca: Cephalopoda) and Vibrio fischeri bacteria has been a well-studied model for understanding the molecular mechanisms of colonization and adherence to host cells. For example, pilin expression has been observed to cause subtle variation in colonization for a number of Gram-negative bacteria with eukaryotic hosts. To investigate variation amongst pil genes of closely related strains of vibrios, we amplified pil genes A, B, C and D to determine orientation and sequence similarity to other symbiotic vibrios. The pilA gene was found to be upstream from all other pil genes, and not contiguous with the rest of the operon. The pilB, pilC and pilD loci were flanked at the 3′ end by yacE, followed by a conserved hypothetical gene. DNA sequences of each pil gene were aligned and analysed phylogenetically using parsimony for both individual and combined gene trees. Results demonstrate that certain pil loci (pilB and pilD) are conserved among strains of V. fischeri, but pilC differs in sequence between symbiotic and free-living strains. Phylogenetic analysis of all pil genes gives better resolution of Indo-west Pacific V. fischeri symbionts compared with analysis of the 16S rRNA gene. Hawaiian and Australian symbiotic strains form one monophyletic tree, supporting the hypothesis that V. fischeri strain specificity is selected by the geographical location of their hosts and is not related to specific squid species. PMID:18523167

  3. MYO7A and USH2A gene sequence variants in Italian patients with Usher syndrome

    PubMed Central

    Sodi, Andrea; Mariottini, Alessandro; Passerini, Ilaria; Murro, Vittoria; Bianchi, Benedetta; Menchini, Ugo; Torricelli, Francesca

    2014-01-01

    Purpose To analyze the spectrum of sequence variants in the MYO7A and USH2A genes in a group of Italian patients affected by Usher syndrome (USH). Methods Thirty-six Italian patients with a diagnosis of USH were recruited. They received a standard ophthalmologic examination, visual field testing, optical coherence tomography (OCT) scan, and electrophysiological tests. Fluorescein angiography and fundus autofluorescence imaging were performed in selected cases. All the patients underwent an audiologic examination for the 0.25–8,000 Hz frequencies. Vestibular function was evaluated with specific tests. DNA samples were analyzed for sequence variants of the MYO7A gene (for USH1) and the USH2A gene (for USH2) with direct sequencing techniques. A few patients were analyzed for both genes. Results In the MYO7A gene, ten missense variants were found; three patients were compound heterozygous, and two were homozygous. Thirty-four USH2A gene variants were detected, including eight missense variants, nine nonsense variants, six splicing variants, and 11 duplications/deletions; 19 patients were compound heterozygous, and three were homozygous. Four MYO7A and 17 USH2A variants have already been described in the literature. Among the novel mutations there are four USH2A large deletions, detected with multiplex ligation dependent probe amplification (MLPA) technology. Two potentially pathogenic variants were found in 27 patients (75%). Affected patients showed variable clinical pictures without a clear genotype-phenotype correlation. Conclusions Ten variants in the MYO7A gene and 34 variants in the USH2A gene were detected in Italian patients with USH at a high detection rate. A selective analysis of these genes may be valuable for molecular analysis, combining diagnostic efficiency with little time wastage and less resource consumption. PMID:25558175

  4. Using RNA sequencing for identifying gene imprinting and random monoallelic expression in human placenta

    PubMed Central

    Metsalu, Tauno; Viltrop, Triin; Tiirats, Airi; Rajashekar, Balaji; Reimann, Ene; Kõks, Sulev; Rull, Kristiina; Milani, Lili; Acharya, Ganesh; Basnet, Purusotam; Vilo, Jaak; Mägi, Reedik; Metspalu, Andres; Peters, Maire; Haller-Kikkatalo, Kadri; Salumets, Andres

    2014-01-01

    Given the possible critical importance of placental gene imprinting and random monoallelic expression on fetal and infant health, most of those genes must be identified, in order to understand the risks that the baby might meet during pregnancy and after birth. Therefore, the aim of the current study was to introduce a workflow and tools for analyzing imprinted and random monoallelic gene expression in human placenta, by applying whole-transcriptome (WT) RNA sequencing of placental tissue and genotyping of coding DNA variants in family trios. Ten family trios, each with a healthy spontaneous single-term pregnancy, were recruited. Total RNA was extracted for WT analysis, providing the full sequence information for the placental transcriptome. Parental and child blood DNA genotypes were analyzed by exome SNP genotyping microarrays, mapping the inheritance and estimating the abundance of parental expressed alleles. Imprinted genes showed consistent expression from either parental allele, as demonstrated by the SNP content of sequenced transcripts, while monoallelically expressed genes had random activity of parental alleles. We revealed 4 novel possible imprinted genes (LGALS8, LGALS14, PAPPA2 and SPTLC3) and confirmed the imprinting of 4 genes (AIM1, PEG10, RHOBTB3 and ZFAT-AS1) in human placenta. The major finding was the identification of 4 genes (ABP1, BCLAF1, IFI30 and ZFAT) with random allelic bias, expressing one of the parental alleles preferentially. The main functions of the imprinted and monoallelically expressed genes included: i) mediating cellular apoptosis and tissue development; ii) regulating inflammation and immune system; iii) facilitating metabolic processes; and iv) regulating cell cycle. PMID:25437054

  5. The nucleotide sequence of the dnaA gene and the first part of the dnaN gene of Escherichia coli K-12.

    PubMed

    Hansen, E B; Hansen, F G; von Meyenburg, K

    1982-11-25

    The nucleotide sequence of the dnaA gene and the first 10% of the dnaN gene was determined. From the nucleotide sequence the amino acid sequence of the dnaA gene product was derived. It is a basic protein of 467 amino acid residues with a molecular weight of 52.5 kD. The expression of the dnaA gene is in the counterclockwise direction like the one of the dnaN gene, for which potential startsites were found.

  6. The nucleotide sequence of the dnaA gene and the first part of the dnaN gene of Escherichia coli K-12.

    PubMed Central

    Hansen, E B; Hansen, F G; von Meyenburg, K

    1982-01-01

    The nucleotide sequence of the dnaA gene and the first 10% of the dnaN gene was determined. From the nucleotide sequence the amino acid sequence of the dnaA gene product was derived. It is a basic protein of 467 amino acid residues with a molecular weight of 52.5 kD. The expression of the dnaA gene is in the counterclockwise direction like the one of the dnaN gene, for which potential startsites were found. PMID:6296774

  7. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.

    PubMed

    Scolnick, Jonathan A; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C; Amorese, Douglas A

    2015-01-01

    Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells.

  8. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples

    PubMed Central

    Scolnick, Jonathan A.; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C.; Amorese, Douglas A.

    2015-01-01

    Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells. PMID:26132974

  9. Sequence analysis of the EF-1 alpha gene family of Mucor racemosus.

    PubMed Central

    Sundstrom, P; Lira, L M; Choi, D; Linz, J E; Sypherd, P S

    1987-01-01

    Our previous studies have shown that Mucor racemosus possesses three genes (TEF-1, -2 and -3) for EF-1 alpha, and that all three genes are transcribed. However, the level of transcription varies markedly between the three genes, with TEF-1 mRNA levels being approximately two fold higher than TEF-3 and 6 fold higher than TEF-2. We have now completed the DNA sequence of both strands of all three genes and have found that these genes are highly homologous. TEF-2 and TEF-3 are more similar to each other than they are to TEF-1. The TEF-2 and the TEF-3 coding regions differ from TEF-1 at 30 and 37 positions respectively out of 1374 nucleotides. Twenty-six of these nucleotide substitutions were common to both TEF-2 and TEF-3, and the majority of the substitutions were clustered in the 5' region of the coding sequences. While the majority of these changes were silent, TEF-2 and TEF-3 differed from TEF-1 by having a lysine instead of a glutamate at amino acid position 41. In addition, TEF-2 and -3, but not TEF-1, each have an intron located near the 5' end of the coding region, although its size and sequence is not conserved between the two genes. All three genes have a conserved intron near the 3' end of the coding region. The sequence data have been analyzed with respect to the structure and function of EF-1 alpha in protein biosynthesis. PMID:3697088

  10. Characterization of ADME gene variation in 21 populations by exome sequencing

    PubMed Central

    Hovelson, Daniel H.; Xue, Zhengyu; Zawistowski, Matthew; Ehm, Margaret G.; Harris, Elizabeth C.; Stocker, Sophie L.; Gross, Annette S.; Jang, In-Jin; Ieiri, Ichiro; Lee, Jong-Eun; Cardon, Lon R.; Chissoe, Stephanie L.; Abecasis, Gonçalo

    2017-01-01

    Objective Proteins involving absorption, distribution, metabolism, and excretion (ADME) play a critical role in drug pharmacokinetics. The type and frequency of genetic variation in the ADME genes differ among populations. The aim of this study was to systematically investigate common and rare ADME coding variation in diverse ethnic populations by exome sequencing. Materials and methods Data derived from commercial exome capture arrays and next-generation sequencing were used to characterize coding variation in 298 ADME genes in 251 Northeast Asians and 1181 individuals from the 1000 Genomes Project. Results Approximately 75% of the ADME coding sequence was captured at high quality across the joint samples harboring more than 8000 variants, with 49% of individuals carrying at least one ‘knockout’ allele. ADME genes carried 50% more nonsynonymous variation than non-ADME genes (P=8.2×10–13) and showed significantly greater levels of population differentiation (P=7.6×10–11). Out of the 2135 variants identified that were predicted to be deleterious, 633 were not on commercially available ADME or general-purpose genotyping arrays. Forty deleterious variants within important ADME genes, with frequencies of at least 2% in at least one population, were identified as candidates for future pharmacogenetic studies. Conclusion Exome sequencing was effective in accurately genotyping most ADME variants important for pharmacogenetic research, in addition to identifying rare or potentially de novo coding variants that may be clinically meaningful. Furthermore, as a class, ADME genes are more variable and less sensitive to purifying selection than non-ADME genes. PMID:27984508

  11. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes.

    PubMed

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-12-10

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant-pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future.

  12. Genome-wide discovery of cis-elements in promoter sequences using gene expression.

    PubMed

    Troukhan, Maxim; Tatarinova, Tatiana; Bouck, John; Flavell, Richard B; Alexandrov, Nickolai N

    2009-04-01

    The availability of complete or nearly complete genome sequences, a large number of 5' expressed sequence tags, and significant public expression data allow for a more accurate identification of cis-elements regulating gene expression. We have implemented a global approach that takes advantage of available expression data, genomic sequences, and transcript information to predict cis-elements associated with specific expression patterns. The key components of our approach are: (1) precise identification of transcription start sites, (2) specific locations of cis-elements relative to the transcription start site, and (3) assessment of statistical significance for all sequence motifs. By applying our method to promoters of Arabidopsis thaliana and Mus musculus, we have identified motifs that affect gene expression under specific environmental conditions or in certain tissues. We also found that the presence of the TATA box is associated with increased variability of gene expression. Strong correlation between our results and experimentally determined motifs shows that the method is capable of predicting new functionally important cis-elements in promoter sequences.

  13. Sequence variation in the Mc1r gene for a group of polymorphic snakes.

    PubMed

    Cox, Christian L; Rabosky, Alison R Davis; Chippindale, Paul T

    2013-01-25

    Studying the genetic factors underlying phenotypic traits can provide insight into dynamics of selection and molecular basis of adaptation, but this goal can be difficult for non-model organisms without extensive genomic resources. However, sequencing candidate genes for the trait of interest can facilitate the study of evolutionary genetics in natural populations. We sequenced the melanocortin-1 receptor (Mc1r) to study the genetic basis of color polymorphism in a group of snake species with variable black banding, the genera Sonora, Chilomeniscus, and Chionactis. Mc1r is an important gene in the melanin synthesis pathway and is associated with ecologically important variation in color pattern in birds, mammals, and other squamate reptiles. We found that Mc1r nucleotide sequence was variable and that within our focal Sonora species, there are both fixed and heterozygous nucleotide substitutions that result in an amino acid change and selection analyses indicated that Mc1r sequence was likely under purifying selection. However, we did not detect any statistical association with the presence or absence of black bands. Our results agree with other studies that have found no role for sequence variation in Mc1r and highlight the importance of comparative data for studying the phenotypic associations of candidate genes.

  14. Cloning, sequencing and expression of the Schwanniomyces occidentalis NADP-dependent glutamate dehydrogenase gene.

    PubMed

    De Zoysa, P A; Connerton, I F; Watson, D C; Johnston, J R

    1991-08-01

    The cloned NADP-specific glutamate dehydrogenase (GDH) genes of Aspergillus nidulans (gdhA) and Neurospora crassa (am) have been shown to hybridize under reduced stringency conditions to genomic sequences of the yeast Schwanniomyces occidentalis. Using 5' and 3' gene-specific probes, a unique 5.1 kb BclI restriction fragment that encompasses the entire Schwanniomyces sequence has been identified. A recombinant clone bearing the unique BclI fragment has been isolated from a pool of enriched clones in the yeast/E. coli shuttle vector pWH5 by colony hybridization. The identity of the plasmid clone was confirmed by functional complementation of the Saccharomyces cerevisiae gdh-1 mutation. The nucleotide sequence of the Schw. occidentalis GDH gene, which consists of 1380 nucleotides in a continuous reading frame of 459 amino acids, has been determined. The predicted amino acid sequence shows considerable homology with GDH proteins from other fungi and significant homology with all other available GDH sequences.

  15. DNA sequences that activate isocitrate lyase gene expression during late embryogenesis and during postgerminative growth.

    PubMed Central

    Zhang, J Z; Santes, C M; Engel, M L; Gasser, C S; Harada, J J

    1996-01-01

    We analyzed DNA sequences that regulate the expression of an isocitrate lyase gene from Brassica napus L. during late embryogenesis and during postgerminative growth to determine whether glyoxysomal function is induced by a common mechanism at different developmental stages. beta-Glucuronidase constructs were used both in transient expression assays in B. napus and in transgenic Arabidopsis thaliana to identify the segments of the isocitrate lyase 5' flanking region that influence promoter activity. DNA sequences that play the principal role in activating the promoter during post-germinative growth are located more than 1,200 bp upstream of the gene. Distinct DNA sequences that were sufficient for high-level expression during late embryogenesis but only low-level expression during postgerminative growth were also identified. Other parts of the 5' flanking region increased promoter activity both in developing seed and in seedlings. We conclude that a combination of elements is involved in regulating the isocitrate lyase gene and that distinct DNA sequences play primary roles in activating the gene in embryos and in seedlings. These findings suggest that different signals contribute to the induction of glyoxysomal function during these two developmental stages. We also showed that some of the constructs were expressed differently in transient expression assays and in transgenic plants. PMID:8934622

  16. Genome Sequence of an Acinetobacter baumannii Strain Carrying Three Acquired Carbapenemase Genes

    PubMed Central

    Oinuma, Ken-Ichi; Suzuki, Masato; Sato, Kanako; Nakaie, Kiyotaka; Niki, Makoto; Takizawa, Etsuko; Niki, Mamiko; Shibayama, Keigo; Yamada, Koichi; Kakeya, Hiroshi

    2016-01-01

    The emergence of multiple-carbapenemase-producing Acinetobacter strains has been a serious concern during the past decade. Here, we report the draft genome sequence of an Acinetobacter baumannii strain isolated from a Japanese patient with three acquired carbapenemase genes: blaNDM-1, blaTMB-1, and blaOXA-58. PMID:27856588

  17. Identification of expressed resistance gene analogs from peanut (Arachis hypogaea L.) expressed sequence tags

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cultivated peanut is an important source of protein and oil. However, low genetic diversity makes peanut vulnerable to many diseases. Several hundred of partial genomic DNA sequences targeting nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance (R) genes have been reported. Only a small...

  18. Nucleotide sequence of the Lactococcus lactis NCDO 763 (ML3) rpoD gene.

    PubMed

    Gansel, X; Hartke, A; Boutibonnes, P; Auffray, Y

    1993-10-19

    The complete nucleotide sequence of rpoD gene from Lactococcus lactis has been determined. The nucleotide data have indicated the presence of an open reading frame of 1020 base pairs encoding a polypeptide which shares the framework structure for principal sigma factors of eubacteria strains.

  19. Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis

    PubMed Central

    Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu

    2017-01-01

    Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant. PMID:28135287

  20. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp.

  1. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  2. In vivo generation of highly abundant sequence-specific oligonucleotides for antisense and triplex gene regulation.

    PubMed Central

    Noonberg, S B; Scott, G K; Garovoy, M R; Benz, C C; Hunt, C A

    1994-01-01

    Antisense and triplex oligonucleotides continue to demonstrate potential as mediators of gene-specific repression of protein synthesis. However, inefficient and heterogeneous cellular uptake, intracellular sequestration, and rapid intracellular and extracellular degradation represent obstacles to their eventual clinical utility. Efficient cellular delivery of targeted ribozymes can present similar problems. In this report we describe a system for circumventing these obstacles and producing large quantities of short, sequence-specific RNA oligonucleotides for use in these gene regulation strategies. The oligonucleotides are generated from a vector containing promoter, capping, and termination sequences from the human small nuclear U6 gene, surrounding a synthetic sequence incorporating the oligonucleotide of interest. In vivo, these oligonucleotides are produced constitutively and without cell type specificity in levels up to 5 x 10(6) copies per cell, reach steady-state levels of expression within 9 hours post-transfection, and are still readily detectable 7 days post-transfection. In addition, these oligonucleotides are retained in the nucleus, obtain a 5' gamma-monomethyl phosphate cap, and have an intracellular half-life of approximately one hour. This expression vector provides a novel and efficient method of intracellular delivery of antisense or triplex RNA oligonucleotides (and/or ribozymes) for gene regulation, as well as a cost-effective means of comparing the biological activity arising from a variety of different potential oligonucleotide sequences. Images PMID:8052538

  3. EST sequencing and gene expression profiling of cultivated peanut (Arachis hypogaea L.).

    PubMed

    Bi, Yu-Ping; Liu, Wei; Xia, Han; Su, Lei; Zhao, Chuan-Zhi; Wan, Shu-Bo; Wang, Xing-Jun

    2010-10-01

    Peanut (Arachis hypogaea L.) is one of the most important oil crops in the world. However, biotechnological based improvement of peanut is far behind many other crops. It is critical and urgent to establish the biotechnological platform for peanut germplasm innovation. In this study, a peanut seed cDNA library was constructed to establish the biotechnological platform for peanut germplasm innovation. About 17,000 expressed sequence tags (ESTs) were sequenced and used for further investigation. Among which, 12.5% were annotated as metabolic related and 4.6% encoded transcription or post-transcription factors. ESTs encoding storage protein and enzymes related to protein degradation accounted for 28.8% and formed the largest group of the annotated ESTs. ESTs that encoded stress responsive proteins and pathogen-related proteins accounted for 5.6%. ESTs that encoded unknown proteins or showed no hit in the GenBank nr database accounted for 20.1% and 13.9%, respectively. A total number of 5066 EST sequences were selected to make a cDNA microarray. Expression analysis revealed that these sequences showed diverse expression patterns in peanut seeds, leaves, stems, roots, flowers, and gynophores. We also analyzed the gene expression pattern during seed development. Genes that were upregulated (≥twofold) at 15, 25, 35, and 45 days after pegging (DAP) were found and compared with 70 DAP. The potential value of these genes and their promoters in the peanut gene engineering study is discussed.

  4. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216.

    PubMed

    Franco, Mario E E; López, Silvina; Medina, Rocio; Saparrat, Mario C N; Balatti, Pedro

    2015-09-24

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation.

  5. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  6. Phylogenetic analysis of Rutaceous plants based on single nucleotide polymorphism in chloroplast and nuclear gene sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...

  7. Identification of Legionella pneumophila serogroups and other Legionella species by mip gene sequencing.

    PubMed

    Haroon, Attiya; Koide, Michio; Higa, Futoshi; Tateyama, Masao; Fujita, Jiro

    2012-04-01

    The virulence factor known as the macrophage infectivity potentiator (mip) is responsible for the intracellular survival of Legionella species. In this study, we investigated the potential of the mip gene sequence to differentiate isolates of different species of Legionella and different serogroups of Legionella pneumophila. We used 35 clinical L. pneumophila isolates and one clinical isolate each of Legionella micdadei, Legionella longbeachae, and Legionella dumoffii (collected from hospitals all over Japan between 1980 and 2007). We used 19 environmental Legionella anisa isolates (collected in the Okinawa, Nara, Osaka, and Hyogo prefectures between 1987 and 2007) and two Legionella type strains. We extracted bacterial genomic DNA and amplified out the mip gene by PCR. PCR products were purified by agarose gel electrophoresis and the mip gene was then sequenced. The L. pneumophila isolates could be divided into two groups: one group was very similar to the type strain and was composed of serogroup (SG) 1 isolates only; the second group had more sequence variations and was composed of SG1 isolates as well as SG2, SG3, SG5, and SG10 isolates. Phylogenetic analysis displayed one cluster for L. anisa isolates, while other Legionella species were present at discrete levels. Our findings show that mip gene sequencing is an effective technique for differentiating L. pneumophila strains from other Legionella species.

  8. Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium

    PubMed Central

    Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž

    2016-01-01

    Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum. Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. PMID:27307293

  9. Tsukamurella pulmonis Bloodstream Infection Identified by secA1 Gene Sequencing

    PubMed Central

    Cano, María E.; García de la Fuente, Celia; Martínez-Martínez, Luis; López, Mónica; Fernández-Mazarrasa, Carlos

    2014-01-01

    Recurrent bloodstream infections caused by a Gram-positive bacterium affected an immunocompromised child. Tsukamurella pulmonis was the microorganism identified by secA1 gene sequencing. Antibiotic treatment in combination with removal of the subcutaneous port healed the patient. PMID:25520439

  10. Phylogeny and identification of Pantoea species and typing of Pantoea agglomerans strains by multilocus gene sequencing.

    PubMed

    Delétoile, Alexis; Decré, Dominique; Courant, Stéphanie; Passet, Virginie; Audo, Jennifer; Grimont, Patrick; Arlet, Guillaume; Brisse, Sylvain

    2009-02-01

    Pantoea agglomerans and other Pantoea species cause infections in humans and are also pathogenic to plants, but the diversity of Pantoea strains and their possible association with hosts and disease remain poorly known, and identification of Pantoea species is difficult. We characterized 36 Pantoea strains, including 28 strains of diverse origins initially identified as P. agglomerans, by multilocus gene sequencing based on six protein-coding genes, by biochemical tests, and by antimicrobial susceptibility testing. Phylogenetic analysis and comparison with other species of Enterobacteriaceae revealed that the genus Pantoea is highly diverse. Most strains initially identified as P. agglomerans by use of API 20E strips belonged to a compact sequence cluster together with the type strain, but other strains belonged to diverse phylogenetic branches corresponding to other species of Pantoea or Enterobacteriaceae and to probable novel species. Biochemical characteristics such as fosfomycin resistance and utilization of d-tartrate could differentiate P. agglomerans from other Pantoea species. All 20 strains of P. agglomerans could be distinguished by multilocus sequence typing, revealing the very high discrimination power of this method for strain typing and population structure in this species, which is subdivided into two phylogenetic groups. PCR detection of the repA gene, associated with pathogenicity in plants, was positive in all clinical strains of P. agglomerans, suggesting that clinical and plant-associated strains do not form distinct populations. We provide a multilocus gene sequencing method that is a powerful tool for Pantoea species delineation and identification and for strain tracking.

  11. Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments.

    PubMed

    Short, Cindy M; Suttle, Curtis A

    2005-01-01

    Primers were designed to amplify a 592-bp region within a conserved structural gene (g20) found in some cyanophages. The goal was to use this gene as a proxy to infer genetic richness in natural cyanophage communities and to determine if sequences were more similar in similar environments. Gene products were amplified from samples from the Gulf of Mexico, the Arctic, Southern, and Northeast and Southeast Pacific Oceans, an Arctic cyanobacterial mat, a catfish production pond, lakes in Canada and Germany, and a depth of ca. 3,246 m in the Chuckchi Sea. Amplicons were separated by denaturing gradient gel electrophoresis, and selected bands were sequenced. Phylogenetic analysis revealed four previously unknown groups of g20 clusters, two of which were entirely found in freshwater. Also, sequences with >99% identities were recovered from environments that differed greatly in temperature and salinity. For example, nearly identical sequences were recovered from the Gulf of Mexico, the Southern Pacific Ocean, an Arctic freshwater cyanobacterial mat, and Lake Constance, Germany. These results imply that closely related hosts and the viruses infecting them are distributed widely across environments or that horizontal gene exchange occurs among phage communities from very different environments. Moreover, the amplification of g20 products from deep in the cyanobacterium-sparse Chuckchi Sea suggests that this primer set targets bacteriophages other than those infecting cyanobacteria.

  12. Network-Informed Gene Ranking Tackles Genetic Heterogeneity in Exome-Sequencing Studies of Monogenic Disease.

    PubMed

    Dand, Nick; Schulz, Reiner; Weale, Michael E; Southgate, Laura; Oakey, Rebecca J; Simpson, Michael A; Schlitt, Thomas

    2015-12-01

    Genetic heterogeneity presents a significant challenge for the identification of monogenic disease genes. Whole-exome sequencing generates a large number of candidate disease-causing variants and typical analyses rely on deleterious variants being observed in the same gene across several unrelated affected individuals. This is less likely to occur for genetically heterogeneous diseases, making more advanced analysis methods necessary. To address this need, we present HetRank, a flexible gene-ranking method that incorporates interaction network data. We first show that different genes underlying the same monogenic disease are frequently connected in protein interaction networks. This motivates the central premise of HetRank: those genes carrying potentially pathogenic variants and whose network neighbors do so in other affected individuals are strong candidates for follow-up study. By simulating 1,000 exome sequencing studies (20,000 exomes in total), we model varying degrees of genetic heterogeneity and show that HetRank consistently prioritizes more disease-causing genes than existing analysis methods. We also demonstrate a proof-of-principle application of the method to prioritize genes causing Adams-Oliver syndrome, a genetically heterogeneous rare disease. An implementation of HetRank in R is available via the Website http://sourceforge.net/p/hetrank/.

  13. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method.

  14. The role of DNA repair genes in recombination between repeated sequences in yeast.

    PubMed

    Liefshitz, B; Parket, A; Maya, R; Kupiec, M

    1995-08-01

    The presence of repeated sequences in the genome represents a potential source of karyotypic instability. Genetic control of recombination is thus important to preserve the integrity of the genome. To investigate the genetic control of recombination between repeated sequences, we have created a series of isogenic strains in which we could assess the role of genes involved in DNA repair in two types of recombination: direct repeat recombination and ectopic gene conversion. Naturally occurring (Ty elements) and artificially constructed repeats could be compared in the same cell population. We have found that direct repeat recombination and gene conversion have different genetic requirements. The role of the RAD51, RAD52, RAD54, RAD55, and RAD57 genes, which are involved in recombinational repair, was investigated. Based on the phenotypes of single and double mutants, these genes can be divided into three functional subgroups: one composed of RAD52, a second one composed of RAD51 and RAD54, and a third one that includes the RAD55 and RAD57 genes. Among seven genes involved in excision repair tested, only RAD1 and RAD10 played a role in the types of recombination studied. We did not detect a differential effect of any rad mutation on Ty elements as compared to artificially constructed repeats.

  15. Power of deep sequencing and agilent microarray for gene expression profiling study.

    PubMed

    Feng, Lin; Liu, Hang; Liu, Yu; Lu, Zhike; Guo, Guangwu; Guo, Suping; Zheng, Hongwei; Gao, Yanning; Cheng, Shujun; Wang, Jian; Zhang, Kaitai; Zhang, Yong

    2010-06-01

    Next-generation sequencing-based Digital Gene Expression tag profiling (DGE) has been used to study the changes in gene expression profiling. To compare the quality of the data generated by microarray and DGE, we examined the gene expression profiles of an in vitro cell model with these platforms. In this study, 17,362 and 15,938 genes were detected by microarray and DGE, respectively, with 13,221 overlapping genes. The correlation coefficients between the technical replicates were >0.99 and the detection variance was <9% for both platforms. The dynamic range of microarray was fixed with four orders of magnitude, whereas that of DGE was extendable. The consistency of the two platforms was high, especially for those abundant genes. It was more difficult for the microarray to distinguish the expression variation of less abundant genes. Although microarrays might be eventually replaced by DGE or transcriptome sequencing (RNA-seq) in the near future, microarrays are still stable, practical, and feasible, which may be useful for most biological researchers.

  16. The Unique hmuY Gene Sequence as a Specific Marker of Porphyromonas gingivalis

    PubMed Central

    Mackiewicz, Paweł; Radwan-Oczko, Małgorzata; Kantorowicz, Małgorzata; Chomyszyn-Gajewska, Maria; Frąszczak, Magdalena; Bielecki, Marcin; Olczak, Mariusz; Olczak, Teresa

    2013-01-01

    Porphyromonas gingivalis, a major etiological agent of chronic periodontitis, acquires heme from host hemoproteins using the HmuY hemophore. The aim of this study was to develop a specific P. gingivalis marker based on a hmuY gene sequence. Subgingival samples were collected from 66 patients with chronic periodontitis and 40 healthy subjects and the entire hmuY gene was analyzed in positive samples. Phylogenetic analyses demonstrated that both the amino acid sequence of the HmuY protein and the nucleotide sequence of the hmuY gene are unique among P. gingivalis strains/isolates and show low identity to sequences found in other species (below 50 and 56%, respectively). In agreement with these findings, a set of hmuY gene-based primers and standard/real-time PCR with SYBR Green chemistry allowed us to specifically detect P. gingivalis in patients with chronic periodontitis (77.3%) and healthy subjects (20%), the latter possessing lower number of P. gingivalis cells and total bacterial cells. Isolates from healthy subjects possess the hmuY gene-based nucleotide sequence pattern occurring in W83/W50/A7436 (n = 4), 381/ATCC 33277 (n = 3) or TDC60 (n = 1) strains, whereas those from patients typically have TDC60 (n = 21), W83/W50/A7436 (n = 17) and 381/ATCC 33277 (n = 13) strains. We observed a significant correlation between periodontal index of risk of infectiousness (PIRI) and the presence/absence of P. gingivalis (regardless of the hmuY gene-based sequence pattern of the isolate identified [r = 0.43; P = 0.0002] and considering particular isolate pattern [r = 0.38; P = 0.0012]). In conclusion, we demonstrated that the hmuY gene sequence or its fragments may be used as one of the molecular markers of P. gingivalis. PMID:23844074

  17. Mining and gene ontology based annotation of SSR markers from expressed sequence tags of Humulus lupulus.

    PubMed

    Singh, Swati; Gupta, Sanchita; Mani, Ashutosh; Chaturvedi, Anoop

    2012-01-01

    Humulus lupulus is commonly known as hops, a member of the family moraceae. Currently many projects are underway leading to the accumulation of voluminous genomic and expressed sequence tag sequences in public databases. The genetically characterized domains in these databases are limited due to non-availability of reliable molecular markers. The large data of EST sequences are available in hops. The simple sequence repeat markers extracted from EST data are used as molecular markers for genetic characterization, in the present study. 25,495 EST sequences were examined and assembled to get full-length sequences. Maximum frequency distribution was shown by mononucleotide SSR motifs i.e. 60.44% in contig and 62.16% in singleton where as minimum frequency are observed for hexanucleotide SSR in contig (0.09%) and pentanucleotide SSR in singletons (0.12%). Maximum trinucleotide motifs code for Glutamic acid (GAA) while AT/TA were the most frequent repeat of dinucleotide SSRs. Flanking primer pairs were designed in-silico for the SSR containing sequences. Functional categorization of SSRs containing sequences was done through gene ontology terms like biological process, cellular component and molecular function.

  18. LOESS correction for length variation in gene set-based genomic sequence analysis

    PubMed Central

    Aboukhalil, Anton; Bulyk, Martha L.

    2012-01-01

    Motivation: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. Results: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. Availability: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/ Contact: mlbulyk@receptor.med.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22492312

  19. Case-Only Exome Sequencing and Complex Disease Susceptibility Gene Discovery: Study Design Considerations

    PubMed Central

    Wu, Lang; Schaid, Daniel J.; Sicotte, Hugues; Wieben, Eric D.; Li, Hu; Petersen, Gloria M.

    2015-01-01

    Whole exome sequencing (WES) provides an unprecedented opportunity to identify the potential etiologic role of rare functional variants in human complex diseases. Large-scale collaborations have generated germline WES data on patients with a number of diseases, especially cancer, but less often on healthy controls under the same sequencing procedures. These data can be a valuable resource for identifying new disease susceptibility loci, if study designs are appropriately applied. This review describes suggested strategies and technical considerations when focusing on case-only study designs that use WES data in complex disease scenarios. These include variant filtering based on frequency and functionality, gene prioritization, interrogation of different data types, and targeted sequencing validation. We propose that if case-only WES designs were applied in an appropriate manner, new susceptibility genes containing rare variants for human complex diseases can be detected. PMID:25371537

  20. Complete nucleotide sequence of the polymerase 3 gene of human influenza virus A/WSN/33.

    PubMed Central

    Kaptein, J S; Nayak, D P

    1982-01-01

    The complete nucleotide sequence of polymerase 3 (P3) gene of a human influenza virus (A/WSN/33) has been determined using cDNA clones except for the last 11 nucleotides which were obtained by direct RNA sequencing. The WSN P3 gene contains 2,341 nucleotides and codes for a protein of 759 amino acids (molecular weight 85,800). The WSN P3 protein, as deduced from the plus-strand DNA sequence, is basic and enriched in positively charged amino acids. In addition, it contains clusters of basic amino acids which may provide sites for the interaction of P3 protein with the capped primer, template, and/or other polymerase proteins during the transcriptive and replicative processes of influenza viral RNA. PMID:7045393

  1. Resolution of the African hominoid trichotomy by use of a mitochondrial gene sequence

    SciTech Connect

    Ruvolo, M.; Disotell, T.R.; Allard, M.W. ); Brown, W.M. ); Honeycutt, R.L. )

    1991-02-15

    Mitochondrial DNA sequences encoding the cytochrome oxidase subunit II gene have been determined for five primate species, siamang (Hylobates syndactylus), lowland gorilla (Gorilla gorilla), pygmy chimpanzee (Pan paniscus), crab-eating macaque (Macaca fascicularis), and green monkey (Cercopithecus aethiops), and compared with published sequences of other primate and nonprimate species. Comparisons of cytochrome oxidase subunit II gene sequences provide clear-cut evidence from the mitochondrial genome for the separation of the African ape trichotomy into two evolutionary lineages, one leading to gorillas and the other to humans and chimpanzees. Several different tree-building methods support this same phylogenetic tree topology. The comparisons also yield trees in which a substantial length separates the divergence point of gorillas from that of humans and chimpanzees, suggesting that the lineage most immediately ancestral to humans and chimpanzees may have been in existence for a relatively long time.

  2. Variation in the sequence and modification state of the human insulin gene flanking regions.

    PubMed

    Ullrich, A; Dull, T J; Gray, A; Philips, J A; Peter, S

    1982-04-10

    The nucleotide sequence of a highly repetitive sequence region upstream from the human insulin gene is reported. The length of this region varies between alleles in the population, and appears to be stably transmitted to the next generation in a Mendelian fashion. There is no significant correlation between the length of this sequence and two types of diabetes mellitus. We observe variation in the cleavability of a BglI recognition site downstream from the human insulin gene, which is probably due to variable nucleotide modification. This presumed modification state appears not to be inherited, and varies between tissues within an individual and between individuals for a given tissue. Both alleles in a given tissue DNA sample are modified to the same extent.

  3. Analysis of the intestinal microbiota using SOLiD 16S rRNA gene sequencing and SOLiD shotgun sequencing

    PubMed Central

    2013-01-01

    Background Metagenomics seeks to understand microbial communities and assemblages by DNA sequencing. Technological advances in next generation sequencing technologies are fuelling a rapid growth in the number and scope of projects aiming to analyze complex microbial environments such as marine, soil or the gut. Recent improvements in longer read lengths and paired-sequencing allow better resolution in profiling microbial communities. While both 454 sequencing and Illumina sequencing have been used in numerous metagenomic studies, SOLiD sequencing is not commonly used in this area, as it is believed to be more suitable in the context of reference-guided projects. Results To investigate the performance of SOLiD sequencing in a metagenomic context, we compared taxonomic profiles of SOLiD mate-pair sequencing reads with Sanger paired reads and 454 single reads. All sequences were obtained from the bacterial 16S rRNA gene, which was amplified from microbial DNA extracted from a human fecal sample. Additionally, from the same fecal sample, complete genomic microbial DNA was extracted and shotgun sequenced using SOLiD sequencing to study the composition of the intestinal microbiota and the existing microbial metabolism. We found that the microbiota composition of 16S rRNA gene sequences obtained using Sanger, 454 and SOLiD sequencing provide results comparable to the result based on shotgun sequencing. Moreover, with SOLiD sequences we obtained more resolution down to the species level. In addition, the shotgun data allowed us to determine a functional profile using the databases SEED and KEGG. Conclusions This study shows that SOLiD mate-pair sequencing is a viable and cost-efficient option for analyzing a complex microbiome. To the best of our knowledge, this is the first time that SOLiD sequencing has been used in a human sample. PMID:24564472

  4. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    SciTech Connect

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  5. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    DOE PAGES

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; ...

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less

  6. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes.

    PubMed

    Aziz, Ramy K; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A

    2015-01-01

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  7. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    PubMed Central

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-01-01

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution. PMID:26005436

  8. Two lamprey Hedgehog genes share non-coding regulatory sequences and expression patterns with gnathostome Hedgehogs.

    PubMed

    Kano, Shungo; Xiao, Jin-Hua; Osório, Joana; Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-10-13

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences.

  9. Differential methylation of genes and retrotransposons facilitates shotgun sequencing of the maize genome.

    PubMed

    Rabinowicz, P D; Schutz, K; Dedhia, N; Yordan, C; Parnell, L D; Stein, L; McCombie, W R; Martienssen, R A

    1999-11-01

    The genomes of higher plants and animals are highly differentiated, and are composed of a relatively small number of genes and a large fraction of repetitive DNA. The bulk of this repetitive DNA constitutes transposable, and especially retrotransposable, elements. It has been hypothesized that most of these elements are heavily methylated relative to genes, but the evidence for this is controversial. We show here that repeat sequences in maize are largely excluded from genomic shotgun libraries by the selection of an appropriate host strain because of their sensitivity to bacterial restriction-modification systems. In contrast, unmethylated genic regions are preserved in these genetically filtered libraries if the insert size is less than the average size of genes. The representation of unique maize sequences not found in plant reference genomes is also greatly enriched. This demonstrates that repeats, and not genes, are the primary targets of methylation in maize. The use of restrictive libraries in genome shotgun sequencing in plant genomes should allow significant representation of genes, reducing the number of reactions required.

  10. Structure and nucleotide sequence of the rat intestinal vitamin D-dependent calcium binding protein gene.

    PubMed Central

    Krisinger, J; Darwish, H; Maeda, N; DeLuca, H F

    1988-01-01

    The vitamin D-dependent intestinal calcium binding protein (ICaBP, 9 kDa) is under transcriptional regulation by 1,25-dihydroxyvitamin D3 [1,25-(OH)2D3], the hormonal active form of the vitamin. To study the mechanism of gene regulation by 1,25-(OH)2D3, we isolated the rat ICaBP gene by using a cDNA probe. Its nucleotide sequence revealed 3 exons separated by 2 introns within approximately 3 kilobases. The first exon represents only noncoding sequences, while the second and third encode the two calcium binding domains of the protein. The gene contains a 15-base-pair imperfect palindrome in the first intron that shows high homology to the estrogen-responsive element. This sequence may represent the vitamin D-responsive element involved in the regulation of the ICaBP gene. The second intron shows an 84-base-pair-long simple nucleotide repeat that implicates Z-DNA formation. Genomic Southern analysis shows that the rat gene is represented as a single copy. Images PMID:3194402

  11. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    PubMed Central

    Yockteng, Roxana; Marthey, Sylvain; Chiapello, Hélène; Gendrault, Annie; Hood, Michael E; Rodolphe, François; Devier, Benjamin; Wincker, Patrick; Dossat, Carole; Giraud, Tatiana

    2007-01-01

    Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics. PMID:17692127

  12. Mutation Spectrum of Six Genes in Chinese Phenylketonuria Patients Obtained through Next-Generation Sequencing

    PubMed Central

    Cen, Zhong; Yu, Li; Lin, Lin; Hao, Jing; Yang, Zhigang; Peng, Jiabao; Cui, Shujian; Huang, Jian

    2014-01-01

    Background The identification of gene variants plays an important role in the diagnosis of genetic diseases. Methodology/Principal Findings To develop a rapid method for the diagnosis of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficiency, we designed a multiplex, PCR-based primer panel to amplify all the exons and flanking regions (50 bp average) of six PKU-associated genes (PAH, PTS, GCH1, QDPR, PCBD1 and GFRP). The Ion Torrent Personal Genome Machine (PGM) System was used to detect mutations in all the exons of these six genes. We tested 93 DNA samples from blood specimens from 35 patients and their parents (32 families) and 26 healthy adults. Using strict bioinformatic criteria, this sequencing data provided, on average, 99.14% coverage of the 39 exons at more than 70-fold mean depth of coverage. We found 23 previously documented variants in the PAH gene and six novel mutations in the PAH and PTS genes. A detailed analysis of the mutation spectrum of these patients is described in this study. Conclusions/Significance These results were confirmed by Sanger sequencing. In conclusion, benchtop next-generation sequencing technology can be used to detect mutations in monogenic diseases and can detect both point mutations and indels with high sensitivity, fidelity and throughput at a lower cost than conventional methods in clinical applications. PMID:24705691

  13. DNA sequences of three beta-1,4-endoglucanase genes from Thermomonospora fusca.

    PubMed Central

    Lao, G; Ghangas, G S; Jung, E D; Wilson, D B

    1991-01-01

    The DNA sequences of the Thermomonospora fusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonas fimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. PMID:1904434

  14. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes.

    PubMed

    Soh, Y Q Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G; Graves, Tina; Minx, Patrick J; Fulton, Robert S; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L; Rozen, Steve; Hughes, Jennifer F; Owens, Elaine; Womack, James E; Murphy, William J; Cao, Qing; de Jong, Pieter; Warren, Wesley C; Wilson, Richard K; Skaletsky, Helen; Page, David C

    2014-11-06

    We sequenced the MSY (male-specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only 2% of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 45 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism.

  15. Large Gene Panel Sequencing in Clinical Diagnostics - Results from 501 Consecutive Cases.

    PubMed

    Pajusalu, Sander; Kahre, Tiina; Roomere, Hanno; Murumets, Ülle; Roht, Laura; Simenson, Kristi; Reimand, Tiia; Õunap, Katrin

    2017-04-05

    In addition to whole exomes, large gene panels of clinically associated genes are used as high-throughput sequencing tests in many clinical centers, but their clinical utility has been much less investigated. Here we report the results of the 501 first unselected cases for whom TruSight One panel (Illumina Inc.) was sequenced as a clinical diagnostic test for a variety of indications in our department. The analysis was restricted to virtual subpanels based on referral forms, where doctors were asked to list candidate genes or select one from predefined larger panels. A probable or definite pathogenic finding was reported in 26.3% of cases. In 238 samples for whom one to nine genes were requested for analysis, the diagnostic yield was significantly higher compared to other 263 cases for whom larger subpanels were requested (31.5% vs. 21.7% respectively, p = 0.016). Detected mutations included single nucleotide variants, small insertions and deletions, and larger copy number variants. Out of 157 reported mutations, 67 were previously undescribed. The clinical utility of large gene panel sequencing in the context of other genetic diagnostic tests is discussed in detail.

  16. Sequence analysis of 21 genes located in the Kartagener syndrome linkage region on chromosome 15q.

    PubMed

    Geremek, Maciej; Schoenmaker, Frederieke; Zietkiewicz, Ewa; Pogorzelski, Andrzej; Diehl, Scott; Wijmenga, Cisca; Witt, Michal

    2008-06-01

    Primary ciliary dyskinesia (PCD) is a rare genetic disorder, which shows extensive genetic heterogeneity and is mostly inherited in an autosomal recessive fashion. There are four genes with a proven pathogenetic role in PCD. DNAH5 and DNAI1 are involved in 28 and 10% of PCD cases, respectively, while two other genes, DNAH11 and TXNDC3, have been identified as causal in one PCD family each. We have previously identified a 3.5 cM (2.82 Mb) region on chromosome 15q linked to Kartagener syndrome (KS), a subtype of PCD characterized by the randomization of body organ positioning. We have now refined the KS candidate region to a 1.8 Mb segment containing 18 known genes. The coding regions of these genes and three neighboring genes were subjected to sequence analysis in seven KS probands, and we were able to identify 60 single nucleotide sequence variants, 35 of which resided in mRNA coding sequences. However, none of the variations alone could explain the occurrence of the disease in these patients.

  17. Cloning and sequence analysis of the LOC339524 gene in Sprague-Dawley rats.

    PubMed

    Long, Z H; Li, H; Chen, F; Zou, L Y

    2015-12-11

    We cloned the LOC339524 gene in Sprague-Dawley (SD) rats and analyzed the structure and function of the protein encoded by it. Based on the known human LOC339524 gene sequences, the full-length coding sequence of the LOC339524 gene in SD rats was cloned and amplified by the polymerase chain reaction using the complementary DNA of SD rats as a template. Bioinformatics analysis showed that the length of the cloned LOC339524 gene (GenBank accession No. KM224520) was 831 bp and it encoded a deduced protein of 276 amino acids. Sequence analysis revealed that the coded protein was identical to that produced in humans and its functional domain was located in the 138-236 amino acid fragments, a proline-rich region. Our results suggest that the encoded protein may be a significant regulator of the inflammatory response and may provide sufficient information to justify an in-depth investigation of the role of the LOC339524 gene.

  18. Analysis of the multi-copied genes and the impact of the redundant protein coding sequences on gene annotation in prokaryotic genomes.

    PubMed

    Yu, Jia-Feng; Chen, Qing-Li; Ren, Jing; Yang, Yan-Ling; Wang, Ji-Hua; Sun, Xiao

    2015-07-07

    The important roles of duplicated genes in evolutional process have been recognized in bacteria, archaebacteria and eukaryotes, while there is very little study on the multi-copied protein coding genes that share sequence identity of 100%. In this paper, the multi-copied protein coding genes in a number of prokaryotic genomes are comprehensively analyzed firstly. The results show that 0-15.93% of the protein coding genes in each genome are multi-copied genes and 0-16.49% of the protein coding genes in each genome are highly similar with the sequence identity ≥ 80%. Function and COG (Clusters of Orthologous Groups of proteins) analysis shows that 64.64% of multi-copied genes concentrate on the function of transposase and 86.28% of the COG assigned multi-copied genes concentrate on the COG code of 'L'. Furthermore, the impact of redundant protein coding sequences on the gene prediction results is studied. The results show that the problem of protein coding sequence redundancies cannot be ignored and the consistency of the gene annotation results before and after excluding the redundant sequences is negatively related with the sequences redundancy degree of the protein coding sequences in the training set.

  19. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    PubMed

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  20. Sequencing, physical organization and kinetic expression of the patulin biosynthetic gene cluster from Penicillium expansum.

    PubMed

    Tannous, Joanna; El Khoury, Rhoda; Snini, Selma P; Lippi, Yannick; El Khoury, André; Atoui, Ali; Lteif, Roger; Oswald, Isabelle P; Puel, Olivier

    2014-10-17

    Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60-70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of the mechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products.

  1. Sequencing and analysis of the gene-rich space of cowpea

    PubMed Central

    Timko, Michael P; Rushton, Paul J; Laudeman, Thomas W; Bokowiec, Marta T; Chipumuro, Edmond; Cheung, Foo; Town, Christopher D; Chen, Xianfeng

    2008-01-01

    Background Cowpea, Vigna unguiculata (L.) Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF) technology. Over 250,000 gene-space sequence reads (GSRs) with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa), and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO) with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A total of 5,888 GSRs had

  2. Genome sequence of the phage-gene rich marine Phaeobacter arcticus type strain DSM 23566T

    PubMed Central

    Freese, Heike M.; Dalingault, Hajnalka; Petersen, Jörn; Pradella, Silke; Davenport, Karen; Teshima, Hazuki; Chen, Amy; Pati, Amrita; Ivanova, Natalia; Goodwin, Lynne A.; Chain, Patrick; Detter, John C.; Rohde, Manfred; Gronow, Sabine; Kyrpides, Nikos C.; Woyke, Tanja; Brinkhoff, Thorsten; Göker, Markus; Overmann, Jörg; Klenk, Hans-Peter

    2013-01-01

    Phaeobacter arcticus Zhang et al. 2008 belongs to the marine Roseobacter clade whose members are phylogenetically and physiologically diverse. In contrast to the type species of this genus, Phaeobacter gallaeciensis, which is well characterized, relatively little is known about the characteristics of P. arcticus. Here, we describe the features of this organism including the annotated high-quality draft genome sequence and highlight some particular traits. The 5,049,232 bp long genome with its 4,828 protein-coding and 81 RNA genes consists of one chromosome and five extrachromosomal elements. Prophage sequences identified via PHAST constitute nearly 5% of the bacterial chromosome and included a potential Mu-like phage as well as a gene-transfer agent (GTA). In addition, the genome of strain DSM 23566T encodes all of the genes necessary for assimilatory nitrate reduction. Phylogenetic analysis and intergenomic distances indicate that the classification of the species might need to be reconsidered. PMID:24501630

  3. Differential DNA sequence recognition is a determinant of specificity in homeotic gene action.

    PubMed Central

    Ekker, S C; von Kessler, D P; Beachy, P A

    1992-01-01

    The homeotic genes of Drosophila encode transcriptional regulatory proteins that specify distinct segment identities. Previous studies have implicated the homeodomain as a major determinant of biological specificity within these proteins, but have not established the physical basis of this specificity. We show here that the homeodomains encoded by the Ultrabithorax and Deformed homeotic genes bind optimally to distinct DNA sequences and have mapped the determinants responsible for differential recognition. We further show that relative transactivation by these two proteins in a simple in vivo system can differ by nearly two orders of magnitude. Such differences in DNA sequence recognition and target activation provide a biochemical basis for at least part of the biological specificity of homeotic gene action. Images PMID:1356765

  4. Coupled enhancer and coding sequence evolution of a homeobox gene shaped leaf diversity

    PubMed Central

    Vuolo, Francesco; Mentink, Remco A.; Hajheidari, Mohsen; Bailey, C. Donovan; Filatov, Dmitry A.; Tsiantis, Miltos

    2016-01-01

    Here we investigate mechanisms underlying the diversification of biological forms using crucifer leaf shape as an example. We show that evolution of an enhancer element in the homeobox gene REDUCED COMPLEXITY (RCO) altered leaf shape by changing gene expression from the distal leaf blade to its base. A single amino acid substitution evolved together with this regulatory change, which reduced RCO protein stability, preventing pleiotropic effects caused by its altered gene expression. We detected hallmarks of positive selection in these evolved regulatory and coding sequence variants and showed that modulating RCO activity can improve plant physiological performance. Therefore, interplay between enhancer and coding sequence evolution created a potentially adaptive path for morphological evolution. PMID:27852629

  5. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

    PubMed Central

    Cirulli, Elizabeth T.; Lasseigne, Brittany N.; Petrovski, Slavé; Sapp, Peter C.; Dion, Patrick A.; Leblond, Claire S.; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J.; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E.; Boone, Braden E.; Wimbish, Jack R.; Waite, Lindsay L.; Jones, Angela L.; Carulli, John P.; Day-Williams, Aaron G.; Staropoli, John F.; Xin, Winnie W.; Chesi, Alessandra; Raphael, Alya R.; McKenna-Yasek, Diane; Cady, Janet; de Jong, J.M.B. Vianney; Kenna, Kevin P.; Smith, Bradley N.; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H.; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E.; Baloh, Robert H.; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M.; Gibson, Summer; Trojanowski, John Q.; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A.; Chung, Wendy K.; Ravits, John M.; Glass, Jonathan D.; Sims, Katherine B.; Van Deerlin, Vivianna M.; Maniatis, Tom; Hayes, Sebastian D.; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S.; Bedlack, Richard S.; Harper, J. Wade; Gitler, Aaron D.; Rouleau, Guy A.; Brown, Robert; Harms, Matthew B.; Cooper, Gregory M.; Harris, Tim; Myers, Richard M.; Goldstein, David B.

    2015-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. Here we report the results of a moderate-scale sequencing study aimed at identifying new genes contributing to predisposition for ALS. We performed whole exome sequencing of 2,874 ALS patients and compared them to 6,405 controls. Several known ALS genes were found to be associated, and the non-canonical IκB kinase family TANK-Binding Kinase 1 (TBK1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. PMID:25700176

  6. Gene activation properties of a mouse DNA sequence isolated by expression selection.

    PubMed Central

    von Hoyningen-Huene, V; Norbury, C; Griffiths, M; Fried, M

    1986-01-01

    The MES-1 element was previously isolated from restricted total mouse cellular DNA by "expression selection"--the ability to reactivate expression of a test gene devoid of its 5' enhancer sequences. Mes-1 has been tested in long-term transformation and short-term CAT expression assays. In both assays MES-1 is active independent of orientation and at a distance when placed 5' to the test gene. The element is active with heterologous promoters and functions efficiently in both rat and mouse cells. MES-1 activates expression by increasing transcription from the test gene's own start (cap) site. Thus the expression selection technique can be used for the isolation of DNA sequences with enhancer-like properties from total cellular DNA. Images PMID:3016657

  7. Exome Sequencing Reveals Cubilin Mutation as a Single-Gene Cause of Proteinuria

    PubMed Central

    Ovunc, Bugsu; Otto, Edgar A.; Vega-Warner, Virginia; Saisawat, Pawaree; Ashraf, Shazia; Ramaswami, Gokul; Fathy, Hanan M.; Schoeb, Dominik; Chernin, Gil; Lyons, Robert H.; Yilmaz, Engin

    2011-01-01

    In two siblings of consanguineous parents with intermittent nephrotic-range proteinuria, we identified a homozygous deleterious frameshift mutation in the gene CUBN, which encodes cubulin, using exome capture and massively parallel re-sequencing. The mutation segregated with affected members of this family and was absent from 92 healthy individuals, thereby identifying a recessive mutation in CUBN as the single-gene cause of proteinuria in this sibship. Cubulin mutations cause a hereditary form of megaloblastic anemia secondary to vitamin B12 deficiency, and proteinuria occurs in 50% of cases since cubilin is coreceptor for both the intestinal vitamin B12-intrinsic factor complex and the tubular reabsorption of protein in the proximal tubule. In summary, we report successful use of exome capture and massively parallel re-sequencing to identify a rare, single-gene cause of nephropathy. PMID:21903995

  8. Exome sequencing reveals cubilin mutation as a single-gene cause of proteinuria.

    PubMed

    Ovunc, Bugsu; Otto, Edgar A; Vega-Warner, Virginia; Saisawat, Pawaree; Ashraf, Shazia; Ramaswami, Gokul; Fathy, Hanan M; Schoeb, Dominik; Chernin, Gil; Lyons, Robert H; Yilmaz, Engin; Hildebrandt, Friedhelm

    2011-10-01

    In two siblings of consanguineous parents with intermittent nephrotic-range proteinuria, we identified a homozygous deleterious frameshift mutation in the gene CUBN, which encodes cubulin, using exome capture and massively parallel re-sequencing. The mutation segregated with affected members of this family and was absent from 92 healthy individuals, thereby identifying a recessive mutation in CUBN as the single-gene cause of proteinuria in this sibship. Cubulin mutations cause a hereditary form of megaloblastic anemia secondary to vitamin B(12) deficiency, and proteinuria occurs in 50% of cases since cubilin is coreceptor for both the intestinal vitamin B(12)-intrinsic factor complex and the tubular reabsorption of protein in the proximal tubule. In summary, we report successful use of exome capture and massively parallel re-sequencing to identify a rare, single-gene cause of nephropathy.

  9. DNA Sequence Heterogeneity of Campylobacter jejuni CJIE4 Prophages and Expression of Prophage Genes

    PubMed Central

    Clark, Clifford G.; Chong, Patrick M.; McCorrister, Stuart J.; Mabon, Philip; Walker, Matthew; Westmacott, Garrett R.

    2014-01-01

    Campylobacter jejuni carry temperate bacteriophages that can affect the biology or virulence of the host bacterium. Known effects include genomic rearrangements and resistance to DNA transformation. C. jejuni prophage CJIE1 shows sequence variability and variability in the content of morons. Homologs of the CJIE1 prophage enhance both adherence and invasion to cells in culture and increase the expression of a specific subset of bacterial genes. Other C. jejuni temperate phages have so far not been well characterized. In this study we describe investigations into the DNA sequence variability and protein expression in a second prophage, CJIE4. CJIE4 sequences were obtained de novo from DNA sequencing of five C. jejuni isolates, as well as from whole genome sequences submitted to GenBank by other research groups. These CJIE4 DNA sequences were heterogenous, with several different insertions/deletions (indels) in different parts of the prophage genome. Two variants of a 3–4 kb region inserted within CJIE4 had different gene content that distinguished two major conserved CJIE4 prophage families. Additional indels were detected throughout the prophage. Detection of proteins in the five isolates characterized in our laboratory in isobaric Tags for Relative and Absolute Quantitation (iTRAQ) experiments indicated that prophage proteins within each of the two large indel variants were expressed during growth of the bacteria on Mueller Hinton agar plates. These proteins included the extracellular DNase associated with resistance to DNA transformation and prophage repressor proteins. Other proteins associated with known or suspected roles in prophage biology were also expressed from CJIE4, including capsid protein, the phage integrase, and MazF, a type II toxin-antitoxin system protein. Together with the results previously obtained for the CJIE1 prophage these results demonstrate that sequence variability and expression of moron genes are both general properties of temperate

  10. Phylogenetic analysis of sequences from diverse bacteria with homology to the Escherichia coli rho gene.

    PubMed Central

    Opperman, T; Richardson, J P

    1994-01-01

    Genes from Pseudomonas fluorescens, Chromatium vinosum, Micrococcus luteus, Deinococcus radiodurans, and Thermotoga maritima with homology to the Escherichia coli rho gene were cloned and sequenced, and their sequences were compared with other available sequences. The species for all of the compared sequences are members of five bacterial phyla, including Thermotogales, the most deeply diverged phylum. This suggests that a rho-like gene is ubiquitous in the Bacteria and was present in their common ancestor. The comparative analysis revealed that the Rho homologs are highly conserved, exhibiting a minimum identity of 50% of their amino acid residues in pairwise comparisons. The ATP-binding domain had a particularly high degree of conservation, consisting of some blocks with sequences of residues that are very similar to segments of the alpha and beta subunits of F1-ATPase and of other blocks with sequences that are unique to Rho. The RNA-binding domain is more diverged than the ATP-binding domain. However, one of its most highly conserved segments includes a RNP1-like sequence, which is known to be involved in RNA binding. Overall, the degree of similarity is lowest in the first 50 residues (the first half of the RNA-binding domain), in the putative connector region between the RNA-binding and the ATP-binding domains, and in the last 50 residues of the polypeptide. Since functionally defective mutants for E. coli Rho exist in all three of these segments, they represent important parts of Rho that have undergone adaptive evolution. PMID:8051015

  11. Species identification using genetic tools: the value of nuclear and mitochondrial gene sequences in whale conservation.

    PubMed

    Palumbi, S R; Cipriano, F

    1998-01-01

    DNA sequence analysis is a powerful tool for identifying the source of samples thought to be derived from threatened or endangered species. Analysis of mitochondrial DNA (mtDNA) from retail whale meat markets has shown consistently that the expected baleen whale in these markets, the minke whale, makes up only about half the products analyzed. The other products are either unregulated small toothed whales like dolphins or are protected baleen whales such as humpback, Bryde's, fin, or blue whales. Independent verification of such mtDNA identifications requires analysis of nuclear genetic loci, but this is technically more difficult than standard mtDNA sequencing. In addition, evolution of species-specific sequences (i.e., fixation of sequence differences to produce reciprocally monophyletic gene trees) is slower in nuclear than in mitochondrial genes primarily because genetic drift is slower at nuclear loci. When will use of nuclear sequences allow forensic DNA identification? Comparison of neutral theories of coalescence of mitochondrial and nuclear loci suggests a simple rule of thumb. The "three-times rule" suggests that phylogenetic sorting at nuclear loci is likely to produce species-specific sequences when mitochondrial alleles are reciprocally monophyletic and the branches leading to the mtDNA sequences of a species are three times longer than the average difference observed within species. A preliminary test of the three-times rule, which depends on many assumptions about the species and genes involved, suggests that blue and fin whales should have species-specific sequences at most neutral nuclear loci, whereas humpback and fin whales should show species-specific sequences at fewer nuclear loci. Partial sequences of actin introns from these species confirm the predictions of the three-times rule and show that blue and fin whales are reciprocally monophyletic at this locus. These intron sequences are thus good tools for the identification of these species

  12. Sequence analysis of the phosphoprotein gene of peste des petits ruminants (PPR) virus: editing of the gene transcript.

    PubMed

    Mahapatra, Madhuchhanda; Parida, Satya; Egziabher, Berhe G; Diallo, Adama; Barrett, Tom

    2003-10-01

    The gene encoding the phosphoprotein of the vaccine strain of Peste des petits ruminants (PPR) virus (Nigeria 75/1 vaccine strain) has been cloned and its nucleotide sequence been determined. This gene is 1655 nucleotides long and encodes two overlapping open reading frames (ORFs). Translation from the first AUG would produce a polypeptide of 509 amino acid residues with a predicted molecular mass of 54.9 kDa, the longest of the published morbillivirus P proteins. Translation from the second AUG would produce a protein of 177 amino acid residues with a predicted molecular mass of 20.3 kDa, analogous to the C proteins of other morbilliviruses. Evidence was found for the production of two types of P mRNA transcript, one a faithful transcript of the gene and the other with an extra G residue inserted at position 751. Translation from the first AUG of this second mRNA would produce a protein of 298 amino acids, with a predicted molecular mass 32.3 kDa, analogous to the V protein produced by other morbilliviruses. Sequences of the predicted P, C and V proteins were compared with those of the other morbillivirus sequences available to date. The P protein was found to be the most poorly conserved of the morbillivirus proteins, the amino acid identity ranging from 54% in case of Canine distemper virus (CDV) to 60% in the case of the Dolphin morbillivirus (DMV).

  13. Deep sequencing identifies viral and wasp genes with potential roles in replication of Microplitis demolitor Bracovirus.

    PubMed

    Burke, Gaelen R; Strand, Michael R

    2012-03-01

    Viruses in the genus Bracovirus (BV) (Polydnaviridae) are symbionts of parasitoid wasps that specifically replicate in the ovaries of females. Recent analysis of expressed sequence tags from two wasp species, Cotesia congregata and Chelonus inanitus, identified transcripts related to 24 different nudivirus genes. These results together with other data strongly indicate that BVs evolved from a nudivirus ancestor. However, it remains unclear whether BV-carrying wasps contain other nudivirus-like genes and what types of wasp genes may also be required for BV replication. Microplitis demolitor carries Microplitis demolitor bracovirus (MdBV). Here we characterized MdBV replication and performed massively parallel sequencing of M. demolitor ovary transcripts. Our results indicated that MdBV replication begins in stage 2 pupae and continues in adults. Analysis of prereplication- and active-replication-stage ovary RNAs yielded 22 Gb of sequence that assembled into 66,425 transcripts. This breadth of sampling indicated that a large percentage of genes in the M. demolitor genome were sequenced. A total of 41 nudivirus-like transcripts were identified, of which a majority were highly expressed during MdBV replication. Our results also identified a suite of wasp genes that were highly expressed during MdBV replication. Among these products were several transcripts with conserved roles in regulating locus-specific DNA amplification by eukaryotes. Overall, our data set together with prior results likely identify the majority of nudivirus-related genes that are transcriptionally functional during BV replication. Our results also suggest that amplification of proviral DNAs for packaging into BV virions may depend upon the replication machinery of wasps.

  14. An atlas of human gene expression from massively parallel signature sequencing (MPSS)

    PubMed Central

    Jongeneel, C. Victor; Delorenzi, Mauro; Iseli, Christian; Zhou, Daixing; Haudenschild, Christian D.; Khrebtukova, Irina; Kuznetsov, Dmitry; Stevenson, Brian J.; Strausberg, Robert L.; Simpson, Andrew J.G.; Vasicek, Thomas J.

    2005-01-01

    We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com. PMID:15998913

  15. Cloning, sequencing and application of the LEU2 gene from the sour dough yeast Candida milleri.

    PubMed

    Turakainen, Hilkka; Korhola, Matti

    2005-07-30

    We have cloned by complementation in Saccharomyces cerevisiae and sequenced a LEU2 gene from the sour dough yeast Candida milleri CBS 8195 and studied its chromosomal location. The LEU2 coding sequence was 1092 nt long encoding a putative beta-isopropylmalate dehydrogenase protein of 363 amino acids. The nucleotide sequence in the coding region had 71.6% identity to S. cerevisiae LEU2 sequence. On the protein level, the identity of C. milleri Leu2p to S. cerevisiae Leu2p was 84.1%. The CmLEU2 DNA probe hybridized to one to three chromosomal bands and two or three BamHI restriction fragments in C. milleri but did not give any signal to chromosomes or restriction fragments of C. albicans, S. cerevisiae, S. exiguus or Torulaspora delbrueckii. Using CmLEU2 probe for DNA hybridization makes it easy to quickly identify C. milleri among other sour dough yeasts.

  16. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences.

    PubMed

    Zhang, Chengjun; Wang, Jun; Marowsky, Nicholas C; Long, Manyuan; Wing, Rod A; Fan, Chuanzhu

    2013-01-01

    In an effort to identify newly evolved genes in rice, we searched the genomes of Asian-cultivated rice Oryza sativa ssp. japonica and its wild progenitors, looking for lineage-specific genes. Using genome pairwise comparison of approximately 20-Mb DNA sequences from the chromosome 3 short arm (Chr3s) in six rice species, O. sativa, O. nivara, O. rufipogon, O. glaberrima, O. barthii, and O. punctata, combined with synonymous substitution rate tests and other evidence, we were able to identify potential recently duplicated genes, which evolved within the last 1 Myr. We identified 28 functional O. sativa genes, which likely originated after O. sativa diverged from O. glaberrima. These genes account for around 1% (28/3,176) of all annotated genes on O. sativa's Chr3s. Among the 28 new genes, two recently duplicated segments contained eight genes. Fourteen of the 28 new genes consist of chimeric gene structure derived from one or multiple parental genes and flanking targeting sequences. Although the majority of these 28 new genes were formed by single or segmental DNA-based gene duplication and recombination, we found two genes that were likely originated partially through exon shuffling. Sequence divergence tests between new genes and their putative progenitors indicated that new genes were most likely evolving under natural selection. We showed all 28 new genes appeared to be functional, as suggested by Ka/Ks analysis and the presence of RNA-seq, cDNA, expressed sequence tag, massively parallel signature sequencing, and/or small RNA data. The high rate of new gene origination and of chimeric gene formation in rice may demonstrate rice's broad diversification, domestication, its environmental adaptation, and the role of new genes in rice speciation.

  17. Sequence length polymorphisms within primate amelogenin and amelogenin-like genes: usefulness in sex determination.

    PubMed

    Morrill, Benson H; Rickords, Lee F; Schafstall, Heather J

    2008-10-01

    Sequence length polymorphisms between the amelogenin (AMELX) and the amelogenin-like (AMELY) genes both within and between several mammalian species have been identified and utilized for sex determination, species identification, and to elucidate evolutionary relationships. Sex determination via polymerase chain reaction (PCR) assays of the AMELX and AMELY genes has been successful in greater apes, prosimians, and two species of old world monkeys. To date, no sex determination PCR assay using AMELX and AMELY has been developed for new world monkeys. In this study, we present partial AMELX and AMELY sequences for five old world monkey species (Mandrillus sphinx, Macaca nemestrina, Macaca fuscata, Macaca mulatta, and Macaca fascicularis) along with primer sets that can be used for sex determination of these five species. In addition, we compare the sequences we generated with other primate AMELX and AMELY sequences available on GenBank and discuss sequence length polymorphisms and their usefulness in sex determination within primates. The mandrill and four species of macaque all share two similar deletion regions with each other, the human, and the chimpanzee in the region sequenced. These two deletion regions are 176-181 and 8 nucleotides in length. In analyzing existing primate sequences on GenBank, we also discovered that a separate six-nucleotide polymorphism located approximately 300 nucleotides upstream of the 177 nucleotide polymorphism in sequences of humans and chimps was also present in two species of new world monkeys (Saimiri boliviensis and Saimiri sciureus). We designed primers that incorporate this polymorphism, creating the first AMELX and AMELY PCR primer set that has been used successfully to generate two bands in a new world monkey species.

  18. The human ATP synthase beta subunit gene: sequence analysis, chromosome assignment, and differential expression.

    PubMed

    Neckelmann, N; Warner, C K; Chung, A; Kudoh, J; Minoshima, S; Fukuyama, R; Maekawa, M; Shimizu, Y; Shimizu, N; Liu, J D

    1989-11-01

    In humans, the functional F0F1-ATP synthase beta subunit gene is located on chromosome 12 in the p13----qter region. Other partially homologous sequences have been detected on chromosomes 2 and 17. The bona fide beta subunit gene has 10 exons encoding a leader peptide of 49 amino acids and a mature protein of 480 amino acids. Thirteen Alu family DNA repeats are found upstream from the gene and in four introns. The gene has four "CCAAT" sequences upstream and in close proximity to the transcriptional initiation site. A 13-bp motif is found in the 5' nontranscribed region of both the beta subunit gene and an ADP/ATP translocator gene that is expressed in high levels in cardiac and skeletal muscle. Analysis of the beta subunit mRNA levels reveals marked differences among tissues. The highest levels are found in heart, lower levels in skeletal muscle, and the lowest levels in liver and kidney. These findings suggest that the tissue-specific levels of ATP synthase beta subunit mRNA may be generated through transcriptional control.

  19. Maize R2R3 Myb genes: Sequence analysis reveals amplification in the higher plants.

    PubMed

    Rabinowicz, P D; Braun, E L; Wolfe, A D; Bowen, B; Grotewold, E

    1999-09-01

    Transcription factors containing the Myb-homologous DNA-binding domain are widely found in eukaryotes. In plants, R2R3 Myb-domain proteins are involved in the control of form and metabolism. The Arabidopsis genome harbors >100 R2R3 Myb genes, but few have been found in monocots, animals, and fungi. Using RT-PCR from different maize organs, we cloned 480 fragments corresponding to a 42-44 residue-long sequence spanning the region between the conserved DNA-recognition helices (Myb(BRH)) of R2R3 Myb domains. We determined that maize expresses >80 different R2R3 Myb genes, and evolutionary distances among maize Myb(BRH) sequences indicate that most of the amplification of the R2R3 Myb gene family occurred after the origin of land plants but prior to the separation of monocots and dicots. In addition, evidence is provided for the very recent duplication of particular classes of R2R3 Myb genes in the grasses. Together, these findings render a novel line of evidence for the amplification of the R2R3 Myb gene family in the early history of land plants and suggest that maize provides a possible model system to examine the hypothesis that the expansion of Myb genes is associated with the regulation of novel plant cellular functions.

  20. Challenges in identifying cancer genes by analysis of exome sequencing data

    PubMed Central

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F.; Bandyopadhyay, Sourav; Mischel, Paul S.; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13–60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  1. Sequence breakpoints in the aflatoxin biosynthesis gene cluster and flanking regions in nonaflatoxigenic Aspergillus flavus isolates.

    PubMed

    Chang, Perng-Kuang; Horn, Bruce W; Dorner, Joe W

    2005-11-01

    Aspergillus flavus populations are genetically diverse. Isolates that produce either, neither, or both aflatoxins and cyclopiazonic acid (CPA) are present in the field. We investigated defects in the aflatoxin gene cluster in 38 nonaflatoxigenic A. flavus isolates collected from southern United States. PCR assays using aflatoxin-gene-specific primers grouped these isolates into eight (A-H) deletion patterns. Patterns C, E, G, and H, which contain 40 kb deletions, were examined for their sequence breakpoints. Pattern C has one breakpoint in the cypA 3' untranslated region (UTR) and another in the verA coding region. Pattern E has a breakpoint in the amdA coding region and another in the ver1 5'UTR. Pattern G contains a deletion identical to the one found in pattern C and has another deletion that extends from the cypA coding region to one end of the chromosome as suggested by the presence of telomeric sequence repeats, CCCTAATGTTGA. Pattern H has a deletion of the entire aflatoxin gene cluster from the hexA coding region in the sugar utilization gene cluster to the telomeric region. Thus, deletions in the aflatoxin gene cluster among A. flavus isolates are not rare, and the patterns appear to be diverse. Genetic drift may be a driving force that is responsible for the loss of the entire aflatoxin gene cluster in nonaflatoxigenic A. flavus isolates when aflatoxins have lost their adaptive value in nature.

  2. Characterization and sequence analysis of the human homeobox-containing gene GBX2

    SciTech Connect

    Lin, Xu; Vaccarino, F.M.; Haas, M.

    1996-02-01

    Polymerase chain reaction (PCR) was used to amplify portions of homeobox genes present in a human 11-week fetal brain cDNA library. One of these PCR products was determined by sequencing to be the Gastrulation and brain specific-2 gene (GBX2). Screening this human fetal brain cDNA library with probes specific for GBX2 led to the identification of a 2151-bp clone encodes for a protein of 347 amino acid residues. The amino acid sequence of the GBX2 homeodomain is identical (100%) to the that of homologous gene, Gbx2, expressed in the developing mouse embryo and virtually identical (97%) to a gene expressed in the developing chicken embryo, CHox7. The 5{prime} end of the GBX2 gene contains a CpG island in the untranslated region and a trinucleotide (CCG){sub 8} repeat in the coding region. The amino-terminal end of the GBX2 protein is proline-rich, with 30 proline residues in one stretch of 120 by Northern analysis in the developing human CNS as well as in other tissues. The human genomic clone for GBX2 was also isolated, characterized, and mapped to 2q36(d)-q37 by somatic cell hybrid analysis and fluorescence in situ hybridization. These studies provide a framework for designing future experiments that are needed to determine the functional significance of this gene in CNS development. 38 refs., 4 figs.

  3. Nucleotide Sequence and Gene Organization of the Starfish Asterina Pectinifera Mitochondrial Genome

    PubMed Central

    Asakawa, S.; Himeno, H.; Miura, K. I.; Watanabe, K.

    1995-01-01

    The 16,260-bp mitochondrial DNA (mtDNA) from the starfish Asterina pectinifera has been sequenced. The genes for 13 proteins, two rRNAs and 22 tRNAs are organized in an extremely economical fashion, similar to those of other animal mtDNAs, with some of the genes overlapping each other. The gene organization is the same as that for another echinoderm, sea urchin, except for the inversion of a 4.6-kb segment that contains genes for two proteins, 13 tRNAs and the 16S rRNA. Judging from the organization of the protein coding genes, mammalian mtDNAs resemble the sea urchin mtDNA more than that of the starfish. The region around the 3' end of the 12S rRNA gene of the starfish shows a high similarity with those for vertebrates. This region encodes a possible stem and loop structure; similar potential structures occur in this region of vertebrate mtDNAs and also in nonmitochondrial small subunit rRNA. A similar stem and loop structure is also found at the 3' end of the 16S rRNA genes in A. pectinifera, in another starfish Pisaster ochraceus, in vertebrates and in Drosophila, but not in sea urchins. The full sequence data confirm the presumption that AGA/AGG, AUA and AAA codons, respectively, code for serine, isoleucine, and asparagine in the starfish mitochondria, and that AGA/AGG codons are read by tRNA(GCU)(Ser), which possesses a truncated dihydrouridine arm, that was previously suggested from a partial mtDNA sequence. The structural characteristics of tRNAs and possible mechanisms for the change in the mitochondrial genetic code are also discussed. PMID:7672576

  4. Identification of candidate genes in rice for resistance to sheath blight disease by whole genome sequencing.

    PubMed

    Silva, James; Scheffler, Brian; Sanabria, Yamid; De Guzman, Christian; Galam, Dominique; Farmer, Andrew; Woodward, Jimmy; May, Gregory; Oard, James

    2012-01-01

    Recent advances in whole genome sequencing (WGS) have allowed identification of genes for disease susceptibility in humans. The objective of our research was to exploit whole genome sequences of 13 rice (Oryza sativa L.) inbred lines to identify non-synonymous SNPs (nsSNPs) and candidate genes for resistance to sheath blight, a disease of worldwide significance. WGS by the Illumina GA IIx platform produced an average 5× coverage with ~700 K variants detected per line when compared to the Nipponbare reference genome. Two filtering strategies were developed to identify nsSNPs between two groups of known resistant and susceptible lines. A total of 333 nsSNPs detected in the resistant lines were absent in the susceptible group. Selected variants associated with resistance were found in 11 of 12 chromosomes. More than 200 genes with selected nsSNPs were assigned to 42 categories based on gene family/gene ontology. Several candidate genes belonged to families reported in previous studies, and three new regions with novel candidates were also identified. A subset of 24 nsSNPs detected in 23 genes was selected for further study. Individual alleles of the 24 nsSNPs were evaluated by PCR whose presence or absence corresponded to known resistant or susceptible phenotypes of nine additional lines. Sanger sequencing confirmed presence of 12 selected nsSNPs in two lines. "Resistant" nsSNP alleles were detected in two accessions of O. nivara that suggests sources for resistance occur in additional Oryza sp. Results from this study provide a foundation for future basic research and marker-assisted breeding of rice for sheath blight resistance.

  5. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences.

  6. Molecular Diagnostics of Gliomas Using Next Generation Sequencing of a Glioma-Tailored Gene Panel.

    PubMed

    Zacher, Angela; Kaulich, Kerstin; Stepanow, Stefanie; Wolter, Marietta; Köhrer, Karl; Felsberg, Jörg; Malzkorn, Bastian; Reifenberger, Guido

    2017-03-01

    Current classification of gliomas is based on histological criteria according to the World Health Organization (WHO) classification of tumors of the central nervous system. Over the past years, characteristic genetic profiles have been identified in various glioma types. These can refine tumor diagnostics and provide important prognostic and predictive information. We report on the establishment and validation of gene panel next generation sequencing (NGS) for the molecular diagnostics of gliomas. We designed a glioma-tailored gene panel covering 660 amplicons derived from 20 genes frequently aberrant in different glioma types. Sensitivity and specificity of glioma gene panel NGS for detection of DNA sequence variants and copy number changes were validated by single gene analyses. NGS-based mutation detection was optimized for application on formalin-fixed paraffin-embedded tissue specimens including small stereotactic biopsy samples. NGS data obtained in a retrospective analysis of 121 gliomas allowed for their molecular classification into distinct biological groups, including (i) isocitrate dehydrogenase gene (IDH) 1 or 2 mutant astrocytic gliomas with frequent α-thalassemia/mental retardation syndrome X-linked (ATRX) and tumor protein p53 (TP53) gene mutations, (ii) IDH mutant oligodendroglial tumors with 1p/19q codeletion, telomerase reverse transcriptase (TERT) promoter mutation and frequent Drosophila homolog of capicua (CIC) gene mutation, as well as (iii) IDH wildtype glioblastomas with frequent TERT promoter mutation, phosphatase and tensin homolog (PTEN) mutation and/or epidermal growth factor receptor (EGFR) amplification. Oligoastrocytic gliomas were genetically assigned to either of these groups. Our findings implicate gene panel NGS as a promising diagnostic technique that may facilitate integrated histological and molecular glioma classification.

  7. Post-transcriptional mending of gene sequences: Looking under the hood of mitochondrial gene expression in diplonemids.

    PubMed

    Valach, Matus; Moreira, Sandrine; Faktorová, Drahomíra; Lukeš, Julius; Burger, Gertraud

    2016-12-01

    The instructions to make proteins and structural RNAs are laid down in gene sequences. Yet, in certain instances, these primary instructions need to be modified considerably during gene expression, most often at the transcript level. Here we review a case of massive post-transcriptional revisions via trans-splicing and RNA editing, a phenomenon occurring in mitochondria of a recently recognized protist group, the diplonemids. As of now, the various post-transcriptional steps have been cataloged in detail, but how these processes function is still unknown. Since genetic manipulation techniques such as gene replacement and RNA interference have not yet been established for these organisms, alternative strategies have to be deployed. Here, we discuss the experimental and bioinformatics approaches that promise to unravel the molecular machineries of trans-splicing and RNA editing in Diplonema mitochondria.

  8. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  9. Sequence Diversity of VP4 and VP7 Genes of Human Rotavirus Strains in Saudi Arabia.

    PubMed

    Abdel-Moneim, Ahmed S; Al-Malky, Mater I R; Alsulaimani, Adnan A A; Abuelsaad, Abdelaziz S A; Mohamed, Imad; Ismail, Ayman K

    2015-12-01

    Group A rotavirus is responsible for inducing severe diarrhea in young children worldwide. Rotavirus vaccines are used to control the disease in many countries. In the current study, the sequences of human rotavirus G and P types in Saudi Arabia are reported and compared to different relevant published sequences. In addition, the VP4 and VP7 genes of the G1P[8] strains are compared to different antigenic epitopes of the rotavirus vaccines. Stool samples were collected from children under 2 years suffering from severe diarrhea. Screening of the rotavirus-positive samples was performed with rapid antigen detection kit. RNA was amplified from rotavirus-positive samples by reverse transcriptase polymerase chain reaction assay for both VP4 and VP7 genes. Direct sequencing of the VP4 and VP7 genes was conducted and the obtained sequences were compared to each other and to the rotavirus vaccines. Both G1P[8] G1P[4] genotypes were detected. Phylogenetic analysis revealed that the detected strains belong to G1 lineage 1 and 2, P[8] lineage 3, and to P[4] lineage 5. Multiple amino acid substitutions were detected between the Saudi RVA strains and the commonly used vaccines. The current findings emphasize the importance of the continuous surveillance of the circulating rotavirus strains, which is crucial for monitoring virus evolution and helping in predicting the protection level afforded by rotavirus vaccines.

  10. Sequence and analysis of the gene for bacteriophage T3 RNA polymerase.

    PubMed Central

    McGraw, N J; Bailey, J N; Cleaves, G R; Dembinski, D R; Gocke, C R; Joliffe, L K; MacWright, R S; McAllister, W T

    1985-01-01

    The RNA polymerases encoded by bacteriophages T3 and T7 have similar structures, but exhibit nearly exclusive template specificities. We have determined the nucleotide sequence of the region of T3 DNA that encodes the T3 RNA polymerase (the gene 1.0 region), and have compared this sequence with the corresponding region of T7 DNA. The predicted amino acid sequence of the T3 RNA polymerase exhibits very few changes when compared to the T7 enzyme (82% of the residues are identical). Significant differences appear to cluster in three distinct regions in the amino-terminal half of the protein. Analysis of the data from both enzymes suggests features that may be important for polymerase function. In particular, a region that differs between the T3 and T7 enzymes exhibits significant homology to the bi-helical domain that is common to many sequence-specific DNA binding proteins. The region that flanks the structural gene contains a number of regulatory elements including: a promoter for the E. coli RNA polymerase, a potential processing site for RNase III and a promoter for the T3 polymerase. The promoter for the T3 RNA polymerase is located only 12 base pairs distal to the stop codon for the structural gene. PMID:3903658

  11. Increased functional protein expression using nucleotide sequence features enriched in highly expressed genes in zebrafish.

    PubMed

    Horstick, Eric J; Jordan, Diana C; Bergeron, Sadie A; Tabor, Kathryn M; Serpe, Mihaela; Feldman, Benjamin; Burgess, Harold A

    2015-04-20

    Many genetic manipulations are limited by difficulty in obtaining adequate levels of protein expression. Bioinformatic and experimental studies have identified nucleotide sequence features that may increase expression, however it is difficult to assess the relative influence of these features. Zebrafish embryos are rapidly injected with calibrated doses of mRNA, enabling the effects of multiple sequence changes to be compared in vivo. Using RNAseq and microarray data, we identified a set of genes that are highly expressed in zebrafish embryos and systematically analyzed for enrichment of sequence features correlated with levels of protein expression. We then tested enriched features by embryo microinjection and functional tests of multiple protein reporters. Codon selection, releasing factor recognition sequence and specific introns and 3' untranslated regions each increased protein expression between 1.5- and 3-fold. These results suggested principles for increasing protein yield in zebrafish through biomolecular engineering. We implemented these principles for rational gene design in software for codon selection (CodonZ) and plasmid vectors incorporating the most active non-coding elements. Rational gene design thus significantly boosts expression in zebrafish, and a similar approach will likely elevate expression in other animal models.

  12. Comparison of exon 5 sequences from 35 class I genes of the BALB/c mouse

    PubMed Central

    1989-01-01

    DNA sequences of the fifth exon, which encodes the transmembrane domain, were determined for the BALB/c mouse class I MHC genes and used to study the relationships between them. Based on nucleotide sequence similarity, the exon 5 sequences can be divided into seven groups. Although most members within each group are at least 80% similar to each other, comparison between groups reveals that the groups share little similarity. However, in spite of the extensive variation of the fifth exon sequences, analysis of their predicted amino acid translations reveals that only four class I gene fifth exons have frameshifts or stop codons that terminate their translation and prevent them from encoding a domain that is both hydrophobic and long enough to span a lipid bilayer. Exactly 27 of the remaining fifth exons could encode a domain that is similar to those of the transplantation antigens in that it consists of a proline-rich connecting peptide, a transmembrane segment, and a cytoplasmic portion with membrane- anchoring basic residues. The conservation of this motif in the majority of the fifth exon translations in spite of extensive variation suggests that selective pressure exists for these exons to maintain their ability to encode a functional transmembrane domain, raising the possibility that many of the nonclassical class I genes encode functionally important products. PMID:2584927

  13. Identification of planarian homeobox sequences indicates the antiquity of most Hox/homeotic gene subclasses.

    PubMed Central

    Balavoine, G; Telford, M J

    1995-01-01

    The homeotic gene complex (HOM-C) is a cluster of genes involved in the anteroposterior axial patterning of animal embryos. It is composed of homeobox genes belonging to the Hox/HOM superclass. Originally discovered in Drosophila, Hox/HOM genes have been identified in organisms as distantly related as arthropods, vertebrates, nematodes, and cnidarians. Data obtained in parallel from the organization of the complex, the domains of gene expression during embryogenesis, and phylogenetic relationships allow the subdivision of the Hox/HOM superclass into five classes (lab, pb/Hox3, Dfd, Antp, and Abd-B) that appeared early during metazoan evolution. We describe a search for homologues of these genes in platyhelminths, triploblast metazoans emerging as an outgroup to the great coelomate ensemble. A degenerate PCR screening for Hox/HOM homeoboxes in three species of triclad planarians has revealed 10 types of Antennapedia-like genes. The homeobox-containing sequences of these PCR fragments allowed the amplification of the homeobox-coding exons for five of these genes in the species Polycelis nigra. A phylogenetic analysis shows that two genes are clear orthologues of Drosophila labial, four others are members of a Dfd/Antp superclass, and a seventh gene, although more difficult to classify with certainty, may be related to the pb/Hox3 class. Together with previously identified Hox/HOM genes in other flatworms, our analyses demonstrate the existence of an elaborate family of Hox/HOM genes in the ancestor of all triploblast animals. Images Fig. 4 PMID:7638172

  14. Amplification of complete gag gene sequences from geographically distinct equine infectious anemia virus isolates.

    PubMed

    Boldbaatar, Bazartseren; Bazartseren, Tsevel; Koba, Ryota; Murakami, Hironobu; Oguma, Keisuke; Murakami, Kenji; Sentsui, Hiroshi

    2013-04-01

    In the current study, primers described previously and modified versions of these primers were evaluated for amplification of full-length gag genes from different equine infectious anemia virus (EIAV) strains from several countries, including the USA, Germany and Japan. Each strain was inoculated into a primary horse leukocyte culture, and the full-length gag gene was amplified by reverse transcription polymerase chain reaction. Each amplified gag gene was cloned into a plasmid vector for sequencing, and the detectable copy numbers of target DNA were determined. Use of a mixture of two forward primers and one reverse primer in the polymerase chain reaction enabled the amplification of all EIAV strains used in this study. However, further study is required to confirm these primers as universal for all EIAV strains. The nucleotide sequence of gag is considered highly conserved, as evidenced by the use of gag-encoded capsid proteins as a common antigen for the detection of EIAV in serological tests. However, significant sequence variation in the gag genes of different EIAV strains was found in the current study.

  15. Identification of sex-specific polymorphic sequences in the goat amelogenin gene for embryo sexing.

    PubMed

    Tsai, T C; Wu, S H; Chen, H L; Tung, Y T; Cheng, W T K; Huang, J C; Chen, C M

    2011-08-01

    Amelogenin (AMEL) is a conserved gene located on the sex chromosomes of mammals. It is involved in the formation of enamel, which is the hard, white material that forms the protective outer layer of each tooth. In this study, we first cloned and determined the intron sequences of the goat AMELX and AMELY genes from female and male ear tissues. The polymorphic AMEL alleles were further analyzed by PCR-based RFLP and Southern blot hybridization analyses. Results showed that intron 5 nucleotide sequences of the goat AMELY gene contains multiple deletions/insertions and shares only 48.5% identity to intron 5 of the goat AMELX gene. Based on the polymorphic AMEL intron sequences, a set of sex-specific triplex primers was designed to PCR amplify a single fragment of 264 bp from the X chromosome of female goats and 2 fragments of 264 and 206 bp from the X and Y chromosomes, respectively, of male goats. An increased sensitivity for sex determination was reached with a single blastomere at the blastula stage isolated from goat embryos. A total of 43 goat embryos were used to estimate a 100% accuracy rate of this method confirmed by chromosomal karyotyping and live births. The embryo sexing technique has been successfully applied in different strains of goats including Alpine, Saanen, Nubian, and Taiwan goats.

  16. Mouse Cmu heavy chain immunoglobulin gene segment contains three intervening sequences separating domains.

    PubMed

    Calame, K; Rogers, J; Early, P; Davis, M; Livant, D; Wall, R; Hood, L

    1980-04-03

    The IgM molecule is composed of subunits made up of two light chain and two heavy chain (mu) polypeptides. The mu chain is encoded by several gene segments--variable (V), joining (J) and constant (Cmu). The Cmu gene segment is of particular interest for several reasons. First, the mu chain must exist in two very different environments--as an integral membrane protein in receptor IgM molecules (micrometer) and as soluble serum protein in IgM molecules into the blood (mus). Second, the Cmu region in mus is composed of four homology units or domains (Cmu1, Cmu2, Cmu3 and Cmu4) of approximately 110 amino acid residues plus a C-terminal tail of 19 residues. We asked two questions concerning the organisation of the Cmu gene segment. (1) Are the homology units separated by intervening DNA sequences as has been reported for alpha (ref. 5), gamma 1 (ref. 6) and gamma 2b (ref. 7) heavy chain genes? (2) Is the C-terminal tail separated from the Cmu4 domain by an intervening DNA sequence? If so, DNA rearrangements or RNA splicing could generate hydrophilic and hydrophobic C-terminal tails for the mus and micrometer polypeptides, respectively. We demonstrate here that intervening DNA sequences separate each of the four coding regions for Cmu domains, and that the coding regions for the Cmu4 domains and the C-terminal tail are directly contiguous.

  17. Sequences homologous to ZFY, a candidate human sex-determining gene, are autosomal in marsupials.

    PubMed

    Sinclair, A H; Foster, J W; Spencer, J A; Page, D C; Palmer, M; Goodfellow, P N; Graves, J A

    Sexual differentiation in placental mammals results from the action of a testis-determining gene encoded by the Y chromosome. This gene causes the indifferent gonad to develop as a testis, thereby initiating a hormonal cascade which produces a male phenotype. Recently, a candidate for the testis-determining gene (ZFY, Y-borne zinc-finger protein) has been cloned. The ZFY probe detects a male-specific (Y-linked) sequence in DNA from a range of eutherian mammals, as well as an X-linked sequence (ZFX) which maps to the human X chromosome. In marsupials it is also the Y chromosome that seems to determine the fate of the gonad, but not all sexual dimorphisms. Using the ZFY probe we find, surprisingly, that the ZFY homologous sequences are not on either the X or the Y chromosome in marsupials, but map to the autosomes. This implies ZFY is not the primary sex-determining gene in marsupials. Either the genetic pathways of sex determination in marsupials and eutherians differ, or they are identical and ZFY is not the primary signal in human sex determination.

  18. Coxiella burnetii superoxide dismutase gene: cloning, sequencing, and expression in Escherichia coli.

    PubMed Central

    Heinzen, R A; Frazier, M E; Mallavia, L P

    1992-01-01

    A superoxide dismutase (SOD) gene from the obligate intracellular bacterium Coxiella burnetii has been cloned, and its DNA sequence has been determined and expressed in Escherichia coli. The gene was identified on pSJR50, a pHC79-derived genomic clone, by using the polymerase chain reaction with degenerate oligonucleotide primers corresponding to conserved regions of known SODs. Sequences resembling conventional E. coli ribosomal and RNA polymerase-binding sites preceded the C. burnetii 579-bp SOD open reading frame. An E. coli SOD-deficient double mutant (sodA sodB) that carried pSJR50 had growth and survival responses similar to those of the wild type when the transformant was challenged with 0.05 mM paraquat and 5 mM hydrogen peroxide, respectively. These observations indicated that the C. burnetii gene was functionally expressed in E. coli. Staining of native polyacrylamide gels for SOD activity demonstrated that pSJR50 insert DNA codes for an SOD that comigrates with an SOD found in C. burnetii cell lysates. The enzyme was inactivated by 5 mM hydrogen peroxide, which is indicative of an iron-containing SOD. Additionally, the predicted amino acid sequence was significantly more homologous to known iron-containing SODs than to manganese-containing SODs. Isolation of the C. burnetii SOD gene may provide an opportunity to examine its role in the intracellular survival of this rickettsia. Images PMID:1500190

  19. Cloning and sequencing of the genes from Salmonella typhimurium encoding a new bacterial ribonucleotide reductase.

    PubMed Central

    Jordan, A; Gibert, I; Barbé, J

    1994-01-01

    A plasmid library of Salmonella typhimurium was used to complement a temperature-sensitive nrdA mutant of Escherichia coli. Complementation was obtained with two different classes of plasmids, one carrying the E. coli nrdAB-like genes and the second containing an operon encoding a new bacterial ribonucleotide reductase. Plasmids harboring these new reductase genes also enable obligately anaerobic nrdB::Mud1 E. coli mutants to grow in the presence of oxygen. This operon consists of two open reading frames, which have been designated nrdE (2,145 bp) and nrdF (969 bp). The deduced amino acid sequences of the nrdE and nrdF products include the catalytically important residues conserved in ribonucleotide reductase enzymes of class I and show 25 and 28% overall identity with the R1 and R2 protein, respectively, of the aerobic ribonucleoside diphosphate reductase of E. coli. The 3' end of the sequenced 4.9-kb fragment corresponds to the upstream region of the previously published proU operon of both S. typhimurium and E. coli, indicating that the nrdEF genes are at 57 min on the chromosomal maps of these two bacterial species. Analysis of the nrdEF and proU sequences demonstrates that transcription of the nrdEF genes is in the clockwise direction on the S. typhimurium and E. coli maps. Images PMID:8195103

  20. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus.

  1. Drosophila GRAIL: An intelligent system for gene recognition in Drosophila DNA sequences

    SciTech Connect

    Xu, Ying; Einstein, J.R.; Uberbacher, E.C.; Helt, G.; Rubin, G.

    1995-06-01

    An AI-based system for gene recognition in Drosophila DNA sequences was designed and implemented. The system consists of two main modules, one for coding exon recognition and one for single gene model construction. The exon recognition module finds a coding exon by recognition of its splice junctions (or translation start) and coding potential. The core of this module is a set of neural networks which evaluate an exon candidate for the possibility of being a true coding exon using the ``recognized`` splice junction (or translation start) and coding signals. The recognition process consists of four steps: generation of an exon candidate pool, elimination of improbable candidates using heuristic rules, candidate evaluation by trained neural networks, and candidate cluster resolution and final exon prediction. The gene model construction module takes as input the clustered exon candidates and builds a ``best`` possible single gene model using an efficient dynamic programming algorithm. 129 Drosophila sequences consisting of 441 coding exons including 216358 coding bases were extructed from GenBank and used to build statistical matrices and to train the neural networks. On this training set the system recognized 97% of the coding messages and predicted only 5% false messages. Among the ``correctly`` predicted exons, 68% match the actual exon exactly and 96% have at least one edge predicted correctly. On an independent test set consisting of 30 Drosophila sequences, the system recognized 96% of the coding messages and predicted 7% false messages.

  2. Identification and characterization of rhizospheric microbial diversity by 16S ribosomal RNA gene sequencing

    PubMed Central

    Naveed, Muhammad; Mubeen, Samavia; khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad

    2014-01-01

    In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization. PMID:25477935

  3. Large scale in silico identification of MYB family genes from wheat expressed sequence tags.

    PubMed

    Cai, Hongsheng; Tian, Shan; Dong, Hansong

    2012-10-01

    The MYB proteins constitute one of the largest transcription factor families in plants. Much research has been performed to determine their structures, functions, and evolution, especially in the model plants, Arabidopsis, and rice. However, this transcription factor family has been much less studied in wheat (Triticum aestivum), for which no genome sequence is yet available. Despite this, expressed sequence tags are an important resource that permits opportunities for large scale gene identification. In this study, a total of 218 sequences from wheat were identified and confirmed to be putative MYB proteins, including 1RMYB, R2R3-type MYB, 3RMYB, and 4RMYB types. A total of 36 R2R3-type MYB genes with complete open reading frames were obtained. The putative orthologs were assigned in rice and Arabidopsis based on the phylogenetic tree. Tissue-specific expression pattern analyses confirmed the predicted orthologs, and this meant that gene information could be inferred from the Arabidopsis genes. Moreover, the motifs flanking the MYB domain were analyzed using the MEME web server. The distribution of motifs among wheat MYB proteins was investigated and this facilitated subfamily classification.

  4. Cloning, sequencing and expression of the gene encoding the extracellular metalloprotease of Aeromonas caviae.

    PubMed

    Kawakami, K; Toma, C; Honma, Y

    2000-01-01

    A gene (apk) encoding the extracellular protease of Aeromonas caviae Ae6 has been cloned and sequenced. For cloning the gene, the DNA genomic library was screened using skim milk LB agar. One clone harboring plasmid pKK3 was selected for sequencing. Nucleotide sequencing of the 3.5 kb region of pKK3 revealed a single open reading frame (ORF) of 1,785 bp encoding 595 amino acids. The deduced polypeptide contained a putative 16-amino acid signal peptide followed by a large propeptide. The N-terminal amino acid sequence of purified recombinant protein (APK) was consistent with the DNA sequence. This result suggested a mature protein of 412 amino acids with a molecular mass of 44 kDa. However, the molecular mass of purified recombinant APK revealed 34 kDa by SDS-PAGE, suggesting that further processing at the C-terminal region took place. The 2 motifs of zinc binding sites deduced are highly conserved in the APK as well as in other zinc metalloproteases including Vibrio proteolyticus neutral protease, Emp V from Vibrio vulnificus, HA/P from Vibrio cholerae, and Pseudomonas aeruginosa elastase. Proteolytic activity was inhibited by EDTA, Zincov, 1,10-phenanthroline and tetraethylenepentamine while unaffected by the other inhibitors tested. The protease showed maximum activity at pH 7.0 and was inactivated by heating at 80 C for 15 min. These results together suggest that APK belongs to the thermolysin family of metalloendopeptidases.

  5. Nucleotide sequence analysis of beta tubulin gene in a wide range of dermatophytes.

    PubMed

    Rezaei-Matehkolaei, Ali; Mirhendi, Hossein; Makimura, Koichi; de Hoog, G Sybren; Satoh, Kazuo; Najafzadeh, Mohammad Javad; Shidfar, Mohammad Reza

    2014-10-01

    We investigated the resolving power of the beta tubulin protein-coding gene (BT2) for systematic study of dermatophyte fungi. Initially, 144 standard and clinical strains belonging to 26 species in the genera Trichophyton, Microsporum, and Epidermophyton were identified by internal transcribe spacer (ITS) sequencing. Subsequently, BT2 was partially amplified in all strains, and sequence analysis performed after construction of a BT2 database that showed length ranged from approximately 723 (T. ajelloi) to 808 nucleotides (M. persicolor) in different species. Intraspecific sequence variation was found in some species, but T. tonsurans, T. equinum, T. concentricum, T. verrucosum, T. rubrum, T. violaceum, T. eriotrephon, E. floccosum, M. canis, M. ferrugineum, and M. audouinii were invariant. The sequences were found to be relatively conserved among different strains of the same species. The species with the closest resemblance were Arthroderma benhamiae and T. concentricum and T. tonsurans and T. equinum with 100% and 99.8% identity, respectively; the most distant species were M. persicolor and M. amazonicum. The dendrogram obtained from BT2 topology was almost compatible with the species concept based on ITS sequencing, and similar clades and species were distinguished in the BT2 tree. Here, beta tubulin was characterized in a wide range of dermatophytes in order to assess intra- and interspecies variation and resolution and was found to be a taxonomically valuable gene.

  6. Evolutionary complexities of swine flu H1N1 gene sequences of 2009.

    PubMed

    Sinha, Niladri Kumar; Roy, Ayan; Das, Ballari; Das, Santasabuj; Basak, Surajit

    2009-12-18

    A recently emerged novel influenza A (H1N1) virus continues to spread globally. The pandemic caused by this new H1N1 swine influenza virus presents an opportunity to analyze the evolutionary significance of the origin of the new strain of swine flu. Our study clearly suggests that strong purifying selection is responsible for the evolution of the novel influenza A (H1N1) virus among human. We observed that the 2009 viral sequences are evolutionarily widely different from the past few years' sequences. Rather, the 2009 sequences are evolutionarily more similar to the most ancient sequence reported in the NCBI Influenza Virus Resource Database collected in 1918. Analysis of evolutionary rates also supports the view that all the genes in the pandemic strain of 2009 except NA and M genes are derived from triple reassorted swine viruses. Our study demonstrates the importance of using complete-genome approach as more sequences will become available to investigate the evolutionary origin of the 1918 influenza A (H1N1) swine flu strain and the possibility of future reassortment events.

  7. Variation in sequence and organization of splicing regulatory elements in vertebrate genes

    PubMed Central

    Yeo, Gene; Hoon, Shawn; Venkatesh, Byrappa; Burge, Christopher B.

    2004-01-01

    Although core mechanisms and machinery of premRNA splicing are conserved from yeast to human, the details of intron recognition often differ, even between closely related organisms. For example, genes from the pufferfish Fugu rubripes generally contain one or more introns that are not properly spliced in mouse cells. Exploiting available genome sequence data, a battery of sequence analysis techniques was used to reach several conclusions about the organization and evolution of splicing regulatory elements in vertebrate genes. The classical splice site and putative branch site signals are completely conserved across the vertebrates studied (human, mouse, pufferfish, and zebrafish), and exonic splicing enhancers also appear broadly conserved in vertebrates. However, another class of splicing regulatory elements, the intronic splicing enhancers, appears to differ substantially between mammals and fish, with G triples (GGG) very abundant in mammalian introns but comparatively rare in fish. Conversely, short repeats of AC and GT are predicted to function as intronic splicing enhancers in fish but are not enriched in mammalian introns. Consistent with this pattern, exonic splicing enhancer-binding SR proteins are highly conserved across all vertebrates, whereas heterogeneous nuclear ribonucleoproteins, which bind many intronic sequences, vary in domain structure and even presence/absence between mammals and fish. Exploiting differences in intronic sequence composition, a statistical model was developed to predict the splicing phenotype of Fugu introns in mammalian systems and was used to engineer the spliceability of a Fugu intron in human cells by insertion of specific sequences, thereby rescuing splicing in human cells. PMID:15505203

  8. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene.

    PubMed Central

    Heilbronn, R; Jahn, G; Bürkle, A; Freese, U K; Fleckenstein, B; zur Hausen, H

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSV-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at Tm - 25 degrees C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Epstein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein. Images PMID:3023689

  9. Genomic localization, sequence analysis, and transcription of the putative human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Heilbronn, T.; Jahn, G.; Buerkle, A.; Freese, U.K.; Fleckenstein, B.; Zur Hausen, H.

    1987-01-01

    The human cytomegalovirus (HCMV)-induced DNA polymerase has been well characterized biochemically and functionally, but its genomic location has not yet been assigned. To identify the coding sequence, cross-hybridization with the herpes simplex virus type 1 (HSV-1) polymerase gene was used, as suggested by the close similarity of the herpes group virus-induced DNA polymerases to the HCMV DNA polymerase. A cosmid and plasmid library of the entire HCMV genome was screened with the BamHI Q fragment of HSF-1 at different stringency conditions. One PstI-HincII restriction fragment of 850 base pairs mapping within the EcoRI M fragment of HCMV cross-hybridized at T/sub m/ - 25/degrees/C. Sequence analysis revealed one open reading frame spanning the entire sequence. The amino acid sequence showed a highly conserved domain of 133 amino acids shared with the HSV and putative Esptein-Barr virus polymerase sequences. This domain maps within the C-terminal part of the HSV polymerase gene, which has been suggested to contain part of the catalytic center of the enzyme. Transcription analysis revealed one 5.4-kilobase early transcript in the sense orientation with respect to the open reading frame identified. This transcript appears to code for the 140-kilodalton HCMV polymerase protein.

  10. Phylogenetic Relationships and the Evolution of Regulatory Gene Sequences in the Parrotfishes

    PubMed Central

    Smith, Lydia L.; Fessler, Jennifer L.; Alfaro, Michael E.; Streelman, J. Todd; Westneat, Mark W.

    2008-01-01

    Regulatory genes control the expression of other genes and are key components of developmental processes such as segmentation and embryonic construction of the skull in vertebrates. Here we examine the variability and evolution of three vertebrate regulatory genes, addressing issues of their utility for phylogenetics and comparing the rates of genetic change seen in regulatory loci to the rates seen in other genes in the parrotfishes. The parrotfishes are a diverse group of colorful fishes from coral reefs and seagrasses worldwide and have been placed phylogenetically within the family Labridae. We tested phylogenetic hypotheses among the parrotfishes, with a focus on the genera Chlorurus and Scarus, by analyzing eight gene fragments for 42 parrotfishes and eight outgroup species. We sequenced mitochondrial 12s rRNA (967 bp), 16s rRNA (577 bp), and cytochrome b (477 bp). From the nuclear genome, we sequenced part of the protein-coding genes rag2 (715 bp), tmo4c4 (485 bp), and the developmental regulatory genes otx1 (672 bp), bmp4 (488 bp), and dlx2 (522 bp). Bayesian, likelihood, and parsimony analyses on the resulting 4903 bp of DNA sequence produced similar topologies that confirm the monophyly of the scarines and provide a phylogeny at the species level for portions of the genera Scarus and Chlorurus. Four major clades of Scarus were recovered, with three distributed in the Indo-Pacific and one containing Caribbean/Atlantic taxa. Molecular rates suggest a Miocene origin of the parrotfishes (22 mya) and a recent divergence of species within Scarus and Chlorurus, within the past 5 million years. Developmentally important genes made a significant contribution to phylogenetic structure, and rates of genetic evolution were high in bmp4, similar to other coding nuclear genes, but low in otx1 and the dlx2 exons. Synonymous and nonsynonymous substitution patterns in developmental regulatory genes support the hypothesis of stabilizing selection during the history of

  11. Discovery of differentially expressed genes in cashmere goat (Capra hircus) hair follicles by RNA sequencing.

    PubMed

    Qiao, X; Wu, J H; Wu, R B; Su, R; Li, C; Zhang, Y J; Wang, R J; Zhao, Y H; Fan, Y X; Zhang, W G; Li, J Q

    2016-09-02

    The mammalian hair follicle (HF) is a unique, highly regenerative organ with a distinct developmental cycle. Cashmere goat (Capra hircus) HFs can be divided into two categories based on structure and development time: primary and secondary follicles. To identify differentially expressed genes (DEGs) in the primary and secondary HFs of cashmere goats, the RNA sequencing of six individuals from Arbas, Inner Mongolia, was performed. A total of 617 DEGs were identified; 297 were upregulated while 320 were downregulated. Gene ontology analysis revealed that the main functions of the upregulated genes were electron transport, respiratory electron transport, mitochondrial electron transport, and gene expression. The downregulated genes were mainly involved in cell autophagy, protein complexes, neutrophil aggregation, and bacterial fungal defense reactions. According to the Kyoto Encyclopedia of Genes and Genomes database, these genes are mainly involved in the metabolism of cysteine and methionine, RNA polymerization, and the MAPK signaling pathway, and were enriched in primary follicles. A microRNA-target network revealed that secondary follicles are involved in several important biological processes, such as the synthesis of keratin-associated proteins and enzymes involved in amino acid biosynthesis. In summary, these findings will increase our understanding of the complex molecular mechanisms of HF development and cycling, and provide a basis for the further study of the genes and functions of HF development.

  12. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing.

    PubMed

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-05-31

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies.

  13. Gene profiling of bone around orthodontic mini-implants by RNA-sequencing analysis.

    PubMed

    Nahm, Kyung-Yen; Heo, Jung Sun; Lee, Jae-Hyung; Lee, Dong-Yeol; Chung, Kyu-Rhim; Ahn, Hyo-Won; Kim, Seong-Hun

    2015-01-01

    This study aimed to evaluate the genes that were expressed in the healing bones around SLA-treated titanium orthodontic mini-implants in a beagle at early (1-week) and late (4-week) stages with RNA-sequencing (RNA-Seq). Samples from sites of surgical defects were used as controls. Total RNA was extracted from the tissue around the implants, and an RNA-Seq analysis was performed with Illumina TruSeq. In the 1-week group, genes in the gene ontology (GO) categories of cell growth and the extracellular matrix (ECM) were upregulated, while genes in the categories of the oxidation-reduction process, intermediate filaments, and structural molecule activity were downregulated. In the 4-week group, the genes upregulated included ECM binding, stem cell fate specification, and intramembranous ossification, while genes in the oxidation-reduction process category were downregulated. GO analysis revealed an upregulation of genes that were related to significant mechanisms, including those with roles in cell proliferation, the ECM, growth factors, and osteogenic-related pathways, which are associated with bone formation. From these results, implant-induced bone formation progressed considerably during the times examined in this study. The upregulation or downregulation of selected genes was confirmed with real-time reverse transcription polymerase chain reaction. The RNA-Seq strategy was useful for defining the biological responses to orthodontic mini-implants and identifying the specific genetic networks for targeted evaluations of successful peri-implant bone remodeling.

  14. Whole genome sequence of Desulfovibrio magneticus strain RS-1 revealed common gene clusters in magnetotactic bacteria

    PubMed Central

    Nakazawa, Hidekazu; Arakaki, Atsushi; Narita-Yamada, Sachiko; Yashiro, Isao; Jinno, Koji; Aoki, Natsuko; Tsuruyama, Ai; Okamura, Yoshiko; Tanikawa, Satoshi; Fujita, Nobuyuki; Takeyama, Haruko; Matsunaga, Tadashi

    2009-01-01

    Magnetotactic bacteria are ubiquitous microorganisms that synthesize intracellular magnetite particles (magnetosomes) by accumulating Fe ions from aquatic environments. Recent molecular studies, including comprehensive proteomic, transcriptomic, and genomic analyses, have considerably improved our hypotheses of the magnetosome-formation mechanism. However, most of these studies have been conducted using pure-cultured bacterial strains of α-proteobacteria. Here, we report the whole-genome sequence of Desulfovibrio magneticus strain RS-1, the only isolate of magnetotactic microorganisms classified under δ-proteobacteria. Comparative genomics of the RS-1 and four α-proteobacterial strains revealed the presence of three separate gene regions (nuo and mamAB-like gene clusters, and gene region of a cryptic plasmid) conserved in all magnetotactic bacteria. The nuo gene cluster, encoding NADH dehydrogenase (complex I), was also common to the genomes of three iron-reducing bacteria exhibiting uncontrolled extracellular and/or intracellular magnetite synthesis. A cryptic plasmid, pDMC1, encodes three homologous genes that exhibit high similarities with those of other magnetotactic bacterial strains. In addition, the mamAB-like gene cluster, encoding the key components for magnetosome formation such as iron transport and magnetosome alignment, was conserved only in the genomes of magnetotactic bacteria as a similar genomic island-like structure. Our findings suggest the presence of core genetic components for magnetosome biosynthesis; these genes may have been acquired into the magnetotactic bacterial genomes by multiple gene-transfer events during proteobacterial evolution. PMID:19675025

  15. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing

    PubMed Central

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-01-01

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies. PMID:27025386

  16. Nucleotide sequence analysis and DNA hybridization studies of the ant(4')-IIa gene from Pseudomonas aeruginosa.

    PubMed Central

    Shaw, K J; Munayyer, H; Rather, P N; Hare, R S; Miller, G H

    1993-01-01

    The ant(4')-IIa gene was previously cloned from Pseudomonas aeruginosa on a 1.6-kb DNA fragment (G. A. Jacoby, M. J. Blaser, P. Santanam, H. Hächler, F. H. Kayser, R. S. Hare, and G. H. Miller, Antimicrob. Agents Chemother. 34:2381-2386, 1990). In the current study, the ant(4')-IIa gene was localized by gamma-delta mutagenesis. A region of approximately 600 nucleotides which contained the ant(4')-IIa gene was identified, and DNA sequence analysis revealed two overlapping open reading frames (ORFs) within this region. Northern (RNA) blot analysis demonstrated expression of both ORFs in P. aeruginosa; therefore, site-directed mutagenesis was used to identify the ORF which encodes the ant(4')-IIa gene. No homology was found between ant(4')-IIa and ant(4')-Ia DNA sequences. Hybridization experiments confirmed that the ant(4')-Ia probe hybridized only to gram-positive presumptive ANT(4')-I strains and that the ant(4')-IIa probe hybridized only to gram-negative strains presumed to carry ANT(4')-II. Seven gram-negative strains which had been classified as having ANT(4')-II resistance profiles did not hybridize with probes for either ant(4')-Ia or ant(4')-IIa, suggesting that at least one additional ant(4') gene may exist. The predicted amino-terminal sequences of the ANT(4')-Ia and ANT(4')-IIa proteins showed significant sequence similarity between residues 38 and 63 of the ANT(4')-Ia protein and residues 26 and 51 of the ANT(4')-IIa protein. PMID:8494365

  17. Multiple gene sequencing for risk assessment in patients with early-onset or familial breast cancer.

    PubMed

    Lin, Po-Han; Kuo, Wen-Hung; Huang, Ai-Chu; Lu, Yen-Shen; Lin, Ching-Hung; Kuo, Sung-Hsin; Wang, Ming-Yang; Liu, Chun-Yu; Cheng, Fiona Tsui-Fen; Yeh, Ming-Hsin; Li, Huei-Ying; Yang, Yu-Hsuan; Hsu, Yu-Hua; Fan, Sheng-Chih; Li, Long-Yuan; Yu, Sung-Liang; Chang, King-Jen; Chen, Pei-Lung; Ni, Yen-Hsuan; Huang, Chiun-Sheng

    2016-02-16

    Since BRCA mutations are only responsible for 10-20% of cases of breast cancer in patients with early-onset or a family history and since next-generation sequencing technology allows the simultaneous sequencing of a large number of target genes, testing for multiple cancer-predisposing genes is now being considered, but its significance in clinical practice remains unclear. We then developed a sequencing panel containing 68 genes that had cancer risk association for patients with early-onset or familial breast cancer. A total of 133 patients were enrolled and 30 (22.6%) were found to carry germline deleterious mutations, 9 in BRCA1, 11 in BRCA2, 2 in RAD50, 2 in TP53 and one each in ATM, BRIP1, FANCI, MSH2, MUTYH, and RAD51C. Triple-negative breast cancer (TNBC) was associated with the highest mutation rate (45.5%, p = 0.025). Seven of the 9 BRCA1 mutations and the single FANCI mutation were in the TNBC group; 9 of the 11 BRCA2, 1 of the 2 RAD50 as well as BRIP1, MSH2, MUTYH, and RAD51C mutations were in the hormone receptor (HR)(+)Her2(-) group, and the other RAD50, ATM, and TP53 mutations were in the HR(+)Her2(+) group. Mutation carriers were considered as high-risk to develop malignancy and advised to receive cancer screening. Screening protocols of non-BRCA genes were based on their biologic functions; for example, patients carrying RAD51C mutation received a screening protocol similar to that for BRCA, since BRCA and RAD51C are both involved in homologous recombination. In conclusion, we consider that multiple gene sequencing in cancer risk assessment is clinically valuable.

  18. Multiple gene sequencing for risk assessment in patients with early-onset or familial breast cancer

    PubMed Central

    Lin, Po-Han; Kuo, Wen-Hung; Huang, Ai-Chu; Lu, Yen-Shen; Lin, Ching-Hung; Kuo, Sung-Hsin; Wang, Ming-Yang; Liu, Chun-Yu; Cheng, Fiona Tsui-Fen; Yeh, Ming-Hsin; Li, Huei-Ying; Yang, Yu-Hsuan; Hsu, Yu-Hua; Fan, Sheng-Chih; Li, Long-Yuan; Yu, Sung-Liang; Chang, King-Jen; Chen, Pei-Lung; Ni, Yen-Hsuan; Huang, Chiun-Sheng

    2016-01-01

    Since BRCA mutations are only responsible for 10–20% of cases of breast cancer in patients with early-onset or a family history and since next-generation sequencing technology allows the simultaneous sequencing of a large number of target genes, testing for multiple cancer-predisposing genes is now being considered, but its significance in clinical practice remains unclear. We then developed a sequencing panel containing 68 genes that had cancer risk association for patients with early-onset or familial breast cancer. A total of 133 patients were enrolled and 30 (22.6%) were found to carry germline deleterious mutations, 9 in BRCA1, 11 in BRCA2, 2 in RAD50, 2 in TP53 and one each in ATM, BRIP1, FANCI, MSH2, MUTYH, and RAD51C. Triple-negative breast cancer (TNBC) was associated with the highest mutation rate (45.5%, p = 0.025). Seven of the 9 BRCA1 mutations and the single FANCI mutation were in the TNBC group; 9 of the 11 BRCA2, 1 of the 2 RAD50 as well as BRIP1, MSH2, MUTYH, and RAD51C mutations were in the hormone receptor (HR)(+)Her2(−) group, and the other RAD50, ATM, and TP53 mutations were in the HR(+)Her2(+) group. Mutation carriers were considered as high-risk to develop malignancy and advised to receive cancer screening. Screening protocols of non-BRCA genes were based on their biologic functions; for example, patients carrying RAD51C mutation received a screening protocol similar to that for BRCA, since BRCA and RAD51C are both involved in homologous recombination. In conclusion, we consider that multiple gene sequencing in cancer risk assessment is clinically valuable. PMID:26824983

  19. Draft Genome Sequence of a Sequence Type 11 Klebsiella pneumoniae Clinical Strain Carrying a blaKPC-2 Carbapenemase Gene and an rmtB 16S rRNA Methylase Gene.

    PubMed

    Yao, Zhihong; Feng, Yu; Wei, Li; Zong, Zhiyong

    2017-02-09

    Klebsiella pneumoniae strain WCHKP649, recovered from a human wound, carried the carbapenemase gene blaKPC-2 and 16S rRNA methylase gene rmtB Here, we report its 5.6-Mb draft genome sequence, comprising 171 contigs with an average 57.34% G+C content. The genome contained 5,284 coding sequences and 84 tRNA genes.

  20. Draft Genome Sequence of a Sequence Type 11 Klebsiella pneumoniae Clinical Strain Carrying a blaKPC-2 Carbapenemase Gene and an rmtB 16S rRNA Methylase Gene

    PubMed Central

    Yao, Zhihong; Feng, Yu; Wei, Li

    2017-01-01

    ABSTRACT Klebsiella pneumoniae strain WCHKP649, recovered from a human wound, carried the carbapenemase gene blaKPC-2 and 16S rRNA methylase gene rmtB. Here, we report its 5.6-Mb draft genome sequence, comprising 171 contigs with an average 57.34% G+C content. The genome contained 5,284 coding sequences and 84 tRNA genes. PMID:28183754

  1. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

    PubMed

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-07-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms.

  2. FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences

    PubMed Central

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-01-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  3. Novel alpha-conotoxins identified by gene sequencing from cone snails native to Hainan, and their sequence diversity.

    PubMed

    Luo, Sulan; Zhangsun, Dongting; Zhang, Ben; Quan, Yaru; Wu, Yong

    2006-11-01

    Conotoxins (CTX) from the venom of marine cone snails (genus Conus) represent large families of proteins, which show a similar precursor organization with surprisingly conserved signal sequence of the precursor peptides, but highly diverse pharmacological activities. By using the conserved sequences found within the genes that encode the alpha-conotoxin precursors, a technique based on RT-PCR was used to identify, respectively, two novel peptides (LiC22, LeD2) from the two worm-hunting Conus species Conus lividus, and Conus litteratus, and one novel peptide (TeA21) from the snail-hunting Conus species Conus textile, all native to Hainan in China. The three peptides share an alpha4/7 subfamily alpha-conotoxins common cysteine pattern (CCX(4)CX(7)C, two disulfide bonds), which are competitive antagonists of nicotinic acetylcholine receptor (nAChRs). The cDNA of LiC22N encodes a precursor of 40 residues, including a propeptide of 19 residues and a mature peptide of 21 residues. The cDNA of LeD2N encodes a precursor of 41 residues, including a propeptide of 21 residues and a mature peptide of 16 residues with three additional Gly residues. The cDNA of TeA21N encodes a precursor of 38 residues, including a propeptide of 20 residues and a mature peptide of 17 residues with an additional residue Gly. The additional residue Gly of LeD2N and TeA21N is a prerequisite for the amidation of the preceding C-terminal Cys. All three sequences are processed at the common signal site -X-Arg- immediately before the mature peptide sequences. The properties of the alpha4/7 conotoxins known so far were discussed in detail. Phylogenetic analysis of the new conotoxins in the present study and the published homologue of alpha4/7 conotoxins from the other Conus species were performed systematically. Patterns of sequence divergence for the three regions of signal, proregion, and mature peptides, both nucleotide acids and residue substitutions in DNA and peptide levels, as well as Cys codon

  4. Nucleotide sequencing and characterization of the genes encoding benzene oxidation enzymes of Pseudomonas putida.

    PubMed Central

    Irie, S; Doi, S; Yorifuji, T; Takagi, M; Yano, K

    1987-01-01

    The nucleotide sequence of the genes from Pseudomonas putida encoding oxidation of benzene to catechol was determined. Five open reading frames were found in the sequence. Four corresponding protein molecules were detected by a DNA-directed in vitro translation system. Escherichia coli cells containing the fragment with the four open reading frames transformed benzene to cis-benzene glycol, which is an intermediate of the oxidation of benzene to catechol. The relation between the product of each cistron and the components of the benzene oxidation enzyme system is discussed. Images PMID:3667527

  5. Nucleotide sequence of the leukotoxin gene from Actinobacillus actinomycetemcomitans: homology to the alpha-hemolysin/leukotoxin gene family.

    PubMed Central

    Kraig, E; Dailey, T; Kolodrubetz, D

    1990-01-01

    The leukotoxin produced by Actinobacillus actinomycetemcomitans has been implicated in the etiology of localized juvenile periodontitis. To initiate a genetic analysis into the role of this protein in disease, we have cloned its gene, lktA. We now present the complete nucleotide sequence of the lktA gene from A. actinomycetemcomitans. When the deduced amino acid sequence of the leukotoxin protein was compared with those of other proteins, it was found to be homologous to the leukotoxin from Pasteurella haemolytica and to the alpha-hemolysins from Escherichia coli and Actinobacillus pleuropneumoniae. Each alignment showed at least 42% identity. As in the other organisms, the lktA gene of A. actinomycetemcomitans was linked to another gene, lktC, which is thought to be involved in the activation of the leukotoxin. The predicted LktC protein was related to the leukotoxin/hemolysin C proteins from the other bacteria, since they shared a minimum of 49% amino acid identity. Surprisingly, although actinobacillus species are more closely related to pasteurellae than to members of the family Enterobacteriaciae, LktA and LktC from A. actinomycetemcomitans shared significantly greater sequence identity with the E. coli alpha-hemolysin proteins than with the P. haemolytica leukotoxin proteins. Despite the overall homology to the other leukotoxin/hemolysin proteins, the LktA protein from A. actinomycetemcomitans has several unique properties. Most strikingly, it is a very basic protein with a calculated pI of 9.7; the other toxins have estimated pIs around 6.2. The unusual features of the A. actinomycetemcomitans protein are discussed in light of the different species and target-cell specificities of the hemolysins and the leukotoxins. Images PMID:2318535

  6. Transcriptome sequencing of neurologic diseases associated genes in HHV-6A infected human astrocyte

    PubMed Central

    Tang, Junwei; Lu, Shuai; Feng, Dongju; Cheng, Ci; Qing, Lanqun; Yao, Kun; Chen, Yun

    2016-01-01

    Human Herpesvirus 6 (HHV-6) has been involved in the development of several central nervous system (CNS) diseases, such as Alzheimer's disease, multiple sclerosis and glioma. In order to identify the pathogenic mechanism of HHV-6A infection, we carried out mRNA-seq study of human astrocyte HA1800 cell with HHV-6A GS infection. Using mRNA-seq analysis of HA1800-control cells with HA1800-HHV-6A GS cells, we identified 249 differentially expressed genes. After investigating these candidate genes, we found seven genes associated with two or more CNS diseases: CTSS, PTX3, CHI3L1, Mx1, CXCL16, BIRC3, and BST2. This is the first transcriptome sequencing study which showed the significant association of these genes between HHV-6A infection and neurologic diseases. We believe that our findings can provide a new perspective to understand the pathogenic mechanism of HHV-6A infection and neurologic diseases. PMID:27344170

  7. Sequence analysis of tyrosinase gene in ocular and oculocutaneous albinism patients: introducing three novel mutations

    PubMed Central

    Khordadpoor-Deilamani, Faravareh; Karimipoor, Morteza; Javadi, Gholamreza

    2015-01-01

    Purpose Albinism is a heterogeneous genetic disorder of melanin synthesis that results in hypopigmented eyes (in patients with ocular albinism) or hair, skin, and eyes (in individuals with oculocutaneous albinism). It is associated with decreased visual acuity, nystagmus, strabismus, and photophobia. The tyrosinase gene is known to be involved in both oculocutaneous albinism and autosomal recessive ocular albinism. In this study, we aimed to screen the mutations in the TYR gene in the nonsyndromic OCA and autosomal recessive ocular albinism patients from Iran. Methods The tyrosinase gene was examined in 23 unrelated patients with autosomal recessive ocular albinism or nonsyndromic OCA using DNA sequencing and bioinformatics analysis. Results TYR gene mutations were identified in 14 (app. 60%) albinism patients. Conclusions We found 10 mutations, 3 of which were novel. No mutation was found in our ocular albinism patients, but one of them was heterozygous for the p.R402Q polymorphism. PMID:26167114

  8. Detection of DNA sequence polymorphisms in carcinogen metabolism genes by polymerase chain reaction

    SciTech Connect

    Bell, D.A. )

    1991-01-01

    The glutathione transferase mu gene (GST1) and the debrisoquine hydroxylase gene (CYP2D6) are known to be polymorphic in the human population and have been associated with increased susceptibility to cancer. Smokers with low lymphocyte GST mu activity are at higher risk for lung cancer, while low debrisoquine hydroxylase activity has been correlated with lower risk for lung and bladder cancer. Phenotypic characterization of these polymorphisms by lymphocyte enzyme activity (GST) and urine metabolite ratios (debrisoquine) is cumbersome for population studies. Recent cloning and sequencing of the mutant alleles of these genes has allowed genotyping via the polymerase chain reaction (PCR). Advantages of PCR approaches are speed, technical simplicity, and minimal sample requirements. This article reviews the PCR-based methods for detection of genetic polymorphisms in human cancer susceptibility genes.

  9. Molecular cloning and sequencing of pheU, a gene for Escherichia coli tRNAPhe.

    PubMed Central

    Schwartz, I; Klotsky, R A; Elseviers, D; Gallagher, P J; Krauskopf, M; Siddiqui, M A; Wong, J F; Roe, B A

    1983-01-01

    A recombinant plasmid (designated pID2) carrying the E. coli gene for tRNAPhe has been isolated from a plasmid bank constructed by the ligation of a total EcoRI digest of E. coli K12 DNA into the EcoRI site of pACYC184 DNA. The plasmid was selected by virtue of its ability to complement a temperature-sensitive lesion in the gene (PheS) for the alpha-subunit of phenylalanyl-tRNA synthetase. Crude tRNA isolated from such transformants exhibited elevated levels of phenylalanine acceptor activity. The tRNAPhe gene has been localized within the first 300 base pairs of a 3.6 kb SalI fragment of pID2. The sequence of the gene and its flanking regions is presented. Images PMID:6306588

  10. Mitogenomic sequences and evidence from unique gene rearrangements corroborate evolutionary relationships of myctophiformes (Neoteleostei)

    PubMed Central

    2013-01-01

    Background A skewed assemblage of two epi-, meso- and bathypelagic fish families makes up the order Myctophiformes – the blackchins Neoscopelidae and the lanternfishes Myctophidae. The six rare neoscopelids show few morphological specializations whereas the divergent myctophids have evolved into about 250 species, of which many show massive abundances and wide distributions. In fact, Myctophidae is by far the most abundant fish family in the world, with plausible estimates of more than half of the oceans combined fish biomass. Myctophids possess a unique communication system of species-specific photophore patterns and traditional intrafamilial classification has been established to reflect arrangements of photophores. Myctophids present the most diverse array of larval body forms found in fishes although this attribute has both corroborated and confounded phylogenetic hypotheses based on adult morphology. No molecular phylogeny is available for Myctophiformes, despite their importance within all ocean trophic cycles, open-ocean speciation and as an important part of neoteleost divergence. This study attempts to resolve major myctophiform phylogenies from both mitogenomic sequences and corroborating evidence in the form of unique mitochondrial gene order rearrangements. Results Mitogenomic evidence from DNA sequences and unique gene orders are highly congruent concerning phylogenetic resolution on several myctophiform classification levels, corroborating evidence from osteology, larval ontogeny and photophore patterns, although the lack of larval morphological characters within the subfamily Lampanyctinae stands out. Neoscopelidae is resolved as the sister family to myctophids with Solivomer arenidens positioned as a sister taxon to the remaining neoscopelids. The enigmatic Notolychnus valdiviae is placed as a sister taxon to all other myctophids and exhibits an unusual second copy of the tRNA-Met gene – a gene order rearrangement reminiscent of that found in

  11. Global assessment of imprinted gene expression in the bovine conceptus by next generation sequencing.

    PubMed

    Chen, Zhiyuan; Hagen, Darren E; Wang, Juanbin; Elsik, Christine G; Ji, Tieming; Siqueira, Luiz G; Hansen, Peter J; Rivera, Rocío M

    2016-07-02

    Genomic imprinting is an epigenetic mechanism that leads to parental-allele-specific gene expression. Approximately 150 imprinted genes have been identified in humans and mice but less than 30 have been described as imprinted in cattle. For the purpose of de novo identification of imprinted genes in bovine, we determined global monoallelic gene expression in brain, skeletal muscle, liver, kidney and placenta of day ∼105 Bos taurus indicus × Bos taurus taurus F1 conceptuses using RNA sequencing. To accomplish this, we developed a bioinformatics pipeline to identify parent-specific single nucleotide polymorphism alleles after filtering adenosine to inosine (A-to-I) RNA editing sites. We identified 53 genes subject to monoallelic expression. Twenty three are genes known to be imprinted in the cow and an additional 7 have previously been characterized as imprinted in human and/or mouse that have not been reported as imprinted in cattle. Of the remaining 23 genes, we found that 10 are uncharacterized or unannotated transcripts located in known imprinted clusters, whereas the other 13 genes are distributed throughout the bovine genome and are not close to any known imprinted clusters. To exclude potential cis-eQTL effects on allele expression, we corroborated the parental specificity of monoallelic expression in day 86 Bos taurus taurus × Bos taurus taurus conceptuses and identified 8 novel bovine imprinted genes. Further, we identified 671 candidate A-to-I RNA editing sites and describe random X-inactivation in day 15 bovine extraembryonic membranes. Our results expand the imprinted gene list in bovine and demonstrate that monoallelic gene expression can be the result of cis-eQTL effects.

  12. Identification, sequence and expression of a crustacean cardioactive peptide (CCAP) gene in the moth Manduca sexta.

    PubMed

    Loi, P K; Emmal, S A; Park, Y; Tublitz, N J

    2001-08-01

    The crustacean cardioactive peptide (CCAP) gene was isolated from the tobacco hawkmoth Manduca sexta. The gene has an open reading frame of 125 amino acid residues containing a single, complete copy of CCAP. Analysis of the gene structure revealed three introns interrupting the coding region. A comparison of the M. sexta CCAP gene with the Drosophila melanogaster genome database reveals significant similarities in sequence and gene structure. The spatial and temporal expression patterns of the CCAP gene in the M. sexta central nervous system were determined in all major post-embryonic stages using in situ hybridization techniques. The CCAP gene is expressed in a total of 116 neurons in the post-embryonic M. sexta central nervous system. Nine pairs of cells are observed in the brain, 4.5 pairs in the subesophageal ganglion, three pairs in each thoracic ganglion (T1-T3), three pairs in the first abdominal ganglion (A1), five pairs each in the second to sixth abdominal ganglia (A2-A6) and 7.5 pairs in the terminal ganglion. The CCAP gene is expressed in every ganglion in each post-embryonic stage, except in the thoracic ganglia of first- and second-instar larvae. The number of cells expressing the CCAP gene varies during post-embryonic life, starting at 52 cells in the first instar and reaching a maximum of 116 shortly after pupation. One set of thoracic neurons expressing CCAP mRNA shows unusual variability in expression levels immediately prior to larval ecdysis. Using previously published CCAP immunocytochemical data, it was determined that 91 of 95 CCAP-immunopositive neurons in the M. sexta central nervous system also express the M. sexta CCAP gene, indicating that there is likely to be only a single CCAP gene in M. sexta.

  13. Global assessment of imprinted gene expression in the bovine conceptus by next generation sequencing

    PubMed Central

    Chen, Zhiyuan; Hagen, Darren E.; Wang, Juanbin; Elsik, Christine G.; Ji, Tieming; Siqueira, Luiz G.; Hansen, Peter J.; Rivera, Rocío M.

    2016-01-01

    ABSTRACT Genomic imprinting is an epigenetic mechanism that leads to parental-allele-specific gene expression. Approximately 150 imprinted genes have been identified in humans and mice but less than 30 have been described as imprinted in cattle. For the purpose of de novo identification of imprinted genes in bovine, we determined global monoallelic gene expression in brain, skeletal muscle, liver, kidney and placenta of day ∼105 Bos taurus indicus × Bos taurus taurus F1 conceptuses using RNA sequencing. To accomplish this, we developed a bioinformatics pipeline to identify parent-specific single nucleotide polymorphism alleles after filtering adenosine to inosine (A-to-I) RNA editing sites. We identified 53 genes subject to monoallelic expression. Twenty three are genes known to be imprinted in the cow and an additional 7 have previously been characterized as imprinted in human and/or mouse that have not been reported as imprinted in cattle. Of the remaining 23 genes, we found that 10 are uncharacterized or unannotated transcripts located in known imprinted clusters, whereas the other 13 genes are distributed throughout the bovine genome and are not close to any known imprinted clusters. To exclude potential cis-eQTL effects on allele expression, we corroborated the parental specificity of monoallelic expression in day 86 Bos taurus taurus × Bos taurus taurus conceptuses and identified 8 novel bovine imprinted genes. Further, we identified 671 candidate A-to-I RNA editing sites and describe random X-inactivation in day 15 bovine extraembryonic membranes. Our results expand the imprinted gene list in bovine and demonstrate that monoallelic gene expression can be the result of cis-eQTL effects. PMID:27245094

  14. Nucleotide sequence of the Klebsiella pneumoniae nifD gene and predicted amino acid sequence of the alpha-subunit of nitrogenase MoFe protein.

    PubMed Central

    Ioannidis, I; Buck, M

    1987-01-01

    The nucleotide sequence of the Klebsiella pneumoniae nifD gene is presented and together with the accompanying paper [Holland, Zilberstein, Zamir & Sussman (1987) Biochem. J. 247, 277-285] completes the sequence of the nifHDK genes encoding the nitrogenase polypeptides. The K. pneumoniae nifD gene encodes the 483-amino acid-residue nitrogenase alpha-subunit polypeptide of Mr 54156. The alpha-subunit has five strongly conserved cysteine residues at positions 63, 89, 155, 184 and 275, some occurring in a region showing both primary sequence and potential structural homology to the K. pneumoniae nitrogenase beta-subunit. A comparison with six other alpha-subunit amino acid sequences has been made, which indicates a number of potentially important domains within alpha-subunits. PMID:3322262

  15. Phylogeny of flowering plants by the chloroplast genome sequences: in search of a "lucky gene".

    PubMed

    Logacheva, M D; Penin, A A; Samigullin, T H; Vallejo-Roman, C M; Antonov, A S

    2007-12-01

    One of the most complicated remaining problems of molecular-phylogenetic analysis is choosing an appropriate genome region. In an ideal case, such a region should have two specific properties: (i) results of analysis using this region should be similar to the results of multigene analysis using the maximal number of regions; (ii) this region should be arranged compactly and be significantly shorter than the multigene set. The second condition is necessary to facilitate sequencing and extension of taxons under analysis, the number of which is also crucial for molecular phylogenetic analysis. Such regions have been revealed for some groups of animals and have been designated as "lucky genes". We have carried out a computational experiment on analysis of 41 complete chloroplast genomes of flowering plants aimed at searching for a "lucky gene" for reconstruction of their phylogeny. It is shown that the phylogenetic tree inferred from a combination of translated nucleotide sequences of genes encoding subunits of plastid RNA polymerase is closest to the tree constructed using all protein coding sites of the chloroplast genome. The only node for which a contradiction is observed is unstable according to the different type analyses. For all the other genes or their combinations, the coincidence is significantly worse. The RNA polymerase genes are compactly arranged in the genome and are fourfold shorter than the total length of protein coding genes used for phylogenetic analysis. The combination of all necessary features makes this group of genes main candidates for the role of "lucky gene" in studying phylogeny of flowering plants.

  16. Transcriptome Sequencing of Codonopsis pilosula and Identification of Candidate Genes Involved in Polysaccharide Biosynthesis

    PubMed Central

    Gao, Jian Ping; Wang, Dong; Cao, Ling Ya; Sun, Hai Feng

    2015-01-01

    Background Codonopsis pilosula (Franch.) Nannf. is one of the most widely used medicinal plants. Although chemical and pharmacological studies have shown that codonopsis polysaccharides (CPPs) are bioactive compounds and that their composition is variable, their biosynthetic pathways remain largely unknown. Next-generation sequencing is an efficient and high-throughput technique that allows the identification of candidate genes involved in secondary metabolism. Principal Findings To identify the components involved in CPP biosynthesis, a transcriptome library, prepared using root and other tissues, was assembled with the help of Illumina sequencing. A total of 9.2 Gb of clean nucleotides was obtained comprising 91,175,044 clean reads, 102,125 contigs, and 45,511 unigenes. After aligning the sequences to the public protein databases, 76.1% of the unigenes were annotated. Among these annotated unigenes, 26,189 were assigned to Gene Ontology categories, 11,415 to Clusters of Orthologous Groups, and 18,848 to Kyoto Encyclopedia of Genes and Genomes pathways. Analysis of abundance of transcripts in the library showed that genes, including those encoding metallothionein, aquaporin, and cysteine protease that are related to stress responses, were in the top list. Among genes involved in the biosynthesis of CPP, those responsible for the synthesis of UDP-L-arabinose and UDP-xylose were highly expressed. Significance To our knowledge, this is the first study to provide a public transcriptome dataset prepared from C. pilosula and an outline of the biosynthetic pathway of polysaccharides in a medicinal plant. Identified candidate genes involved in CPP biosynthesis provide understanding of the biosynthesis and regulation of CPP at the molecular level. PMID:25719364

  17. Target genes of microsatellite sequences in head and neck squamous cell carcinoma: mononucleotide repeats are not detected.

    PubMed

    Wang, Yimin; Liu, Xuejuan; Li, Yulin

    2012-09-10

    Microsatellite instability (MSI) is detected in a wide variety of tumors. It is thought that mismatch repair gene mutation or inactivation is the major cause of MSI. Microsatellite sequences are predominantly distributed in intergenic or intronic DNA. However, MSI is found in the exonic sequences of some genes, causing their inactivation. In this report, we searched GenBank for candidate genes containing potential MSI sequences in exonic regions. Twenty seven target genes were selected for MSI analysis. Instability was found in 70% of these genes (14/20) with head and neck squamous cell carcinoma (HNSCC). Interestingly, no instability was detected in mononucleotide repeats in genes or in intergenic sequences. We conclude that instability of mononucleotide repeats is a rare event in HNSCC. High MSI phenotype in young HNSCC patients is limited to noncoding regions only. MSI percentage in HNSCC tumor is closely related to the repeat type, repeat location and patient's age.

  18. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples

    PubMed Central

    2011-01-01

    Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs), have been estimated using expressed sequence tag (EST) libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq) now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal samples and 9 lung cancer

  19. A novel homozygous mutation in SUCLA2 gene identified by exome sequencing

    PubMed Central

    Lamperti, Costanza; Fang, Mingyan; Invernizzi, Federica; Liu, Xuanzhu; Wang, Hairong; Zhang, Qing; Carrara, Franco; Moroni, Isabella; Zeviani, Massimo; Zhang, Jianguo; Ghezzi, Daniele

    2012-01-01

    Mitochondrial disorders with multiple mitochondrial respiratory chain (MRC) enzyme deficiency and depletion of mitochondrial DNA (mtDNA) are autosomal recessive conditions due to mutations in several nuclear genes necessary for proper mtDNA maintenance. In this report, we describe two Italian siblings presenting with encephalomyopathy and mtDNA depletion in muscle. By whole exome-sequencing and prioritization of candidate genes, we identified a novel homozygous missense mutation in the SUCLA2 gene in a highly conserved aminoacid residue. Although a recurrent mutation in the SUCLA2 gene is relatively frequent in the Faroe Islands, mutations in other populations are extremely rare. In contrast with what has been reported in other patients, methyl-malonic aciduria, a biomarker for this genetic defect, was absent in our proband and very mildly elevated in her affected sister. This report demonstrates that next-generation technologies, particularly exome-sequencing, are user friendly, powerful means for the identification of disease genes in genetically and clinically heterogeneous inherited conditions, such as mitochondrial disorders. PMID:23010432

  20. Using machine learning to predict gene expression and discover sequence motifs

    NASA Astrophysics Data System (ADS)

    Li, Xuejing

    Recently, large amounts of experimental data for complex biological systems have become available. We use tools and algorithms from machine learning to build data-driven predictive models. We first present a novel algorithm to discover gene sequence motifs associated with temporal expression patterns of genes. Our algorithm, which is based on partial least squares (PLS) regression, is able to directly model the flow of information, from gene sequence to gene expression, to learn cis regulatory motifs and characterize associated gene expression patterns. Our algorithm outperforms traditional computational methods e.g. clustering in motif discovery. We then present a study of extending a machine learning model for transcriptional regulation predictive of genetic regulatory response to Caenorhabditis elegans. We show meaningful results both in terms of prediction accuracy on the test experiments and biological information extracted from the regulatory program. The model discovers DNA binding sites ab initio. We also present a case study where we detect a signal of lineage-specific regulation. Finally we present a comparative study on learning predictive models for motif discovery, based on different boosting algorithms: Adaptive Boosting (AdaBoost), Linear Programming Boosting (LPBoost) and Totally Corrective Boosting (TotalBoost). We evaluate and compare the performance of the three boosting algorithms via both statistical and biological validation, for hypoxia response in Saccharomyces cerevisiae.

  1. Identification of regulatory sequences in the gene for 5-aminolevulinate synthase from rat.

    PubMed

    Braidotti, G; Borthwick, I A; May, B K

    1993-01-15

    The housekeeping enzyme 5-aminolevulinate synthase (ALAS) regulates the supply of heme for respiratory cytochromes. Here we report on the isolation of a genomic clone for the rat ALAS gene. The 5'-flanking region was fused to the chloramphenicol acetyltransferase gene and transient expression analysis revealed the presence of both positive and negative cis-acting sequences. Expression was substantially increased by the inclusion of the first intron located in the 5'-untranslated region. Sequence analysis of the promoter identified two elements at positions -59 and -88 bp with strong similarity to the binding site for nuclear respiratory factor 1 (NRF-1). Gel shift analysis revealed that both NRF-1 elements formed nucleoprotein complexes which could be abolished by an authentic NRF-1 oligomer. Mutagenesis of each NRF-1 motif in the ALAS promoter gave substantially lowered levels of chloramphenicol acetyltransferase expression, whereas mutagenesis of both NRF-1 motifs resulted in the almost complete loss of expression. These results establish that the NRF-1 motifs in the ALAS promoter are critical for promoter activity. NRF-1 binding sites have been identified in the promoters of several nuclear genes encoding mitochondrial proteins concerned with oxidative phosphorylation. The present studies suggest that NRF-1 may co-ordinate the supply of mitochondrial heme with the synthesis of respiratory cytochromes by regulating expression of ALAS. In erythroid cells, NRF-1 may be less important for controlling heme levels since an erythroid ALAS gene is strongly expressed and the promoter for this gene apparently lacks NRF-1 binding sites.

  2. Isolation of nine gene sequences induced by silica in murine macrophages

    SciTech Connect

    Segade, F.; Claudio, E.; Wrobel, K.; Ramos, S.; Lazo, P.S.

    1995-03-01

    Macrophage activation by silica is the initial step in the development of silicosis. To identify genes that might be involved in silica-mediated activation, RAW 264.7 mouse macrophages were treated with silica for 48 h, and a subtracted cDNA library enriched for silica-induced genes (SIG) was constructed and differently screened. Nine cDNA clones (designated SIG-12, -14, -20, -41, -61, -81, -91, and -111) were partially sequenced and compared with sequences in GenBank/EMBL databases. SIG-12, -14, and -20 corresponded to the genes for ribosomal proteins L13A, L32, and L26, respectively. SIG-61 is the mouse homologue of p21 RhoC. SIG-91 is identical to the 67-kDa high-affinity laminin receptor. Four genes were not identified and are novel. All of the mRNAs corresponding to the nine cloned cDNAs were inducible by silica. Steady-state levels of mRNAs in RAW 264.7 cells treated with various macrophage activators and inducers of signal transduction pathways were determined. A complex pattern of induction and repression was found, indicating that upon phagocytosis of silica particles, many regulatory mechanisms of genes expression are simultaneously triggered. 55 refs., 4 figs., 1 tab.

  3. Complete genome sequence of Bacillus oceanisediminis 2691, a reservoir of heavy-metal resistance genes.

    PubMed

    Jung, Jaejoon; Jeong, Haeyoung; Kim, Hyun Ju; Lee, Dong-Woo; Lee, Sang Jun

    2016-12-01

    Ocean sediments are commonly subject to the pollution of various heavy metals. Intracellular heavy metal concentrations in marine microorganisms should be kept within allowable concentrations. Here, we report redundant heavy metal resistance related genes encoding heavy metal-sensing transcriptional regulators (i.e. cadC), heavy metal efflux pumps, and detoxifying enzymes in the complete genome sequence of Bacillus oceanisediminis 2691. By comparing CadC sequences of strain 2691 with those from other bacterial genomes, we demonstrated that each cadC gene located in the chromosome or plasmid of 2691 cells are similar to those of various near or distant microbes, which might shed light on evolutionary trajectories of redundant heavy metal resistance genes. In application aspects, these diverse heavy metal sensing genes can be harnessed as synthetic biological parts, modules, and devices for the development of heavy metal-specific biosensors. Heavy metal bioremediation technologies or platform cells can be also developed based on the marine genomic information of heavy metal resistance and/or detoxification genes in a bacterial isolate from ocean sediments.

  4. Chromosomal Organization and Sequence Diversity of Genes Encoding Lachrymatory Factor Synthase in Allium cepa L.

    PubMed

    Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi, Naoki; Shigyo, Masayoshi

    2012-06-01

    Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum-shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F(2) mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5.

  5. Human case of bacteremia caused by Streptococcus canis sequence type 9 harboring the scm gene.

    PubMed

    Taniyama, Daisuke; Abe, Yoshihiko; Sakai, Tetsuya; Kikuchi, Takahide; Takahashi, Takashi

    2017-01-01

    Streptococcus canis (Sc) is a zoonotic pathogen that is transferred mainly from companion animals to humans. One of the major virulence factors in Sc is the M-like protein encoded by the scm gene, which is involved in anti-phagocytic activities, as well as the recruitment of plasminogen to the bacterial surface in cooperation with enolase, and the consequent enhancement of bacterial transmigration and survival. This is the first reported human case of uncomplicated bacteremia following a dog bite, caused by Streptococcus canis harboring the scm gene. The similarity of the 16S rRNA from the infecting species to that of the Sc type strain, as well as the amplification of the species-specific cfg gene, encoding a co-hemolysin, was used to confirm the species identity. Furthermore, the isolate was confirmed as sequence type 9. The partial scm gene sequence harbored by the isolate was closely related to those of other two Sc strains. While this isolate did not possess the erm(A), erm(B), or mef(A), macrolide/lincosamide resistance genes, it was not susceptible to azithromycin: its susceptibility was intermediate. Even though human Sc bacteremia is rare, clinicians should be aware of this microorganism, as well as Pasteurella sp., Prevotella sp., and Capnocytophaga sp., when examining and treating patients with fever who maintain close contact with companion animals.

  6. Phylogenetic analysis of oryx species using partial sequences of mitochondrial rRNA genes.

    PubMed

    Khan, H A; Arif, I A; Al Farhan, A H; Al Homaidan, A A

    2008-10-28

    We conducted a comparative evaluation of 12S rRNA and 16S rRNA genes of the mitochondrial genome for molecular differentiation among three oryx species (Oryx leucoryx, Oryx dammah and Oryx gazella) with respect to two closely related outgroups, addax and roan. Our findings showed the failure of 12S rRNA gene to differentiate between the genus Oryx and addax, whereas a 342-bp partial sequence of 16S rRNA accurately grouped all five taxa studied, suggesting the utility of 16S rRNA segment for molecular phylogeny of oryx at the genus and possibly species levels.

  7. Molecular cloning, nucleotide sequence and expression of a Sulfolobus solfataricus gene encoding a class II fumarase.

    PubMed

    Colombo, S; Grisa, M; Tortora, P; Vanoni, M

    1994-01-03

    Fumarase catalyzes the interconversion of L-malate and fumarate. A Sulfolobus solfataricus fumarase gene (fumC) was cloned and sequenced. Typical archaebacterial regulatory sites were identified in the region flanking the fumC open reading frame. The fumC gene encodes a protein of 438 amino acids (47,899 Da) which shows several significant similarities with class II fumarases from both eubacterial and eukariotic sources as well as with aspartases. S. solfataricus fumarase expressed in Escherichia coli retains enzymatic activity and its thermostability is comparable to that of S. solfataricus purified enzyme despite a 11 amino acid C-terminal deletion.

  8. Transcriptome Sequencing Identified Genes and Gene Ontologies Associated with Early Freezing Tolerance in Maize

    PubMed Central

    Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu

    2016-01-01

    Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095

  9. Molecular cloning and nucleotide sequence of a transforming gene detected by transfection of chicken B-cell lymphoma DNA

    NASA Astrophysics Data System (ADS)

    Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.

    1983-03-01

    A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.

  10. Identification of genes in anonymous DNA sequences. Final report: Report period, 15 April 1993--15 April 1994

    SciTech Connect

    Fields, C.A.

    1994-09-01

    This Report concludes the DOE Human Genome Program project, ``Identification of Genes in Anonymous DNA Sequence.`` The central goals of this project have been (1) understanding the problem of identifying genes in anonymous sequences, and (2) development of tools, primarily the automated identification system gm, for identifying genes. The activities supported under the previous award are summarized here to provide a single complete report on the activities supported as part of the project from its inception to its completion.

  11. Immune gene discovery by expressed sequence tag (EST) analysis of hemocytes in the ridgetail white prawn Exopalaemon carinicauda

    PubMed Central

    Duan, Yafei; Liu, Ping; Li, Jitao; Li, Jian; Chen, Ping

    2013-01-01

    The ridgetail white prawn Exopalaemon carinicauda is one of the most important commercial species in eastern China. However, little information of immune genes in E. carinicauda has been reported. To identify distinctive genes associated with immunity, an expressed sequence tag (EST) library was constructed from hemocytes of E. carinicauda. A total of 3411 clones were sequenced, yielding 2853 ESTs and the average sequence length is 436 bp. The cluster and assembly analysis yielded 1053 unique sequences including 329 contigs and 724 singletons. Blast analysis identified 593 (56.3%) of the unique sequences as orthologs of genes from other organisms (E-value < 1e-5). Based on the COG and Gene Ontology (GO), 593 unique sequences were classified. Through comparison with previous studies, 153 genes assembled from 367 ESTs have been identified as possibly involved in defense or immune functions. These genes are categorized into seven categories according to their putative functions in shrimp immune system: antimicrobial peptides, prophenoloxidase activating system, antioxidant defense systems, chaperone proteins, clottable proteins, pattern recognition receptors and other immune-related genes. According to EST abundance, the major immune-related genes were thioredoxin (141, 4.94% of all ESTs) and calmodulin (14, 0.49% of all ESTs). The EST sequences of E. carinicauda hemocytes provide important information of the immune system and lay the groundwork for development of molecular markers related to disease resistance in prawn species. PMID:23092732

  12. Characterization and phylogenetic analysis of α-gliadin gene sequences reveals significant genomic divergence in Triticeae species.

    PubMed

    Li, Guang-Rong; Lang, Tao; Yang, En-Nian; Liu, Cheng; Yang, Zu-Jun

    2014-12-01

    Although the unique properties of wheat α-gliadin gene family are well characterized, little is known about the evolution and genomic divergence of α-gliadin gene family within the Triticeae. We isolated a total of 203 α-gliadin gene sequences from 11 representative diploid and polyploid Triticeae species, and found 108 sequences putatively functional. Our results indicate that α-gliadin genes may have possibly originated from wild Secale species, where the sequences contain the shortest repetitive domains and display minimum variation. A miniature inverted-repeat transposable element insertion is reported for the first time in α-gliadin gene sequence of Thinopyrum intermedium in this study, indicating that the transposable element might have contributed to the diversification of α-gliadin genes family among Triticeae genomes. The phylogenetic analyses revealed that the α-gliadin gene sequences of Dasypyrum, Australopyrum, Lophopyrum, Eremopyrum and Pseudoroengeria species have amplified several times. A search for four typical toxic epitopes for celiac disease within the Triticeae α-gliadin gene sequences showed that the α-gliadins of wild Secale, Australopyrum and Agropyron genomes lack all four epitopes, while other Triticeae species have accumulated these epitopes, suggesting that the evolution of these toxic epitopes sequences occurred during the course of speciation, domestication or polyploidization of Triticeae.

  13. Impacts of Neanderthal-Introgressed Sequences on the Landscape of Human Gene Expression.

    PubMed

    McCoy, Rajiv C; Wakefield, Jon; Akey, Joshua M

    2017-02-23

    Regulatory variation influencing gene expression is a key contributor to phenotypic diversity, both within and between species. Unfortunately, RNA degrades too rapidly to be recovered from fossil remains, limiting functional genomic insights about our extinct hominin relatives. Many Neanderthal sequences surviv