Science.gov

Sample records for agglutinin-like sequence gene

  1. Candida albicans Agglutinin-Like Sequence (Als) Family Vignettes: A Review of Als Protein Structure and Function

    PubMed Central

    Hoyer, Lois L.; Cota, Ernesto

    2016-01-01

    Approximately two decades have passed since the description of the first gene in the Candida albicans ALS (agglutinin-like sequence) family. Since that time, much has been learned about the composition of the family and the function of its encoded cell-surface glycoproteins. Solution of the structure of the Als adhesive domain provides the opportunity to evaluate the molecular basis for protein function. This review article is formatted as a series of fundamental questions and explores the diversity of the Als proteins, as well as their role in ligand binding, aggregative effects, and attachment to abiotic surfaces. Interaction of Als proteins with each other, their functional equivalence, and the effects of protein abundance on phenotypic conclusions are also examined. Structural features of Als proteins that may facilitate invasive function are considered. Conclusions that are firmly supported by the literature are presented while highlighting areas that require additional investigation to reveal basic features of the Als proteins, their relatedness to each other, and their roles in C. albicans biology. PMID:27014205

  2. Dynamics of Agglutinin-Like Sequence (ALS) Protein Localization on the Surface of Candida Albicans

    ERIC Educational Resources Information Center

    Coleman, David Andrew

    2009-01-01

    The ALS gene family encodes large cell-surface glycoproteins associated with "C. albicans" pathogenesis. Als proteins are thought to act as adhesin molecules binding to host tissues. Wide variation in expression levels among the ALS genes exists and is related to cell morphology and environmental conditions. "ALS1," "ALS3," and "ALS4" are three of…

  3. Repetitive sequence environment distinguishes housekeeping genes

    PubMed Central

    Eller, C. Daniel; Regelson, Moira; Merriman, Barry; Nelson, Stan; Horvath, Steve; Marahrens, York

    2007-01-01

    Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (>400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element 1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, were used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes. PMID:17141428

  4. Gene Sequence Homology of Chemokines Across Species

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-reactivities...

  5. GENE SEQUENCE HOMOLOGY OF CHEMOKINES ACROSS SPECIES

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...

  6. Gene Discovery through Expressed Sequence Tag Sequencing in Trypanosoma cruzi

    PubMed Central

    Verdun, Ramiro E.; Di Paolo, Nelson; Urmenyi, Turan P.; Rondinelli, Edson; Frasch, Alberto C. C.; Sanchez, Daniel O.

    1998-01-01

    Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5′ ends of 1,949 clones to generate ESTs. The clones were randomly selected from a normalized CL Brener epimastigote cDNA library. A total of 14.6% of the clones were homologous to previously identified T. cruzi genes, while 18.4% had significant matches to genes from other organisms in the database. A total of 67% of the ESTs had no matches in the database, and thus, some of them might be T. cruzi-specific genes. Functional groups of those sequences with matches in the database were constructed according to their putative biological functions. The two largest categories were protein synthesis (23.3%) and cell surface molecules (10.8%). The information reported in this paper should be useful for researchers in the field to analyze genes and proteins of their own interest. PMID:9784549

  7. DNA sequence of the yeast transketolase gene.

    PubMed

    Fletcher, T S; Kwee, I L; Nakada, T; Largman, C; Martin, B M

    1992-02-18

    Transketolase (EC 2.2.1.1) is the enzyme that, together with aldolase, forms a reversible link between the glycolytic and pentose phosphate pathways. We have cloned and sequenced the transketolase gene from yeast (Saccharomyces cerevisiae). This is the first transketolase gene of the pentose phosphate shunt to be sequenced from any source. The molecular mass of the proposed translated protein is 73,976 daltons, in good agreement with the observed molecular mass of about 75,000 daltons. The 5'-nontranslated region of the gene is similar to other yeast genes. There is no evidence of 5'-splice junctions or branch points in the sequence. The 3'-nontranslated region contains the polyadenylation signal (AATAAA), 80 base pairs downstream from the termination codon. A high degree of homology is found between yeast transketolase and dihydroxyacetone synthase (formaldehyde transketolase) from the yeast Hansenula polymorpha. The overall sequence identity between these two proteins is 37%, with four regions of much greater similarity. The regions from amino acid residues 98-131, 157-182, 410-433, and 474-489 have sequence identities of 74%, 66%, 83%, and 82%, respectively. One of these regions (157-182) includes a possible thiamin pyrophosphate (TPP) binding domain, and another (410-433) may contain the catalytic domain. PMID:1737042

  8. The nucleotide sequence of the mouse immunoglobulin epsilon gene: comparison with the human epsilon gene sequence.

    PubMed Central

    Ishida, N; Ueda, S; Hayashida, H; Miyata, T; Honjo, T

    1982-01-01

    We have determined the nucleotide sequence of the immunoglobulin epsilon gene cloned from newborn mouse DNA. The epsilon gene sequence allows prediction of the amino acid sequence of the constant region of the epsilon chain and comparison of it with sequences of the human epsilon and other mouse immunoglobulin genes. The epsilon gene was shown to be under the weakest selection pressure at the protein level among the immunoglobulin genes although the divergence at the synonymous position is similar. Our results suggest that the epsilon gene may be dispensable, which is in accord with the fact that IgE has only obscure roles in the immune defense system but has an undesirable role as a mediator of hypersensitivity. The sequence data suggest that the human and murine epsilon genes were derived from different ancestors duplicated a long time ago. The amino acid sequence of the epsilon chain is more homologous to those of the gamma chains than the other mouse heavy chains. Two membrane exons, separated by an 80-base intron, were identified 1.7 kb 3' to the CH4 domain of the epsilon gene and shown to conserve a hydrophobic portion similar to those of other heavy chain genes. RNA blot hybridization showed that the epsilon membrane exons are transcribed into two species of mRNA in an IgE hybridoma. Images Fig. 4. PMID:6329728

  9. Nemertean toxin genes revealed through transcriptome sequencing.

    PubMed

    Whelan, Nathan V; Kocot, Kevin M; Santos, Scott R; Halanych, Kenneth M

    2014-12-01

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63-74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. PMID:25432940

  10. Nemertean Toxin Genes Revealed through Transcriptome Sequencing

    PubMed Central

    Whelan, Nathan V.; Kocot, Kevin M.; Santos, Scott R.; Halanych, Kenneth M.

    2014-01-01

    Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63–74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. PMID:25432940

  11. C DNA SEQUENCE OF CHANNEL CATFISH PEROXIREDOXIN 6 GENE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Peroxiredoxin 6 gene (Prdx6) of channel catfish, Ictalurus punctatus, was cloned and sequenced. Total RNA from channel catfish tissues was isolated, reverse transcribed and amplified. The sequence of the channel catfish Prdx6 gene consists of 1003 nucleotides. Analysis of the nucleotide sequence ...

  12. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  13. Identification of genes in genomic and EST sequences

    SciTech Connect

    Fields, C.; Adams, M.D.; Kerlavage, A.R.; Dubnick, M.; McCombie, W.R.; Martin-Gallardo, A.; Venter, J.C.; White, O.

    1993-12-31

    Currently-available software tools are capable of predicting the locations of most protein-coding genes in anonymous genomic DNA sequences. The use of predicted exxon to select primers for PCR amplification from cDNA libraries allows the complete structures of novel genes to be determined efficiently. As the number of expressed sequence tag (EST) sequences increases, the fraction of genes that can be localized in genomic sequences by searching EST databases will rapidly approach unity. The challenge for automated DNA sequence analysis is now to develop methods for accurately predicting gene structure and alternative splicing patterns. Substantially improving current accuracies in gene structure prediction will require retrospective comparative analysis of sequences from different organisms and gene families.

  14. Brain-specific genes have identifier sequences in their introns.

    PubMed Central

    Milner, R J; Bloom, F E; Lai, C; Lerner, R A; Sutcliffe, J G

    1984-01-01

    The 82-nucleotide identifier (ID) sequence is present in the rat genome in 1-1.5 X 10(5) copies and in cDNA clones of precursors of brain-specific mRNAs. One brain-specific gene contains more than one ID sequence in its introns. There is an excess of ID sequences to brain genes, and some ID sequences appear to have been inserted as mobile elements into other genetic locations. Therefore, brain genes contain ID sequences in their introns, but not all ID sequences are located in brain gene introns. A brain ID consensus sequence has been obtained by comparing 8 ID nucleotide sequences. Images PMID:6583673

  15. Bioinformatic Identification of Conserved Cis-Sequences in Coregulated Genes.

    PubMed

    Bülow, Lorenz; Hehl, Reinhard

    2016-01-01

    Bioinformatics tools can be employed to identify conserved cis-sequences in sets of coregulated plant genes because more and more gene expression and genomic sequence data become available. Knowledge on the specific cis-sequences, their enrichment and arrangement within promoters, facilitates the design of functional synthetic plant promoters that are responsive to specific stresses. The present chapter illustrates an example for the bioinformatic identification of conserved Arabidopsis thaliana cis-sequences enriched in drought stress-responsive genes. This workflow can be applied for the identification of cis-sequences in any sets of coregulated genes. The workflow includes detailed protocols to determine sets of coregulated genes, to extract the corresponding promoter sequences, and how to install and run a software package to identify overrepresented motifs. Further bioinformatic analyses that can be performed with the results are discussed. PMID:27557771

  16. Using shotgun sequence data to find active restriction enzyme genes.

    PubMed

    Zheng, Yu; Posfai, Janos; Morgan, Richard D; Vincze, Tamas; Roberts, Richard J

    2009-01-01

    Whole genome shotgun sequence analysis has become the standard method for beginning to determine a genome sequence. The preparation of the shotgun sequence clones is, in fact, a biological experiment. It determines which segments of the genome can be cloned into Escherichia coli and which cannot. By analyzing the complete set of sequences from such an experiment, it is possible to identify genes lethal to E. coli. Among this set are genes encoding restriction enzymes which, when active in E. coli, lead to cell death by cleaving the E. coli genome at the restriction enzyme recognition sites. By analyzing shotgun sequence data sets we show that this is a reliable method to detect active restriction enzyme genes in newly sequenced genomes, thereby facilitating functional annotation. Active restriction enzyme genes have been identified, and their activity demonstrated biochemically, in the sequenced genomes of Methanocaldococcus jannaschii, Bacillus cereus ATCC 10987 and Methylococcus capsulatus. PMID:18988632

  17. Nucleotide sequence of the gene for human prothrombin

    SciTech Connect

    Degen, S.J.F.; Davie, E.W.

    1987-09-22

    A human genomic DNA library was screened for the gene coding for human prothrombin with a cDNA coding for the human protein. Eighty-one positive lambda phage were identified, and three were chosen for further characterization. These three phage hybridized with 5' and/or 3' probes prepared from the prothrombin cDNA. The complete DNA sequence of 21 kilobases of the human prothrombin gene was determined and included a 4.9-kilobase region that was previously sequenced. The gene for human prothrombin contains 14 exons separated by 13 intervening sequences. The exons range in size from 25 to 315 base pairs, while the introns range from 84 to 9447 base pairs. Ninety percent of the gene is composed of intervening sequence. All the intron splice junctions are consistent with sequences found in other eukaryotic genes, except for the presence of GC rather than GT on the 5' end of intervening sequence L. Thirty copies of Alu repetitive DNA and two copies of partial KpnI repeats were identified in clusters within several of the intervening sequences, and these repeats represent 40% of the DNA sequence of the gene. The size, distribution, and sequence homology of the introns within the gene were the compared to those of the genes for the other vitamin K dependent proteins and several other serine proteases.

  18. Recognition of Yeast Species from Gene Sequence Comparisons

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This review discusses recognition of yeast species from gene sequence comparisons, which have been responsible for doubling the number of known yeasts over the past decade. The resolution provided by various single gene sequences is examined for both ascomycetous and basidiomycetous species, and th...

  19. Reclassification of ascomycetous yeasts from gene sequence analyses

    Technology Transfer Automated Retrieval System (TEKTRAN)

    During the past decade, identification of yeasts and their classification has been based almost exclusively on gene sequence analysis. Primarily as a result of using diagnostic gene sequences, such as D1/D2 LSU and ITS ribosomal RNAs, the number of known species has doubled. With the faster sequen...

  20. Organization and sequence of the human alpha-lactalbumin gene.

    PubMed Central

    Hall, L; Emery, D C; Davies, M S; Parker, D; Craig, R K

    1987-01-01

    A recombinant bacteriophage containing the entire alpha-lactalbumin gene was isolated from a human genomic library constructed in bacteriophage lambda L47. Within this recombinant the 2.5 kb alpha-lactalbumin gene is flanked by about 5 kb of sequence on either side. The complete nucleotide sequence of the gene and its immediate flanking sequences were determined and compared with those of the rat alpha-lactalbumin gene. These studies showed that the size, organization and sequence of the exons have been highly conserved, whereas the introns have diverged considerably. In particular, the first intron of the human gene was found to contain an Alu repetitive sequence not present in the rat. A high degree of homology (67%) was also observed in the 5' flanking regions, extending as far as 655 nucleotide residues upstream of the transcriptional initiation site. Comparison of the 5' flanking sequences of these two alpha-lactalbumin genes with those of five casein genes has revealed the presence of a highly conserved region [consensus sequence: RGAAGRAAA(N)TGGACAGAAATCAA(CG)TTTCTA], extending from position -140 to -110 in all seven sequences examined, suggesting a possible regulatory role in the hormonal control or tissue-specific expression of milk protein genes in the mammary gland. Images Fig. 1. PMID:2954544

  1. Fusion genes and their discovery using high throughput sequencing.

    PubMed

    Annala, M J; Parker, B C; Zhang, W; Nykter, M

    2013-11-01

    Fusion genes are hybrid genes that combine parts of two or more original genes. They can form as a result of chromosomal rearrangements or abnormal transcription, and have been shown to act as drivers of malignant transformation and progression in many human cancers. The biological significance of fusion genes together with their specificity to cancer cells has made them into excellent targets for molecular therapy. Fusion genes are also used as diagnostic and prognostic markers to confirm cancer diagnosis and monitor response to molecular therapies. High-throughput sequencing has enabled the systematic discovery of fusion genes in a wide variety of cancer types. In this review, we describe the history of fusion genes in cancer and the ways in which fusion genes form and affect cellular function. We also describe computational methodologies for detecting fusion genes from high-throughput sequencing experiments, and the most common sources of error that lead to false discovery of fusion genes. PMID:23376639

  2. Sequence homologies in the protamine gene family of rainbow trout.

    PubMed Central

    Aiken, J M; McKenzie, D; Zhao, H Z; States, J C; Dixon, G H

    1983-01-01

    We have sequenced five different rainbow trout protamine genes plus their flanking regions. The genes are not clustered and do not contain intervening sequences. There is an extremely high degree of sequence conservation in the coding and 3' untranslated regions of the gene. Downstream sequences exhibit little homology though conserved regions are found 250 base pairs 3' to the gene. There are four regions upstream of the gene that are highly conserved in the six clones, including the canonical Goldberg - Hogness box which is 45 base pairs 5' to the coding region. A second homologous region is found 90 bases upstream. Although in the same approximate location as the CAAT box found upstream of other genes, it does not contain the canonical CAAT sequence. Further upstream of the protamine genes at -115 there is an A-T rich sequence while a 25 base pair conserved sequence is located 150 bases upstream. In addition we report the presence of a potential Z-DNA region of predominantly A-C repeats approximately one kilobase downstream of one of the genes. Images PMID:6308564

  3. Optimization of gene sequences under constant mutational pressure and selection

    NASA Astrophysics Data System (ADS)

    Kowalczuk, M.; Gierlik, A.; Mackiewicz, P.; Cebrat, S.; Dudek, M. R.

    1999-12-01

    We have analyzed the influence of constant mutational pressure and selection on the nucleotide composition of DNA sequences of various size, which were represented by the genes of the Borrelia burgdorferi genome. With the help of MC simulations we have found that longer DNA sequences accumulate much less base substitutions per sequence length than short sequences. This leads us to the conclusion that the accuracy of replication may determine the size of genome.

  4. Nucleotide sequence of SHV-2 beta-lactamase gene

    SciTech Connect

    Garbarg-Chenon, A.; Godard, V.; Labia, R.; Nicolas, J.C. )

    1990-07-01

    The nucleotide sequence of plasmid-mediated beta-lactamase SHV-2 from Salmonella typhimurium (SHV-2pHT1) was determined. The gene was very similar to chromosomally encoded beta-lactamase LEN-1 of Klebsiella pneumoniae. Compared with the sequence of the Escherichia coli SHV-2 enzyme (SHV-2E.coli) obtained by protein sequencing, the deduced amino acid sequence of SHV-2pHT1 differed by three amino acid substitutions.

  5. Acinetobacter cyclohexanone monooxygenase: gene cloning and sequence determination.

    PubMed Central

    Chen, Y C; Peoples, O P; Walsh, C T

    1988-01-01

    The gene coding for cyclohexanone monooxygenase from Acinetobacter sp. strain NCIB 9871 was isolated by immunological screening methods. We located and determined the nucleotide sequence of the gene. The structural gene is 1,626 nucleotides long and codes for a polypeptide of 542 amino acids; 389 nucleotides 5' and 108 nucleotides 3' of the coding region are also reported. The complete amino acid sequence of the enzyme was derived by translation of the nucleotide sequence. From a comparison of the amino acid sequence with consensus sequences of nucleotide-binding folds, we identified a potential flavin-binding site at the NH2 terminus of the enzyme (residues 6 to 18) and a potential nicotinamide-binding site extending from residue 176 to residue 208 of the protein. An overproduction system for the gene to facilitate genetic manipulations was also constructed by using the tac promoter vector pKK223-3 in Escherichia coli. Images PMID:3338974

  6. Classification of Arabidopsis thaliana gene sequences: clustering of coding sequences into two groups according to codon usage improves gene prediction.

    PubMed

    Mathé, C; Peresetsky, A; Déhais, P; Van Montagu, M; Rouzé, P

    1999-02-01

    While genomic sequences are accumulating, finding the location of the genes remains a major issue that can be solved only for about a half of them by homology searches. Prediction methods are thus required, but unfortunately are not fully satisfying. Most prediction methods implicitly assume a unique model for genes. This is an oversimplification as demonstrated by the possibility to group coding sequences into several classes in Escherichia coli and other genomes. As no classification existed for Arabidopsis thaliana, we classified genes according to the statistical features of their coding sequences. A clustering algorithm using a codon usage model was developed and applied to coding sequences from A. thaliana, E. coli, and a mixture of both. By using it, Arabidopsis sequences were clustered into two classes. The CU1 and CU2 classes differed essentially by the choice of pyrimidine bases at the codon silent sites: CU2 genes often use C whereas CU1 genes prefer T. This classification discriminated the Arabidopsis genes according to their expressiveness, highly expressed genes being clustered in CU2 and genes expected to have a lower expression, such as the regulatory genes, in CU1. The algorithm separated the sequences of the Escherichia-Arabidopsis mixed data set into five classes according to the species, except for one class. This mixed class contained 89 % Arabidopsis genes from CU1 and 11 % E. coli genes, mostly horizontally transferred. Interestingly, most genes encoding organelle-targeted proteins, except the photosynthetic and photoassimilatory ones, were clustered in CU1. By tailoring the GeneMark CDS prediction algorithm to the observed coding sequence classes, its quality of prediction was greatly improved. Similar improvement can be expected with other prediction systems. PMID:9925779

  7. Single molecule targeted sequencing for cancer gene mutation detection.

    PubMed

    Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui

    2016-01-01

    With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis. PMID:27193446

  8. Single molecule targeted sequencing for cancer gene mutation detection

    PubMed Central

    Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W.; He, Jiankui

    2016-01-01

    With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis. PMID:27193446

  9. Degenerative primer design and gene sequencing validation for select turkey genes.

    PubMed

    Hutsko, Stephanie L; Lilburn, Michael S; Wick, Macdonald

    2016-06-01

    We successfully designed and validated degenerative primers for turkey genes MUC2, RPS13, TBP and TFF2 based on chicken sequences in order to use gene transcription analysis to evaluate (quantify) the mucin transcription to probiotic supplementation in turkeys. Primers were designed for the genes MUC2, TFF2, RPS13 and TBP using a degenerative primer design method based on the available Gallus gallus sequences. All primer sets, which produced a single PCR amplicon of the expected sizes, were cloned into the TOPO(®) vector and then transformed into TOP 10(®) competent cells. Plasmid DNA isolation was performed on the TOP10(®) cell culture and sent for sequencing. Sequences were analyzed using NCBI BLAST. All genes sequenced had over 90% homology with both the chicken and predicted turkey sequences. The sequences were used to design new 100% homologous primer sets for the genes of interest. PMID:27053625

  10. rpoB Gene Sequencing for Identification of Corynebacterium Species

    PubMed Central

    Khamis, Atieh; Raoult, Didier; La Scola, Bernard

    2004-01-01

    The genus Corynebacterium is a heterogeneous group of species comprising human and animal pathogens and environmental bacteria. It is defined on the basis of several phenotypic characters and the results of DNA-DNA relatedness and, more recently, 16S rRNA gene sequencing. However, the 16S rRNA gene is not polymorphic enough to ensure reliable phylogenetic studies and needs to be completely sequenced for accurate identification. The almost complete rpoB sequences of 56 Corynebacterium species were determined by both PCR and genome walking methods. In all cases the percent similarities between different species were lower than those observed by 16S rRNA gene sequencing, even for those species with degrees of high similarity. Several clusters supported by high bootstrap values were identified. In order to propose a method for strain identification which does not require sequencing of the complete rpoB sequence (approximately 3,500 bp), we identified an area with a high degree of polymorphism, bordered by conserved sequences that can be used as universal primers for PCR amplification and sequencing. The sequence of this fragment (434 to 452 bp) allows accurate species identification and may be used in the future for routine sequence-based identification of Corynebacterium species. PMID:15364970

  11. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events). PMID:23161689

  12. A silent composite hemoglobinopathy characterized by gene sequencing.

    PubMed

    Zorai, A; Moumni, I; Benmansour, I; Chaouachi, D; Ghanem, A; Abbes, S

    2011-01-01

    We report the case of a 35-year-old Tunisian women with a chronic anemia non investigated for a long time. Laboratory analysis using advanced technology of DNA sequencing revealed a compound heterozygote for Hb O Arab and cd 39 beta degrees-thalassemia. It's the first time that such a genotype has been characterized by gene sequencing. PMID:23461145

  13. Flagellin gene sequence variation in the genus Pseudomonas.

    PubMed

    Bellingham, N F; Morgan, J A; Saunders, J R; Winstanley, C

    2001-07-01

    Flagellin gene (fliC) sequences from 18 strains of Pseudomonas sensu stricto representing 8 different species, and 9 representative fliC sequences from other members of the gamma sub-division of proteobacteria, were compared. Analysis was performed on N-terminal, C-terminal and whole fliC sequences. The fliC analyses confirmed the inferred relationship between P. mendocina, P. oleovorans and P. aeruginosa based on 16S rRNA sequence comparisons. In addition, the analyses indicated that P. putida PRS2000 was closely related to P. fluorescens SBW25 and P. fluorescens NCIMB 9046T, but suggested that P. putida PaW8 and P. putida PRS2000 were more closely related to other Pseudomonas spp. than they were to each other. There were a number of inconsistencies in inferred evolutionary relationships between strains, depending on the analysis performed. In particular, whole flagellin gene comparisons often differed from those obtained using N- and C-terminal sequences. However, there were also inconsistencies between the terminal region analyses, suggesting that phylogenetic relationships inferred on the basis of fliC sequence should be treated with caution. Although the central domain of fliC is highly variable between Pseudomonas strains, there was evidence of sequence similarities between the central domains of different Pseudomonas fliC sequences. This indicates the possibility of recombination in the central domain of fliC genes within Pseudomonas species, and between these genes and those from other bacteria. PMID:11518318

  14. Structure and sequence divergence of two archaebacterial genes

    SciTech Connect

    Cue, D.; Beckler, G.S.; Reeve, J.N.; Konisky, J.

    1985-06-01

    The DNA sequences of a region that includes the hisA gene of two related methanogenic archaebacteria, Methanococcus voltae and Methanococcus vannielii, have been compared. Both organisms show a similar genome organization in this region, displaying three open reading frames (ORFs) separated by regions of very high A+T content. Two of the ORFs, including ORFHisA, show significant DNA sequence homology. As might be expected for organisms having a genome that is A+T-rich, there is a high preference for A and U as the third base in codons. A ribosome binding site, G-G-T-G, is located 6 base pairs preceding the ATG translation initiation sequence of both hisA genes. The sequences upstream of the two hisA genes show only limited sequence homology. The M. voltae intergenic region contains four tandemly arranged repetitions of an 11-base-pair sequence, whereas the M. vannielii sequence contains both direct and inverted repetitive sequences. Based on the degree of hisA sequence homology, the authors conclude that M. voltae and M. vannielii are less closely related taxonomically than are members of the enteric group of eubacteria.

  15. Mechanism of gene amplification via yeast autonomously replicating sequences.

    PubMed

    Sehgal, Shelly; Kaul, Sanjana; Dhar, M K

    2015-01-01

    The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification. PMID:25685838

  16. Mechanism of Gene Amplification via Yeast Autonomously Replicating Sequences

    PubMed Central

    Dhar, M. K.

    2015-01-01

    The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification. PMID:25685838

  17. Nucleotide sequence of a human tRNA gene heterocluster

    SciTech Connect

    Chang, Y.N.; Pirtle, I.L.; Pirtle, R.M.

    1986-05-01

    Leucine tRNA from bovine liver was used as a hybridization probe to screen a human gene library harbored in Charon-4A of bacteriophage lambda. The human DNA inserts from plaque-pure clones were characterized by restriction endonuclease mapping and Southern hybridization techniques, using both (3'-/sup 32/P)-labeled bovine liver leucine tRNA and total tRNA as hybridization probes. An 8-kb Hind III fragment of one of these ..gamma..-clones was subcloned into the Hind III site of pBR322. Subsequent fine restriction mapping and DNA sequence analysis of this plasmid DNA indicated the presence of four tRNA genes within the 8-kb DNA fragment. A leucine tRNA gene with an anticodon of AAG and a proline tRNA gene with an anticodon of AGG are in a 1.6-kb subfragment. A threonine tRNA gene with an anticodon of UGU and an as yet unidentified tRNA gene are located in a 1.1-kb subfragment. These two different subfragments are separated by 2.8 kb. The coding regions of the three sequenced genes contain characteristic internal split promoter sequences and do not have intervening sequences. The 3'-flanking region of these three genes have typical RNA polymerase III termination sites of at least four consecutive T residues.

  18. Pilus genes of Neisseria gonorrheae: chromosomal organization and DNA sequence.

    PubMed

    Meyer, T F; Billyard, E; Haas, R; Storzbach, S; So, M

    1984-10-01

    We have mapped two regions of the Neisseria gonorrheae genome, pilE1 and pilE2, which are involved in pilus expression. When the cells are in the piliated P+ state, these two loci carry sequences necessary for pilin production. A silent locus, pilS1, also maps near pilE1 and pilE2. pilS1 contains structural gene information but lacks pilus promoter sequences. The pilus gene sequences in pilE1 and pilE2 are identical in strain MS11. PMID:6148752

  19. Gene identification and classification in the Synechocystis genomic sequence by recursive gene mark analysis.

    PubMed

    Hirosawa, M; Isono, K; Hayes, W; Borodovsky, M

    1997-01-01

    The GeneMark method has proven to be an efficient gene-finding tool for the analysis of prokaryotic genomic sequence data. We have developed a procedure of deriving and utilizing several GeneMark models in order to get better gene-detection performance. Upon applying this procedure to the 1.0 Mb contiguous DNA sequence of Synechocystis sp. strain PCC6803, we were able to cluster predicted genes into distinct classes and to produce the class-specific GeneMark models reflecting statistical characteristics of each gene class. One gene class apparently includes genes of exogenous origin. Using class-specific models reduces the gene under prediction error rate down to 1.7% in comparison with 8.1% reported in the previous study when only one GeneMark model was used. PMID:9522117

  20. The regions of sequence variation in caulimovirus gene VI.

    PubMed

    Sanger, M; Daubert, S; Goodman, R M

    1991-06-01

    The sequence of gene VI from figwort mosaic virus (FMV) clone x4 was determined and compared with that previously published for FMV clone DxS. Both clones originated from the same virus isolation, but the virus used to clone DxS was propagated extensively in a host of a different family prior to cloning whereas that used to clone x4 was not. Differences in the amino acid sequence inferred from the DNA sequences occurred in two clusters. An N-terminal conserved region preceded two regions of variation separated by a central conserved region. Variation in cauliflower mosaic virus (CaMV) gene VI sequences, all of which were derived from virus isolates from hosts from one host family, was similar to that seen in the FMV comparison, though the extent of variation was less. Alignment of gene VI domains from FMV and CaMV revealed regions of amino acid sequence identical in both viruses within the conserved regions. The similarity in the pattern of conserved and variable domains of these two viruses suggests common host-interactive functions in caulimovirus gene VI homologues, and possibly an analogy between caulimoviruses and certain animal viruses in the influence of the host on sequence variability of viral genes. PMID:2024500

  1. Cloning and sequencing of the gene for human. beta. -casein

    SciTech Connect

    Loennerdal, B.; Bergstroem, S.; Andersson, Y.; Hialmarsson, K.; Sundgyist, A.; Hernell, O. )

    1990-02-26

    Human {beta}-casein is a major protein in human milk. This protein is part of the casein micelle and has been suggested to have several physiological functions in the newborn. Since there is limited information on {beta}casein and the factors that affect its concentration in human milk, the authors have isolated and sequenced the gene for this protein. A human mammary gland cDNA library (Clontech) in gt 11 was screened by plaque hy-hybridization using a 42-mer synthetic {sup 32}p-labelled oligo-nucleotide. Positive clones were identified and isolated, DNA was prepared and the gene isolated by cleavage with EcoR1. Following subcloning (PUC18), restriction mapping and Southern blotting, DNA for sequencing was prepared. The gene was sequenced by the dideoxy method. Human {beta}-casein has 212 amino acids and the amino acid sequence deducted from the nucleotide sequence is to 91% identical to the published sequence for human {beta}-casein show a high degree of conservation at the leader peptide and the highly phosphorylated sequences, but also deletions and divergence at several positions. These results provide insight into the structure of the human {beta}-casein gene and will facilitate studies on factors affecting its expression.

  2. SxtA gene sequence analysis of dinoflagellate Alexandrium minutum

    NASA Astrophysics Data System (ADS)

    Norshaha, Safida Anira; Latib, Norhidayu Abdul; Usup, Gires; Yusof, Nurul Yuziana Mohd

    2015-09-01

    The dinoflagellate Alexandrium minutum is typically known for the production of potent neurotoxins such as saxitoxin, affecting the health of human seafood consumers via paralytic shellfish poisoning (PSP). These phenomena is related to the harmful algal blooms (HABs) that is believed to be influenced by environmental and nutritional factors. Previous study has revealed that SxtA gene is a starting gene that involved in the saxitoxin production pathway. The aim of this study was to analyse the sequence of the sxtA gene in A. minutum. The dinoflagellates culture was cultured at temperature 26°C with 16:8-hour light:dark photocycle. After the samples were harvested, RNA was extracted, complementary DNA (cDNA) was synthesised and amplified by polymerase chain reaction (PCR). The PCR products were then purified and cloned before sequenced. The SxtA sequence obtained was then analyzed in order to identify the presence of SxtA gene in Alexandrium minutum.

  3. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  4. Sequence Variability in Staphylococcal Enterotoxin Genes seb, sec, and sed

    PubMed Central

    Johler, Sophia; Sihto, Henna-Maria; Macori, Guerrino; Stephan, Roger

    2016-01-01

    Ingestion of staphylococcal enterotoxins preformed by Staphylococcus aureus in food leads to staphylococcal food poisoning, the most prevalent foodborne intoxication worldwide. There are five major staphylococcal enterotoxins: SEA, SEB, SEC, SED, and SEE. While variants of these toxins have been described and were linked to specific hosts or levels or enterotoxin production, data on sequence variation is still limited. In this study, we aim to extend the knowledge on promoter and gene variants of the major enterotoxins SEB, SEC, and SED. To this end, we determined seb, sec, and sed promoter and gene sequences of a well-characterized set of enterotoxigenic Staphylococcus aureus strains originating from foodborne outbreaks, human infections, human nasal colonization, rabbits, and cattle. New nucleotide sequence variants were detected for all three enterotoxins and a novel amino acid sequence variant of SED was detected in a strain associated with human nasal colonization. While the seb promoter and gene sequences exhibited a high degree of variability, the sec and sed promoter and gene were more conserved. Interestingly, a truncated variant of sed was detected in all tested sed harboring rabbit strains. The generated data represents a further step towards improved understanding of strain-specific differences in enterotoxin expression and host-specific variation in enterotoxin sequences. PMID:27258311

  5. Sequence Determinants of Circadian Gene Expression Phase in Cyanobacteria

    PubMed Central

    Vijayan, Vikram

    2013-01-01

    The cyanobacterium Synechococcus elongatus PCC 7942 exhibits global biphasic circadian oscillations in gene expression under constant-light conditions. Class I genes are maximally expressed in the subjective dusk, whereas class II genes are maximally expressed in the subjective dawn. Here, we identify sequence features that encode the phase of circadian gene expression. We find that, for multiple genes, an ∼70-nucleotide promoter fragment is sufficient to specify class I or II phase. We demonstrate that the gene expression phase can be changed by random mutagenesis and that a single-nucleotide substitution is sufficient to change the phase. Our study provides insight into how the gene expression phase is encoded in the cyanobacterial genome. PMID:23204469

  6. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  7. Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space

    PubMed Central

    Lonardi, Stefano; Duma, Denisa; Alpert, Matthew; Cordero, Francesca; Beccuti, Marco; Bhat, Prasanna R.; Wu, Yonghui; Ciardo, Gianfranco; Alsaihati, Burair; Ma, Yaqin; Wanamaker, Steve; Resnik, Josh; Bozdag, Serdar; Luo, Ming-Cheng; Close, Timothy J.

    2013-01-01

    For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding. PMID:23592960

  8. The nucleotide sequence of the bacteriophage T5 ltf gene.

    PubMed

    Kaliman, A V; Kulshin, V E; Shlyapnikov, M G; Ksenzenko, V N; Kryukov, V M

    1995-06-01

    The nucleotide sequence of the bacteriophage T5 Bg/II-BamHI fragment (4,835 bp in length) known to carry a gene encoding the LTF protein which forms the phage L-shaped tail fibers was determined. It was shown to contain an open reading frame for 1,396 amino acid residues that corresponds to a protein of 147.8 kDa. The coding region of ltf gene is preceded by a typical Shine-Dalgarno sequence. Downstream from the ltf gene there is a strong transcription terminator. Data bank analysis of the LTF protein sequence reveals 55.1% identity to the hypothetical protein ORF 401 of bacteriophage lambda in a segment of 118 amino acids overlap. PMID:7789514

  9. Diverse nucleotide compositions and sequence fluctuation in Rubisco protein genes

    NASA Astrophysics Data System (ADS)

    Holden, Todd; Dehipawala, S.; Cheung, E.; Bienaime, R.; Ye, J.; Tremberger, G., Jr.; Schneider, P.; Lieberman, D.; Cheung, T.

    2011-10-01

    The Rubisco protein-enzyme is arguably the most abundance protein on Earth. The biology dogma of transcription and translation necessitates the study of the Rubisco genes and Rubisco-like genes in various species. Stronger correlation of fractal dimension of the atomic number fluctuation along a DNA sequence with Shannon entropy has been observed in the studied Rubisco-like gene sequences, suggesting a more diverse evolutionary pressure and constraints in the Rubisco sequences. The strategy of using metal for structural stabilization appears to be an ancient mechanism, with data from the porphobilinogen deaminase gene in Capsaspora owczarzaki and Monosiga brevicollis. Using the chi-square distance probability, our analysis supports the conjecture that the more ancient Rubisco-like sequence in Microcystis aeruginosa would have experienced very different evolutionary pressure and bio-chemical constraint as compared to Bordetella bronchiseptica, the two microbes occupying either end of the correlation graph. Our exploratory study would indicate that high fractal dimension Rubisco sequence would support high carbon dioxide rate via the Michaelis- Menten coefficient; with implication for the control of the whooping cough pathogen Bordetella bronchiseptica, a microbe containing a high fractal dimension Rubisco-like sequence (2.07). Using the internal comparison of chi-square distance probability for 16S rRNA (~ E-22) versus radiation repair Rec-A gene (~ E-05) in high GC content Deinococcus radiodurans, our analysis supports the conjecture that high GC content microbes containing Rubisco-like sequence are likely to include an extra-terrestrial origin, relative to Deinococcus radiodurans. Similar photosynthesis process that could utilize host star radiation would not compete with radiation resistant process from the biology dogma perspective in environments such as Mars and exoplanets.

  10. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene

    PubMed Central

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the ‘CCCGCC’ motif in the GFP coding sequence. PMID:27193250

  11. Illumina MiSeq sequencing disfavours a sequence motif in the GFP reporter gene.

    PubMed

    Van den Hoecke, Silvie; Verhelst, Judith; Saelens, Xavier

    2016-01-01

    Green fluorescent protein (GFP) is one of the most used reporter genes. We have used next-generation sequencing (NGS) to analyse the genetic diversity of a recombinant influenza A virus that expresses GFP and found a remarkable coverage dip in the GFP coding sequence. This coverage dip was present when virus-derived RT-PCR product or the parental plasmid DNA was used as starting material for NGS and regardless of whether Nextera XT transposase or Covaris shearing was used for DNA fragmentation. Therefore, the sequence coverage dip in the GFP coding sequence was not the result of emerging GFP mutant viruses or a bias introduced by Nextera XT fragmentation. Instead, we found that the Illumina MiSeq sequencing method disfavours the 'CCCGCC' motif in the GFP coding sequence. PMID:27193250

  12. Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

    PubMed

    Minoche, André E; Dohm, Juliane C; Schneider, Jessica; Holtgräwe, Daniela; Viehöver, Prisca; Montfort, Magda; Sörensen, Thomas Rosleff; Weisshaar, Bernd; Himmelbauer, Heinz

    2015-01-01

    We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes. PMID:26328666

  13. Sequence and gene expression evolution of paralogous genes in willows.

    PubMed

    Harikrishnan, Srilakshmy L; Pucholt, Pascal; Berlin, Sofia

    2015-01-01

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows. PMID:26689951

  14. Sequence and gene expression evolution of paralogous genes in willows

    PubMed Central

    Harikrishnan, Srilakshmy L.; Pucholt, Pascal; Berlin, Sofia

    2015-01-01

    Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows. PMID:26689951

  15. Nucleotide sequence corresponding to five chemotaxis genes in Escherichia coli.

    PubMed Central

    Mutoh, N; Simon, M I

    1986-01-01

    The nucleotide sequence of DNA which contains five chemotaxis-related genes of Escherichia coli, cheW, cheR, cheB, cheY, and cheZ, and part of the cheA gene was determined. Molecular weights of the polypeptides encoded by these genes were calculated from translated amino acid sequences, and they were 18,100 for cheW, 32,700 for cheR, 37,500 for cheB, 14,100 for cheY, and 24,000 for cheZ. Nucleotide sequences which could act as ribosome-binding sites were found in the upstream region of each gene. After the termination codon of the cheW gene, a typical rho-independent transcription termination signal was observed. There are no other open reading frames long enough to encode polypeptides in this region except those which code for the two previously reported genes tar and tap. PMID:3510184

  16. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications.

    PubMed

    Herzog, M; Maroteaux, L

    1986-11-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  17. Dinoflagellate 17S rRNA sequence inferred from the gene sequence: Evolutionary implications

    PubMed Central

    Herzog, Michel; Maroteaux, Luc

    1986-01-01

    We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795

  18. DNA sequence of the Serratia marcescens lipoprotein gene

    PubMed Central

    Nakamura, Kenzo; Inouye, Masayori

    1980-01-01

    The Serratia marcescens gene for the outer membrane lipoprotein (lpp) was cloned in λ phage vector Charon 14. The recombinant phage was very unstable, and the lpp gene with a 300-base-pair deletion at the transcription termination site was further cloned in pBR322. The DNA sequence of 834 base pairs encompassing the lpp gene was determined and compared with that of the Escherichia coli lpp gene. The sequence comparisons exhibit several unique features. (i) The promoter region is highly conserved (84% homology) and has an extremely high A+T content (78%) as in E. coli (80%). (ii) The 5′ nontranslated region of the lipoprotein mRNA is also highly conserved (95% homology). (iii) In the DNA sequence corresponding to the signal peptide of this secretory protein, there are three drastic changes, including addition of one base pair and deletion of four base pairs in S. marcescens as compared to E. coli. The resultant alterations in the amino acid sequence, however, do not change the basic properties of the signal peptide, which are assumed to be essential for its function in the secretory mechanism. (iv) The DNA sequence from the amino terminus to the 51st residue of the mature lipoprotein is highly conserved (95% homology) and there is no amino acid substitution. (v) The DNA sequence corresponding to the seven amino acid residues at the carboxyl terminus has only 42% homology, resulting in four amino acid substitutions. (vi) Within the section of 40 base pairs beginning with the termination codon (UAA) and ending immediately before the oligo(T) transcription termination site in the E. coli lpp gene, there is about 60% homology. However, after this section, there is no obvious homology between the two sequences, probably because of a deletion of 300 base pairs at this region. (vii) Seven stable stem-and-loop structures could be formed in the mRNA region. (viii) Alterations in the third position of codons used in the lpp gene suggest that the gene has evolved somewhat

  19. Analyzing S-adenosylhomocysteine hydrolase gene sequences in deuterostome genomes.

    PubMed

    Zhao, Jing-Nan; Wang, Yuan; Zhao, Bo-Sheng; Chen, Ling-Ling

    2009-12-01

    S-adenosylhomocysteine hydrolase (SAHH) gene sequences of sea-urchin, two amphioxus, sea-squirt and eight vertebrates are comparatively analyzed in the current analysis. Although SAHH protein sequences are highly conserved in these species, their nucleotide sequences are much different, ranging from 5,446 bp in amphioxus to 40,174 bp in zebra fish. The length divergence is mainly caused by distinct introns in some species. SAHH genes in amphioxus (or sea-urchin), sea-squirt and vertebrates are composed of eight, nine and ten exons, respectively. Sequence alignment shows that exon 3 in amphioxus and sea-urchin is similar to exons 3 + 4 in vertebrates, exon 5 in amphioxus and sea-urchin is similar to exons 5 + 6 in sea-squirt, and the two exons are fused into exon 6 in vertebrates. Furthermore, exon 7 in sea-squirt is similar to exons 7 + 8 in vertebrates, indicating that exon-fission and exon-fusion events have been taken place during the evolution of deuterostome SAHH genes. Active sites and NAD+-binding sites are located in exons 2 7 in amphioxus, which are dispersed into much more exons along with the evolution of vertebrates. It is speculated that ten-exon organization of SAHH gene occurred after the separation of invertebrates and vertebrates. Synonymous and non-synonymous substitution analysis shows that negative selection plays a dominant role in the evolution of SAHH genes. Phylogenetic analysis shows that SAHH genes in amphioxus, sea-urchin and sea-squirt form a cluster and locate at the base of neighbor-joining tree, suggesting that they are the archetype of vertebrate SAHH genes. PMID:19795919

  20. Spliced synthetic genes as internal controls in RNA sequencing experiments.

    PubMed

    Hardwick, Simon A; Chen, Wendy Y; Wong, Ted; Deveson, Ira W; Blackburn, James; Andersen, Stacey B; Nielsen, Lars K; Mattick, John S; Mercer, Tim R

    2016-09-01

    RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome. PMID:27502218

  1. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2014-01-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy, and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:24510847

  2. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2011-09-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:21901741

  3. Phenotype Sequencing: Identifying the Genes That Cause a Phenotype Directly from Pooled Sequencing of Independent Mutants

    PubMed Central

    Harper, Marc A.; Chen, Zugen; Toy, Traci; Machado, Iara M. P.; Nelson, Stanley F.; Liao, James C.; Lee, Christopher J.

    2011-01-01

    Random mutagenesis and phenotype screening provide a powerful method for dissecting microbial functions, but their results can be laborious to analyze experimentally. Each mutant strain may contain 50–100 random mutations, necessitating extensive functional experiments to determine which one causes the selected phenotype. To solve this problem, we propose a “Phenotype Sequencing” approach in which genes causing the phenotype can be identified directly from sequencing of multiple independent mutants. We developed a new computational analysis method showing that 1. causal genes can be identified with high probability from even a modest number of mutant genomes; 2. costs can be cut many-fold compared with a conventional genome sequencing approach via an optimized strategy of library-pooling (multiple strains per library) and tag-pooling (multiple tagged libraries per sequencing lane). We have performed extensive validation experiments on a set of E. coli mutants with increased isobutanol biofuel tolerance. We generated a range of sequencing experiments varying from 3 to 32 mutant strains, with pooling on 1 to 3 sequencing lanes. Our statistical analysis of these data (4099 mutations from 32 mutant genomes) successfully identified 3 genes (acrB, marC, acrA) that have been independently validated as causing this experimental phenotype. It must be emphasized that our approach reduces mutant sequencing costs enormously. Whereas a conventional genome sequencing experiment would have cost $7,200 in reagents alone, our Phenotype Sequencing design yielded the same information value for only $1200. In fact, our smallest experiments reliably identified acrB and marC at a cost of only $110–$340. PMID:21364744

  4. Sequence and regulation of the porcine FSHR gene promoter.

    PubMed

    Wu, Wangjun; Han, Jing; Cao, Rui; Zhang, Jinbi; Li, Bojiang; Liu, Zequn; Liu, Kaiqing; Li, Qifa; Pan, Zengxiang; Chen, Jie; Liu, Honglin

    2015-03-01

    Follicle-stimulating hormone (FSH) plays a crucial role in animal reproduction and exerts its physiological functions by interacting with the FSH receptor (FSHR). The FSHR is exclusively expressed in granulose cells in the ovary and its expression level is closely related to granulose cell differentiation and follicle maturation. In mammal, most of the follicles undergo atresia, while follicle atresia is mainly caused by granulosa cell apoptosis. However, knowledge on the transcriptional regulatory mechanisms of the porcine FSHR gene in granulosa cell is still limited. In this study, approximately 2.1kb of the proximal promoter sequence of the porcine FSHR gene were obtained by genome walking, and the regulatory elements and transcription factors in the porcine FSHR promoter sequence were predicted. Furthermore, the core promoter region (-1195/-598) of the porcine FSHR gene was identified using a luciferase assay. Subsequently, the relationship between expression levels of the porcine FSHR gene and histone H3K9 acetylation levels around the core promoter region (-787/-572) in vivo and invitro were analyzed. Our results showed that an increased FSHR gene expression level was accompanied with an increase in histone H3K9 acetylation levels, suggesting that histone H3K9 acetylation could regulate the expression of the porcine FSHR gene. PMID:25599592

  5. Detecting sequence homology at the gene cluster level with MultiGeneBlast.

    PubMed

    Medema, Marnix H; Takano, Eriko; Breitling, Rainer

    2013-05-01

    The genes encoding many biomolecular systems and pathways are genomically organized in operons or gene clusters. With MultiGeneBlast, we provide a user-friendly and effective tool to perform homology searches with operons or gene clusters as basic units, instead of single genes. The contextualization offered by MultiGeneBlast allows users to get a better understanding of the function, evolutionary history, and practical applications of such genomic regions. The tool is fully equipped with applications to generate search databases from GenBank or from the user's own sequence data. Finally, an architecture search mode allows searching for gene clusters with novel configurations, by detecting genomic regions with any user-specified combination of genes. Sources, precompiled binaries, and a graphical tutorial of MultiGeneBlast are freely available from http://multigeneblast.sourceforge.net/. PMID:23412913

  6. Nucleotide sequence of the tobacco (Nicotiana tabacum) anionic peroxidase gene

    SciTech Connect

    Diaz-De-Leon, F.; Klotz, K.L.; Lagrimini, L.M. )

    1993-03-01

    Peroxidases have been implicated in numerous physiological processes including lignification (Grisebach, 1981), wound-healing (Espelie et al., 1986), phenol oxidation (Lagrimini, 1991), pathogen defense (Ye et al., 1990), and the regulation of cell elongation through the formation of interchain covalent bonds between various cell wall polymers (Fry, 1986; Goldberg et al., 1986; Bradley et al., 1992). However, a complete description of peroxidase action in vivo is not available because of the vast number of potential substrates and the existence of multiple isoenzymes. The tobacco anionic peroxidase is one of the better-characterized isoenzymes. This enzyme has been shown to oxidize a number of significant plant secondary compounds in vitro including cinnamyl alcohols, phenolic acids, and indole-3-acetic acid (Maeder, 1980; Lagrimini, 1991). A cDNA encoding the enzyme has been obtained, and this enzyme was shown to be expressed at the highest levels in lignifying tissues (xylem and tracheary elements) and also in epidermal tissue (Lagrimini et al., 1987). It was shown at this time that there were four distinct copies of the anionic peroxidase gene in tobacco (Nicotiana tabacum). A tobacco genomic DNA library was constructed in the [lambda]-phase EMBL3, from which two unique peroxidase genes were sequenced. One of these clones, [lambda]POD1, was designated as a pseudogene when the exonic sequences were found to differ from the cDNA sequences by 1%, and several frame shifts in the coding sequences indicated a dysfunctional gene (the authors' unpublished results). The other clone, [lambda]POD3, described in this manuscript, was designated as the functional tobacco anionic peroxidase gene because of 100% homology with the cDNA. Significant structural elements include an AS-2 box indicated in shoot-specific expression (Lam and Chua, 1989), a TATA box, and two intervening sequences. 10 refs., 1 tab.

  7. Sequence variations in the FAD2 gene in seeded pumpkins.

    PubMed

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-01-01

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2. PMID:26782391

  8. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  9. Complete sequence and gene organization of the Nosema spodopterae rRNA gene.

    PubMed

    Tsai, Shu-Jen; Huang, Wei-Fone; Wang, Chung-Hsiung

    2005-01-01

    By sequencing the entire ribosomal RNA (rRNA) gene of Nosema spodopterae, we show here that its gene organization follows a pattern similar to the Nosema type species, Nosema bombycis, i.e. 5'-large subunit rRNA (2,497 bp)-internal transcribed spacer (185 bp)-small subunit rRNA (1,232 bp)-intergenic spacer (277 bp)-5S rRNA (114 bp)-3'. Gene sequences and the secondary structures of large subunit rRNA, small subunit rRNA, and 5S rRNA are compared with the known corresponding sequences and structures of closely related microsporidia. The results suggest that the Nosema genus may be heterogeneous and that the rRNA gene organization may be a useful characteristic for determining which species are closely related to the type species. PMID:15702980

  10. Nucleotide sequence of Bacillus phage Nf terminal protein gene.

    PubMed Central

    Leavitt, M C; Ito, J

    1987-01-01

    The nucleotide sequence of Bacillus phage Nf gene E has been determined. Gene E codes for phage terminal protein which is the primer necessary for the initiation of DNA replication. The deduced amino acid sequence of Nf terminal protein is approximately 66% homologous with the terminal proteins of Bacillus phages PZA and luminal diameter 29, and shows similar hydropathy and secondary structure predictions. A serine which has been identified as the residue which covalently links the protein to the 5' end of the genome in luminal diameter 29, is conserved in all three phages. The hydropathic and secondary structural environment of this serine is similar in these phage terminal proteins and also similar to the linking serine of adenovirus terminal protein. PMID:3601672

  11. Sequence and expression of a halobacterial beta-galactosidase gene.

    PubMed

    Holmes, M L; Dyall-Smith, M L

    2000-04-01

    Studies of gene expression in haloarchaea have been greatly hindered by the lack of a convenient reporter gene. In a previous study, a beta-galactosidase from Haloferax alicantei was purified and several peptide sequences determined. The peptide sequences have now been used to clone the entire beta-galactosidase gene (designated bgaH) along with some flanking chromosomal DNA. The deduced amino acid sequence of BgaH was 665 amino acids (74 kDa) and showed greatest amino acid similarity to members of glycosyl hydrolase family 42 [classification of Henrissat, B., and Bairoch, A. (1993) New families in the classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem J 293: 781-788]. Within this family, BgaH was most similar (42-43% aa identity) to enzymes from extremely thermophilic bacteria such as Thermotoga and Thermus. Family 42 enzymes are only distantly related to the Sulfolobus LacS and Escherichia coli LacZ enzymes (families one and two respectively). Three open reading frames (ORFs) upstream of bgaH were readily identified by database searches as glucose-fructose oxidoreductase, 2-dehydro-3-deoxyphosphogluconate aldolase and 2-keto-3-deoxygluconate kinase, enzymes that are also involved in carbohydrate metabolism. Downstream of bgaH there was an ORF which contained a putative fibronectin III motif. The bgaH gene was engineered into a halobacterial plasmid vector and introduced into Haloferax volcanii, a widely used strain that lacks detectable beta-galactosidase activity. Transformants were shown to express the enzyme; colonies turned blue when sprayed with Xgal and enzyme activity could be easily quantitated using a standard ONPG assay. In an accompanying publication, Patenge et al. (2000) have demonstrated the utility of bgaH as a promoter reporter in Halobacterium salinarum. PMID:10760168

  12. Evaluation of gene-finding programs on mammalian sequences.

    PubMed

    Rogic, S; Mackworth, A K; Ouellette, F B

    2001-05-01

    We present an independent comparative analysis of seven recently developed gene-finding programs: FGENES, GeneMark.hmm, Genie, Genescan, HMMgene, Morgan, and MZEF. For evaluation purposes we developed a new, thoroughly filtered, and biologically validated dataset of mammalian genomic sequences that does not overlap with the training sets of the programs analyzed. Our analysis shows that the new generation of programs has substantially better results than the programs analyzed in previous studies. The accuracy of the programs was also examined as a function of various sequence and prediction features, such as G + C content of the sequence, length and type of exons, signal type, and score of the exon prediction. This approach pinpoints the strengths and weaknesses of each individual program as well as those of computational gene-finding in general. The dataset used in this analysis (HMR195) as well as the tables with the complete results are available at http://www.cs.ubc.ca/~rogic/evaluation/. PMID:11337477

  13. Full-length minor ampullate spidroin gene sequence.

    PubMed

    Chen, Gefei; Liu, Xiangqin; Zhang, Yunlong; Lin, Senzhu; Yang, Zijiang; Johansson, Jan; Rising, Anna; Meng, Qing

    2012-01-01

    Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level. PMID:23251707

  14. Full-Length Minor Ampullate Spidroin Gene Sequence

    PubMed Central

    Chen, Gefei; Liu, Xiangqin; Zhang, Yunlong; Lin, Senzhu; Yang, Zijiang; Johansson, Jan; Rising, Anna; Meng, Qing

    2012-01-01

    Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps). Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level. PMID:23251707

  15. Second-generation sequencing for gene discovery in the Brassicaceae.

    PubMed

    Hayward, Alice; Vighnesh, Guru; Delay, Christina; Samian, Mohd Rafizan; Manoli, Sahana; Stiller, Jiri; McKenzie, Megan; Edwards, David; Batley, Jacqueline

    2012-08-01

    The Brassicaceae contains the most diverse collection of agriculturally important crop species of all plant families. Yet, this is one of the few families that do not form functional symbiotic associations with mycorrhizal fungi in the soil for improved nutrient acquisition. The genes involved in this symbiosis were more recently recruited by legumes for symbiotic association with nitrogen-fixing rhizobia bacteria. This study applied second-generation sequencing (SGS) and analysis tools to discover that two such genes, NSP1 (Nodulation Signalling Pathway 1) and NSP2, remain conserved in diverse members of the Brassicaceae despite the absence of these symbioses. We demonstrate the utility of SGS data for the discovery of putative gene homologs and their analysis in complex polyploid crop genomes with little prior sequence information. Furthermore, we show how this data can be applied to enhance downstream reverse genetics analyses. We hypothesize that Brassica NSP genes may function in the root in other plant-microbe interaction pathways that were recruited for mycorrhizal and rhizobial symbioses during evolution. PMID:22765874

  16. Complete MHC haplotype sequencing for common disease gene mapping.

    PubMed

    Stewart, C Andrew; Horton, Roger; Allcock, Richard J N; Ashurst, Jennifer L; Atrazhev, Alexey M; Coggill, Penny; Dunham, Ian; Forbes, Simon; Halls, Karen; Howson, Joanna M M; Humphray, Sean J; Hunt, Sarah; Mungall, Andrew J; Osoegawa, Kazutoyo; Palmer, Sophie; Roberts, Anne N; Rogers, Jane; Sims, Sarah; Wang, Yu; Wilming, Laurens G; Elliott, John F; de Jong, Pieter J; Sawcer, Stephen; Todd, John A; Trowsdale, John; Beck, Stephan

    2004-06-01

    The future systematic mapping of variants that confer susceptibility to common diseases requires the construction of a fully informative polymorphism map. Ideally, every base pair of the genome would be sequenced in many individuals. Here, we report 4.75 Mb of contiguous sequence for each of two common haplotypes of the major histocompatibility complex (MHC), to which susceptibility to >100 diseases has been mapped. The autoimmune disease-associated-haplotypes HLA-A3-B7-Cw7-DR15 and HLA-A1-B8-Cw7-DR3 were sequenced in their entirety through a bacterial artificial chromosome (BAC) cloning strategy using the consanguineous cell lines PGF and COX, respectively. The two sequences were annotated to encompass all described splice variants of expressed genes. We defined the complete variation content of the two haplotypes, revealing >18,000 variations between them. Average SNP densities ranged from less than one SNP per kilobase to >60. Acquisition of complete and accurate sequence data over polymorphic regions such as the MHC from large-insert cloned DNA provides a definitive resource for the construction of informative genetic maps, and avoids the limitation of chromosome regions that are refractory to PCR amplification. PMID:15140828

  17. DNA sequence of a gene encoding a BALB/c mouse Ld transplantation antigen.

    PubMed

    Moore, K W; Sher, B T; Sun, Y H; Eakle, K A; Hood, L

    1982-02-01

    The sequence of a gene, denoted 27.5, encoding a transplantation antigen for the BALB/c mouse has been determined. Gene transfer studies and comparison of the translated sequence with the partial amino acid sequence of the Ld transplantation antigen establish that gene 27.5 encodes an Ld polypeptide. A comparison of the gene 27.5 sequence with several complementary DNA sequences suggests that the BALB/c mouse may contain a number of closely related L-like genes. Gene 27.5 has eight exons that correlate with the structural domains of the transplantation antigen. PMID:7058332

  18. Divergence of human [alpha]-chain constant region gene sequences: A novel recombinant [alpha]2 gene

    SciTech Connect

    Chintalacharuvu, K. R.; Morrison, S.L. ); Raines, M. )

    1994-06-01

    IgA is the major Ig synthesized in humans and provides the first line of defense at the mucosal surfaces. The constant region of IgA heavy chain is encoded by the [alpha] gene on chromosome 14. Previous studies have indicated the presence of two [alpha] genes, [alpha]1 and [alpha]2 existing in two allotypic forms, [alpha]2 m(1) and [alpha]2 m(2). Here the authors report the cloning and complete nucleotide sequence determination of a novel human [alpha] gene. Nucleotide sequence comparison with the published [alpha] sequences suggests that the gene arose as a consequence of recombination or gene conversion between the two [alpha]2 alleles. The authors have expressed the gene as a chimeric protein in myeloma cells indicating that it encodes a functional protein. The novel IgA resembles IgA2 m(2) in that disulfide bonds link H and L chains. This novel recombinant gene provides insights into the mechanisms of generation of different constant regions and suggests that within human populations, multiple alleles of [alpha] may be present providing IgAs of different structures.

  19. Technology development for gene discovery and full-length sequencing

    SciTech Connect

    Marcelo Bento Soares

    2004-07-19

    In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.

  20. Nuclear gene sequences from a late pleistocene sloth coprolite.

    PubMed

    Poinar, Hendrik; Kuch, Melanie; McDonald, Gregory; Martin, Paul; Pääbo, Svante

    2003-07-01

    The determination of nuclear DNA sequences from ancient remains would open many novel opportunities such as the resolution of phylogenies, the sexing of hominid and animal remains, and the characterization of genes involved in phenotypic traits. However, to date, single-copy nuclear DNA sequences from fossils have been determined only from bones and teeth of woolly mammoths preserved in the permafrost. Since the best preserved ancient nucleic acids tend to stem from cold environments, this has led to the assumption that nuclear DNA would be retrievable only from frozen remains. We have previously shown that Pleistocene coprolites stemming from the extinct Shasta sloth (Nothrotheriops shastensis, Megatheriidae) contain mitochondrial (mt) DNA from the animal that produced them as well as chloroplast (cp) DNA from the ingested plants. Recent attempts to resolve the phylogeny of two families of extinct sloths by using strictly mitochondrial DNA has been inconclusive. We have prepared DNA extracts from a ground sloth coprolite from Gypsum Cave, Nevada, and quantitated the number of mtDNA copies for three different fragment lengths by using real-time PCR. We amplified one multicopy and three single-copy nuclear gene fragments and used the concatenated sequence to resolve the phylogeny. These results show that ancient single-copy nuclear DNA can be recovered from warm, arid climates. Thus, nuclear DNA preservation is not restricted to cold climates. PMID:12842016

  1. Next Generation Sequencing in Predicting Gene Function in Podophyllotoxin Biosynthesis*

    PubMed Central

    Marques, Joaquim V.; Kim, Kye-Won; Lee, Choonseok; Costa, Michael A.; May, Gregory D.; Crow, John A.; Davin, Laurence B.; Lewis, Norman G.

    2013-01-01

    Podophyllum species are sources of (−)-podophyllotoxin, an aryltetralin lignan used for semi-synthesis of various powerful and extensively employed cancer-treating drugs. Its biosynthetic pathway, however, remains largely unknown, with the last unequivocally demonstrated intermediate being (−)-matairesinol. Herein, massively parallel sequencing of Podophyllum hexandrum and Podophyllum peltatum transcriptomes and subsequent bioinformatics analyses of the corresponding assemblies were carried out. Validation of the assembly process was first achieved through confirmation of assembled sequences with those of various genes previously established as involved in podophyllotoxin biosynthesis as well as other candidate biosynthetic pathway genes. This contribution describes characterization of two of the latter, namely the cytochrome P450s, CYP719A23 from P. hexandrum and CYP719A24 from P. peltatum. Both enzymes were capable of converting (−)-matairesinol into (−)-pluviatolide by catalyzing methylenedioxy bridge formation and did not act on other possible substrates tested. Interestingly, the enzymes described herein were highly similar to methylenedioxy bridge-forming enzymes from alkaloid biosynthesis, whereas candidates more similar to lignan biosynthetic enzymes were catalytically inactive with the substrates employed. This overall strategy has thus enabled facile further identification of enzymes putatively involved in (−)-podophyllotoxin biosynthesis and underscores the deductive power of next generation sequencing and bioinformatics to probe and deduce medicinal plant biosynthetic pathways. PMID:23161544

  2. Human retinoblastoma susceptibility gene: cloning, identification, and sequence

    SciTech Connect

    Lee, W.; Bookstein, R.; Hong, F.; Young, L.; Shew, J.; Lee, E.Y.P.

    1987-03-13

    Recent evidence indicates the existence of a genetic locus in chromosome region 13q14 that confers susceptibility to retinoblastoma, a cancer of the eye in children. A gene encoding a messenger RNA of 4.6 kilobases (kb), located in the proximity of esterase D, was identified as the retinoblastoma susceptibility (RB) gene on the basis of chromosomal location, homozygous deletion, and tumor-specific alterations in expression. Transcription of this gene was abnormal in six of six retinoblastomas examined: in two tumors, RB mRNA was not detectable, while four others expressed variable quantities of RB mRNA with decreased molecular size of about 4.0 kb. In contrast, full-length RB mRNA was present in human fetal retina and placenta, and in other tumors such as neuroblastoma and medulloblastoma. DNA from retinoblastoma cells had a homozygous gene deletion in one case and hemizygous deletion in another case, while the remainder were not grossly different from normal human control DNA. The gene contains at least 12 exons distributed in a region of over 100 kb. Sequence analysis of complementary DNA clones yielded a single long open reading frame that could encode a hypothetical protein of 816 amino acids.

  3. Deep sequencing reveals 50 novel genes for recessive cognitive disorders.

    PubMed

    Najmabadi, Hossein; Hu, Hao; Garshasbi, Masoud; Zemojtel, Tomasz; Abedini, Seyedeh Sedigheh; Chen, Wei; Hosseini, Masoumeh; Behjati, Farkhondeh; Haas, Stefan; Jamali, Payman; Zecha, Agnes; Mohseni, Marzieh; Püttmann, Lucia; Vahid, Leyla Nouri; Jensen, Corinna; Moheb, Lia Abbasi; Bienek, Melanie; Larti, Farzaneh; Mueller, Ines; Weissmann, Robert; Darvish, Hossein; Wrogemann, Klaus; Hadavi, Valeh; Lipkowitz, Bettina; Esmaeeli-Nieh, Sahar; Wieczorek, Dagmar; Kariminejad, Roxana; Firouzabadi, Saghar Ghasemi; Cohen, Monika; Fattahi, Zohreh; Rost, Imma; Mojahedi, Faezeh; Hertzberg, Christoph; Dehghan, Atefeh; Rajab, Anna; Banavandi, Mohammad Javad Soltani; Hoffer, Julia; Falah, Masoumeh; Musante, Luciana; Kalscheuer, Vera; Ullmann, Reinhard; Kuss, Andreas Walter; Tzschach, Andreas; Kahrizi, Kimia; Ropers, H Hilger

    2011-10-01

    Common diseases are often complex because they are genetically heterogeneous, with many different genetic defects giving rise to clinically indistinguishable phenotypes. This has been amply documented for early-onset cognitive impairment, or intellectual disability, one of the most complex disorders known and a very important health care problem worldwide. More than 90 different gene defects have been identified for X-chromosome-linked intellectual disability alone, but research into the more frequent autosomal forms of intellectual disability is still in its infancy. To expedite the molecular elucidation of autosomal-recessive intellectual disability, we have now performed homozygosity mapping, exon enrichment and next-generation sequencing in 136 consanguineous families with autosomal-recessive intellectual disability from Iran and elsewhere. This study, the largest published so far, has revealed additional mutations in 23 genes previously implicated in intellectual disability or related neurological disorders, as well as single, probably disease-causing variants in 50 novel candidate genes. Proteins encoded by several of these genes interact directly with products of known intellectual disability genes, and many are involved in fundamental cellular processes such as transcription and translation, cell-cycle control, energy metabolism and fatty-acid synthesis, which seem to be pivotal for normal brain development and function. PMID:21937992

  4. Cloning, sequencing, gene organization, and localization of the human ribosomal protein RPL23A gene

    SciTech Connect

    Fan, Wufang; Christensen, M.; Eichler, E.

    1997-12-01

    The intron-containing gene for human ribosomal protein RPL23A has been cloned, sequenced, and localized. The gene is approximately 4.0 kb in length and contains five exons and four introns. All splice sites exactly match the AG/GT consensus rule. The transcript is about 0.6 kb and is detected in all tissues examined. In adult tissues, the RPL23A transcript is dramatically more abundant in pancreas, skeletal muscle, and heart, while much less abundant in kidney, brain, placenta, lung, and liver. A full-length cDNA clone of 576 nt was identified, and the nucleotide sequence was found to match the exon sequence precisely. The open reading frame encodes a polypeptide of 156 amino acids, which is absolutely conserved with the rat RPL23A protein. In the 5{prime} flanking region of the gene, a canonical TATA sequence and a defined CAAT box were found for the first time in a mammalian ribosomal protein gene. The intron-containing RPL23A gene was mapped to cytogenetic band 17q11 by fluorescence in situ hybridization. 33 refs., 4 figs.

  5. Identification of Driver Genes in Hepatocellular Carcinoma by Exome Sequencing

    PubMed Central

    Cleary, Sean P.; Jeck, William R.; Zhao, Xiaobei; Chen, Kui; Selitsky, Sara R.; Savich, Gleb L.; Tan, Ting-Xu; Wu, Michael C.; Getz, Gad; Lawrence, Michael S.; Parker, Joel S.; Li, Jinyu; Powers, Scott; Kim, Hyeja; Fischer, Sandra; Guindi, Maha; Ghanekar, Anand; Chiang, Derek Y.

    2013-01-01

    Genetic alterations in specific driver genes lead to disruption of cellular pathways and are critical events in the instigation and progression of hepatocellular carcinoma. As a prerequisite for individualized cancer treatment, we sought to characterize the landscape of recurrent somatic mutations in hepatocellular carcinoma. We performed whole exome sequencing on 87 hepatocellular carcinomas and matched normal adjacent tissues to anaverage coverage of 59x. The overall mutation rate was roughly 2 mutations per Mb, with a median of 45 non-synonymous mutations that altered the amino acid sequence (range 2 to 381). We found recurrent mutations in several genes with high transcript levels: TP53 (18%), CTNNB1 (10%), KEAP1 (8%), C16orf62 (8%), MLL4(7%) and RAC2 (5%). Significantly affected gene families include the nucleotide-binding domain and leucine rich repeat containing family, calcium channel subunits, and histone methyltransferases. In particular, the MLL family of methyltransferases for histone H3 lysine 4 were mutated in 20% of tumors. Conclusion The NFE2L2-KEAP1 and MLL pathways are recurrently mutated in multiple cohorts of hepatocellular carcinoma. PMID:23728943

  6. International interlaboratory study comparing single organism 16S rRNA gene sequencing data: Beyond consensus sequence comparisons.

    PubMed

    Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B

    2015-03-01

    This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies. PMID:27077030

  7. Complete nucleotide sequence of the human corticotropin-beta-lipotropin precursor gene.

    PubMed Central

    Takahashi, H; Hakamata, Y; Watanabe, Y; Kikuno, R; Miyata, T; Numa, S

    1983-01-01

    The nucleotide sequence of an 8658-base-pair human genomic DNA segment containing the entire corticotropin-beta-lipotropin precursor gene has been determined, and some sequence features of the gene and its flanking regions have been analysed. The gene is composed of 7665 base pairs including two introns of 3708 and 2886 base pairs. Comparison of the 5'-flanking sequences of the human, bovine and mouse corticotropin-beta-lipotropin precursor genes reveals the presence of a highly conserved region, which contains sequences of 14-15 base pairs homologous with sequences located upstream of the mRNA start site of other glucocorticoid-regulated genes. PMID:6314261

  8. Characteristic features of the nucleotide sequences of yeast mitochondrial ribosomal protein genes as analyzed by computer program GeneMark.

    PubMed

    Isono, K; McIninch, J D; Borodovsky, M

    1994-01-01

    The nucleotide sequence data for yeast mitochondrial ribosomal protein (MRP) genes were analyzed by the computer program GeneMark which predicts the presence of likely genes in sequence data by calculating statistical biases in the appearance of consecutive nucleotides. The program uses a set of standard sequence data for this calculation. We used this program for the analysis of yeast nucleotide sequence data containing MRP genes, hoping to obtain information as to whether they share features in common that are different from other yeast genes. Sequence data sets for ordinary yeast genes and for 27 known MRP genes were used. The MRP genes were nicely predicted as likely genes regardless of the data sets used, whereas other yeast genes were predicted to be likely genes only when the data set for ordinary yeast genes was used. The assembled sequence data for chromosomes II, III, VIII and XI as well as the segmented data for chromosome V were analyzed in a similar manner. In addition to the known MRP genes, eleven ORF's were predicted to be likely MRP genes. Thus, the method seems very powerful in analyzing genes of heterologous origins. PMID:7719921

  9. Excision of plastid marker genes using directly repeated DNA sequences.

    PubMed

    Mudd, Elisabeth A; Madesis, Panagiotis; Avila, Elena Martin; Day, Anil

    2014-01-01

    Excision of marker genes using DNA direct repeats makes use of the predominant homologous recombination pathways present in the plastids of algae and plants. The method is simple, efficient, and widely applicable to plants and microalgae. Marker excision frequency is dependent on the length and number of directly repeated sequences. When two repeats are used a repeat size of greater than 600 bp promotes efficient excision of the marker gene. A wide variety of sequences can be used to make the direct repeats. Only a single round of transformation is required, and there is no requirement to introduce site-specific recombinases by retransformation or sexual crosses. Selection is used to maintain the marker and ensure homoplasmy of transgenic plastid genomes. Release of selection allows the accumulation of marker-free plastid genomes generated by marker excision, which is spontaneous, random, and a unidirectional process. Positive selection is provided by linking marker excision to restoration of the coding region of an herbicide resistance gene from two overlapping but incomplete coding regions. Cytoplasmic sorting allows the segregation of cells with marker-free transgenic plastids. The marker-free shoots resulting from direct repeat-mediated excision of marker genes have been isolated by vegetative propagation of shoots in the T0 generation. Alternatively, accumulation of marker-free plastid genomes during growth, development and flowering of T0 plants allows the collection of seeds that give rise to a high proportion of marker-free T1 seedlings. The simplicity and convenience of direct repeat excision facilitates its widespread use to isolate marker-free crops. PMID:24599849

  10. Modeling DNA sequence-based cis-regulatory gene networks.

    PubMed

    Bolouri, Hamid; Davidson, Eric H

    2002-06-01

    Gene network analysis requires computationally based models which represent the functional architecture of regulatory interactions, and which provide directly testable predictions. The type of model that is useful is constrained by the particular features of developmentally active cis-regulatory systems. These systems function by processing diverse regulatory inputs, generating novel regulatory outputs. A computational model which explicitly accommodates this basic concept was developed earlier for the cis-regulatory system of the endo16 gene of the sea urchin. This model represents the genetically mandated logic functions that the system executes, but also shows how time-varying kinetic inputs are processed in different circumstances into particular kinetic outputs. The same basic design features can be utilized to construct models that connect the large number of cis-regulatory elements constituting developmental gene networks. The ultimate aim of the network models discussed here is to represent the regulatory relationships among the genomic control systems of the genes in the network, and to state their functional meaning. The target site sequences of the cis-regulatory elements of these genes constitute the physical basis of the network architecture. Useful models for developmental regulatory networks must represent the genetic logic by which the system operates, but must also be capable of explaining the real time dynamics of cis-regulatory response as kinetic input and output data become available. Most importantly, however, such models must display in a direct and transparent manner fundamental network design features such as intra- and intercellular feedback circuitry; the sources of parallel inputs into each cis-regulatory element; gene battery organization; and use of repressive spatial inputs in specification and boundary formation. Successful network models lead to direct tests of key architectural features by targeted cis-regulatory analysis. PMID

  11. dcp gene of Escherichia coli: cloning, sequencing, transcript mapping, and characterization of the gene product.

    PubMed Central

    Henrich, B; Becker, S; Schroeder, U; Plapp, R

    1993-01-01

    Dipeptidyl carboxypeptidase is a C-terminal exopeptidase of Escherichia coli. We have isolated the respective gene, dcp, from a low-copy-number plasmid library by its ability to complement a dcp mutation preventing the utilization of the unique substrate N-benzoyl-L-glycyl-L-histidyl-L-leucine. Sequence analysis of a 2.9-kb DNA fragment revealed an open reading frame of 2,043 nucleotides which was assigned to the dcp gene by N-terminal amino acid sequencing and electrophoretic molecular mass determination of the purified dcp product. Transcript mapping by primer extension and S1 protection experiments verified the physiological significance of potential initiation and termination signals for dcp transcription and allowed the identification of a single species of monocistronic dcp mRNA. The codon usage pattern and the effects of elevated gene copy number indicated a relatively low level of dcp expression. The predicted amino acid sequence of dipeptidyl carboxypeptidase, containing a potential zinc-binding site, is highly homologous (78.8%) to the corresponding enzyme from Salmonella typhimurium. It also displays significant homology to the products of the S. typhimurium opdA and the E. coli prlC genes and to some metalloproteases from rats and Saccharomyces cerevisiae. No potential export signals could be inferred from the amino acid sequence. Dipeptidyl carboxypeptidase was enriched 80-fold from crude extracts of E. coli and used to investigate some of its biochemical and biophysical properties. Images PMID:8226676

  12. Nucleotide sequence of the gene encoding the repressor for the histidine utilization genes of Pseudomonas putida.

    PubMed Central

    Allison, S L; Phillips, A T

    1990-01-01

    The hutC gene of Pseudomonas putida encodes a repressor which, in combination with the inducer urocanate, regulates expression of the five structural genes necessary for conversion of histidine to glutamate, ammonia, and formate. The nucleotide sequence of the hutC region was determined and found to contain two open reading frames which overlapped by one nucleotide. The first open reading frame (ORF1) appeared to encode a 27,648-dalton protein of 248 amino acids whose sequence strongly resembled that of the hut repressor of Klebsiella aerogenes (A. Schwacha and R. A. Bender, J. Bacteriol. 172:5477-5481, 1990) and contained a helix-turn-helix motif that could be involved in operator binding. The gene was preceded by a sequence which was nearly identical to that of the operator site located upstream of hutU which controls transcription of the hutUHIG genes. The operator near hutC would presumably allow the hut repressor to regulate its own synthesis as well as the expression of the divergent hutF gene. A second open reading frame (ORF2) would encode a 21,155-dalton protein, but because this region could be deleted with only a slight effect on repressor activity, it is not likely to be involved in repressor function or structure. PMID:2203753

  13. Molecular cloning, sequence identification, and gene expression analysis of bovine ADCY2 gene.

    PubMed

    Li, Y X; Jin, H G; Yan, C G; Ren, C Y; Jiang, C J; Jin, C D; Seo, K S; Jin, X

    2014-06-01

    Adenylyl cyclase 2 (ADCY2), a class B member of adenylyl cyclases, is important in accelerating phosphor-acidification as well as glycogen synthesis and breakdown. Given its distinct role in flesh tenderization after butchering, we cloned and sequenced the ADCY2 gene from Yanbian cattle and assessed its expression in bovine tissues. A 2947 bp nucleotide sequence representing the full-length cDNA of bovine ADCY2 gene was obtained by 5' and 3' remote analysis computations for gene expression. Analyses of the putative protein sequence showed that ADCY2 had high homology among species, except with the non-mammal Oreochromis niloticus. Gene structural domain analyses in humans and rats indicated that the ADCY2 protein had no flaw; only the transmembrane domain was reduced and the CYCc structure domain was shortened. Assessment of ADCY2 expression in bovine tissues by real-time PCR showed that the highest expression was in the testes, followed by the longissimus dorsi, tensor fasciae latae, and latissimus dorsi. These data will serve as a foundation for further insight into the cattle ADCY2 gene. PMID:24797538

  14. Membrane gene ontology bias in sequencing and microarray obtained by housekeeping-gene analysis.

    PubMed

    Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun

    2016-01-10

    Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. PMID:26407868

  15. Morganella morganii urease: purification, characterization, and isolation of gene sequences.

    PubMed

    Hu, L T; Nicholson, E B; Jones, B D; Lynch, M J; Mobley, H L

    1990-06-01

    Morganella morganii, a very common cause of catheter-associated bacteriuria, was previously classified with the genus Proteus on the basis of urease production. M. morganii constitutively synthesizes a urease distinct from that of other uropathogens. The enzyme, purified 175-fold by passage through DEAE-Sepharose, phenyl-Sepharose, Mono-Q, and Superose 6 chromatography resins, was found to have a native molecular size of 590 kilodaltons and was composed of three distinct subunits with apparent molecular sizes of 63, 15, and 6 kilodaltons, respectively. Amino-terminal analysis of the subunit polypeptides revealed a high degree of conservation of amino acid sequence between jack bean and Proteus mirabilis ureases. Km for urea equalled 0.8 mM. Antiserum prepared against purified enzyme inhibited activity by 43% at a 1:2 dilution after 1 h of incubation. All urease activity was immunoprecipitated from cytosol by a 1:16 dilution. Antiserum did not precipitate ureases of other species except for one Providencia rettgeri strain but did recognize the large subunits of ureases of Providencia and Proteus species on Western blots (immunoblots). Thirteen urease-positive cosmid clones of Morganella chromosomal DNA shared a 3.5-kilobase (kb) BamHI fragment. Urease gene sequences were localized to a 7.1-kb EcoRI-SalI fragment. Tn5 mutagenesis revealed that between 3.3 and 6.6 kb of DNA were necessary for enzyme activity. A Morganella urease DNA probe did not hybridize with gene sequences of other species tested. Morganella urease antiserum recognized identical subunit polypeptides on Western blots of cytosol from the wild-type strain and Escherichia coli bearing the recombinant clone which corresponded to those seen in denatured urease. Although the wild-type strain and recombinant clone produced equal amounts of urease protein, the clone produced less than 1% of the enzyme activity of the wild-type strain. PMID:2345135

  16. Cloning, expression and sequencing of Helicobacter felis urease genes.

    PubMed

    Ferrero, R L; Labigne, A

    1993-07-01

    Urease genes from Helicobacter felis were cloned and expressed in Escherichia coli cells. A genomic bank of Sau3A-digested H. felis chromosomal DNA was created using a cosmid vector. Cosmid clones were screened for urease activity following subculture on a nitrogen-limiting medium. Subcloning of DNA from an urease-positive cosmid clone led to the construction of pILL205 (9.5 kb) which conferred a urease activity of 1.2 +/- 0.5 mumole urea min-1 mg-1 bacterial protein to E. coli HB101 bacteria grown on a nitrogen-limiting medium. Random mutagenesis using a MiniTn3-Km transposable element permitted the identification of three DNA regions on pILL205 which were necessary for the expression of an urease-positive phenotype in E. coli clones. To localize the putative structural genes of H. felis on pILL205, extracts of clones harbouring the mutated copies of the plasmid were analysed by Western blotting with anti-H. felis rabbit serum. One mutant clone did not synthesize the putative UreB subunit of H. felis urease and it was postulated that the transposable element had disrupted the corresponding structural gene. By sequencing the DNA region adjacent to the transposon insertion site two open reading frames, designated ureA and ureB, were identified. The polypeptides encoded by these genes had calculated molecular masses of 26,074 and 61,663 Da, respectively, and shared 73.5% and 88.2% identity with the corresponding gene products of Helicobacter pylori urease. PMID:8412683

  17. The nucleotide sequence of the tnpA gene completes the sequence of the Pseudomonas transposon Tn501.

    PubMed Central

    Brown, N L; Winnie, J N; Fritzinger, D; Pridmore, R D

    1985-01-01

    The nucleotide sequence of the gene (tnpA) which codes for the transposase of transposon Tn501 has been determined. It contains an open reading frame for a polypeptide of Mr = 111,500, which terminates within the inverted repeat sequence of the transposon. The reading frame would be transcribed in the same direction as the mercury-resistance genes and the tnpR gene. The amino acid sequence predicted from this reading frame shows 32% identity with that of the transposase of the related transposon Tn3. The C-terminal regions of these two polypeptides show slightly greater homology than the N-terminal regions when conservative amino acid substitutions are considered. With this sequence determination, the nucleotide sequence of Tn501 is fully defined. The main features of the sequence are briefly presented. PMID:2994007

  18. Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley.

    PubMed

    Rostoks, Nils; Park, Yong-Jin; Ramakrishna, Wusirika; Ma, Jianxin; Druka, Arnis; Shiloff, Bryan A; SanMiguel, Phillip J; Jiang, Zeyu; Brueggeman, Robert; Sandhu, Devinder; Gill, Kulvinder; Bennetzen, Jeffrey L; Kleinhofs, Andris

    2002-05-01

    Barley (Hordeum vulgare L.) is one of the most important large-genome cereals with extensive genetic resources available in the public sector. Studies of genome organization in barley have been limited primarily to genetic markers and sparse sequence data. Here we report sequence analysis of 417.5 kb DNA from four BAC clones from different genomic locations. Sequences were analyzed with respect to gene content, the arrangement of repetitive sequences and the relationship of gene density to recombination frequencies. Gene densities ranged from 1 gene per 12 kb to 1 gene per 103 kb with an average of 1 gene per 21 kb. In general, genes were organized into islands separated by large blocks of nested retrotransposons. Single genes in apparent isolation were also found. Genes occupied 11% of the total sequence, LTR retrotransposons and other repeated elements accounted for 51.9% and the remaining 37.1% could not be annotated. PMID:12021850

  19. Microdiversity of extracellular enzyme genes among sequenced prokaryotic genomes

    PubMed Central

    Zimmerman, Amy E; Martiny, Adam C; Allison, Steven D

    2013-01-01

    Understanding the relationship between prokaryotic traits and phylogeny is important for predicting and modeling ecological processes. Microbial extracellular enzymes have a pivotal role in nutrient cycling and the decomposition of organic matter, yet little is known about the phylogenetic distribution of genes encoding these enzymes. In this study, we analyzed 3058 annotated prokaryotic genomes to determine which taxa have the genetic potential to produce alkaline phosphatase, chitinase and β-N-acetyl-glucosaminidase enzymes. We then evaluated the relationship between the genetic potential for enzyme production and 16S rRNA phylogeny using the consenTRAIT algorithm, which calculated the phylogenetic depth and corresponding 16S rRNA sequence identity of clades of potential enzyme producers. Nearly half (49.2%) of the genomes analyzed were found to be capable of extracellular enzyme production, and these were non-randomly distributed across most prokaryotic phyla. On average, clades of potential enzyme-producing organisms had a maximum phylogenetic depth of 0.008004–0.009780, though individual clades varied broadly in both size and depth. These values correspond to a minimum 16S rRNA sequence identity of 98.04–98.40%. The distribution pattern we found is an indication of microdiversity, the occurrence of ecologically or physiologically distinct populations within phylogenetically related groups. Additionally, we found positive correlations among the genes encoding different extracellular enzymes. Our results suggest that the capacity to produce extracellular enzymes varies at relatively fine-scale phylogenetic resolution. This variation is consistent with other traits that require a small number of genes and provides insight into the relationship between taxonomy and traits that may be useful for predicting ecological function. PMID:23303371

  20. Identification of novel hereditary cancer genes by whole exome sequencing.

    PubMed

    Sokolenko, Anna P; Suspitsin, Evgeny N; Kuligina, Ekatherina Sh; Bizin, Ilya V; Frishman, Dmitrij; Imyanitov, Evgeny N

    2015-12-28

    Whole exome sequencing (WES) provides a powerful tool for medical genetic research. Several dozens of WES studies involving patients with hereditary cancer syndromes have already been reported. WES led to breakthrough in understanding of the genetic basis of some exceptionally rare syndromes; for example, identification of germ-line SMARCA4 mutations in patients with ovarian hypercalcemic small cell carcinomas indeed explains a noticeable share of familial aggregation of this disease. However, studies on common cancer types turned out to be more difficult. In particular, there is almost a dozen of reports describing WES analysis of breast cancer patients, but none of them yet succeeded to reveal a gene responsible for the significant share of missing heritability. Virtually all components of WES studies require substantial improvement, e.g. technical performance of WES, interpretation of WES results, mode of patient selection, etc. Most of contemporary investigations focus on genes with autosomal dominant mechanism of inheritance; however, recessive and oligogenic models of transmission of cancer susceptibility also need to be considered. It is expected that the list of medically relevant tumor-predisposing genes will be rapidly expanding in the next few years. PMID:26427841

  1. Transcriptome Sequencing and Positive Selected Genes Analysis of Bombyx mandarina

    PubMed Central

    Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  2. Transcriptome sequencing and positive selected genes analysis of Bombyx mandarina.

    PubMed

    Cheng, Tingcai; Fu, Bohua; Wu, Yuqian; Long, Renwen; Liu, Chun; Xia, Qingyou

    2015-01-01

    The wild silkworm Bombyx mandarina is widely believed to be an ancestor of the domesticated silkworm, Bombyx mori. Silkworms are often used as a model for studying the mechanism of species domestication. Here, we performed transcriptome sequencing of the wild silkworm using an Illumina HiSeq2000 platform. We produced 100,004,078 high-quality reads and assembled them into 50,773 contigs with an N50 length of 1764 bp and a mean length of 941.62 bp. A total of 33,759 unigenes were identified, with 12,805 annotated in the Nr database, 8273 in the Pfam database, and 9093 in the Swiss-Prot database. Expression profile analysis found significant differential expression of 1308 unigenes between the middle silk gland (MSG) and posterior silk gland (PSG). Three sericin genes (sericin 1, sericin 2, and sericin 3) were expressed specifically in the MSG and three fibroin genes (fibroin-H, fibroin-L, and fibroin/P25) were expressed specifically in the PSG. In addition, 32,297 Single-nucleotide polymorphisms (SNPs) and 361 insertion-deletions (INDELs) were detected. Comparison with the domesticated silkworm p50/Dazao identified 5,295 orthologous genes, among which 400 might have experienced or to be experiencing positive selection by Ka/Ks analysis. These data and analyses presented here provide insights into silkworm domestication and an invaluable resource for wild silkworm genomics research. PMID:25806526

  3. Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

    Technology Transfer Automated Retrieval System (TEKTRAN)

    : Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...

  4. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses.

    PubMed

    Stelzer, Gil; Rosen, Naomi; Plaschkes, Inbar; Zimmerman, Shahar; Twik, Michal; Fishilevich, Simon; Stein, Tsippi Iny; Nudel, Ron; Lieder, Iris; Mazor, Yaron; Kaplan, Sergey; Dahary, Dvir; Warshawsky, David; Guan-Golan, Yaron; Kohn, Asher; Rappaport, Noa; Safran, Marilyn; Lancet, Doron

    2016-01-01

    GeneCards, the human gene compendium, enables researchers to effectively navigate and inter-relate the wide universe of human genes, diseases, variants, proteins, cells, and biological pathways. Our recently launched Version 4 has a revamped infrastructure facilitating faster data updates, better-targeted data queries, and friendlier user experience. It also provides a stronger foundation for the GeneCards suite of companion databases and analysis tools. Improved data unification includes gene-disease links via MalaCards and merged biological pathways via PathCards, as well as drug information and proteome expression. VarElect, another suite member, is a phenotype prioritizer for next-generation sequencing, leveraging the GeneCards and MalaCards knowledgebase. It automatically infers direct and indirect scored associations between hundreds or even thousands of variant-containing genes and disease phenotype terms. VarElect's capabilities, either independently or within TGex, our comprehensive variant analysis pipeline, help prepare for the challenge of clinical projects that involve thousands of exome/genome NGS analyses. © 2016 by John Wiley & Sons, Inc. PMID:27322403

  5. Next generation sequencing in synovial sarcoma reveals novel gene mutations.

    PubMed

    Vlenterie, Myrella; Hillebrandt-Roeffen, Melissa H S; Flucke, Uta E; Groenen, Patricia J T A; Tops, Bastiaan B J; Kamping, Eveline J; Pfundt, Rolph; de Bruijn, Diederik R H; Geurts van Kessel, Ad H M; van Krieken, Han J H J M; van der Graaf, Winette T A; Versleijen-Jonkers, Yvonne M H

    2015-10-27

    Over 95% of all synovial sarcomas (SS) share a unique translocation, t(X;18), however, they show heterogeneous clinical behavior. We analyzed multiple SS to reveal additional genetic alterations besides the translocation. Twenty-six SS from 22 patients were sequenced for 409 cancer-related genes using the Comprehensive Cancer Panel (Life Technologies, USA) on an Ion Torrent platform. The detected variants were verified by Sanger sequencing and compared to matched normal DNAs. Copy number variation was assessed in six tumors using the Oncoscan array (Affymetrix, USA). In total, eight somatic mutations were detected in eight samples. These mutations have not been reported previously in SS. Two of these, in KRAS and CCND1, represent known oncogenic mutations in other malignancies. Additional mutations were detected in RNF213, SEPT9, KDR, CSMD3, MLH1 and ERBB4. DNA alterations occurred more often in adult tumors. A distinctive loss of 6q was found in a metastatic lesion progressing under pazopanib, but not in the responding lesion. Our results emphasize t(X;18) as a single initiating event in SS and as the main oncogenic driver. Our results also show the occurrence of additional genetic events, mutations or chromosomal aberrations, occurring more frequently in SS with an onset in adults. PMID:26415226

  6. Next generation sequencing in synovial sarcoma reveals novel gene mutations

    PubMed Central

    Vlenterie, Myrella; Hillebrandt-Roeffen, Melissa H.S.; Flucke, Uta E.; Groenen, Patricia J.T.A.; Tops, Bastiaan B.J.; Kamping, Eveline J.; Pfundt, Rolph; de Bruijn, Diederik R.H.; van Kessel, Ad H.M. Geurts; van Krieken, Han J.H.J.M.; van der Graaf, Winette T.A.; Versleijen-Jonkers, Yvonne M.H.

    2015-01-01

    Over 95% of all synovial sarcomas (SS) share a unique translocation, t(X;18), however, they show heterogeneous clinical behavior. We analyzed multiple SS to reveal additional genetic alterations besides the translocation. Twenty-six SS from 22 patients were sequenced for 409 cancer-related genes using the Comprehensive Cancer Panel (Life Technologies, USA) on an Ion Torrent platform. The detected variants were verified by Sanger sequencing and compared to matched normal DNAs. Copy number variation was assessed in six tumors using the Oncoscan array (Affymetrix, USA). In total, eight somatic mutations were detected in eight samples. These mutations have not been reported previously in SS. Two of these, in KRAS and CCND1, represent known oncogenic mutations in other malignancies. Additional mutations were detected in RNF213, SEPT9, KDR, CSMD3, MLH1 and ERBB4. DNA alterations occurred more often in adult tumors. A distinctive loss of 6q was found in a metastatic lesion progressing under pazopanib, but not in the responding lesion. Our results emphasize t(X;18) as a single initiating event in SS and as the main oncogenic driver. Our results also show the occurrence of additional genetic events, mutations or chromosomal aberrations, occurring more frequently in SS with an onset in adults. PMID:26415226

  7. Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

    PubMed Central

    Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

    1985-01-01

    Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512

  8. Genome-wide gene-gene interaction analysis for next-generation sequencing.

    PubMed

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-03-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972

  9. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

    SciTech Connect

    Yilmaz, P.; Kottmann, R.; Field, D.; Knight, R.; Cole, J. R.; Amaral-Zettler, L.; Gilbert, J. A.

    2011-05-01

    Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences - the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental packages' apply to any genome sequence of known origin and can be used in combination with MIMARKS and other GSC checklists. Finally, to establish a unified standard for describing sequence data and to provide a single point of entry for the scientific community to access and learn about GSC checklists, we present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere.

  10. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system

    PubMed Central

    Jenior, Matthew L.; Koumpouras, Charles C.; Westcott, Sarah L.; Highlander, Sarah K.

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  11. Sequencing 16S rRNA gene fragments using the PacBio SMRT DNA sequencing system.

    PubMed

    Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K

    2016-01-01

    Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting. PMID:27069806

  12. Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

    PubMed Central

    Ulpinnis, Chris; Scholz, Uwe; Altmann, Thomas

    2015-01-01

    A major goal of maize genomic research is to identify sequence polymorphisms responsible for phenotypic variation in traits of economic importance. Large-scale detection of sequence variation is critical for linking genes, or genomic regions, to phenotypes. However, due to its size and complexity, it remains expensive to generate whole genome sequences of sufficient coverage for divergent maize lines, even with access to next generation sequencing (NGS) technology. Because methods involving reduction of genome complexity, such as genotyping-by-sequencing (GBS), assess only a limited fraction of sequence variation, targeted sequencing of selected genomic loci offers an attractive alternative. We therefore designed a sequence capture assay to target 29 Mb genomic regions and surveyed a total of 4,648 genes possibly affecting biomass production in 21 diverse inbred maize lines (7 flints, 14 dents). Captured and enriched genomic DNA was sequenced using the 454 NGS platform to 19.6-fold average depth coverage, and a broad evaluation of read alignment and variant calling methods was performed to select optimal procedures for variant discovery. Sequence alignment with the B73 reference and de novo assembly identified 383,145 putative single nucleotide polymorphisms (SNPs), of which 42,685 were non-synonymous alterations and 7,139 caused frameshifts. Presence/absence variation (PAV) of genes was also detected. We found that substantial sequence variation exists among genomic regions targeted in this study, which was particularly evident within coding regions. This diversification has the potential to broaden functional diversity and generate phenotypic variation that may lead to new adaptations and the modification of important agronomic traits. Further, annotated SNPs identified here will serve as useful genetic tools and as candidates in searches for phenotype-altering DNA variation. In summary, we demonstrated that sequencing of captured DNA is a powerful approach for

  13. Isolation of Hox cluster genes from insects reveals an accelerated sequence evolution rate.

    PubMed

    Hadrys, Heike; Simon, Sabrina; Kaune, Barbara; Schmitt, Oliver; Schöner, Anja; Jakob, Wolfgang; Schierwater, Bernd

    2012-01-01

    Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda) that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera). We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx) from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution. PMID:22685537

  14. Sequence of the dog immunoglobulin alpha and epsilon constant region genes

    SciTech Connect

    Patel, M.; Selinger, D.; Mark, G.E.; Hollis, G.F.; Hickey, G.J.

    1995-03-01

    The immunoglobulin alpha (IGHAC) and epsilon (IGHEC) germline constant region genes were isolated from a dog liver genomic DNA library. Sequence analysis indicates that the dog IGHEC gene is encoded by four exons spread out over 1.7 kilobases (kb). The IGHAC sequence encompasses 1.5 kb and includes all three constant region coding exons. The complete exon/intron sequence of these genes is described. 28 refs., 2 figs., 2 tabs.

  15. Cloning and nucleotide sequence of the Lactobacillus casei lactate dehydrogenase gene.

    PubMed Central

    Kim, S F; Baek, S J; Pack, M Y

    1991-01-01

    An allosteric L-(+)-lactate dehydrogenase gene of Lactobacillus casei ATCC 393 was cloned in Escherichia coli, and the nucleotide sequence of the gene was determined. The gene was composed of an open reading frame of 981 bp, starting with a GTG codon and ending with a TAA codon. The sequences for the promoter and ribosome binding site were identified, and a sequence for a structure resembling a rho-independent transcription terminator was also found. Images PMID:1768113

  16. A next-generation sequencing gene panel (MiamiOtoGenes) for comprehensive analysis of deafness genes.

    PubMed

    Tekin, Demet; Yan, Denise; Bademci, Guney; Feng, Yong; Guo, Shengru; Foster, Joseph; Blanton, Susan; Tekin, Mustafa; Liu, Xuezhong

    2016-03-01

    Extreme genetic heterogeneity along with remarkable variation in the distribution of causative variants across in different ethnicities makes single gene testing inefficient for hearing loss. We developed a custom capture/next-generation sequencing gene panel of 146 known deafness genes with a total target size of approximately 1 MB. The genes were identified by searching databases including Hereditary Hearing Loss Homepage, the Human Genome Mutation Database (HGMD), Online Mendelian Inheritance in Man (OMIM) and most recent peer-reviewed publications related to the genetics of deafness. The design covered all coding exons, UTRs and 25 bases of intronic flanking sequences for each exon. To validate our panel, we used 6 positive controls with variants in known deafness genes and 8 unsolved samples from individuals with hearing loss. Mean coverage of the targeted exons was 697X. On average, each sample had 99.8%, 96.2% and 92.7% of the targeted region coverage of 1X, 50X and 100X reads, respectively. Analysis detected all known variants in nuclear genes. These results prove the accuracy and reliability of the custom capture experiment. PMID:26850479

  17. Tandem gene arrays in Trypanosoma brucei: Comparative phylogenomic analysis of duplicate sequence variation

    PubMed Central

    Jackson, Andrew P

    2007-01-01

    Background The genome sequence of the protistan parasite Trypanosoma brucei contains many tandem gene arrays. Gene duplicates are created through tandem duplication and are expressed through polycistronic transcription, suggesting that the primary purpose of long, tandem arrays is to increase gene dosage in an environment where individual gene promoters are absent. This report presents the first account of the tandem gene arrays in the T. brucei genome, employing several related genome sequences to establish how variation is created and removed. Results A systematic survey of tandem gene arrays showed that substantial sequence variation existed across the genome; variation from different regions of an array often produced inconsistent phylogenetic affinities. Phylogenetic relationships of gene duplicates were consistent with concerted evolution being a widespread homogenising force. However, tandem duplicates were not usually identical; therefore, any homogenising effect was coincident with divergence among duplicates. Allelic gene conversion was detected using various criteria and was apparently able to both remove and introduce sequence variation. Tandem arrays containing structural heterogeneity demonstrated how sequence homogenisation and differentiation can occur within a single locus. Conclusion The use of multiple genome sequences in a comparative analysis of tandem gene arrays identified substantial sequence variation among gene duplicates. The distribution of sequence variation is determined by a dynamic balance of conservative and innovative evolutionary forces. Gene trees from various species showed that intraspecific duplicates evolve in concert, perhaps through frequent gene conversion, although this does not prevent sequence divergence, especially where structural heterogeneity physically separates a duplicate from its neighbours. In describing dynamics of sequence variation that have consequences beyond gene dosage, this survey provides a basis for

  18. In Silico Sequence Analysis Reveals New Characteristics of Fungal NADPH Oxidase Genes

    PubMed Central

    Détry, Nicolas; Choi, Jaeyoung; Kuo, Hsiao-Che; Asiegbu, Fred O.

    2014-01-01

    NADPH oxidases (Noxes), transmembrane proteins found in most eukaryotic species, generate reactive oxygen species and are thereby involved in essential biological processes. However, the fact that genes encoding ferric reductases and ferric-chelate reductases share high sequence similarities and domains with Nox genes represents a challenge for bioinformatic approaches used to identify Nox-encoding genes. Further, most studies on fungal Nox genes have focused mainly on functionality, rather than sequence properties, and consequently clear differentiation among the various Nox isoforms has not been achieved. We conducted an extensive sequence analysis to identify putative Nox genes among 34 eukaryotes, including 28 fungal genomes and one Oomycota genome. Analyses were performed with respect to phylogeny, transmembrane helices, di-histidine distance and glycosylation. Our analyses indicate that the sequence properties of fungal Nox genes are different from those of human and plant Nox genes, thus providing novel insight that will enable more accurate identification and characterization of fungal Nox genes. PMID:25346600

  19. Cloning and sequencing of the alcohol dehydrogenase II gene from Zymomonas mobilis

    DOEpatents

    Ingram, Lonnie O.; Conway, Tyrrell

    1992-01-01

    The alcohol dehydrogenase II gene from Zymomonas mobilis has been cloned and sequenced. This gene can be expressed at high levels in other organisms to produce acetaldehyde or to convert acetaldehyde to ethanol.

  20. GeneInfoMiner--a web server for exploring biomedical literature using batch sequence ID.

    PubMed

    Xuan, Weijian; Watson, Stanley J; Meng, Fan

    2005-08-15

    GeneInfoMiner is a web-based system for searching Medline abstracts using sequence ID lists such as GenBank accession numbers derived from high-throughput experiments. It will map query results to MeSH topics to facilitate the exploration of the biological significance of the sequence ID lists. GeneInfoMiner is based on a custom gene and protein name identification engine that can map gene and protein names to important molecular biology databases. PMID:15994195

  1. ALS1 and ALS3 gene expression and biofilm formation in Candida albicans isolated from vulvovaginal candidiasis

    PubMed Central

    Roudbarmohammadi, Shahla; Roudbary, Maryam; Bakhshi, Bita; Katiraee, Farzad; Mohammadi, Rasoul; Falahati, Mehraban

    2016-01-01

    Background: A cluster of genes are involved in the pathogenesis and adhesion of Candida albicans to mucosa and epithelial cells in the vagina, the important of which is agglutinin-like sequence (ALS) genes. As well as vaginitis is a significant health problem among women, the antifungal resistance of Candida species is continually increasing. This cross-sectional study investigates the expression of ALS1 and ALS3 genes and biofilm formation in C. albicans isolate isolated from vaginitis. Materials and Methods: Fifty-three recognized isolates of C. albicans were collected from women with recurrent vulvovaginal candidiasis in Iran, cultured on sabouraud dextrose agar, and then examined for gene expression. Total messenger RNA (mRNA) extracted from C. albicans isolates and complementary DNA synthesized using reverse transcriptase enzyme. Reverse transcription-polymerase chain reaction (RT-PCR) using specific primer was used to evaluate the expression of ALS1 and ALS3 through housekeeping (ACT1) genes. 3-(4,5-dimethyl-2-thiazyl)-2,5-diphenyl-2H-tetrazolium bromide assay was performed to assess adherence capacity and biofilm formation in the isolated. Results: Forty isolates (75.8%) expressed ALS1 and 41 isolates (77.7%) expressed ALS3 gene. Moreover, 39 isolates (74%) were positive for both ALS1 and ALS3 mRNA by the RT-PCR. Adherence capability in isolates with ALS1 or ALS3 genes expression was greater than the control group (with any gene expression), besides, it was significantly for the most in the isolates that expressed both ALS1 and ALS3 genes simultaneously. Conclusion: The results attained indicated that there is an association between the expression of ALS1 and ALS3 genes and fluconazole resistance in C. albicans. A considerable percent of the isolates expressing the ALS1 and ALS3 genes may have contributed to their adherence to vagina and biofilm formation. PMID:27376044

  2. Targeted RNA Sequencing Assay to Characterize Gene Expression and Genomic Alterations.

    PubMed

    Martin, Dorrelyn P; Miya, Jharna; Reeser, Julie W; Roychowdhury, Sameek

    2016-01-01

    RNA sequencing (RNAseq) is a versatile method that can be utilized to detect and characterize gene expression, mutations, gene fusions, and noncoding RNAs. Standard RNAseq requires 30 - 100 million sequencing reads and can include multiple RNA products such as mRNA and noncoding RNAs. We demonstrate how targeted RNAseq (capture) permits a focused study on selected RNA products using a desktop sequencer. RNAseq capture can characterize unannotated, low, or transiently expressed transcripts that may otherwise be missed using traditional RNAseq methods. Here we describe the extraction of RNA from cell lines, ribosomal RNA depletion, cDNA synthesis, preparation of barcoded libraries, hybridization and capture of targeted transcripts and multiplex sequencing on a desktop sequencer. We also outline the computational analysis pipeline, which includes quality control assessment, alignment, fusion detection, gene expression quantification and identification of single nucleotide variants. This assay allows for targeted transcript sequencing to characterize gene expression, gene fusions, and mutations. PMID:27585245

  3. Analysis of human immunodeficiency virus type 1 nef gene sequences present in vivo.

    PubMed Central

    Shugars, D C; Smith, M S; Glueck, D H; Nantermet, P V; Seillier-Moiseiwitsch, F; Swanstrom, R

    1993-01-01

    The nef genes of the human immunodeficiency viruses type 1 and 2 (HIV-1 and HIV-2) and the related simian immunodeficiency viruses (SIVs) encode a protein (Nef) whose role in virus replication and cytopathicity remains uncertain. As an attempt to elucidate the function of nef, we characterized the nucleotide and corresponding protein sequences of naturally occurring nef genes obtained from several HIV-1-infected individuals. A consensus Nef sequence was derived and used to identify several features that were highly conserved among the Nef sequences. These features included a nearly invariant myristylation signal, regions of sequence polymorphism and variable duplication, a region with an acidic charge, a (Pxx)4 repeat sequence, and a potential protein kinase C phosphorylation site. Clustering of premature stop codons at position 124 was noted in 6 of the 54 Nef sequences. Further analysis revealed four stretches of residues that were highly conserved not only among the patient-derived HIV-1 Nef sequences, but also among the Nef sequences of HIV-2 and the SIVs, suggesting that Nef proteins expressed by these retroviruses are functionally equivalent. The "Nef-defining" sequences were used to evaluate the sequence alignments of known proteins reported to share sequence similarity with Nef sequences and to conduct additional computer-based searches for similar protein sequences. A gene encoding the consensus Nef sequence was also generated. This gene encodes a full-length Nef protein that should be a valuable tool in further studies of Nef function. Images PMID:8043040

  4. H3 and H4 histone cDNA sequences from Xenopus: a sequence comparison of H4 genes.

    PubMed Central

    Turner, P C; Woodland, H R

    1982-01-01

    Ovarian poly (A) + RNA from Xenopus laevis and Xenopus borealis was used to construct two cDNA libraries which were screened for histone sequences. cDNA clones to H4 mRNA were obtained from both species and an H3 cDNA clone from Xenopus laevis. The complete DNA sequences of these clones have been determined and are presented. These new sequences are compared with other H3 and H4 DNA sequences both in the coding and 3' noncoding regions. We find that there is considerable non-random codon usage in ten H4 genes. In addition there are some sequence similarities in the 3' noncoding regions of H3 and H4 genes. PMID:6896750

  5. Short DNA sequences inserted for gene targeting can accidentally interfere with off-target gene expression.

    PubMed

    Meier, Ingo D; Bernreuther, Christian; Tilling, Thomas; Neidhardt, John; Wong, Yong Wee; Schulze, Christian; Streichert, Thomas; Schachner, Melitta

    2010-06-01

    Targeting of genes in mice, a key approach to study development and disease, often leaves a neo cassette, loxP, or FRT sites inserted in the mouse genome. Insertion of neo can influence the expression of neighboring genes, but similar effects have not been reported for loxP sites. We therefore performed microarray analyses of mice in which the Ncam or the Tnr gene were targeted either by insertion of neo or loxP/FRT sites. In the case of Ncam, neo, but not loxP/FRT insertion, led to a 2-fold reduction in mRNA levels of 3 genes located at distances between 0.2 and 3.1 Mb from the target. In contrast, after introduction of loxP/FRT sites into introns of Tnr, we observed a 2.5- to 4-fold reduction in the transcript level of the Gas5 gene, 1.1 Mb away from Tnr, most probably due to disruption of a conserved regulatory element in Tnr. Insertion of short DNA sequences such as loxP/FRT can thus influence off-target mRNA levels if these sites are accidentally placed into regulatory elements. Our results imply that conditional knockout mice should be analyzed for genomic positional side effects that may influence the animals' phenotypes. PMID:20110269

  6. Sequence analysis of a cluster of twenty-one tRNA genes in Bacillus subtilis.

    PubMed Central

    Green, C J; Vold, B S

    1983-01-01

    The DNA sequence of a cluster of twenty-one tRNA genes distal to a rRNA gene set in B. subtilis was determined. None of the tRNA genes are repeated in the sequence. The only classes of tRNAs that are not represented are those for cysteine, glutamine, tryptophan, and tyrosine. Three of the tRNA genes in this cluster do not have the 3'-CCA sequence encoded in the gene. There is no RNA polymerase terminator sequence in the region between the 5S gene and the first tRNA gene or within the tRNA gene cluster. A terminator sequence was found directly after the last tRNA gene. This rRNA and tRNA gene cluster probably represents one transcriptional unit. However, there may be an RNA polymerase promoter site within this sequence, which raises some interesting questions concerning the regulation of transcription for these tRNA genes. PMID:6310512

  7. Nucleotide sequence of the nifH gene coding for nitrogen reductase in the acetic acid bacterium Acetobacter diazotrophicus.

    PubMed

    Franke, I H; Fegan, M; Hayward, A C; Sly, L I

    1998-01-01

    The nifH gene sequence of the nitrogen-fixing bacterium Acetobacter diazotrophicus was determined with the use of the polymerase chain reaction and universal degenerate oligonucleotide primers. The gene shows highest pair-wise similarity to the nifH gene of Azospirillum brasilense. The phylogenetic relationships of the nifH gene sequences were compared with those inferred from 16S rRNA gene sequences. Knowledge of the sequence of the nifH gene contributes to the growing database of nifH gene sequences, and will allow the detection of Acet. diazotrophicus from environmental samples with nifH gene-based primers. PMID:9489028

  8. Complete sequence and gene organization of the Nosema heliothidis ribosomal RNA gene region.

    PubMed

    Dong, Shinan; Shen, Zhongyuan; Zhu, Feng; Tang, Xudong; Xu, Li

    2011-01-01

    By sequencing the entire ribosomal RNA (rRNA) gene region of Nosema heliothidis isolated from cotton bollworm (Helicoverpa armigera), we showed that its gene organization is similar to the type species, Nosema bombycis: the 5'-large subunit rRNA (2,490 bp)-internal transcribed spacer (192 bp)-small subunit rRNA (1,232 bp)-intergenic spacer (274 bp)-5S rRNA (115 bp)-3'. We constructed two phylogenetic trees, analyzed phylogenetic relationships, examined rRNA organization of microsporidia, and compared the secondary structure of small subunit rRNA with closely related microsporidia. The latter two features may provide important information for the classification and phylogenetic analysis of microsporidia. PMID:21895841

  9. Use of gene sequence analyses and genome comparisons for yeast systematics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Detection, identification, and classification of yeasts has undergone a major transformation in the past decade and a half following application of gene sequence analyses and genome comparisons. Development of a database (barcode) of easily determined gene sequences from domains 1 and 2 of large sub...

  10. Cloning, sequencing and characterization of lipase genes from a polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...

  11. Identification of Candidate Genes in Rice for Resistance to Sheath Blight Disease by Whole Genome Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent advances in whole genome sequencing have allowed identification of genes for disease susceptibility in humans. The objective of our research was to exploit whole genome sequences of 13 rice (Oryza sativa L.) inbred lines to identify non-synonymous SNPs (nsSNPs) and candidate genes for resista...

  12. Tetrathiobacter kashmirensis Strain CA-1 16S rRNA gene complete sequence.

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This study used 1326 base pair 16S rRNA gene sequence methods to confirm the identification of a bacterium as Tetrathiobacter kashmirensis. Morphological, biochemical characteristics, and fatty acid profiles are consistent with the 16S rRNA gene sequence identification of the bacterium. The isolate...

  13. EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization

    PubMed Central

    Rackham, Owen J. L.; Shihab, Hashem A.; Johnson, Michael R.; Petretto, Enrico

    2015-01-01

    Methods to interpret personal genome sequences are increasingly required. Here, we report a novel framework (EvoTol) to identify disease-causing genes using patient sequence data from within protein coding-regions. EvoTol quantifies a gene's intolerance to mutation using evolutionary conservation of protein sequences and can incorporate tissue-specific gene expression data. We apply this framework to the analysis of whole-exome sequence data in epilepsy and congenital heart disease, and demonstrate EvoTol's ability to identify known disease-causing genes is unmatched by competing methods. Application of EvoTol to the human interactome revealed networks enriched for genes intolerant to protein sequence variation, informing novel polygenic contributions to human disease. PMID:25550428

  14. Complete mitochondrial genome DNA sequence for two ophiuroids and a holothuroid: the utility of protein gene sequence and gene maps in the analyses of deep deuterostome phylogeny.

    PubMed

    Scouras, Andrea; Beckenbach, Karen; Arndt, Allan; Smith, Michael J

    2004-04-01

    The complete mitochondrial genome sequences have been determined for the holothuroid Cucumaria miniata and two ophiuroid species Ophiopholis aculeata and Ophiura lütkeni. In addition, the nucleotide sequence of the mitochondrial protein-coding genes for the asteroid Pisaster ochraceus has been completed. Maximum-likelihood and LogDet distance analyses of concatenated protein-coding sequences produced a series of trees that did not conclusively support generally accepted models of echinoderm phylogeny. The ophiuroid data consistently demonstrated accelerated nucleotide divergence rates and lack of stationarity. This confounds the phylogenetic analyses. Molecular investigations using individual protein-coding gene alignments demonstrated that the cytochrome b gene exhibits the least deviation in rate and stationarity and generated some trees consistent with proposed echinoderm phylogenies. Phylogenies based on echinoderm mitochondrial gene rearrangements also proved problematic because of extensive variation in gene order between and within classes. A comparison of the two distinctive ophiuroid mitochondrial gene orders supports the hypothesis that O. lütkeni has a more derived mitochondrial gene order versus O. aculeata. The variation in the echinoderm mitochondrial gene maps reinforces the limitations of the application of mitochondrial gene rearrangements as a global phylogenetic tool. PMID:15019608

  15. Three Replicons of Rhizobium sp. Strain NGR234 Harbor Symbiotic Gene Sequences

    PubMed Central

    Flores, Margarita; Mavingui, Patrick; Girard, Lourdes; Perret, Xavier; Broughton, William J.; Martínez-Romero, Esperanza; Dávila, Guillermo; Palacios, Rafael

    1998-01-01

    Rhizobium sp. strain NGR234 contains three replicons: the symbiotic plasmid or pNGR234a, a megaplasmid (pNGR234b), and the chromosome. Symbiotic gene sequences not present in pNGR234a were analyzed by hybridization. DNA sequences homologous to the genes fixLJKNOPQGHIS were found on the chromosome, while sequences homologous to nodPQ and exoBDFLK were found on pNGR234b. PMID:9811668

  16. Identification of potential regulatory motifs in odorant receptor genes by analysis of promoter sequences

    PubMed Central

    Michaloski, Jussara S.; Galante, Pedro A.F.

    2006-01-01

    Mouse odorant receptors (ORs) are encoded by >1000 genes dispersed throughout the genome. Each olfactory neuron expresses one single OR gene, while the rest of the genes remain silent. The mechanisms underlying OR gene expression are poorly understood. Here, we investigated if OR genes share common cis-regulatory sequences in their promoter regions. We carried out a comprehensive analysis in which the upstream regions of a large number of OR genes were compared. First, using RLM-RACE, we generated cDNAs containing the complete 5′-untranslated regions (5′-UTRs) for a total number of 198 mouse OR genes. Then, we aligned these cDNA sequences to the mouse genome so that the 5′ structure and transcription start sites (TSSs) of the OR genes could be precisely determined. Sequences upstream of the TSSs were retrieved and browsed for common elements. We found DNA sequence motifs that are overrepresented in the promoter regions of the OR genes. Most motifs resemble O/E-like sites and are preferentially localized within 200 bp upstream of the TSSs. Finally, we show that these motifs specifically interact with proteins extracted from nuclei prepared from the olfactory epithelium, but not from brain or liver. Our results show that the OR genes share common promoter elements. The present strategy should provide information on the role played by cis-regulatory sequences in OR gene regulation. PMID:16902085

  17. Functional analysis and nucleotide sequence of the promoter region of the murine hck gene.

    PubMed Central

    Lock, P; Stanley, E; Holtzman, D A; Dunn, A R

    1990-01-01

    The structure and function of the promoter region and exon 1 of the murine hck gene have been characterized in detail. RNase protection analysis has established that hck transcripts initiate from heterogeneous start sites located within the hck gene. Fusion gene constructs containing hck 5'-flanking sequences and the bacterial Neor gene have been introduced into the hematopoietic cell lines FDC-P1 and WEHI-265 by using a self-inactivating retroviral vector. The transcriptional start sites of the fusion gene are essentially identical to those of the endogenous hck gene. Analysis of infected WEHI-265 cell lines treated with bacterial lipopolysaccharide (LPS) reveals a 3- to 5-fold elevation in the levels of endogenous hck mRNA and a 1.4- to 2.6-fold increase in the level of Neor fusion gene transcripts, indicating that hck 5'-flanking sequences are capable of conferring LPS responsiveness on the Neor gene. The 5'-flanking region of the hck gene contains sequences similar to an element which is thought to be involved in the LPS responsiveness of the class II major histocompatibility gene A alpha k. A subset of these sequences are also found in the 5'-flanking regions of other LPS-responsive genes. Moreover, this motif is related to the consensus binding sequence of NF-kappa B, a transcription factor which is known to be regulated by LPS. Images PMID:2388619

  18. Complete Sequence Construction of the Highly Repetitive Ribosomal RNA Gene Repeats in Eukaryotes Using Whole Genome Sequence Data.

    PubMed

    Agrawal, Saumya; Ganley, Austen R D

    2016-01-01

    The ribosomal RNA genes (rDNA) encode the major rRNA species of the ribosome, and thus are essential across life. These genes are highly repetitive in most eukaryotes, forming blocks of tandem repeats that form the core of nucleoli. The primary role of the rDNA in encoding rRNA has been long understood, but more recently the rDNA has been implicated in a number of other important biological phenomena, including genome stability, cell cycle, and epigenetic silencing. Noncoding elements, primarily located in the intergenic spacer region, appear to mediate many of these phenomena. Although sequence information is available for the genomes of many organisms, in almost all cases rDNA repeat sequences are lacking, primarily due to problems in assembling these intriguing regions during whole genome assemblies. Here, we present a method to obtain complete rDNA repeat unit sequences from whole genome assemblies. Limitations of next generation sequencing (NGS) data make them unsuitable for assembling complete rDNA unit sequences; therefore, the method we present relies on the use of Sanger whole genome sequence data. Our method makes use of the Arachne assembler, which can assemble highly repetitive regions such as the rDNA in a memory-efficient way. We provide a detailed step-by-step protocol for generating rDNA sequences from whole genome Sanger sequence data using Arachne, for refining complete rDNA unit sequences, and for validating the sequences obtained. In principle, our method will work for any species where the rDNA is organized into tandem repeats. This will help researchers working on species without a complete rDNA sequence, those working on evolutionary aspects of the rDNA, and those interested in conducting phylogenetic footprinting studies with the rDNA. PMID:27576718

  19. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant. PMID:26252423

  20. Cloning, characterization, and sequence of the yeast DNA topoisomerase I gene.

    PubMed Central

    Thrash, C; Bankier, A T; Barrell, B G; Sternglanz, R

    1985-01-01

    The structural gene for yeast DNA topoisomerase I (TOP1) has been cloned from two yeast genomic plasmid banks. Integration of a plasmid carrying the gene into the chromosome and subsequent genetic mapping shows that TOP1 is identical to the gene previously called MAK1. Seven top1 (mak1) mutants including gene disruptions are viable, demonstrating that DNA topoisomerase I is not essential for viability in yeast. A 3787-base-pair DNA fragment including the gene has been sequenced. The protein predicted from the DNA sequence has 769 amino acids and a molecular weight of 90,020. Images PMID:2989818

  1. Flagellar apparatus gene sequences of Aeromonas hydrophila AL09-73 isolate

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Flagellar apparatus genes of recent outbreak Aeromonas hydrophila AL09-73 isolate were sequenced and characterized. Total 28 flagellar genes were identified. The sizes of the genes range from 318 to 2001 nucleotides, which potentially encode different complex flagellar proteins. At nucleotide and...

  2. Turkey fecal microbial community structure and functional gene diversity revealed by 16S rRNA gene and metagenomic sequences.

    PubMed

    Lu, Jingrang; Domingo, Jorge Santo

    2008-10-01

    The primary goal of this study was to better understand the microbial composition and functional genetic diversity associated with turkey fecal communities. To achieve this, 16S rRNA gene and metagenomic clone libraries were sequenced from turkey fecal samples. The analysis of 382 16S rRNA gene sequences showed that the most abundant bacteria were closely related to Lactobacillales (47%), Bacillales (31%), and Clostridiales (11%). Actinomycetales, Enterobacteriales, and Bacteroidales sequences were also identified, but represented a smaller part of the community. The analysis of 379 metagenomic sequences showed that most clones were similar to bacterial protein sequences (58%). Bacteriophage (10%) and avian viruses (3%) sequences were also represented. Of all metagenomic clones potentially encoding for bacterial proteins, most were similar to low G+C Gram-positive bacterial proteins, particularly from Lactobacillales (50%), Bacillales (11%), and Clostridiales (8%). Bioinformatic analyses suggested the presence of genes encoding for membrane proteins, lipoproteins, hydrolases, and functional genes associated with the metabolism of nitrogen and sulfur containing compounds. The results from this study further confirmed the predominance of Firmicutes in the avian gut and highlight the value of coupling 16S rRNA gene and metagenomic sequencing data analysis to study the microbial composition of avian fecal microbial communities. PMID:18974945

  3. Nucleotide sequence, transcription and phylogeny of the gene encoding the superoxide dismutase of Sulfolobus acidocaldarius.

    PubMed

    Klenk, H P; Schleper, C; Schwass, V; Brudler, R

    1993-07-18

    The gene encoding the superoxide dismutase (SOD) of the thermophilic archaeon Sulfolobus acidocaldarius has been isolated and sequenced. Both the start site and the termination sites of the corresponding transcript were mapped. The deduced amino acid sequence of the protein is very similar to the sequence of manganese- or iron-containing SODs. Phylogenetic sequence analysis corroborated the monophyletic nature of the archaeal domain. PMID:8334170

  4. Cloning and sequencing of the allophycocyanin genes from Spirulina maxima (Cyanophyta)

    NASA Astrophysics Data System (ADS)

    Qin, Song; Hiroyuki, Kojima; Yoshikazu, Kawata; Shin-Ichi, Yano; Zeng, Cheng-Kui

    1998-03-01

    The genes coding for the α-and β-subunit of allophycocyanin ( apcA and apcB) from the cyanophyte Spirulina maxima were cloned and sequenced. The results revealed 44.4% of nucleotide sequence similarity and 30.4% of similarity of deduced amino acid sequence between them. The amino acid sequence identities between S. maxima and S. platensis are 99.4% for α subunit and 100% for β subunit.

  5. A Comprehensive Approach to Clustering of Expressed Human Gene Sequence: The Sequence Tag Alignment and Consensus Knowledge Base

    PubMed Central

    Miller, Robert T.; Christoffels, Alan G.; Gopalakrishnan, Chella; Burke, John; Ptitsyn, Andrey A.; Broveak, Tania R.; Hide, Winston A.

    1999-01-01

    The expressed human genome is being sequenced and analyzed by disparate groups producing disparate data. The majority of the identified coding portion is in the form of expressed sequence tags (ESTs). The need to discover exonic representation and expression forms of full-length cDNAs for each human gene is frustrated by the partial and variable quality nature of this data delivery. A highly redundant human EST data set has been processed into integrated and unified expressed transcript indices that consist of hierarchically organized human transcript consensi reflecting gene expression forms and genetic polymorphism within an index class. The expression index and its intermediate outputs include cleaned transcript sequence, expression, and alignment information and a higher fidelity subset, SANIGENE. The STACK_PACK clustering system has been applied to dbEST release 121598 (GenBank version 110). Sixty-four percent of 1,313,103 Homo sapiens ESTs are condensed into 143,885 tissue level multiple sequence clusters; linking through clone-ID annotations produces 68,701 total assemblies, such that 81% of the original input set is captured in a STACK multiple sequence or linked cluster. Indexing of alignments by substituent EST accession allows browsing of the data structure and its cross-links to UniGene. STACK metaclusters consolidate a greater number of ESTs by a factor of 1.86 with respect to the corresponding UniGene build. Fidelity comparison with genome reference sequence AC004106 demonstrates consensus expression clusters that reflect significantly lower spurious repeat sequence content and capture alternate splicing within a whole body index cluster and three STACK v.2.3 tissue-level clusters. Statistics of a staggered release whole body index build of STACK v.2.0 are presented. PMID:10568754

  6. Compilation of 5S rRNA and 5S rRNA gene sequences

    PubMed Central

    Specht, Thomas; Wolters, Jörn; Erdmann, Volker A.

    1990-01-01

    The BERLIN RNA DATABANK as of Dezember 31, 1989, contains a total of 667 sequences of 5S rRNAs or their genes, which is an increase of 114 new sequence entries over the last compilation (1). It covers sequences from 44 archaebacteria, 267 eubacteria, 20 plastids, 6 mitochondria, 319 eukaryotes and 11 eukaryotic pseudogenes. The hardcopy shows only the list (Table 1) of those organisms whose sequences have been determined. The BERLIN RNA DATABANK uses the format of the EMBL Nucleotide Sequence Data Library complemented by a Sequence Alignment (SA) field including secondary structure information. PMID:1692116

  7. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    PubMed Central

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2005-01-01

    We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085

  8. Characterization of promoter sequence of toll-like receptor genes in Vechur cattle

    PubMed Central

    Lakshmi, R.; Jayavardhanan, K. K.; Aravindakshan, T. V.

    2016-01-01

    Aim: To analyze the promoter sequence of toll-like receptor (TLR) genes in Vechur cattle, an indigenous breed of Kerala with the sequence of Bos taurus and access the differences that could be attributed to innate immune responses against bovine mastitis. Materials and Methods: Blood samples were collected from Jugular vein of Vechur cattle, maintained at Vechur cattle conservation center of Kerala Veterinary and Animal Sciences University, using an acid-citrate-dextrose anticoagulant. The genomic DNA was extracted, and polymerase chain reaction was carried out to amplify the promoter region of TLRs. The amplified product of TLR2, 4, and 9 promoter regions was sequenced by Sanger enzymatic DNA sequencing technique. Results: The sequence of promoter region of TLR2 of Vechur cattle with the B. taurus sequence present in GenBank showed 98% similarity and revealed variants for four sequence motifs. The sequence of the promoter region of TLR4 of Vechur cattle revealed 99% similarity with that of B. taurus sequence but not reveals significant variant in motifregions. However, two heterozygous loci were observed from the chromatogram. Promoter sequence of TLR9 gene also showed 99% similarity to B. taurus sequence and revealed variants for four sequence motifs. Conclusion: The results of this study indicate that significant variation in the promoter of TLR2 and 9 genes in Vechur cattle breed and may potentially link the influence the innate immunity response against mastitis diseases. PMID:27397987

  9. The Arabidopsis Root Transcriptome by Serial Analysis of Gene Expression. Gene Identification Using the Genome Sequence1

    PubMed Central

    Fizames, Cécile; Muños, Stéphane; Cazettes, Céline; Nacry, Philippe; Boucherez, Jossia; Gaymard, Frédéric; Piquemal, David; Delorme, Valérie; Commes, Thérèse; Doumas, Patrick; Cooke, Richard; Marti, Jacques; Sentenac, Hervé; Gojon, Alain

    2004-01-01

    Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warrants the identification of the genes corresponding to the majority of the tags found experimentally, with a high level of reliability, and provides a reference database for SAGE studies in Arabidopsis. This new resource allowed us to characterize the expression of more than 3,000 genes, for which there is no expressed sequence tag (EST) or cDNA in the databases. Moreover, 85% of the tags were specific for one gene. To illustrate this advantage of SAGE for functional genomics, we show that our data allow an unambiguous analysis of most of the individual genes belonging to 12 different ion transporter multigene families. These results indicate that, compared with EST-based tag to gene assignment, the use of the annotated genome sequence greatly improves gene identification in SAGE studies. However, more than 6,000 different tags remained with no gene match, suggesting that a significant proportion of transcripts present in the roots originate from yet unknown or wrongly annotated genes. The root transcriptome characterized in this study markedly differs from those obtained in other organs, and provides a unique resource for investigating the functional specificities of the root system. As an example of the use of SAGE for transcript profiling in Arabidopsis, we report here the identification of 270 genes differentially expressed between roots of plants grown either with NO3- or NH4NO3 as N source. PMID:14730065

  10. Gene expression profile in the anterior regeneration of the earthworm using expressed sequence tags.

    PubMed

    Cho, Sung-Jin; Lee, Myung Sik; Tak, Eun Sik; Lee, Eun; Koh, Ki Seok; Ahn, Chi Hyun; Park, Soon Cheol

    2009-01-01

    In order to gain insight into the gene expression profiles associated with anterior regeneration of the earthworm, Perionyx excavatus, we analyzed 1,159 expressed sequence tags (ESTs) derived from cDNA library early anterior regenerated tissue. Among the 1,159 ESTs analyzed, 622 (53.7%) ESTs showed significant similarity to known genes and represented 338 genes, of which 233 ESTs were singletons and 105 ESTs manifested as two or more ESTs. While 663 ESTs (57.2%) were sequenced only once, 308 ESTs (26.6%) appeared 2 to 5 times, and 188 ESTs (16.2%) were sequenced more than 5 times. A total of 803 genes were categorized into 15 groups according to their biological functions. Among 1,159 ESTs sequenced, we found several gene encoding signaling molecules, such as Notch and Distal-less. The ESTs used in this study should provide a resource for future research in earthworm regeneration. PMID:19129665

  11. Genome-Wide Analysis Reveals Diverged Patterns of Codon Bias, Gene Expression, and Rates of Sequence Evolution in Picea Gene Families

    PubMed Central

    De La Torre, Amanda R.; Lin, Yao-Cheng; Van de Peer, Yves; Ingvarsson, Pär K.

    2015-01-01

    The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms. PMID:25747252

  12. Six novel Y chromosome genes in Anopheles mosquitoes discovered by independently sequencing males and females

    PubMed Central

    2013-01-01

    Background Y chromosomes are responsible for the initiation of male development, male fertility, and other male-related functions in diverse species. However, Y genes are rarely characterized outside a few model species due to the arduous nature of studying the repeat-rich Y. Results The chromosome quotient (CQ) is a novel approach to systematically discover Y chromosome genes. In the CQ method, genomic DNA from males and females is sequenced independently and aligned to candidate reference sequences. The female to male ratio of the number of alignments to a reference sequence, a parameter called the chromosome quotient (CQ), is used to determine whether the sequence is Y-linked. Using the CQ method, we successfully identified known Y sequences from Homo sapiens and Drosophila melanogaster. The CQ method facilitated the discovery of Y chromosome sequences from the malaria mosquitoes Anopheles stephensi and An. gambiae. Comparisons to transcriptome sequence data with blastn led to the discovery of six Anopheles Y genes, three from each species. All six genes are expressed in the early embryo. Two of the three An. stephensi Y genes were recently acquired from the autosomes or the X. Although An. stephensi and An. gambiae belong to the same subgenus, we found no evidence of Y genes shared between the species. Conclusions The CQ method can reliably identify Y chromosome sequences using the ratio of alignments from male and female sequence data. The CQ method is widely applicable to species with fragmented genome assemblies produced from next-generation sequencing data. Analysis of the six Y genes characterized in this study indicates rapid Y chromosome evolution between An. stephensi and An. gambiae. The Anopheles Y genes discovered by the CQ method provide unique markers for population and phylogenetic analysis, and opportunities for novel mosquito control measures through the manipulation of sexual dimorphism and fertility. PMID:23617698

  13. Analysis of the regions flanking the human insulin gene and sequence of an Alu family member.

    PubMed Central

    Bell, G I; Pictet, R; Rutter, W J

    1980-01-01

    The regions around the human insulin gene have been studied by heteroduplex, hybridization and sequence analysis. These studies indicated that there is a region of heterogeneous length located approximately 700 bp before the 5' end of the gene; and that the 19 kb of cloned DNA which includes the 1430 bp insulin gene as well as 5650 bp before and 11,500 bp after the gene is single copy sequence except for 500 bp located 6000 bp from the 3' end of the gene. This 500 bp segment contains a member of the Alu family of dispersed middle repetitive sequences as well as another less highly repeated homopolymeric segment. The sequence of this region was determined. This Alu repeat is bordered by 19 bp direct repeats and also contains an 83 bp sequence which is present twice. The regions flanking the human and rat I insulin genes were compared by heteroduplex analysis to localize homologous sequences in the flanking regions which could be involved in the regulation of insulin biosynthesis. The homology between the two genes is restricted to the region encoding preproinsulin and a short region of approximately 60 bp flanking the 5' side of the genes. Images PMID:6253909

  14. An Introductory Bioinformatics Exercise to Reinforce Gene Structure and Expression and Analyze the Relationship between Gene and Protein Sequences

    ERIC Educational Resources Information Center

    Almeida, Craig A.; Tardiff, Daniel F.; De Luca, Jane P.

    2004-01-01

    We have developed an introductory bioinformatics exercise for sophomore biology and biochemistry students that reinforces the understanding of the structure of a gene and the principles and events involved in its expression. In addition, the activity illustrates the severe effect mutations in a gene sequence can have on the protein product.…

  15. rpoB Gene Sequence-Based Identification of Staphylococcus Species

    PubMed Central

    Drancourt, Michel; Raoult, Didier

    2002-01-01

    The complete sequence of rpoB, the gene encoding the beta subunit of RNA polymerase was determined for Staphylococcus saccharolyticus, Staphylococcus lugdunensis, S taphylococcus caprae, and Staphylococcus intermedius and partial sequences were obtained for an additional 27 Staphylococcus species. The complete rpoB sequences varied in length from 3,452 to 3,845 bp and had a 36.8 to 39.2% GC content. The partial sequences had 71.6 to 93.6% interspecies homology and exhibited a 0.08 to 0.8% intraspecific divergence. With a few exceptions, the phylogenetic relationships inferred from the partial rpoB sequences were in agreement with those previously derived from DNA-DNA hybridization studies and analyses of 16S ribosomal DNA gene sequences and partial HSP60 gene sequences. The staphylococcal rpoB sequence database we established enabled us to develop a molecular method for identifying Staphylococcus isolates by PCR followed by direct sequencing of the 751-bp amplicon. In blind tests, this method correctly identified 10 Staphylococcus isolates, and no positive results were obtained with 10 non-Staphylococcus gram-positive and gram-negative bacterial isolates. We propose partial sequencing of the rpoB gene as a new tool for the accurate identification of Staphylococcus isolates. PMID:11923353

  16. Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

    PubMed Central

    Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

    2012-01-01

    Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922

  17. EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

    PubMed

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-07-01

    EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  18. EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

    PubMed Central

    Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

    2003-01-01

    EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408

  19. Identification and mapping of paralogous genes on a known genomic DNA sequence.

    PubMed

    Bina, Minou

    2006-01-01

    The completion of whole genome sequencing projects offers the opportunity to examine the organization of genes and the discovery of evolutionarily related genes in a given species. For the beginners in the field, through a specific example, this chapter provides a step-by-step procedure for identifying paralogous genes, using the genome browser at UCSC (http://genome.ucsc.edu/). The example describes identification and mapping in the human genome, the paralogs of TCF12/HTF4. The example identifies TCF3 and TCF4 as paralogs of the TCF12/HTF4 gene. The example also identifies a related sequence, corresponding to a pseudogene, in one of the introns of the JAK2 gene. The procedure described should be applicable to the discovery and creation of maps of paralogous genes in the genomic DNA sequences that are available at the genome browser at UCSC. PMID:16888348

  20. Cloning, sequencing, and expression of bacteriophage BF23 late genes 24 and 25 encoding tail proteins.

    PubMed Central

    Nakayama, S; Kaneko, T; Ishimaru, H; Moriwaki, H; Mizobuchi, K

    1994-01-01

    Two bacteriophage BF23 late genes, genes 24 and 25, were isolated on a 7.4-kb PstI fragment from the phage DNA, and their nucleotide sequences were determined. Gene 24 encodes a minor tail protein with the expected M(r) of 34,309, and gene 25 located 4 bp upstream of gene 24 encodes a major tail protein with the expected M(r) of 50,329. When total cellular RNA isolated from either phage-infected cells or cells bearing the cloned genes was analyzed by the primer extension method using the primers specific to either gene 25 or gene 24, we identified a possible late gene promoter, designated P25, in the 5'-flanking region of gene 25. This promoter was similar in structure to Escherichia coli promoters for sigma 70. Studies of the translational gene 25- and gene 24-lacZ fusions in the cloned gene system revealed that the promoter P25 was responsible for the expression of both genes 25 and 24 even in the absence of the regulatory genes which were absolutely required for late gene expression in the normal phage-infected cells. These results indicate that the two genes constitute an operon under the control of P25 and that the regulatory gene products of BF23 do not participate directly in specifying the late gene promoter. Images PMID:7961500

  1. Nonrepresentative PCR amplification of variable gene sequences in clinical specimens containing dilute, complex mixtures of microorganisms.

    PubMed Central

    Wright, C J; Jerse, A E; Cohen, M S; Cannon, J G; Seifert, H S

    1994-01-01

    PCR amplification and DNA sequencing of the expression locus from Neisseria gonorrhoeae contained in urine sediments collected from experimentally infected human subjects produced two observations. First, different pilin sequences were obtained when separate aliquots of the same sample were amplified and sequenced. In contrast, the same pilin sequence was obtained when repeated amplifications were performed on individual colonies grown from the clinical samples. Second, mixed sequences (i.e., more than one nucleotide at variable positions in the pilin gene sequence) were observed in both the direct clinical isolates and individual cultures grown from the isolates. These results suggest that when clinical samples are directly examined by PCR amplification and sequencing, multiple amplifications may be required to detect sequence variants in the sample and minority variant sequences will not always be detected. Images PMID:7908674

  2. Structural gene and complete amino acid sequence of Pseudomonas aeruginosa IFO 3455 elastase.

    PubMed Central

    Fukushima, J; Yamamoto, S; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Okuda, K

    1989-01-01

    The DNA encoding the elastase of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, bacteria carrying the gene exhibited high levels of both elastase activity and elastase antigens. The amino acid sequence, deduced from the nucleotide sequence, revealed that the mature elastase consisted of 301 amino acids with a relative molecular mass of 32,926 daltons. The amino acid composition predicted from the DNA sequence was quite similar to the chemically determined composition of purified elastase reported previously. We also observed nucleotide sequence encoding a signal peptide and "pro" sequence consisting of 197 amino acids upstream from the mature elastase protein gene. The amino acid sequence analysis revealed that both the N-terminal sequence of the purified elastase and the N-terminal side sequences of the C-terminal tryptic peptide as well as the internal lysyl peptide fragment were completely identical to the deduced amino acid sequences. The pattern of identity of amino acid sequences was quite evident in the regions that include structurally and functionally important residues of Bacillus subtilis thermolysin. PMID:2493453

  3. Prokaryotic genes in eukaryotic genome sequences: when to infer horizontal gene transfer and when to suspect an actual microbe.

    PubMed

    Artamonova, Irena I; Lappi, Tanya; Zudina, Liudmila; Mushegian, Arcady R

    2015-07-01

    Assessment of phylogenetic positions of predicted gene and protein sequences is a routine step in any genome project, useful for validating the species' taxonomic position and for evaluating hypotheses about genome evolution and function. Several recent eukaryotic genome projects have reported multiple gene sequences that were much more similar to homologues in bacteria than to any eukaryotic sequence. In the spirit of the times, horizontal gene transfer from bacteria to eukaryotes has been invoked in some of these cases. Here, we show, using comparative sequence analysis, that some of those bacteria-like genes indeed appear likely to have been horizontally transferred from bacteria to eukaryotes. In other cases, however, the evidence strongly indicates that the eukaryotic DNA sequenced in the genome project contains a sample of non-integrated DNA from the actual bacteria, possibly providing a window into the host microbiome. Recent literature suggests also that common reagents, kits and laboratory equipment may be systematically contaminated with bacterial DNA, which appears to be sampled by metagenome projects non-specifically. We review several bioinformatic criteria that help to distinguish putative horizontal gene transfers from the admixture of genes from autonomously replicating bacteria in their hosts' genome databases or from the reagent contamination. PMID:25919787

  4. Complex repetitive arrangements of gene sequence in the candidate region of the spinal muscular atrophy gene in 5q13

    SciTech Connect

    Theodosiou, A.M.; Nesbit, A.M.; Daniels, R.J.; Campbell, L.; Francis, M.J.; Christodoulou, Z.; Morrison, K.E.; Davies, K.E. |

    1994-12-01

    Childhood-onset proximal spinal muscular atrophy (SMA) is a heritable neurological disorder, which has been mapped by genetic linkage analysis to chromosome 5q13, in the interval between markers D5S435 and D5S557. Here, we present gene sequences that have been isolated from this interval, several of which show sequence homologies to exons of {beta}-glucuronidase. These gene sequences are repeated several times across the candidate region and are also present on chromosome 5p. The arrangement of these repetitive gene motifs is polymorphic between individuals. The high degree of variability observed may have some influence on the expression of the genes in the region. Since SMA is not inherited as a classical autosomal recessive disease, novel genomic rearrangements arising from aberrant recombination events between the complex repeats may be associated with the phenotype observed.

  5. Utility of rpoB Gene Sequencing for Identification of Nontuberculous Mycobacteria in the Netherlands

    PubMed Central

    de Zwaan, Rina; van Ingen, Jakko

    2014-01-01

    In the Netherlands, clinical isolation of nontuberculous mycobacteria (NTM) has increased over the past decade. Proper identification of isolates is important, as NTM species differ strongly in clinical relevance. Most of the currently applied identification methods cannot distinguish between all different Mycobacterium species and complexes within species. rpoB gene sequencing exhibits a promising level of discrimination among rapidly and slowly growing mycobacteria, including the Mycobacterium avium complex. In this study, we prospectively compared rpoB gene sequencing with our routine algorithm of reverse line blot identification combined with partial 16S rRNA gene sequencing of 455 NTM isolates. rpoB gene sequencing identified 403 isolates to species level as 45 different known species and identified 44 isolates to complex level, and eight isolates remained unidentifiable to species level. In contrast, our reference reverse line blot assay with adjunctive 16S rRNA gene sequencing identified 390 isolates to species level (30 distinct species) and identified 56 isolates to complex level, and nine isolates remained unidentified. The higher discriminatory power of rpoB gene sequencing results largely from the distinction of separate species within complexes and subspecies. Also, Mycobacterium gordonae, Mycobacterium kansasii, and Mycobacterium interjectum were separated into multiple groupings with relatively low sequence similarity (98 to 94%), suggesting that these are complexes of closely related species. We conclude that rpoB gene sequencing is a more discriminative identification technique than the combination of reverse line blot and 16S rRNA gene sequencing and could introduce a major improvement in clinical care of NTM disease and the research on the epidemiology and clinical relevance of NTM. PMID:24808238

  6. Chromosomal localization and sequence variation of 5S rRNA gene in five Capsicum species.

    PubMed

    Park, Y K; Park, K C; Park, C H; Kim, N S

    2000-02-29

    Chromosomal localization and sequence analysis of the 5S rRNA gene were carried out in five Capsicum species. Fluorescence in situ hybridization revealed that chromosomal location of the 5S rRNA gene was conserved in a single locus at a chromosome which was assigned to chromosome 1 by the synteny relationship with tomato. In sequence analysis, the repeating units of the 5S rRNA genes in the Capsicum species were variable in size from 278 bp to 300 bp. In sequence comparison of our results to the results with other Solanaceae plants as published by others, the coding region was highly conserved, but the spacer regions varied in size and sequence. T stretch regions, just after the end of the coding sequences, were more prominant in the Capsicum species than in two other plants. High G x C rich regions, which might have similar functions as that of the GC islands in the genes transcribed by RNA PolII, were observed after the T stretch region. Although we could not observe the TATA like sequences, an AT rich segment at -27 to -18 was detected in the 5S rRNA genes of the Capsicum species. Species relationship among the Capsicum species was also studied by the sequence comparison of the 5S rRNA genes. While C. chinense, C. frutescens, and C. annuum formed one lineage, C. baccatum was revealed to be an intermediate species between the former three species and C. pubescens. PMID:10774742

  7. Comparative genome sequencing of drosophila pseudoobscura: Chromosomal, gene and cis-element evolution

    SciTech Connect

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2004-04-01

    The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.

  8. Targeting DNA with triplex-forming oligonucleotides to modify gene sequence.

    PubMed

    Simon, Philippe; Cannata, Fabio; Concordet, Jean-Paul; Giovannangeli, Carine

    2008-08-01

    Molecules that interact with DNA in a sequence-specific manner are attractive tools for manipulating gene sequence and expression. For example, triplex-forming oligonucleotides (TFOs), which bind to oligopyrimidine.oligopurine sequences via Hoogsteen hydrogen bonds, have been used to inhibit gene expression at the DNA level as well as to induce targeted mutagenesis in model systems. Recent advances in using oligonucleotides and analogs to target DNA in a sequence-specific manner will be discussed. In particular, chemical modification of TFOs has been used to improve binding to chromosomal target sequences in living cells. Various oligonucleotide analogs have also been found to expand the range of sequences amenable to manipulation, including so-called "Zorro" locked nucleic acids (LNAs) and pseudo-complementary peptide nucleic acids (pcPNAs). Finally, we will examine the potential of TFOs for directing targeted gene sequence modification and propose that synthetic nucleases, based on conjugation of sequence-specific DNA ligands to DNA damaging molecules, are a promising alternative to protein-based endonucleases for targeted gene sequence modification. PMID:18460344

  9. Sequence variants in oxytocin pathway genes and preterm birth: a candidate gene association study

    PubMed Central

    2013-01-01

    Background Preterm birth (PTB) is a complex disorder associated with significant neonatal mortality and morbidity and long-term adverse health consequences. Multiple lines of evidence suggest that genetic factors play an important role in its etiology. This study was designed to identify genetic variation associated with PTB in oxytocin pathway genes whose role in parturition is well known. Methods To identify common genetic variants predisposing to PTB, we genotyped 16 single nucleotide polymorphisms (SNPs) in the oxytocin (OXT), oxytocin receptor (OXTR), and leucyl/cystinyl aminopeptidase (LNPEP) genes in 651 case infants from the U.S. and one or both of their parents. In addition, we examined the role of rare genetic variation in susceptibility to PTB by conducting direct sequence analysis of OXTR in 1394 cases and 1112 controls from the U.S., Argentina, Denmark, and Finland. This study was further extended to maternal triads (maternal grandparents-mother of a case infant, N=309). We also performed in vitro analysis of selected rare OXTR missense variants to evaluate their functional importance. Results Maternal genetic effect analysis of the SNP genotype data revealed four SNPs in LNPEP that show significant association with prematurity. In our case–control sequence analysis, we detected fourteen coding variants in exon 3 of OXTR, all but four of which were found in cases only. Of the fourteen variants, three were previously unreported novel rare variants. When the sequence data from the maternal triads were analyzed using the transmission disequilibrium test, two common missense SNPs (rs4686302 and rs237902) in OXTR showed suggestive association for three gestational age subgroups. In vitro functional assays showed a significant difference in ligand binding between wild-type and two mutant receptors. Conclusions Our study suggests an association between maternal common polymorphisms in LNPEP and susceptibility to PTB. Maternal OXTR missense SNPs rs4686302

  10. A Neurospora crassa ribosomal protein gene, homologous to yeast CRY1, contains sequences potentially coordinating its transcription with rRNA genes.

    PubMed Central

    Tyler, B M; Harrison, K

    1990-01-01

    We have isolated and sequenced a Neurospora crassa ribosomal protein gene (designated crp-2) strongly homologous to the rp59 gene (CRY1) of yeast and the S14 ribosomal protein gene of mammals. The inferred sequence of the crp-2 protein is more homologous (83%) to the mammalian S14 sequence than to the yeast rp59 sequence (69%). The gene has three intervening sequences (IVSs) two of which are offset 7 bp from the position of IVSs in the mammalian genes. None correspond to the position of the IVS in the yeast gene. Crp-2 was mapped by RFLP analysis to the right arm of linkage group III. The 5' region of the gene contains three copies of a sequence, the Ribo box, previously shown to be required for transcription of both 5S and 40S rRNA genes. We speculate that the Ribo box may coordinate ribosomal protein and rRNA gene transcription. Images PMID:1977135

  11. Attacin gene sequence variations in different ecoraces of tasar silkworm Antheraea mylitta

    PubMed Central

    Sudha, Rati; Murthy, Geetha N; Awasthi, Arvind K; Ponnuvel, Kangayam M

    2015-01-01

    Attacin gene exists as paralogous conversion and is being used for identification of strain variations in insects based on the sequence variation. Hence, a study was undertaken to analyze the sequence variation of the attacin gene isoforms in the tasar silkworm Anthereae mylitta that exists in the form of different ecoraces depending upon the environment, food plant and location. Comparison of the previously reported attacin sequences with the DNA sequences of attacin A and B genes revealed six amino acid substitutions among the sequences of the ecoraces which however did not affect the functional domain of Attacin. The generated dendrogram clearly indicated unique branches for each ecorace with two separate gene clusters for attacin A and B. The Sarihan ecorace formed a separate sub-group under both the gene clusters. The present study also revealed the presence of Attacin_N Superfamily domain exclusively in Exon I separated from the Attacin_C Superfamily domain that was present in Exon II and part of Exon III, a prominent character of attacin gene. The phylogenetic reconstruction analysis of attacin gene in A.mylitta supported the common evolutionary origin of attacin genes belonging to the Lepidoteran and Dipteran families that formed two separate clusters. PMID:26664033

  12. Transcriptome sequencing of Hydrangea macrophylla to uncover genes related to reblooming and powdery mildew resistance

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Massively parallel pyrosequencing technology has been used extensively on agronomic crop and model plants. Transcriptome sequencing is a useful first step in functional genomic studies, microarray and gene expression studies, single nucleotide polymorphism (SNP) surveys, quantitative trait loci (QT...

  13. Matrix genes of measles virus and canine distemper virus: cloning, nucleotide sequences, and deduced amino acid sequences.

    PubMed Central

    Bellini, W J; Englund, G; Richardson, C D; Rozenblatt, S; Lazzarini, R A

    1986-01-01

    The nucleotide sequences encoding the matrix (M) proteins of measles virus (MV) and canine distemper virus (CDV) were determined from cDNA clones containing these genes in their entirety. In both cases, single open reading frames specifying basic proteins of 335 amino acid residues were predicted from the nucleotide sequences. Both viral messages were composed of approximately 1,450 nucleotides and contained 400 nucleotides of presumptive noncoding sequences at their respective 3' ends. MV and CDV M-protein-coding regions were 67% homologous at the nucleotide level and 76% homologous at the amino acid level. Only chance homology was observed in the 400-nucleotide trailer sequences. Comparisons of the M protein sequences of MV and CDV with the sequence reported for Sendai virus (B. M. Blumberg, K. Rose, M. G. Simona, L. Roux, C. Giorgi, and D. Kolakofsky, J. Virol. 52:656-663; Y. Hidaka, T. Kanda, K. Iwasaki, A. Nomoto, T. Shioda, and H. Shibuta, Nucleic Acids Res. 12:7965-7973) indicated the greatest homology among these M proteins in the carboxyterminal third of the molecule. Secondary-structure analyses of this shared region indicated a structurally conserved, hydrophobic sequence which possibly interacted with the lipid bilayer. Images PMID:3754588

  14. The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.

    PubMed

    Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-02-01

    The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago. PMID:15354359

  15. Trichinella pseudospiralis vs. T. spiralis thymidylate synthase gene structure and T. pseudospiralis thymidylate synthase retrogene sequence

    PubMed Central

    2014-01-01

    Background Thymidylate synthase is a housekeeping gene, designated ancient due to its role in DNA synthesis and ubiquitous phyletic distribution. The genomic sequences were characterized coding for thymidylate synthase in two species of the genus Trichinella, an encapsulating T. spiralis and a non-encapsulating T. pseudospiralis. Methods Based on the sequence of parasitic nematode Trichinella spiralis thymidylate synthase cDNA, PCR techniques were employed. Results Each of the respective gene structures encompassed 6 exons and 5 introns located in conserved sites. Comparison with the corresponding gene structures of other eukaryotic species revealed lack of common introns that would be shared among selected fungi, nematodes, mammals and plants. The two deduced amino acid sequences were 96% identical. In addition to the thymidylate synthase gene, the intron-less retrocopy, i.e. a processed pseudogene, with sequence identical to the T. spiralis gene coding region, was found to be present within the T. pseudospiralis genome. This pseudogene, instead of the gene, was confirmed by RT-PCR to be expressed in the parasite muscle larvae. Conclusions Intron load, as well as distribution of exon and intron phases in thymidylate synthase genes from various sources, point against the theory of gene assembly by the primordial exon shuffling and support the theory of evolutionary late intron insertion into spliceosomal genes. Thymidylate synthase pseudogene expressed in T. pseudospiralis muscle larvae is designated a retrogene. PMID:24716800

  16. Nucleotide sequence of the tcml gene (ribosomal protein L3) of Saccharomyces cerevisiae.

    PubMed Central

    Schultz, L D; Friesen, J D

    1983-01-01

    The yeast tcml gene, which codes for ribosomal protein L3, has been isolated by using recombinant DNA and genetic complementation. The DNA fragment carrying this gene has been subcloned and we have determined its DNA sequence. The 20 amino acid residues at the amino terminus as inferred from the nucleotide sequence agreed exactly with the amino acid sequence data. The amino acid composition of the encoded protein agreed with that determined for purified ribosomal protein L3. Codon usage in the tcml gene was strongly biased in the direction found for several other abundant Saccharomyces cerevisiae proteins. The tcml gene has no introns, which appears to be atypical of ribosomal protein structural genes. PMID:6305925

  17. Nonessential region of bacteriophage P4: DNA sequence, transcription, gene products, and functions.

    PubMed Central

    Ghisotti, D; Finkel, S; Halling, C; Dehò, G; Sironi, G; Calendar, R

    1990-01-01

    We sequenced the leftmost 2,640 base pairs of bacteriophage P4 DNA, thus completing the sequence of the 11,627-base-pair P4 genome. The newly sequenced region encodes three nonessential genes, which are called gop, beta, and cII (in order, from left to right). The gop gene product kills Escherichia coli when the beta protein is absent; the gop and beta genes are transcribed rightward from the same promoter. The cII gene is transcribed leftward to a rho-independent terminator. Mutation of this terminator creates a temperature-sensitive phenotype, presumably owing to a defect in expression of the beta gene. Images PMID:2403440

  18. Targeting of AID-mediated sequence diversification to immunoglobulin genes.

    PubMed

    Kothapalli, Naga Rama; Fugmann, Sebastian D

    2011-04-01

    Activation-induced cytidine deaminase (AID) is a key enzyme for antibody-mediated immune responses. Antibodies are encoded by the immunoglobulin genes and AID acts as a transcription-dependent DNA mutator on these genes to improve antibody affinity and effector functions. An emerging theme in field is that many transcribed genes are potential targets of AID, presenting an obvious danger to genomic integrity. Thus there are mechanisms in place to ensure that mutagenic outcomes of AID activity are specifically restricted to the immunoglobulin loci. Cis-regulatory targeting elements mediate this effect and their mode of action is probably a combination of immunoglobulin gene specific activation of AID and a perversion of faithful DNA repair towards error-prone outcomes. PMID:21295456

  19. Targeting of AID-mediated sequence diversification to immunoglobulin genes

    PubMed Central

    Kothapalli, Naga Rama; Fugmann, Sebastian D.

    2011-01-01

    Activation-induced cytidine deaminase (AID) is a key enzyme for antibody-mediated immune responses. Antibodies are encoded by the immunoglobulin genes and AID acts as a transcription-dependent DNA mutator on these genes to improve antibody affinity and effector functions. An emerging theme in field is that many transcribed genes are potential targets of AID, presenting an obvious danger to genomic integrity. Thus there are mechanisms in place to ensure that mutagenic outcomes of AID activity are specifically restricted to the immunoglobulin loci. Cis-regulatory targeting elements mediate this effect and their mode of action is likely a combination of immunoglobulin gene specific activation of AID and a perversion of faithful DNA repair towards error-prone outcomes. PMID:21295456

  20. Sequence and evolution of HLA-DR7- and -DRw53-associated. beta. -chain genes

    SciTech Connect

    Young, J.A.T.; Wilkinson, D.; Bodmer, W.F.; Trowsdale, J.

    1987-07-01

    cDNA clones representing products of the DR7 and DRw53 ..beta..-chain genes were isolated from the human B-lymphoblastoid cell line MANN (DR7, DRw53, DQw2, DPw2). The DRw53..beta.. sequence was identical to a DRw53..beta.. sequence derived from cells with a DR4 haplotype. In contrast, the DR7..beta.. sequence was as unrelated to DR4..beta.. sequence as it was to other DR..beta..-related genes, except at the 3'-untranslated region. These results suggest that the DR7 and DR4 haplotypes may have been derived relatively recently from a common ancestral haplotype and that the DR4 and DR7 ..beta..-chain genes have undergone more rapid diversification in the ..beta..1 domains, most probably as a result of natural selection, than have the DRw53..beta..-chain genes. Short tracts of sequence within the DR7 and DRw53 ..beta..1 domains were shared with other DR..beta.. sequences, indicating that exchanges of genetic information between ..beta..1 domains of DR..beta..-related genes have played a part in their evolution. Serological analysis of mouse L-cell transfectants expressing surface HLA-DR7 molecules, confirmed by antibody binding and allelic sequence comparison, identified amino acid residues that may be critical to the binding of a monomorphic DR- and CP-specific monoclonal antibody.

  1. Sequence heterogeneity in the two 16S rRNA genes of Phormium yellow leaf phytoplasma.

    PubMed Central

    Liefting, L W; Andersen, M T; Beever, R E; Gardner, R C; Forster, R L

    1996-01-01

    Phormium yellow leaf (PYL) phytoplasma causes a lethal disease of the monocotyledon, New Zealand flax (Phormium tenax). The 16S rRNA genes of PYL phytoplasma were amplified from infected flax by PCR and cloned, and the nucleotide sequences were determined. DNA sequencing and Southern hybridization analysis of genomic DNA indicated the presence of two copies of the 16S rRNA gene. The two 16S rRNA genes exhibited sequence heterogeneity in 4 nucleotide positions and could be distinguished by the restriction enzymes BpmI and BsrI. This is the first record in which sequence heterogeneity in the 16S rRNA genes of a phytoplasma has been determined by sequence analysis. A phylogenetic tree based on 16S rRNA gene sequences showed that PYL phytoplasma is most closely related to the stolbur and German grapevine yellows phytoplasmas, which form the stolbur subgroup of the aster yellows group. This phylogenetic position of PYL phytoplasma was supported by 16S/23S spacer region sequence data. PMID:8795200

  2. Cloning and sequence analysis of chitin synthase gene fragments of Demodex mites*

    PubMed Central

    Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying

    2012-01-01

    To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi’an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi’an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%–99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites. PMID:23024043

  3. Phylogeny and Identification of Enterococci by atpA Gene Sequence Analysis

    PubMed Central

    Naser, S.; Thompson, F. L.; Hoste, B.; Gevers, D.; Vandemeulebroecke, K.; Cleenwerck, I.; Thompson, C. C.; Vancanneyt, M.; Swings, J.

    2005-01-01

    The relatedness among 91 Enterococcus strains representing all validly described species was investigated by comparing a 1,102-bp fragment of atpA, the gene encoding the alpha subunit of ATP synthase. The relationships observed were in agreement with the phylogeny inferred from 16S rRNA gene sequence analysis. However, atpA gene sequences were much more discriminatory than 16S rRNA for species differentiation. All species were differentiated on the basis of atpA sequences with, at a maximum, 92% similarity. Six members of the Enterococcus faecium species group (E. faecium, E. hirae, E. durans, E. villorum, E. mundtii, and E. ratti) showed >99% 16S rRNA gene sequence similarity, but the highest value of atpA gene sequence similarity was only 89.9%. The intraspecies atpA sequence similarities for all species except E. faecium strains varied from 98.6 to 100%; the E. faecium strains had a lower atpA sequence similarity of 96.3%. Our data clearly show that atpA provides an alternative tool for the phylogenetic study and identification of enterococci. PMID:15872246

  4. Sequence of the Ampullariella sp. strain 3876 gene coding for xylose isomerase.

    PubMed Central

    Saari, G C; Kumar, A A; Kawasaki, G H; Insley, M Y; O'Hara, P J

    1987-01-01

    The nucleotide sequence of the gene coding for xylose isomerase from Ampullariella sp. strain 3876, a gram-positive bacterium, has been determined. A clone of a fragment of strain 3876 DNA coding for a xylose isomerase activity was identified by its ability to complement a xylose isomerase-defective Escherichia coli strain. One such complementation positive fragment, 2,922 nucleotides in length, was sequenced in its entirety. There are two open reading frames 1,182 and 1,242 nucleotides in length, on opposite strands of this fragment, each of which could code for a protein the expected size of xylose isomerase. The 1,182-nucleotide open reading frame was identified as the coding sequence for the protein from the sequence analysis of the amino-terminal region and selected internal peptides. The gene initiates with GTG and has a high guanine and cytosine content (70%) and an exceptionally strong preference (97%) for guanine or cytosine in the third position of the codons. The gene codes for a 43,210-dalton polypeptide composed of 393 amino acids. The xylose isomerase from Ampullariella sp. strain 3876 is similar in size to other bacterial xylose isomerases and has limited amino acid sequence homology to the available sequences from E. coli, Bacillus subtilis, and Streptomyces violaceus-ruber. In all cases yet studied, the bacterial gene for xylulose kinase is downstream from the gene for xylose isomerase. We present evidence suggesting that in Ampullariella sp. strain 3876 these genes are similarly arranged. PMID:3027039

  5. Analysis of a marine picoplankton community by 16S rRNA gene cloning and sequencing.

    PubMed Central

    Schmidt, T M; DeLong, E F; Pace, N R

    1991-01-01

    The phylogenetic diversity of an oligotrophic marine picoplankton community was examined by analyzing the sequences of cloned ribosomal genes. This strategy does not rely on cultivation of the resident microorganisms. Bulk genomic DNA was isolated from picoplankton collected in the north central Pacific Ocean by tangential flow filtration. The mixed-population DNA was fragmented, size fractionated, and cloned into bacteriophage lambda. Thirty-eight clones containing 16S rRNA genes were identified in a screen of 3.2 x 10(4) recombinant phage, and portions of the rRNA gene were amplified by polymerase chain reaction and sequenced. The resulting sequences were used to establish the identities of the picoplankton by comparison with an established data base of rRNA sequences. Fifteen unique eubacterial sequences were obtained, including four from cyanobacteria and eleven from proteobacteria. A single eucaryote related to dinoflagellates was identified; no archaebacterial sequences were detected. The cyanobacterial sequences are all closely related to sequences from cultivated marine Synechococcus strains and with cyanobacterial sequences obtained from the Atlantic Ocean (Sargasso Sea). Several sequences were related to common marine isolates of the gamma subdivision of proteobacteria. In addition to sequences closely related to those of described bacteria, sequences were obtained from two phylogenetic groups of organisms that are not closely related to any known rRNA sequences from cultivated organisms. Both of these novel phylogenetic clusters are proteobacteria, one group within the alpha subdivision and the other distinct from known proteobacterial subdivisions. The rRNA sequences of the alpha-related group are nearly identical to those of some Sargasso Sea picoplankton, suggesting a global distribution of these organisms. Images PMID:2066334

  6. Organization and sequence of four flagellin-encoding genes of Edwardsiella icataluri

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Edwardsiella ictaluri, the cause of enteric septicemia in channel catfish (Ictalurus punctatus), is motile by means of peritrichous flagella. We determined the complete flagellin gene sequences and their organization in E. ictaluri by sequencing genomic segments selected from a lambda-ZAP phage gen...

  7. Sequencing of the β-tubulin genes in the ascarid nematodes Parascaris equorum and Ascaridia galli.

    PubMed

    Tydén, E; Engström, A; Morrison, D A; Höglund, J

    2013-07-01

    Benzimidazoles (BZ) are used to control infections of the equine roundworm Parascaris equorum and the poultry roundworm Ascaridia galli. There are still no reports of anthelmintic resistance (AR) to BZ in these two nematodes, although AR to BZ is widespread in several other veterinary parasites. Several single nucleotide polymorphisms (SNP) in the β-tubulin genes have been associated with BZ-resistance. In the present study we have sequenced β-tubulin genes: isotype 1 and isotype 2 of P. equorum and isotype 1 of A. galli. Phylogenetic analysis of all currently known isotypes showed that the Nematoda has more diversity among the β-tubulin genes than the Vertebrata. In addition, this diversity is arranged in a more complex pattern of isotypes. Phylogenetically, the A. galli sequence and one of the P. equorum sequences clustered with the known Ascaridoidea isotype 1 sequences, while the other P. equorum sequence did not cluster with any other β-tubulin sequences. We therefore conclude that this is a previously unreported isotype 2. The β-tubulin gene sequences were used to develop a PCR for genotyping SNP in codons 167, 198 and 200. No SNP was observed despite sequencing 95 and 100 individual adult worms of P. equorum and A. galli, respectively. Given the diversity of isotype patterns among nematodes, it is likely that associations of genetic data with BZ-resistance cannot be generalised from one taxonomic group to another. PMID:23685342

  8. Streptococcus suis Serotypes Characterized by Analysis of Chaperonin 60 Gene Sequences

    PubMed Central

    Brousseau, Ronald; Hill, Janet E.; Préfontaine, Gabrielle; Goh, Swee-Han; Harel, Josée; Hemmingsen, Sean M.

    2001-01-01

    Streptococcus suis is an important pathogen of swine which occasionally infects humans as well. There are 35 serotypes known for this organism, and it would be desirable to develop rapid methods methods to identify and differentiate the strains of this species. To that effect, partial chaperonin 60 gene sequences were determined for the 35 serotype reference strains of S. suis. Analysis of a pairwise distance matrix showed that the distances ranged from 0 to 0.275 when values were calculated by the maximum-likelihood method. For five of the strains the distances from serotype 1 were greater than 0.1, and for two of these strains the distances were were more than 0.25, suggesting that they belong to a different species. Most of the nucleotide differences were silent; alignment of protein sequences showed that there were only 11 distinct sequences for the 35 strains under study. The chaperonin 60 gene phylogenetic tree was similar to the previously published tree based on 16S rRNA sequences, and it was also observed that strains with identical chaperonin 60 gene sequences tended to have identical 16S rRNA sequences. The chaperonin 60 gene sequences provided a higher level of discrimination between serotypes than the 16S RNA sequences provided and could form the basis for a diagnostic protocol. PMID:11571190

  9. SEQUENCING OF CUCUMBER (CUCUMIS SATIVUS L.) CHLOROPLAST GENOMES IDENTIFIES PUTATIVE CANDIDATE GENES FOR CHILLING TOLERANCE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Chilling injury in cucumber (Cucumis sativus L.) is conditioned by maternal factors and the sequencing of its chloroplast (cp) genome could lead to the identification of economically important candidate genes. Complete sequencing of cucumber cpDNA was facilitated by the development of 414 consensus...

  10. Sequence divergence and chromosomal rearrangements during the evolution of human pseudoautosomal genes and their mouse homologs

    SciTech Connect

    Ellison, J.; Li, X.; Francke, U.

    1994-09-01

    The pseudoautosomal region (PAR) is an area of sequence identity between the X and Y chromosomes and is important for mediating X-Y pairing during male meiosis. Of the seven genes assigned to the human PAR, none of the mouse homologs have been isolated by a cross-hybridization strategy. Two of these homologs, Csfgmra and II3ra, have been isolated using a functional assay for the gene products. These genes are quite different in sequence from their human homologs, showing only 60-70% sequence similarity. The Csfgmra gene has been found to further differ from its human homolog in being isolated not on the sex chromosomes, but on a mouse autosome (chromosome 19). Using a mouse-hamster somatic cell hybrid mapping panel, we have mapped the II3ra gene to yet another mouse autosome, chromosome 14. Attempts to clone the mouse homolog of the ANT3 locus resulted in the isolation of two related genes, Ant1 and Ant2, but failed to yield the Ant3 gene. Southern blot analysis of the ANT/Ant genes showed the Ant1 and Ant2 sequences to be well-conserved among all of a dozen mammals tested. In contrast, the ANT3 gene only showed hybridization to non-rodent mammals, suggesting it is either greatly divergent or has been deleted in the rodent lineage. Similar experiments with other human pseudoautosomal probes likewise showed a lack of hybridization to rodent sequences. The results show a definite trend of extensive divergence of pseudoautosomal sequences in addition to chromosomal rearrangements involving X;autosome translocations and perhaps gene deletions. Such observations have interesting implications regarding the evolution of this important region of the sex chromosomes.

  11. Sequence Requirements for Myosin Gene Expression and Regulation in Caenorhabditis Elegans

    PubMed Central

    Okkema, P. G.; Harrison, S. W.; Plunger, V.; Aryana, A.; Fire, A.

    1993-01-01

    Four Caenorhabditis elegans genes encode muscle-type specific myosin heavy chain isoforms: myo-1 and myo-2 are expressed in the pharyngeal muscles; unc-54 and myo-3 are expressed in body wall muscles. We have used transformation-rescue and lacZ fusion assays to determine sequence requirements for regulated myosin gene expression during development. Multiple tissue-specific activation elements are present for all four genes. For each of the four genes, sequences upstream of the coding region are tissue-specific promoters, as shown by their ability to drive expression of a reporter gene (lacZ) in the appropriate muscle type. Each gene contains at least one additional tissue-specific regulatory element, as defined by the ability to enhance expression of a heterologous promoter in the appropriate muscle type. In rescue experiments with unc-54, two further requirements apparently independent of tissue specificity were found: sequences within the 3' non-coding region are essential for activity while an intron near the 5' end augments expression levels. The general intron stimulation is apparently independent of intron sequence, indicating a mechanistic effect of splicing. To further characterize the myosin gene promoters and to examine the types of enhancer sequences in the genome, we have initiated a screen of C. elegans genomic DNA for fragments capable of enhancing the myo-2 promoter. The properties of enhancers recovered from this screen suggest that the promoter is limited to muscle cells in its ability to respond to enhancers. PMID:8244003

  12. A 5.8S nuclear ribosomal RNA gene sequence database: applications to ecology and evolution

    NASA Technical Reports Server (NTRS)

    Cullings, K. W.; Vogler, D. R.

    1998-01-01

    We complied a 5.8S nuclear ribosomal gene sequence database for animals, plants, and fungi using both newly generated and GenBank sequences. We demonstrate the utility of this database as an internal check to determine whether the target organism and not a contaminant has been sequenced, as a diagnostic tool for ecologists and evolutionary biologists to determine the placement of asexual fungi within larger taxonomic groups, and as a tool to help identify fungi that form ectomycorrhizae.

  13. Sources of variation in ancestral sequence reconstruction for HIV-1 envelope genes

    PubMed Central

    Ross, Howard A.; Nickle, David C.; Liu, Yi; Heath, Laura; Jensen, Mark A.; Rodrigo, Allen G.; Mullins, James I.

    2007-01-01

    We characterized the variation in the reconstructed ancestor of 118 HIV-1 envelope gene sequences arising from the methods used for (a) estimating and (b) rooting the phylogenetic tree, and (c) reconstructing the ancestor on that tree, from (d) the sequence format, and from (e) the number of input sequences. The method of rooting the tree was responsible for most of the sequence variation both among the reconstructed ancestral sequences and between the ancestral and observed sequences. Variation in predicted 3-D structural properties of the ancestors mirrored their sequence variation. The observed sequence consensus and ancestral sequences from center-rooted trees were most similar in all predicted attributes. Only for the predicted number of N-glycosylation sites was there a difference between MP and ML methods of reconstruction. Taxon sampling effects were observed only for outgroup-rooted trees, not center-rooted, reflecting the occurrence of several divergent basal sequences. Thus, for sequences exhibiting a radial phylogenetic tree, as does HIV-1, most of the variation in the estimated ancestor arises from the method of rooting the phylogenetic tree. Those investigating the ancestors of genes exhibiting such a radial tree should pay particular attention to alternate rooting methods in order to obtain a representative sample of ancestors. PMID:19455202

  14. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity.

    PubMed

    Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B

    2015-09-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, nc

  15. Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

    SciTech Connect

    Fields, C.A.

    1996-06-01

    The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progress report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.

  16. Sequencing and mapping hemoglobin gene clusters in the australian model dasyurid marsupial sminthopsis macroura

    SciTech Connect

    De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.

    2004-07-26

    Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.

  17. RNA Sequencing Revealed Numerous Polyketide Synthase Genes in the Harmful Dinoflagellate Karenia mikimotoi

    PubMed Central

    Kimura, Kei; Okuda, Shujiro; Nakayama, Kei; Shikata, Tomoyuki; Takahashi, Fumio; Yamaguchi, Haruo; Skamoto, Setsuko; Yamaguchi, Mineo; Tomaru, Yuji

    2015-01-01

    The dinoflagellate Karenia mikimotoi forms blooms in the coastal waters of temperate regions and occasionally causes massive fish and invertebrate mortality. This study aimed to elucidate the toxic effect of K. mikimotoi on marine organisms by using the genomics approach; RNA-sequence libraries were constructed, and data were analyzed to identify toxin-related genes. Next-generation sequencing produced 153,406 transcript contigs from the axenic culture of K. mikimotoi. BLASTX analysis against all assembled contigs revealed that 208 contigs were polyketide synthase (PKS) sequences. Thus, K. mikimotoi was thought to have several genes encoding PKS metabolites and to likely produce toxin-like polyketide molecules. Of all the sequences, approximately 30 encoded eight PKS genes, which were remarkably similar to those of Karenia brevis. Our phylogenetic analyses showed that these genes belonged to a new group of PKS type-I genes. Phylogenetic and active domain analyses showed that the amino acid sequence of four among eight Karenia PKS genes was not similar to any of the reported PKS genes. These PKS genes might possibly be associated with the synthesis of polyketide toxins produced by Karenia species. Further, a homology search revealed 10 contigs that were similar to a toxin gene responsible for the synthesis of saxitoxin (sxtA) in the toxic dinoflagellate Alexandrium fundyense. These contigs encoded A1–A3 domains of sxtA genes. Thus, this study identified some transcripts in K. mikimotoi that might be associated with several putative toxin-related genes. The findings of this study might help understand the mechanism of toxicity of K. mikimotoi and other dinoflagellates. PMID:26561394

  18. Nucleotide sequence of a gene from Phanerochaete chrysosporium that shows homology to the facA gene of Aspergillus nidulans.

    PubMed

    Birch, P R; Sims, P F; Broda, P

    1992-01-01

    Heterologous hybridisation was used to isolate a genomic DNA sequence from Phanerochaete chrysosporium using the facA (acetyl CoA synthetase) gene from Aspergillus nidulans as a probe. The cloned sequence hybridises to a 2.2 kb transcript in poly(A)+ RNA prepared from mycelium grown on acetate as the sole carbon source. Comparison of the DNA sequence obtained with those of the A. nidulans facA and N. crassa acu5 genes reveals an ORF that appears to be interrupted by five typical fungal introns. Two possible candidates for the translation initiation codon were observed. Homology with the facA and acu5 genes is revealed after the second ATG codon. PMID:1352996

  19. Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

    PubMed Central

    Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

    2003-01-01

    Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375

  20. Cloning, mapping, and sequencing of plasmid R100 traM and finP genes.

    PubMed Central

    Fee, B E; Dempsey, W B

    1986-01-01

    The fertility control gene finP, the transfer gene traM, and the transfer origin, oriT, of plasmid R100 were isolated on a single 1.2-kilobase EcoRV fragment and were then subcloned as HaeIII fragments. The sequence of the 754-base-pair finP-containing fragment is reported here. In addition to the finP gene, the sequence includes all but two bases of the R100 traM open reading frame and apparently all of the leader mRNA sequence and amino end of the traJ gene of R100. The sequence contains two open reading frames which encode small proteins on the opposite strand from the traM and traJ genes. It also shows two sets of inverted repeats that have the characteristics of transcription terminators. One set is positioned as if it was the traM terminator, and the other set, which is downstream from the first, sits in the middle of the leader mRNA sequence for traJ. On the bottom strand, this inverted repeat has the structure of a rho-independent terminator. Other less-stable inverted repeats overlap this second terminator in the same way as is seen in attenuation sequences, and the two separate small open reading frames on the bottom strand also totally overlap the stem of the rho-independent terminator, suggesting that their translation would cause shifting of termination to the bottom strand homolog of the putative traM terminator. The finP gene product was not identified, but the gene was mapped to the sequence which contains the traJ gene. It either overlaps traJ or is antisense to it. PMID:3522549

  1. GIPS: A Software Guide to Sequencing-Based Direct Gene Cloning in Forward Genetics Studies.

    PubMed

    Hu, Han; Wang, Weitao; Zhu, Zhongxu; Zhu, Jianhua; Tan, Deyong; Zhou, Zhipeng; Mao, Chuanzao; Chen, Xin

    2016-04-01

    The Gene Identification via Phenotype Sequencing (GIPS) software considers a range of experimental and analysis choices in sequencing-based forward genetics studies within an integrated probabilistic framework, which enables direct gene cloning from the sequencing of several unrelated mutants of the same phenotype without the need to create segregation populations. GIPS estimates four measurements to help optimize an analysis procedure as follows: (1) the chance of reporting the true phenotype-associated gene; (2) the expected number of random genes that may be reported; (3) the significance of each candidate gene's association with the phenotype; and (4) the significance of violating the Mendelian assumption if no gene is reported or if all candidate genes have failed validation. The usage of GIPS is illustrated with the identification of a rice (Oryza sativa) gene that epistatically suppresses the phenotype of the phosphate2 mutant from sequencing three unrelated ethyl methanesulfonate mutants. GIPS is available at https://github.com/synergy-zju/gips/wiki with the user manual and an analysis example. PMID:26842621

  2. Bacterial metabarcoding by 16S rRNA gene ion torrent amplicon sequencing.

    PubMed

    Fantini, Elio; Gianese, Giulio; Giuliano, Giovanni; Fiore, Alessia

    2015-01-01

    Ion Torrent is a next generation sequencing technology based on the detection of hydrogen ions produced during DNA chain elongation; this technology allows analyzing and characterizing genomes, genes, and species. Here, we describe an Ion Torrent procedure applied to the metagenomic analysis of 16S rRNA gene amplicons to study the bacterial diversity in food and environmental samples. PMID:25343859

  3. Draft Genome Sequence and Gene Annotation of the Uropathogenic Bacterium Proteus mirabilis Pr2921

    PubMed Central

    Giorello, F. M.; Romero, V.; Farias, J.; Scavone, P.; Umpiérrez, A.; Zunino, P.

    2016-01-01

    Here, we report the genome sequence of Proteus mirabilis Pr2921, a uropathogenic bacterium that can cause severe complicated urinary tract infections. After gene annotation, we identified two additional copies of ucaA, one of the most studied fimbrial protein genes, and other fimbriae related-proteins that are not present in P. mirabilis HI4320. PMID:27340058

  4. DETECTION OF EXOGENOUS GENE SEQUENCES IN DISSOLVED DNA FROM AQUATIC ENVIRONMENTS

    EPA Science Inventory

    A method for the concentration and detection of gene sequences in the dissolved DNA from freshwater and marine environments has been developed. he limit of detection in the dot blot format was 167 fg/ml (100 ml sample) for exogenous herpes simplex thymidine kinase (TK) gene that ...

  5. Patterns of gene expression in microarrays and expressed sequence tags from normal and cataractous lenses.

    PubMed

    Sousounis, Konstantinos; Tsonis, Panagiotis A

    2012-01-01

    In this contribution, we have examined the patterns of gene expression in normal and cataractous lenses as presented in five different papers using microarrays and expressed sequence tags. The purpose was to evaluate unique and common patterns of gene expression during development, aging and cataracts. PMID:23244575

  6. Sequence and diversity of rabbit T-cell receptor gamma chain genes

    SciTech Connect

    Isono, T.; Kim, C.J.; Seto, A.

    1995-03-01

    The nucleotide sequences of one constant (C), six variable (V), and two joining (J) gene segments coding for the rabbit T-cell receptor gamma chain (Tcrg) were determined by directly sequencing fragments amplified by the cassette-ligation mediated polymerase chain reaction. The Tcrg-C gene segment did not encode a cysteine residue for connection to the Tcr delta chain in the connecting region, and two variant forms of the Tcrg-C gene segment were generated by alternative splicing, like the human Tcrg-C2 gene. Five of six rabbit Tcrg-V gene segments belonged to the same family and displayed similarity to five productive human Tcrg-V1 family genes as well as the mouse Tcrg-V5 gene. The remaining rabbit Tcrg-V gene segment displayed similarity to the human Tcrg-V3 gene. Both rabbit Tcrg-J gene segments displayed similarity to the human Tcrg-J2.1 and 2.3, respectively. These findings suggested that the genomic organization of rabbit Tcrg genes is more similar to that of human than of mouse Tcrg genes. 18 refs., 4 figs., 1 tab.

  7. Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum disorders.

    PubMed

    O'Roak, Brian J; Vives, Laura; Fu, Wenqing; Egertson, Jarrett D; Stanaway, Ian B; Phelps, Ian G; Carvill, Gemma; Kumar, Akash; Lee, Choli; Ankenman, Katy; Munson, Jeff; Hiatt, Joseph B; Turner, Emily H; Levy, Roie; O'Day, Diana R; Krumm, Niklas; Coe, Bradley P; Martin, Beth K; Borenstein, Elhanan; Nickerson, Deborah A; Mefford, Heather C; Doherty, Dan; Akey, Joshua M; Bernier, Raphael; Eichler, Evan E; Shendure, Jay

    2012-12-21

    Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes-CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1-may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly and DYRK1A-microcephaly) and replicate the importance of a β-catenin-chromatin-remodeling network to ASD etiology. PMID:23160955

  8. Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation

    SciTech Connect

    Burian, D.; Clifton, S.W.; Crabtree, J.

    1995-05-01

    The complete human BCR gene (152j-141 nt) on chromosome 22 and greater than 80% of the human ABL gene (179-512 nt) on chromosome 9 have been sequenced from mapped cosmid and plasmid clones via a shotgun strategy. Because these two chromosomes are translocated with breakpoints within the BCR and ABL genes in Philadelphia chromosome-positive leukemias, knowledge of these sequences also might provide insight into the validity of various theories of chromosomal rearrangements. Comparison of these genes with their cDNA sequences reveal the positions of 23 BCR exons and putative alternative BCR first and second exons, as well as the common ABL exons 2-11, respectively. Additionally, these regions include the alternative ABL first exons 1b and 1a, a new gene 5` to the first ABL exon, and an open reading frame with homology to an EST within the BCR fourth intron. Further analysis reveals an Alu homology of 38.83 and 39.35% for the BCR and ABL genes, respectively, with other repeat elements present to a lesser extent. Four new Philadelphia chromosome translocation breakpoints from chronic myelogenous leukemia patients also were sequenced, and the positions of these and several other previously sequenced breakpoints now have been mapped precisely, although no consistent breakpoint features immediately were apparent. Comparative analysis of genomic sequences encompassing the murine homologues to the human ABL exons 1b and 1a, as well as regions encompassing the ABL exons 2 and 3, reveals that although there is a high degree of homology in their corresponding exons and promoter regions, these two vertebrate species show a striking lack of homology outside these regions. 122 refs., 5 figs., 4 tabs.

  9. Sequence of the Proteus mirabilis urease accessory gene ureG.

    PubMed

    Sriwanthana, B; Island, M D; Mobley, H L

    1993-07-15

    We report the sequence of ureG, an accessory gene that is a part of the ure gene cluster of uropathogenic Proteus mirabilis and required for full enzymatic activity of urease. The 615-bp open reading frame predicts a M(r) 22,374 polypeptide, which contains a consensus amino acid (aa) sequence for ATP-binding. The polypeptide shares sequence homology with UreG of Escherichia coli (93% of identical aa), Klebsiella aerogenes (59%) and Helicobacter pylori (59%). PMID:8335248

  10. [Characterization of Black and Dichothrix Cyanobacteria Based on the 16S Ribosomal RNA Gene Sequence

    NASA Technical Reports Server (NTRS)

    Ortega, Maya

    2010-01-01

    My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.

  11. Exome sequencing identifies NBEAL2 as the causative gene for Gray Platelet Syndrome

    PubMed Central

    Albers, Cornelis A; Cvejic, Ana; Favier, Rémi; Bouwmans, Evelien E; Alessi, Marie-Christine; Bertone, Paul; Jordan, Gregory; Kettleborough, Ross NW; Kiddle, Graham; Kostadima, Myrto; Read, Randy J; Sipos, Botond; Sivapalaratnam, Suthesh; Smethurst, Peter A; Stephens, Jonathan; Voss, Katrin; Nurden, Alan; Rendon, Augusto; Nurden, Paquita; Ouwehand, Willem H

    2012-01-01

    Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder characterized by a mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated cases and identified as the causative gene NBEAL2, a gene with previously unknown function but a member of a gene family involved in granule development. Silencing of nbeal2 in zebrafish abrogated thrombocyte formation. PMID:21765411

  12. Mosaic gene conversion after a tandem duplication of mtDNA sequence in Diomedeidae (albatrosses).

    PubMed

    Eda, Masaki; Kuro-o, Masaki; Higuchi, Hiroyoshi; Hasegawa, Hiroshi; Koike, Hiroko

    2010-04-01

    Although the tandem duplication of mitochondrial (mt) sequences, especially those of the control region (CR), has been detected in metazoan species, few studies have focused on the features of the duplicated sequence itself, such as the gene conversion rate, distribution patterns of the variation, and relative rates of evolution between the copies. To investigate the features of duplicated mt sequences, we partially sequenced the mt genome of 16 Phoebastria albatrosses belonging to three species (P. albatrus, P. nigripes, and P. immutabilis). More than 2,300 base pairs of tandemly-duplicated sequence were shared by all three species. The observed gene arrangement was shared in the three Phoebastria albatrosses and suggests that the duplication event occurred in the common ancestor of the three species. Most of the copies in each individual were identical or nearly identical, and were maintained through frequent gene conversions. By contrast, portions of CR domains I and III had different phylogenetic signals, suggesting that gene conversion had not occurred in those sections after the speciation of the three species. Several lines of data, including the heterogeneity of the rate of molecular evolution, nucleotide differences, and putative secondary structures, suggests that the two sequences in CR domain I are maintained through selection; however, additional studies into the mechanisms of gene conversion and mtDNA synthesis are required to confirm this hypothesis. PMID:20558899

  13. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  14. GeneSV – an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences

    PubMed Central

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W. C.; Cardosa, Jane; Weaver, Scott C.; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  15. Cloning, sequence, and expression of a lipase gene from Pseudomonas cepacia: lipase production in heterologous hosts requires two Pseudomonas genes.

    PubMed Central

    Jørgensen, S; Skov, K W; Diderichsen, B

    1991-01-01

    The lipA gene encoding an extracellular lipase from Pseudomonas cepacia was cloned and sequenced. Downstream from the lipase gene an open reading frame was identified, and the corresponding gene was named limA. lipA was well expressed only in the presence of limA. limA exerts its effect both in cis and in trans and therefore produces a diffusible gene product, presumably a protein of 344 amino acids. Replacement of the lipA expression signals (promoter, ribosome-binding site, and signal peptide-coding sequences) by heterologous signals from gram-positive bacteria still resulted in limA-dependent lipA expression in Escherichia coli, Bacillus subtilis, and Streptomyces lividans. Images PMID:1987151

  16. Mapping of aldose reductase gene sequences to human chromosomes 1, 3, 7, 9, 11, and 13

    SciTech Connect

    Bateman, J.B.; Kojis, T. UCLA School of Medicine, Los Angeles, CA ); Heinzmann, C.; Sparkes, R.S.; Klisak, I.; Diep, A. ); Carper, D. ); Nishimura, Chihiro ); Mohandas, T. )

    1993-09-01

    Aldose reductase (alditol:NAD(P)+ 1-oxidoreductase; EC 1.1.1.21) (AR) catalyzes the reduction of several aldehydes, including that of glucose, to the corresponding sugar alcohol. Using a complementary DNA clone encoding human AR, the authors mapped the gene sequences to human chromosomes 1, 3, 7, 9, 11, 13, 14, and 18 by somatic cell hybridization. By in situ hybridization analysis, sequences were localized to human chromosomes 1q32-q43, 3p12, 7q31-q35, 9q22, 11p14-p15, and 13q14-q21. As a putative functional AR gene has been mapped to chromosome 7 and a putative pseudogene to chromosome 3, the sequences on the other seven chromosomes may represent other active genes, non-aldose reductase homologous sequences, or pseudogenes. 24 refs., 3 figs., 2 tabs.

  17. Proximal and distal sequences control UV cone pigment gene expression in transgenic zebrafish.

    PubMed

    Luo, Wenqin; Williams, John; Smallwood, Philip M; Touchman, Jeffrey W; Roman, Laura M; Nathans, Jeremy

    2004-04-30

    The molecular basis of cone photoreceptor-specific gene expression is largely unknown. In this study, we define cis-acting DNA sequences that control the cell type-specific expression of the zebrafish UV cone pigment gene by transient expression of green fluorescent protein transgenes following their injection into zebrafish embryos. These experiments show that 4.8 kb of 5'-flanking sequences from the zebrafish UV pigment gene direct expression specifically to UV cones and that this activity requires both distal and proximal sequences. In addition, we demonstrate that a proximal region located between -215 and -110 bp (with respect to the initiator methionine codon) can function in the context of a zebrafish rhodopsin promotor to convert its specificity from rod-only expression to rod and UV cone expression. These experiments demonstrate the power of transient transgenesis in zebrafish to efficiently define cis-acting regulatory sequences in an intact vertebrate. PMID:14966125

  18. Combined sequence and sequence-structure based methods for analyzing FGF23, CYP24A1 and VDR genes.

    PubMed

    Nagamani, Selvaraman; Singh, Kh Dhanachandra; Muthusamy, Karthikeyan

    2016-09-01

    FGF23, CYP24A1 and VDR altogether play a significant role in genetic susceptibility to chronic kidney disease (CKD). Identification of possible causative mutations may serve as therapeutic targets and diagnostic markers for CKD. Thus, we adopted both sequence and sequence-structure based SNP analysis algorithm in order to overcome the limitations of both methods. We explore the functional significance towards the prediction of risky SNPs associated with CKD. We assessed the performance of four widely used pathogenicity prediction methods. We compared the performances of the programs using Mathews correlation Coefficient ranged from poor (MCC = 0.39) to reasonably good (MCC = 0.42). However, we got the best results for the combined sequence and structure based analysis method (MCC = 0.45). 4 SNPs from FGF23 gene, 8 SNPs from VDR gene and 13 SNPs from CYP24A1 gene were predicted to be the causative agents for human diseases. This study will be helpful in selecting potential SNPs for experimental study from the SNP pool and also will reduce the cost for identification of potential SNPs as a genetic marker. PMID:27114920

  19. Combined sequence and sequence-structure based methods for analyzing FGF23, CYP24A1 and VDR genes

    PubMed Central

    Nagamani, Selvaraman; Singh, Kh. Dhanachandra; Muthusamy, Karthikeyan

    2016-01-01

    FGF23, CYP24A1 and VDR altogether play a significant role in genetic susceptibility to chronic kidney disease (CKD). Identification of possible causative mutations may serve as therapeutic targets and diagnostic markers for CKD. Thus, we adopted both sequence and sequence-structure based SNP analysis algorithm in order to overcome the limitations of both methods. We explore the functional significance towards the prediction of risky SNPs associated with CKD. We assessed the performance of four widely used pathogenicity prediction methods. We compared the performances of the programs using Mathews correlation Coefficient ranged from poor (MCC = 0.39) to reasonably good (MCC = 0.42). However, we got the best results for the combined sequence and structure based analysis method (MCC = 0.45). 4 SNPs from FGF23 gene, 8 SNPs from VDR gene and 13 SNPs from CYP24A1 gene were predicted to be the causative agents for human diseases. This study will be helpful in selecting potential SNPs for experimental study from the SNP pool and also will reduce the cost for identification of potential SNPs as a genetic marker. PMID:27114920

  20. Molecular cloning, expression, and sequence of the pilin gene from nontypeable Haemophilus influenzae M37.

    PubMed Central

    Coleman, T; Grass, S; Munson, R

    1991-01-01

    Nontypeable Haemophilus influenzae M37 adheres to human buccal epithelial cells and exhibits mannose-resistant hemagglutination of human erythrocytes. An isogenic variant of this strain which was deficient in hemagglutination was isolated. A protein with an apparent molecular weight of 22,000 was present in the sodium dodecyl sulfate-polyacrylamide gel profile of sarcosyl-insoluble proteins from the hemagglutination-proficient strain but was absent from the profile of the isogenic hemagglutination-deficient variant. A monoclonal antibody which reacts with the hemagglutination-proficient isolate but not with the hemagglutination-deficient isolate has been characterized. This monoclonal antibody was employed in an affinity column for purification of the protein as well as to screen a genomic library for recombinant clones expressing the gene. Several clones which contained overlapping genomic fragments were identified by reaction with the monoclonal antibody. The gene for the 22-kDa protein was subcloned and sequenced. The gene for the type b pilin from H. influenzae type b strain MinnA was also cloned and sequenced. The DNA sequence of the strain MinnA gene was identical to that reported previously for two other type b strains. The DNA sequence of the strain M37 gene is 77% identical to that of the type b pilin gene, and the derived amino acid sequence is 68% identical to that of the type b pilin. Images PMID:1673447

  1. Novel candidate genes putatively involved in stress fracture predisposition detected by whole-exome sequencing.

    PubMed

    Friedman, Eitan; Moran, Daniel S; Ben-Avraham, Danny; Yanovich, Ran; Atzmon, Gil

    2014-01-01

    While genetic factors in all likelihood contribute to stress fracture (SF) pathogenesis, a few studies focusing on candidate genes have previously been reported. The objective of this study is to gain better understanding on the genetic basis of SF in a gene-naive manner. Exome sequence capture followed by massive parallel sequencing of two pooled DNA samples from Israeli combat soldiers was employed: cases with high grade SF and ethnically matched healthy controls. The resulting sequence variants were individually verified using the Sequenom™ platform and the contribution of the genetic alterations was validated in a second cohort of cases and controls. In the discovery set that included DNA pool of cases (n = 34) and controls (n = 60), a total of 1174 variants with >600 reads/variant/DNA pool were identified, and 146 (in 127 genes) of these exhibited statistically significant (P < 0·05) different rates between SF cases and controls after multiple comparisons correction. Subsequent validation of these 146 sequence variants individually in a total of 136 SF cases and 127 controls using the Sequenom™ platform validated 20/146 variants. Of these, three missense mutations (rs7426114, rs4073918, rs3752135 in the NEB, SLC6A18 and SIGLEC12 genes, respectively) and three synonymous mutations (rs2071856, rs2515941, rs716745 in the ELFN2, GRK4, LRRC55 genes) displayed significant different rates in SF cases compared with controls. Exome sequencing seemingly unravelled novel candidate genes as involved in SF pathogenesis and predisposition. PMID:25023003

  2. Presence and Expression of Microbial Genes Regulating Soil Nitrogen Dynamics Along the Tanana River Successional Sequence

    NASA Astrophysics Data System (ADS)

    Boone, R. D.; Rogers, S. L.

    2004-12-01

    We report on work to assess the functional gene sequences for soil microbiota that control nitrogen cycle pathways along the successional sequence (willow, alder, poplar, white spruce, black spruce) on the Tanana River floodplain, Interior Alaska. Microbial DNA and mRNA were extracted from soils (0-10 cm depth) for amoA (ammonium monooxygenase), nifH (nitrogenase reductase), napA (nitrate reductase), and nirS and nirK (nitrite reductase) genes. Gene presence was determined by amplification of a conserved sequence of each gene employing sequence specific oligonucleotide primers and Polymerase Chain Reaction (PCR). Expression of the genes was measured via nested reverse transcriptase PCR amplification of the extracted mRNA. Amplified PCR products were visualized on agarose electrophoresis gels. All five successional stages show evidence for the presence and expression of microbial genes that regulate N fixation (free-living), nitrification, and nitrate reduction. We detected (1) nifH, napA, and nirK presence and amoA expression (mRNA production) for all five successional stages and (2) nirS and amoA presence and nifH, nirK, and napA expression for early successional stages (willow, alder, poplar). The results highlight that the existing body of previous process-level work has not sufficiently considered the microbial potential for a nitrate economy and free-living N fixation along the complete floodplain successional sequence.

  3. RNA-Seq Analysis and Gene Discovery of Andrias davidianus Using Illumina Short Read Sequencing

    PubMed Central

    Li, Fenggang; Wang, Lixin; Lan, Qingjing; Yang, Hui; Li, Yang; Liu, Xiaolin; Yang, Zhaoxia

    2015-01-01

    The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp). Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR). The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians. PMID:25874626

  4. Sequence and comparative analysis of the MIP gene in Chinese straw mushroom, Volvariella volvacea.

    PubMed

    Chen, Bing-Zhi; Gui, Fu; Xie, Bao-Gui; Zou, Feng; Jiang, Yu-Ji; Deng, You-Jin

    2012-09-01

    The mitochondrial intermediate peptidase (MIP) gene is conserved in fungi. It is linked closely with the mating-type A (mtA) gene. In this study, a fragment of the MIP gene in Volvariella volvacea (Bull. ex Fr.) Singer was first cloned by homologue-based cloning technology. Subsequently, the entire MIP DNA sequence (PYd21-MIP) was obtained after the fragment was compared with the genomic data through BLAST analysis. The PYd21-MIP sequence appeared to be homologous with the MIP gene in other fungi. Phylogenetic analysis of PYd21-MIP and other MIP sequences from diverse fungi agreed with the current organism phylogeny. Analysis of protein domains by InterProScan software and motif searching demonstrated that PYd21-MIP encodes a homologous MIP protein. These data support the hypothesis that the PYd21-MIP protein is a Hog-MIP protein homologue from V. volvacea. PMID:22937907

  5. Characterization of fusion genes and the significantly expressed fusion isoforms in breast cancer by hybrid sequencing

    PubMed Central

    Weirather, Jason L.; Afshar, Pegah Tootoonchi; Clark, Tyson A.; Tseng, Elizabeth; Powers, Linda S.; Underwood, Jason G.; Zabner, Joseph; Korlach, Jonas; Wong, Wing Hung; Au, Kin Fai

    2015-01-01

    We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes. PMID:26040699

  6. Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues

    PubMed Central

    Lee, Je Hyuk; Daugharthy, Evan R.; Scheiman, Jonathan; Kalhor, Reza; Ferrante, Thomas C.; Terry, Richard; Turczyk, Brian M.; Yang, Joyce L.; Lee, Ho Suk; Aach, John; Zhang, Kun; Church, George M.

    2014-01-01

    RNA sequencing measures the quantitative change in gene expression over the whole transcriptome, but it lacks spatial context. On the other hand, in situ hybridization provides the location of gene expression, but only for a small number of genes. Here we detail a protocol for genome-wide profiling of gene expression in situ in fixed cells and tissues, in which RNA is converted into cross-linked cDNA amplicons and sequenced manually on a confocal microscope. Unlike traditional RNA-seq our method enriches for context-specific transcripts over house-keeping and/or structural RNA, and it preserves the tissue architecture for RNA localization studies. Our protocol is written for researchers experienced in cell microscopy with minimal computing skills. Library construction and sequencing can be completed within 14 d, with image analysis requiring an additional 2 d. PMID:25675209

  7. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C.

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  8. Target Capture and Massive Sequencing of Genes Transcribed in Mytilus galloprovincialis

    PubMed Central

    Rosani, Umberto

    2014-01-01

    Next generation sequencing (NGS) allows fast and massive production of both genome and transcriptome sequence datasets. As the genome of the Mediterranean mussel Mytilus galloprovincialis is not available at present, we have explored the possibility of reducing the whole genome sequencing efforts by using capture probes coupled with PCR amplification and high-throughput 454-sequencing to enrich selected genomic regions. The enrichment of DNA target sequences was validated by real-time PCR, whereas the efficacy of the applied strategy was evaluated by mapping the 454-output reads against reference transcript data already available for M. galloprovincialis and by measuring coverage, SNPs, number of de novo sequenced introns, and complete gene sequences. Focusing on a target size of nearly 1.5 Mbp, we obtained a target coverage which allowed the identification of more than 250 complete introns, 10,741 SNPs, and also complete gene sequences. This study confirms the transcriptome-based enrichment of gDNA regions as a good strategy to expand knowledge on specific subsets of genes also in nonmodel organisms. PMID:25101286

  9. Sequence analysis of the ERCC2 gene regions in human, mouse, and hamster reveals three linked genes.

    PubMed

    Lamerdin, J E; Stilwagen, S A; Ramirez, M H; Stubbs, L; Carrano, A V

    1996-06-15

    The ERCC2 (excision repair cross-complementing rodent repair group 2) gene product is involved in transcription-coupled repair as an integral member of the basal transcription factor BTF2/TFIIH complex. Defects in this gene can result in three distinct human disorders, namely the cancer-prone syndrome xeroderma pigmentosum complementation group D, trichothiodystrophy, and Cockayne syndrome. We report the comparative analysis of 91.6 kb of new sequence including 54.3 kb encompassing the human ERCC2 locus, the syntenic region in the mouse (32.6 kb), and a further 4.7 kb of sequence 3' of the previously reported ERCC2 region in the hamster. In addition to ERCC2, our analysis revealed the presence of two previously undescribed genes in all three species. The first is centromeric (in the human) to ERCC2 and is most similar to the kinesin light chain gene in sea urchin. The second gene is telomeric (in the human) to ERCC2 and contains a motif found in ankyrins, some cell cycle proteins, and transcription factors. Multiple EST matches to this putative new gene indicate that it is expressed in several human tissues, including breast. The identification and description of two new genes provides potential candidate genes for disorders mapping to this region of 19q13.2. PMID:8786141

  10. The nucleotide sequence of the large ribosomal RNA gene and the adjacent tRNA genes from rat mitochondria.

    PubMed Central

    Saccone, C; Cantatore, P; Gadaleta, G; Gallerani, R; Lanave, C; Pepe, G; Kroon, A M

    1981-01-01

    We have sequenced the Eco R(1) fragment D from rat mitochondrial DNA. It contains one third of the tRNA (Val) gene (the remaining part has been sequenced from the 3' end of the Eco R(1) fragment A) the complete gene for the large mt 16S rRNA, the tRNA (Leu) gene and the 5' end of an unidentified reading frame. The mt gene for the large rRNA from rat has been aligned with the homologous genes from mouse and human using graphic computer programs. Hypervariable regions at the center of the molecule and highly conserved regions toward the 3' end have been detected. The mt gene for tRNA Leu is of the conventional type and its primary structure is highly conserved among mammals. The mt gene for tRNA(Val) shows characteristics similar to those of other mt tRNA genes but the degree of homology is lower. Comparative studies confirm that AGA and AGG are read as stop codons in mammalian mitochondria. PMID:6913863

  11. Sequence analysis of the ERCC2 gene regions in human, mouse, and hamster reveals three linked genes

    SciTech Connect

    Lamerdin, J.E.; Stilwagen, S.A.; Ramirez, M.H.

    1996-06-15

    The ERCC2 (excision repair cross-complementing rodent repair group 2) gene product is involved in transcription-coupled repair as an integral member of the basal transcription factor BTF2/TFIIH complex. Defects in this gene can result in three distinct human disorders, namely the cancer-prone syndrome xeroderma pigmentosum complementation group D, trichothiodystrophy, and Cockayne syndrome. We report the comparative analysis of 91.6 kb of new sequence including 54.3 kb encompassing the human ERCC2 locus, the syntenic region in the mouse (32.6 kb), and a further 4.7 kb of sequence 3{prime} of the previously reported ERCC2 region in the hamster. In addition to ERCC2, our analysis revealed the presence of two previously undescribed genes in all three species. The first is centromeric (in the human) to ERCC2 and is most similar to the kinesin light chain gene in sea urchin. The second gene is telomeric (in the human) to ERCC2 and contains a motif found in ankyrins, some cell proteins, and transcription factors. Multiple EST matches to this putative new gene indicate that it is expressed in several human tissues, including breast. The identification and description of two new genes provides potential candidate genes for disorders mapping to this region of 19q13.2. 42 refs., 6 figs., 3 tabs.

  12. Research Techniques Made Simple: Bacterial 16S Ribosomal RNA Gene Sequencing in Cutaneous Research.

    PubMed

    Jo, Jay-Hyun; Kennedy, Elizabeth A; Kong, Heidi H

    2016-03-01

    Skin serves as a protective barrier and also harbors numerous microorganisms collectively comprising the skin microbiome. As a result of recent advances in sequencing (next-generation sequencing), our understanding of microbial communities on skin has advanced substantially. In particular, the 16S ribosomal RNA gene sequencing technique has played an important role in efforts to identify the global communities of bacteria in healthy individuals and patients with various disorders in multiple topographical regions over the skin surface. Here, we describe basic principles, study design, and a workflow of 16S ribosomal RNA gene sequencing methodology, primarily for investigators who are not familiar with this approach. This article will also discuss some applications and challenges of 16S ribosomal RNA sequencing as well as directions for future development. PMID:26902128

  13. Comparative Sequence Analysis of the Sorghum Rph Region and the Maize Rp1 Resistance Gene Complex

    PubMed Central

    Ramakrishna, Wusirika; Emberton, John; SanMiguel, Phillip; Ogden, Matthew; Llaca, Victor; Messing, Joachim; Bennetzen, Jeffrey L.

    2002-01-01

    A 268-kb chromosomal segment containing sorghum (Sorghum bicolor) genes that are orthologous to the maize (Zea mays) Rp1 disease resistance (R) gene complex was sequenced. A region of approximately 27 kb in sorghum was found to contain five Rp1 homologs, but most have structures indicating that they are not functional. In contrast, maize inbred B73 has 15 Rp1 homologs in two nearby clusters of 250 and 300 kb. As at maize Rp1, the cluster of R gene homologs is interrupted by the presence of several genes that appear to have no resistance role, but these genes were different from the ones found within the maize Rp1 complex. More than 200 kb of DNA downstream from the sorghum Rp1-orthologous R gene cluster was sequenced and found to contain many duplicated and/or truncated genes. None of the duplications currently exist as simple tandem events, suggesting that numerous rearrangements were required to generate the current genomic structure. Four truncated genes were observed, including one gene that appears to have both 5′ and 3′ deletions. The maize Rp1 region is also unusually enriched in truncated genes. Hence, the orthologous maize and sorghum regions share numerous structural features, but all involve events that occurred independently in each species. The data suggest that complex R gene clusters are unusually prone to frequent internal and adjacent chromosomal rearrangements of several types. PMID:12481055

  14. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  15. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity

    PubMed Central

    Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.

    2015-01-01

    Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance

  16. Analysis of early hepatic stage schistosomula gene expression by subtractive expressed sequence tags library.

    PubMed

    Wang, Xinzhi; Gobert, Geoffrey N; Feng, XinGang; Fu, Zhiqiang; Jin, Yamei; Peng, Jinbiao; Lin, Jiaojiao

    2009-07-01

    Schistosome parasites require a complex lifecycle requiring two hosts and aquatic phases of development. The schistosomula is a key phase of parasite development within the mammalian host, however relatively little is understood about the molecular processes underlying this stage. In this study 5723 subtractive expressed sequence tags (ESTs) were randomly selected from a 7 day hepatic schistosomula enriched library constructed using suppression subtractive hybridization method. Sequence analysis of these ESTs identified 1762 unique genes (contigs). Among them, 989 contigs were annotated with known genes, 311 contigs were homologous to established genes, 101 contigs were similar to established genes, 72 contigs were weakly similar to established genes and 289 sequences did not match any published sequences. Genes identified related to metabolism, cellular development, immune evasion and host-parasite interactions were identified as enriched in the hepatic schistosomula stage. The future identification of poorly annotated but stage-specific genes may potentially represent new drugs or vaccine targets, applicable for the future controlling of schistosomiasis. PMID:19428674

  17. The nucleotide sequence of the uvrD gene of E. coli.

    PubMed Central

    Finch, P W; Emmerson, P T

    1984-01-01

    The nucleotide sequence of a cloned section of the E. coli chromosome containing the uvrD gene has been determined. The coding region for the UvrD protein consists of 2,160 nucleotides which would direct the synthesis of a polypeptide 720 amino acids long with a calculated molecular weight of 82 kd. The predicted amino acid sequence of the UvrD protein has been compared with the amino acid sequences of other known adenine nucleotide binding proteins and a common sequence has been identified, thought to contribute towards adenine nucleotide binding. PMID:6379604

  18. The qa repressor gene of Neurospora crassa: wild-type and mutant nucleotide sequences.

    PubMed Central

    Huiet, L; Giles, N H

    1986-01-01

    The qa-1S gene, one of two regulatory genes in the qa gene cluster of Neurospora crassa, encodes the qa repressor. The qa-1S gene together with the qa-1F gene, which encodes the qa activator protein, control the expression of all seven qa genes, including those encoding the inducible enzymes responsible for the utilization of quinic acid as a carbon source. The nucleotide sequence of the qa-1S gene and its flanking regions has been determined. The deduced coding sequence for the qa-1S protein encodes 918 amino acids with a calculated molecular weight of 100,650 and is interrupted by a single 66-base-pair intervening sequence. Both constitutive and noninducible mutants occur in the qa-1S gene and two different mutations of each type have been cloned and sequenced. All four mutations occur within the predicted coding region of the qa-1S gene. This result strongly supports the hypothesis that the qa-1S gene encodes a repressor. All four mutations are located within codons for the last 300 amino acids of the qa-1S protein. The mutations in three of the mutants involve amino acid substitutions, while the fourth mutant, which has a constitutive phenotype, contains a frameshift mutation. The two constitutive mutations occur in the most distal region of the gene, possibly implicating the COOH-terminal region of the qa repressor in binding to its target. The two noninducible mutations occur in a region proximal to the constitutive mutations, possibly implicating this region of the qa repressor in binding the inducer. Images PMID:3010294

  19. A pilot study of gene testing of genetic bone dysplasia using targeted next-generation sequencing.

    PubMed

    Zhang, Huiwen; Yang, Rui; Wang, Yu; Ye, Jun; Han, Lianshu; Qiu, Wenjuan; Gu, Xuefan

    2015-12-01

    Molecular diagnosis of genetic bone dysplasia is challenging for non-expert. A targeted next-generation sequencing technology was applied to identify the underlying molecular mechanism of bone dysplasia and evaluate the contribution of these genes to patients with bone dysplasia encountered in pediatric endocrinology. A group of unrelated patients (n=82), characterized by short stature, dysmorphology and X-ray abnormalities, of which mucopolysacharidoses, GM1 gangliosidosis, mucolipidosis type II/III and achondroplasia owing to FGFR3 G380R mutation had been excluded, were recruited in this study. Probes were designed to 61 genes selected according to the nosology and classification of genetic skeletal disorders of 2010 by Illumina's online DesignStudio software. DNA was hybridized with probes and then a library was established following the standard Illumina protocols. Amplicon library was sequenced on a MiSeq sequencing system and the data were analyzed by MiSeq Reporter. Mutations of 13 different genes were found in 44 of the 82 patients (54%). Mutations of COL2A1 gene and PHEX gene were found in nine patients, respectively (9/44=20%), followed by COMP gene in 8 (18%), TRPV4 gene in 4 (9%), FBN1 gene in 4 (9%), COL1A1 gene in 3 (6%) and COL11A1, TRAPPC2, MATN3, ARSE, TRPS1, SMARCAL1, ENPP1 gene mutations in one patient each (2% each). In conclusion, mutations of COL2A1, PHEX and COMP gene are common for short stature due to bone dysplasia in outpatient clinics in pediatric endocrinology. Targeted next-generation sequencing is an efficient way to identify the underlying molecular mechanism of genetic bone dysplasia. PMID:26377240

  20. Insights into corn genes derived from large-scale cDNA sequencing.

    PubMed

    Alexandrov, Nickolai N; Brover, Vyacheslav V; Freidin, Stanislav; Troukhan, Maxim E; Tatarinova, Tatiana V; Zhang, Hongyu; Swaller, Timothy J; Lu, Yu-Ping; Bouck, John; Flavell, Richard B; Feldmann, Kenneth A

    2009-01-01

    We present a large portion of the transcriptome of Zea mays, including ESTs representing 484,032 cDNA clones from 53 libraries and 36,565 fully sequenced cDNA clones, out of which 31,552 clones are non-redundant. These and other previously sequenced transcripts have been aligned with available genome sequences and have provided new insights into the characteristics of gene structures and promoters within this major crop species. We found that although the average number of introns per gene is about the same in corn and Arabidopsis, corn genes have more alternatively spliced isoforms. Examination of the nucleotide composition of coding regions reveals that corn genes, as well as genes of other Poaceae (Grass family), can be divided into two classes according to the GC content at the third position in the amino acid encoding codons. Many of the transcripts that have lower GC content at the third position have dicot homologs but the high GC content transcripts tend to be more specific to the grasses. The high GC content class is also enriched with intronless genes. Together this suggests that an identifiable class of genes in plants is associated with the Poaceae divergence. Furthermore, because many of these genes appear to be derived from ancestral genes that do not contain introns, this evolutionary divergence may be the result of horizontal gene transfer from species not only with different codon usage but possibly that did not have introns, perhaps outside of the plant kingdom. By comparing the cDNAs described herein with the non-redundant set of corn mRNAs in GenBank, we estimate that there are about 50,000 different protein coding genes in Zea. All of the sequence data from this study have been submitted to DDBJ/GenBank/EMBL under accession numbers EU940701-EU977132 (FLI cDNA) and FK944382-FL482108 (EST). PMID:18937034

  1. Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease

    SciTech Connect

    Su, Y.; Zhang, H.; Madrid, R.

    1994-09-01

    Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses is to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.

  2. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ)

    PubMed Central

    An, Qing-Ming; Zhou, Hui-Tong; Hu, Jiang; Luo, Yu-Zhu; Hickford, Jon G. H.

    2015-01-01

    The adiponectin gene (ADIPOQ) plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5) of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A1-D1, A2-D2) were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A3-C3) and three SNPs were observed. Two patterns (A4-B4, A5-B5) and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A) putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg). In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A1, A2 and A3 were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A1-A3, A1-C3, B1-A3 and B1-C3 were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits. PMID:26610572

  3. Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

    PubMed Central

    2009-01-01

    Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay

  4. Diversity of Frankia in soil assessed by Illumina sequencing of nifH gene fragments.

    PubMed

    Rodriguez, David; Guerra, Trina M; Forstner, Michael R J; Hahn, Dittmar

    2016-09-01

    Targeted Illumina sequencing of nitrogenase reductase (nifH) gene fragments and analyses of pair-end reads through a modified QIIME pipeline were used to assess the diversity of the actinomyceteous genus Frankia in three soils. Soils were vegetated with host or non-host plants, and included locations in Illinois (ABA, host), Colorado (CoMt, non-host), and Wisconsin (FMWI, non-host). After filtering, seven unique sequences were recovered for soil ABA, six for CoMt, and four sequences for FMWI. These sequences were included in a Bayesian topology anchored by published sequence data from pure cultures of Frankia. Sequences from all three soils showed affinities to Frankia strains from both the Alnus and Elaeagnus host infection groups. Reads representing Casuarina-infective strains were not detected. Four sequences from soil CoMt and five sequences from soil ABA did not cluster, at 97% similarity, into a shared OTU that contained a cultured relative. These results demonstrate that targeted Illumina sequencing provides an efficient and economical method for assessing haplotype diversity of ecofunctional genes (e.g. nifH) at the genus level in microorganisms that perform important ecosystem functions. PMID:27485903

  5. Isolation, sequence identification and tissue expression profiles of 3 novel porcine genes: ASPA, NAGA, and HEXA.

    PubMed

    Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa

    2008-01-01

    The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes. PMID:18670062

  6. Nucleotide sequence and expression of the gene encoding the EcoRII modification enzyme.

    PubMed Central

    Som, S; Bhagwat, A S; Friedman, S

    1987-01-01

    The gene coding for the EcoRII modification enzyme has been cloned and the nucleotide sequence of 1933 base pairs containing the gene has been determined. The gene codes for a protein of 477 amino acids. Two transcriptional start sites have been mapped by S1 mapping. One deletion that removes 34 N-terminal amino acids was found to have partial enzyme activity. Comparison of the EcoRII methylase sequence with other cytosine methylases revealed several domains of partial homology among all cytosine methylases. Cloning the gene in multicopy pUC vectors increased the expression by 6-18 fold. A 40 fold overproduction of the EcoRII methylase was obtained by cloning the gene in the expression vector carrying the lambda PL promoter. Images PMID:3029675

  7. Impact of Pre-Analytical Variables on Cancer Targeted Gene Sequencing Efficiency.

    PubMed

    Araujo, Luiz H; Timmers, Cynthia; Shilo, Konstantin; Zhao, Weiqiang; Zhang, Jianying; Yu, Lianbo; Natarajan, Thanemozhi G; Miller, Clinton J; Yilmaz, Ayse Selen; Liu, Tom; Amann, Joseph; Lapa E Silva, José Roberto; Ferreira, Carlos Gil; Carbone, David P

    2015-01-01

    Tumor specimens are often preserved as formalin-fixed paraffin-embedded (FFPE) tissue blocks, the most common clinical source for DNA sequencing. Herein, we evaluated the effect of pre-sequencing parameters to guide proper sample selection for targeted gene sequencing. Data from 113 FFPE lung tumor specimens were collected, and targeted gene sequencing was performed. Libraries were constructed using custom probes and were paired-end sequenced on a next generation sequencing platform. A PCR-based quality control (QC) assay was utilized to determine DNA quality, and a ratio was generated in comparison to control DNA. We observed that FFPE storage time, PCR/QC ratio, and DNA input in the library preparation were significantly correlated to most parameters of sequencing efficiency including depth of coverage, alignment rate, insert size, and read quality. A combined score using the three parameters was generated and proved highly accurate to predict sequencing metrics. We also showed wide read count variability within the genome, with worse coverage in regions of low GC content like in KRAS. Sample quality and GC content had independent effects on sequencing depth, and the worst results were observed in regions of low GC content in samples with poor quality. Our data confirm that FFPE samples are a reliable source for targeted gene sequencing in cancer, provided adequate sample quality controls are exercised. Tissue quality should be routinely assessed for pre-analytical factors, and sequencing depth may be limited in genomic regions of low GC content if suboptimal samples are utilized. PMID:26605948

  8. Comparative organization of nitrogen fixation-specific genes from Azotobacter vinelandii and Klebsiella pneumoniae: DNA sequence of the nifUSV genes.

    PubMed Central

    Beynon, J; Ally, A; Cannon, M; Cannon, F; Jacobson, M; Cash, V; Dean, D

    1987-01-01

    In the facultative anaerobe Klebsiella pneumoniae 17 nitrogen fixation-specific genes (nif genes) have been identified. Homologs to 12 of these genes have now been isolated from the aerobic diazotroph Azotobacter vinelandii. Comparative studies have indicated that these diverse microorganisms share striking similarities in the genetic organization of their nif genes and in the primary structure of their individual nif gene products. In this study the complete nucleotide sequence of the nifUSV gene clusters from both K. pneumoniae and A. vinelandii were determined. These genes are identically organized on their respective genomes, and the individual genes and their products exhibit a high degree of interspecies sequence homology. PMID:3040672

  9. Performant Mutation Identification Using Targeted Next-Generation Sequencing of 14 Thoracic Aortic Aneurysm Genes.

    PubMed

    Proost, Dorien; Vandeweyer, Geert; Meester, Josephina A N; Salemink, Simone; Kempers, Marlies; Ingram, Christie; Peeters, Nils; Saenen, Johan; Vrints, Christiaan; Lacro, Ronald V; Roden, Dan; Wuyts, Wim; Dietz, Harry C; Mortier, Geert; Loeys, Bart L; Van Laer, Lut

    2015-08-01

    At least 14 causative genes have been identified for both syndromic and nonsyndromic forms of thoracic aortic aneurysm/dissection (TAA), an important cause of death in the industrialized world. Molecular confirmation of the diagnosis is increasingly important for gene-tailored patient management but consecutive, conventional molecular TAA gene screening is expensive and labor-intensive. To circumvent these problems, we developed a TAA gene panel for next-generation sequencing of 14 TAA genes. After validation, we applied the assay to 100 Marfan patients. We identified 90 FBN1 mutations, 44 of which were novel. In addition, Multiplex ligation-dependent probe amplification identified large deletions in six of the remaining samples, whereas false-negative results were excluded by Sanger sequencing of FBN1, TGFBR1, and TGFBR2 in the last four samples. Subsequently, we screened 55 syndromic and nonsyndromic TAA patients. We identified causal mutations in 15 patients (27%), one in each of the six following genes: ACTA2, COL3A1, TGFBR1, MYLK, SMAD3, SLC2A10 (homozygous), two in NOTCH1, and seven in FBN1. We conclude that our approach for TAA genetic testing overcomes the intrinsic hurdles of consecutive Sanger sequencing of all candidate genes and provides a powerful tool for the elaboration of clinical phenotypes assigned to different genes. PMID:25907466

  10. Sequence and expression analyses of the UL37 and UL38 genes of Aujeszky's disease virus.

    PubMed

    Braun, A; Kaliman, A; Boldogköi, Z; Aszódi, A; Fodor, I

    2000-01-01

    Previously, we sequenced the HSV-1 Ul39-Ul40 homologue genes of Aujeszky's disease virus (ADV), also designated as pseudorabies virus (Kaliman et al., 1994a, b). Now we report the nucleotide sequence of the adjacent DNA that encodes Ul38, the 5'-region (750 bp) of Ul37, and the promoter regions between these divergently arranged two genes. The ADV Ul38 gene encodes a protein of 368 amino acids. Amino acid sequence comparison of ADV Ul38 with that of other herpesviruses revealed significant structural homology. In a transcription study using RNase protection assay and Northern blot hybridization, we found that the Ul38 gene had one initiation site, but the Ul37 gene was initiated at two transcription sites with two potential initiator AUGs, one of which was dominant. Comparison of ADV Ul37, Ul38 and ribonucleotide reductase gene expression showed that these genes belong to the same temporal class with early kinetics. Data of structural and transcriptional studies suggest that regulation of the expression of these two ADV genes could differ from that of the HSV-1 virus. PMID:11402671

  11. Transcriptome sequencing of transgenic poplar (Populus × euramericana 'Guariento') expressing multiple resistance genes

    PubMed Central

    2014-01-01

    Background Transgenic poplar (Populus × euramericana 'Guariento') plants harboring five exogenous, stress-related genes exhibit increased tolerance to multiple stresses including drought, salt, waterlogging, and insect feeding, but the complex mechanisms underlying stress tolerance in these plants have not been elucidated. Here, we analyzed the differences in the transcriptomes of the transgenic poplar line D5-20 and the non-transgenic line D5-0 using high-throughput transcriptome sequencing techniques and elucidated the functions of the differentially expressed genes using various functional annotation methods. Results We generated 11.80 Gb of sequencing data containing 63, 430, 901 sequences, with an average length of 200 bp. The processed sequences were mapped to reference genome sequences of Populus trichocarpa. An average of 62.30% and 61.48% sequences could be aligned with the reference genomes for D5-20 and D5-0, respectively. We detected 11,352 (D5-20) and 11,372 expressed genes (D5-0), 7,624 (56.61%; D5-20) and 7,453 (65.54%; D5-0) of which could be functionally annotated. A total of 782 differentially expressed genes in D5-20 were identified compared with D5-0, including 628 up-regulated and 154 down-regulated genes. In addition, 196 genes with putative functions related to stress responses were also annotated. Gene Ontology (GO) analysis revealed that 346 differentially expressed genes are mainly involved in 67 biological functions, such as DNA binding and nucleus. KEGG annotation revealed that 36 genes (21 up-regulated and 15 down-regulated) were enriched in 51 biological pathways, 9 of which are linked to glucose metabolism. KOG functional classification revealed that 475 genes were enriched in 23 types of KOG functions. Conclusion These results suggest that the transferred exogenous genes altered the expression of stress (biotic and abiotic) response genes, which were distributed in different metabolic pathways and were linked to some extent. Our

  12. A score system for quality evaluation of RNA sequence tags: an improvement for gene expression profiling

    PubMed Central

    Pinheiro, Daniel G; Galante, Pedro AF; de Souza, Sandro J; Zago, Marco A; Silva, Wilson A

    2009-01-01

    Background High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS) or Sequencing-by-Synthesis (SBS) represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at . S3T source code and datasets can also be downloaded from the aforementioned website. PMID:19500384

  13. The complete nucleotide sequence and structure of the gene encoding bovine phenylethanolamine N-methyltransferase.

    PubMed

    Batter, D K; D'Mello, S R; Turzai, L M; Hughes, H B; Gioio, A E; Kaplan, B B

    1988-03-01

    A cDNA clone for bovine adrenal phenylethanolamine N-methyltransferase (PNMT) was used to screen a Charon 28 genomic library. One phage was identified, designated lambda P1, which included the entire PNMT gene. Construction of a restriction map, with subsequent Southern blot analysis, allowed the identification of exon-containing fragments. Dideoxy sequence analysis of these fragments, and several more further upstream, indicates that the bovine PNMT gene is 1,594 base pairs in length, consisting of three exons and two introns. The transcription initiation site was identified by two independent methods and is located approximately 12 base pairs upstream from the ATG translation start site. The 3' untranslated region is 88 base pairs in length and contains the expected polyadenylation signal (AATAAA). A putative promoter sequence (TATA box) is located about 25 base pairs upstream from the transcription initiation site. Computer comparison of the nucleotide sequence data with the consensus sequences of known regulatory elements revealed potential binding sites for glucocorticoid receptors and the Sp1 regulatory protein in the 5' flanking region of the gene. Additionally, comparison of the sequence of the exons of the PNMT gene with cDNA sequences for other enzymes involved in biogenic amine synthesis revealed no significant homology, indicating that PNMT is not a member of a multigene family of catecholamine biosynthetic enzymes. PMID:3379652

  14. A Systematic Analysis of Human Disease-Associated Gene Sequences In Drosophila melanogaster

    PubMed Central

    Reiter, Lawrence T.; Potocki, Lorraine; Chien, Sam; Gribskov, Michael; Bier, Ethan

    2001-01-01

    We performed a systematic BLAST analysis of 929 human disease gene entries associated with at least one mutant allele in the Online Mendelian Inheritance in Man (OMIM) database against the recently completed genome sequence of Drosophila melanogaster. The results of this search have been formatted as an updateable and searchable on-line database called Homophila. Our analysis identified 714 distinct human disease genes (77% of disease genes searched) matching 548 unique Drosophila sequences, which we have summarized by disease category. This breakdown into disease classes creates a picture of disease genes that are amenable to study using Drosophila as the model organism. Of the 548 Drosophila genes related to human disease genes, 153 are associated with known mutant alleles and 56 more are tagged by P-element insertions in or near the gene. Examples of how to use the database to identify Drosophila genes related to human disease genes are presented. We anticipate that cross-genomic analysis of human disease genes using the power of Drosophila second-site modifier screens will promote interaction between human and Drosophila research groups, accelerating the understanding of the pathogenesis of human genetic disease. The Homophila database is available at http://homophila.sdsc.edu. PMID:11381037

  15. Sequence variation in ROP8 gene among Toxoplasma gondii isolates from different hosts and geographical localities.

    PubMed

    Li, Z Y; Chen, J; Lu, J; Wang, C R; Zhu, X Q

    2015-01-01

    The protozoan parasite Toxoplasma gondii has a worldwide distribution; it can cause serious diseases in humans and almost all other warm-blooded animals. Different genotypes of T. gondii result in different lesions in the same host. T. gondii rhoptry protein 8 (TgROP8) is a major factor of T. gondii acute virulence. We examined sequence variation in the TgROP8 gene among T. gondii isolates from different hosts and geographical localities. The TgROP8 gene was amplified from individual isolates and sequenced. A phylogenetic tree was constructed using Bayesian inference, maximum parsimony, and maximum likelihood based on the sequences obtained plus TgME49 from the ToxoDB database. The TgROP8 gene was 1728 bp in length for all the examined T. gondii strains, and their A+T contents were 45.37-45.95%. Sequence analysis detected 140 (0.06-5.56%) variable nucleotide positions resulting in 96 (0-10.78%) amino acid substitutions. Sequence variations in the TgROP8 gene resulted in polymorphic restriction sites for endonucleases BstBI, BsaI, and XhoI, which allowed the differentiation of the three classical genotype strains (types I, II, and III) by polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP). However, phylogenetic analyses indicated that the TgROP8 gene is not a suitable genetic marker for population studies of T. gondii. PMID:26436382

  16. Detection of alternative splice and gene duplication by RNA sequencing in Japanese flounder, Paralichthys olivaceus.

    PubMed

    Wang, Wenji; Wang, Jing; You, Feng; Ma, Liman; Yang, Xiao; Gao, Jinning; He, Yan; Qi, Jie; Yu, Haiyang; Wang, Zhigang; Wang, Xubo; Wu, Zhihao; Zhang, Quanqi

    2014-12-01

    Japanese flounder (Paralichthys olivaceus) is one of the economic important fish in China. Sexual dimorphism, especially the different growth rates and body sizes between two sexes, makes this fish a good model to investigate mechanisms responsible for such dimorphism for both fundamental questions in evolution and applied topics in aquaculture. However, the lack of "omics" data has hindered the process. The recent advent of RNA-sequencing technology provides a robust tool to further study characteristics of genomes of nonmodel species. Here, we performed de novo transcriptome sequencing for a double haploid Japanese flounder individual using Illumina sequencing. A single lane of paired-end sequencing produced more than 27 million reads. These reads were assembled into 107,318 nonredundant transcripts, half of which (51,563; 48.1%) were annotated by blastx to public protein database. A total of 1051 genes that had potential alternative splicings were detected by Chrysalis implemented in Trinity software. Four of 10 randomly picked genes were verified truly containing alternative splicing by cloning and Sanger sequencing. Notably, using a doubled haploid Japanese flounder individual allow us to analyze gene duplicates. In total, 3940 "single-nucleotide polymorphisms" were detected form 1859 genes, which may have happened gene duplicates. This study lays the foundation for structural and functional genomics studies in Japanese flounder. PMID:25512620

  17. Detection of Alternative Splice and Gene Duplication by RNA Sequencing in Japanese Flounder, Paralichthys olivaceus

    PubMed Central

    Wang, Wenji; Wang, Jing; You, Feng; Ma, Liman; Yang, Xiao; Gao, Jinning; He, Yan; Qi, Jie; Yu, Haiyang; Wang, Zhigang; Wang, Xubo; Wu, Zhihao; Zhang, Quanqi

    2014-01-01

    Japanese flounder (Paralichthys olivaceus) is one of the economic important fish in China. Sexual dimorphism, especially the different growth rates and body sizes between two sexes, makes this fish a good model to investigate mechanisms responsible for such dimorphism for both fundamental questions in evolution and applied topics in aquaculture. However, the lack of “omics” data has hindered the process. The recent advent of RNA-sequencing technology provides a robust tool to further study characteristics of genomes of nonmodel species. Here, we performed de novo transcriptome sequencing for a double haploid Japanese flounder individual using Illumina sequencing. A single lane of paired-end sequencing produced more than 27 million reads. These reads were assembled into 107,318 nonredundant transcripts, half of which (51,563; 48.1%) were annotated by blastx to public protein database. A total of 1051 genes that had potential alternative splicings were detected by Chrysalis implemented in Trinity software. Four of 10 randomly picked genes were verified truly containing alternative splicing by cloning and Sanger sequencing. Notably, using a doubled haploid Japanese flounder individual allow us to analyze gene duplicates. In total, 3940 “single-nucleotide polymorphisms” were detected form 1859 genes, which may have happened gene duplicates. This study lays the foundation for structural and functional genomics studies in Japanese flounder. PMID:25512620

  18. Screening two mutations in the dysferlin gene by exon capture and sequence analysis: A case report

    PubMed Central

    WANG, XUEYAN; YANG, YUN; ZHOU, RONG

    2016-01-01

    A patient with progressive muscular atrophy was assessed for the disease-associated genes by next-generation sequencing technology and exon trap and sequence analysis. The results of the investigation identified 399 genes, covering all exons in addition to 10 bp on either side, which are specific to 659 types of neuromuscular disorders, including hypotypes. Exon capture and sequence analysis revealed that the patient possessed two splice site mutations in the dysferlin (DYSF) gene, c.144+1G>A and c.342+1G>T, and the presence of the mutations was confirmed by Sanger sequencing. The patient's mother and sister were also assessed and confirmed to have mutations within the DYSF gene, the mother with c.342+1G>T and the sister with c.144+1G>A. The two splice site mutations in the DYSF gene, c.144+1G>A and c.342+1G>T, have not previously been reported. Therefore, exon capture and sequence analysis is able to rapidly and efficiently screen for genetic alterations in neuromuscular disorders.

  19. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data.

    PubMed

    Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-03-01

    Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics

  20. AST: An Automated Sequence-Sampling Method for Improving the Taxonomic Diversity of Gene Phylogenetic Trees

    PubMed Central

    Zhou, Chan; Mao, Fenglou; Yin, Yanbin; Huang, Jinling; Gogarten, Johann Peter; Xu, Ying

    2014-01-01

    A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php. PMID:24892935

  1. Differentiation of Xylella fastidiosa Strains via Multilocus Sequence Analysis of Environmentally Mediated Genes (MLSA-E)

    PubMed Central

    Parker, Jennifer K.; Havird, Justin C.

    2012-01-01

    Isolates of the plant pathogen Xylella fastidiosa are genetically very similar, but studies on their biological traits have indicated differences in virulence and infection symptomatology. Taxonomic analyses have identified several subspecies, and phylogenetic analyses of housekeeping genes have shown broad host-based genetic differences; however, results are still inconclusive for genetic differentiation of isolates within subspecies. This study employs multilocus sequence analysis of environmentally mediated genes (MLSA-E; genes influenced by environmental factors) to investigate X. fastidiosa relationships and differentiate isolates with low genetic variability. Potential environmentally mediated genes, including host colonization and survival genes related to infection establishment, were identified a priori. The ratio of the rate of nonsynonymous substitutions to the rate of synonymous substitutions (dN/dS) was calculated to select genes that may be under increased positive selection compared to previously studied housekeeping genes. Nine genes were sequenced from 54 X. fastidiosa isolates infecting different host plants across the United States. Results of maximum likelihood (ML) and Bayesian phylogenetic (BP) analyses are in agreement with known X. fastidiosa subspecies clades but show novel within-subspecies differentiation, including geographic differentiation, and provide additional information regarding host-based isolate variation and specificity. dN/dS ratios of environmentally mediated genes, though <1 due to high sequence similarity, are significantly greater than housekeeping gene dN/dS ratios and correlate with increased sequence variability. MLSA-E can more precisely resolve relationships between closely related bacterial strains with low genetic variability, such as X. fastidiosa isolates. Discovering the genetic relationships between X. fastidiosa isolates will provide new insights into the epidemiology of populations of X. fastidiosa, allowing

  2. Versatile Cosmid Vectors for the Isolation, Expression, and Rescue of Gene Sequences: Studies with the Human α -globin Gene Cluster

    NASA Astrophysics Data System (ADS)

    Lau, Yun-Fai; Kan, Yuet Wai

    1983-09-01

    We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.

  3. Characterization of the complete mitochondrial genome of Spirocerca lupi: sequence, gene organization and phylogenetic implications

    PubMed Central

    2013-01-01

    Background Spirocerca lupi is a life-threating parasitic nematode of dogs that has a cosmopolitan distribution but is most prevalent in tropical and subtropical countries. Despite its veterinary importance in canids, the epidemiology, molecular ecology and population genetics of this parasite still remain unexplored. Methods The complete mitochondrial (mt) genome of S. lupi was amplified in four overlapping long fragments using primers designed based on partial cox1, rrnS, cox2 and nad2 sequences. Phylogenetic re-construction of 13 spirurid species (including S. lupi) was carried out using Bayesian inference (BI) based on concatenated amino acid sequence datasets. Results The complete mt genome sequence of S. lupi is 13,780 bp in length, including 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lacks the atp8 gene. The gene arrangement is identical to that of Thelazia callipaeda (Thelaziidae) and Setaria digitata (Onchocercidae), but distinct from that of Dracunculus medinensis (Dracunculidae) and Heliconema longissimum (Physalopteridae). All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The content of A + T is 73.73% for S. lupi, in accordance with mt genomes of other spirurid nematodes sequenced to date. Phylogenetic analyses using concatenated amino acid sequences of the 12 protein-coding genes by BI showed that the S. lupi (Thelaziidae) is closely related to the families Setariidae and Onchocercidae. Conclusions The present study determined the complete mt genome sequence of S. lupi. These new mt genome dataset should provide novel mtDNA markers for studying the molecular epidemiology and population genetics of this parasite, and should have implications for the molecular diagnosis, prevention and control of spirocercosis in dogs and other canids. PMID:23433345

  4. The complete mitochondrial genome sequence and gene organization of Tridentiger trigonocephalus (Gobiidae: Gobionellinae) with phylogenetic consideration.

    PubMed

    Wei, Hongqing; Ma, Hongyu; Ma, Chunyan; Zhang, Fengying; Wang, Wei; Chen, Wei; Ma, Lingbo

    2016-09-01

    The complete mitochondrial genome plays an important role in studies of genome-level characteristics and phylogenetic relationships. Here we determined the complete mitogenome sequence of Tridentiger trigonocephalus (Perciformes, Gobiidae), and discovered its phylogenetic relationship. This circular genome was 16 662 bp in length, and consisted of 37 typical genes, including 13 protein-coding genes, 22 tRNA genes, and two rRNA genes. The gene order of T. trigonocephalus mitochondrial genome was identical to those observed in most other vertebrates. Of 37 genes, 28 were encoded by heavy strand, while the others were encoded by light strand. The phylogenetic tree constructed by 13 concatenated protein-coding genes showed that T. trigonocephalus was closest to T. bifasciatus, and then to T. barbatus among the 20 species within suborder Gobioidei. This work should facilitate the studies on population genetic diversity, and molecular evolution in Gobioidei fishes. PMID:26370266

  5. Interference in transcription of overexpressed genes by promoter-proximal downstream sequences.

    PubMed

    Turchinovich, A; Surowy, H M; Tonevitsky, A G; Burwinkel, B

    2016-01-01

    Despite a high sequence homology among four human RNAi-effectors Argonaute proteins and their coding sequences, the efficiency of ectopic overexpression of AGO3 and AGO4 coding sequences in human cells is greatly reduced as compared to AGO1 and AGO2. While investigating this phenomenon, we documented the existence of previously uncharacterized mechanism of gene expression regulation, which is manifested in greatly varying basal transcription levels from the RNApolII promoters depending on the promoter-proximal downstream sequences. Specifically, we show that distinct overexpression of Argonaute coding sequences cannot be explained by mRNA degradation in the cytoplasm or nucleus, and exhibits on transcriptional level. Furthermore, the first 1000-2000 nt located immediately downstream the promoter had the most critical influence on ectopic gene overexpression. The transcription inhibiting effect, associated with those downstream sequences, subsided with increasing distance to the promoter and positively correlated with promoter strength. We hypothesize that the same mechanism, which we named promoter proximal inhibition (PPI), could generally contribute to basal transcription levels of genes, and could be mainly responsible for the essence of difficult-to-express recombinant proteins. Finally, our data reveal that expression of recombinant proteins in human cells can be greatly enhanced by using more permissive promoter adjacent downstream sequences. PMID:27485701

  6. Interference in transcription of overexpressed genes by promoter-proximal downstream sequences

    PubMed Central

    Turchinovich, A.; Surowy, H. M.; Tonevitsky, A. G.; Burwinkel, B.

    2016-01-01

    Despite a high sequence homology among four human RNAi-effectors Argonaute proteins and their coding sequences, the efficiency of ectopic overexpression of AGO3 and AGO4 coding sequences in human cells is greatly reduced as compared to AGO1 and AGO2. While investigating this phenomenon, we documented the existence of previously uncharacterized mechanism of gene expression regulation, which is manifested in greatly varying basal transcription levels from the RNApolII promoters depending on the promoter-proximal downstream sequences. Specifically, we show that distinct overexpression of Argonaute coding sequences cannot be explained by mRNA degradation in the cytoplasm or nucleus, and exhibits on transcriptional level. Furthermore, the first 1000–2000 nt located immediately downstream the promoter had the most critical influence on ectopic gene overexpression. The transcription inhibiting effect, associated with those downstream sequences, subsided with increasing distance to the promoter and positively correlated with promoter strength. We hypothesize that the same mechanism, which we named promoter proximal inhibition (PPI), could generally contribute to basal transcription levels of genes, and could be mainly responsible for the essence of difficult-to-express recombinant proteins. Finally, our data reveal that expression of recombinant proteins in human cells can be greatly enhanced by using more permissive promoter adjacent downstream sequences. PMID:27485701

  7. Complexity of genetic sequences modified by horizontal gene transfer and degraded-DNA uptake

    NASA Astrophysics Data System (ADS)

    Tremberger, George; Dehipawala, S.; Nguyen, A.; Cheung, E.; Sullivan, R.; Holden, T.; Lieberman, D.; Cheung, T.

    2015-09-01

    Horizontal gene transfer has been a major vehicle for efficient transfer of genetic materials among living species and could be one of the sources for noncoding DNA incorporation into a genome. Our previous study of lnc- RNA sequence complexity in terms of fractal dimension and information entropy shows a tight regulation among the studied genes in numerous diseases. The role of sequence complexity in horizontal transferred genes was investigated with Mealybug in symbiotic relation with a 139K genome microbe and Deinococcus radiodurans as examples. The fractal dimension and entropy showed correlation R-sq of 0.82 (N = 6) for the studied Deinococcus radiodurans sequences. For comparison the Deinococcus radiodurans oxidative stress tolerant catalase and superoxide dismutase genes under extracellular dGMP growth condition showed R-sq ~ 0.42 (N = 6); and the studied arsenate reductase horizontal transferred genes for toxicity survival in several microorganisms showed no correlation. Simulation results showed that R-sq < 0.4 would be improbable at less than one percent chance, suggestive of additional selection pressure when compared to the R-sq ~ 0.29 (N = 21) in the studied transferred genes in Mealybug. The mild correlation of R-sq ~ 0.5 for fractal dimension versus transcription level in the studied Deinococcus radiodurans sequences upon extracellular dGMP growth condition would suggest that lower fractal dimension with less electron density fluctuation favors higher transcription level.

  8. Candida famata (Debaryomyces hansenii) DNA sequences containing genes involved in riboflavin synthesis.

    PubMed

    Voronovsky, Andriy Y; Abbas, Charles A; Dmytruk, Kostyantyn V; Ishchuk, Olena P; Kshanovska, Barbara V; Sybirna, Kateryna A; Gaillardin, Claude; Sibirny, Andriy A

    2004-11-01

    Previously cloned Candida famata (Debaryomyces hansenii) strain VKM Y-9 genomic DNA fragments containing genes RIB1 (codes for GTP cyclohydrolase II), RIB2 (encodes specific reductase), RIB5 (codes for dimethylribityllumazine synthase), RIB6 (encodes dihydroxybutanone phosphate synthase) and RIB7 (codes for riboflavin synthase) were sequenced. The derived amino acid sequences of C. famata RIB genes showed extensive homology to the corresponding sequences of riboflavin synthesis enzymes of other yeast species. The highest identity was observed to homologues of D. hansenii CBS767, as C. famata is the anamorph of this hemiascomycetous yeast. The D. hansenii CBS767 RIB3 gene encoding specific deaminase was cloned. This gene successfully complemented riboflavin auxotrophy of the rib3 mutant of flavinogenic yeast, Pichia guilliermondii. Putative iron-responsive elements (potential sites for binding of the transcription factors Fep1p or Aft1p and Aft2p) were found in the upstream regions of some C. famata and D. hansenii RIB genes. The sequences of C. famata RIB genes have been submitted to the EMBL data library under Accession Nos AJ810169-AJ810173. PMID:15543522

  9. Nucleotide sequence of the thermostable direct hemolysin gene of Vibrio parahaemolyticus.

    PubMed Central

    Nishibuchi, M; Kaper, J B

    1985-01-01

    The gene encoding the thermostable direct hemolysin of Vibrio parahaemolyticus was characterized. This gene (designated tdh) was subcloned into pBR322 in Escherichia coli, and the functional tdh gene was localized to a 1.3-kilobase HindIII fragment. This fragment was sequenced, and the structural gene was found to encode a mature protein of 165 amino acid residues. The mature protein sequence was preceded by a putative signal peptide sequence of 24 amino acids. A putative tdh promoter, determined by its similarity to concensus sequences, was not functional in E. coli. However, a promoter that was functional in E. coli was shown to exist further upstream by use of a promoter probe plasmid. A 5.7-kilobase SalI fragment containing the structural gene and both potential promoters was cloned into a broad-host-range plasmid and mobilized into a Kanagawa phenomenon-negative V. parahaemolyticus strain. In contrast to E. coli, where the hemolysin was detected only in cell lysates, introduction of the cloned gene into V. parahaemolyticus resulted in the production of extracellular hemolysin. Images PMID:3988703

  10. Alu sequence involvement in transcriptional insulation of the keratin 18 gene in transgenic mice.

    PubMed Central

    Thorey, I S; Ceceña, G; Reynolds, W; Oshima, R G

    1993-01-01

    The human keratin 18 (K18) gene is expressed in a variety of adult simple epithelial tissues, including liver, intestine, lung, and kidney, but is not normally found in skin, muscle, heart, spleen, or most of the brain. Transgenic animals derived from the cloned K18 gene express the transgene in appropriate tissues at levels directly proportional to the copy number and independently of the sites of integration. We have investigated in transgenic mice the dependence of K18 gene expression on the distal 5' and 3' flanking sequences and upon the RNA polymerase III promoter of an Alu repetitive DNA transcription unit immediately upstream of the K18 promoter. Integration site-independent expression of tandemly duplicated K18 transgenes requires the presence of either an 825-bp fragment of the 5' flanking sequence or the 3.5-kb 3' flanking sequence. Mutation of the RNA polymerase III promoter of the Alu element within the 825-bp fragment abolishes copy number-dependent expression in kidney but does not abolish integration site-independent expression when assayed in the absence of the 3' flanking sequence of the K18 gene. The characteristics of integration site-independent expression and copy number-dependent expression are separable. In addition, the formation of the chromatin state of the K18 gene, which likely restricts the tissue-specific expression of this gene, is not dependent upon the distal flanking sequences of the 10-kb K18 gene but rather may depend on internal regulatory regions of the gene. Images PMID:7692231

  11. Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

    USGS Publications Warehouse

    Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

    2004-01-01

    The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.

  12. Putative and unique gene sequence utilization for the design of species specific probes as modeled by Lactobacillus plantarum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...

  13. Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...

  14. Molecular cloning, sequence characterization, and gene expression profiling of a novel water buffalo (Bubalus bubalis) gene, AGPAT6.

    PubMed

    Song, S; Huo, J L; Li, D L; Yuan, Y Y; Yuan, F; Miao, Y W

    2013-01-01

    Several 1-acylglycerol-3-phosphate-O-acyltransferases (AGPATs) can acylate lysophosphatidic acid to produce phosphatidic acid. Of the eight AGPAT isoforms, AGPAT6 is a crucial enzyme for glycerolipids and triacylglycerol biosynthesis in some mammalian tissues. We amplified and identified the complete coding sequence (CDS) of the water buffalo AGPAT6 gene by using the reverse transcription-polymerase chain reaction, based on the conversed sequence information of the cattle or expressed sequence tags of other Bovidae species. This novel gene was deposited in the NCBI database (accession No. JX518941). Sequence analysis revealed that the CDS of this AGPAT6 encodes a 456-amino acid enzyme (molecular mass = 52 kDa; pI = 9.34). Water buffalo AGPAT6 contains three hydrophobic transmembrane regions and a signal 37-amino acid peptide, localized in the cytoplasm. The deduced amino acid sequences share 99, 98, 98, 97, 98, 98, 97 and 95% identity with their homologous sequences from cattle, horse, human, mouse, orangutan, pig, rat, and chicken, respectively. The phylogenetic tree analysis based on the AGPAT6 CDS showed that water buffalo has a closer genetic relationship with cattle than with other species. Tissue expression profile analysis shows that this gene is highly expressed in the mammary gland, moderately expressed in the heart, muscle, liver, and brain; weakly expressed in the pituitary gland, spleen, and lung; and almost silently expressed in the small intestine, skin, kidney, and adipose tissues. Four predicted microRNA target sites are found in the water buffalo AGPAT6 CDS. These results will establish a foundation for further insights into this novel water buffalo gene. PMID:24114207

  15. Identification of Mycobacterium spp. of veterinary importance using rpoB gene sequencing

    PubMed Central

    2011-01-01

    Background Studies conducted on Mycobacterium spp. isolated from human patients indicate that sequencing of a 711 bp portion of the rpoB gene can be useful in assigning a species identity, particularly for members of the Mycobacterium avium complex (MAC). Given that MAC are important pathogens in livestock, companion animals, and zoo/exotic animals, we were interested in evaluating the use of rpoB sequencing for identification of Mycobacterium isolates of veterinary origin. Results A total of 386 isolates, collected over 2008 - June 2011 from 378 animals (amphibians, reptiles, birds, and mammals) underwent PCR and sequencing of a ~ 711 bp portion of the rpoB gene; 310 isolates (80%) were identified to the species level based on similarity at ≥ 98% with a reference sequence. The remaining 76 isolates (20%) displayed < 98% similarity with reference sequences and were assigned to a clade based on their location in a neighbor-joining tree containing reference sequences. For a subset of 236 isolates that received both 16S rRNA and rpoB sequencing, 167 (70%) displayed a similar species/clade assignation for both sequencing methods. For the remaining 69 isolates, species/clade identities were different with each sequencing method. Mycobacterium avium subsp. hominissuis was the species most frequently isolated from specimens from pigs, cervids, companion animals, cattle, and exotic/zoo animals. Conclusions rpoB sequencing proved useful in identifying Mycobacterium isolates of veterinary origin to clade, species, or subspecies levels, particularly for assemblages (such as the MAC) where 16S rRNA sequencing alone is not adequate to demarcate these taxa. rpoB sequencing can represent a cost-effective identification tool suitable for routine use in the veterinary diagnostic laboratory. PMID:22118247

  16. Next-Generation Sequencing of Apoptotic DNA Breakpoints Reveals Association with Actively Transcribed Genes and Gene Translocations

    PubMed Central

    Fullwood, Melissa J.; Lee, Joanne; Lin, Lifang; Li, Guoliang; Huss, Mikael; Ng, Patrick; Sung, Wing-Kin; Shenolikar, Shirish

    2011-01-01

    DNA fragmentation is a well-recognized hallmark of apoptosis. However, the precise DNA sequences cleaved during apoptosis triggered by distinct mechanisms remain unclear. We used next-generation sequencing of DNA fragments generated in Actinomycin D-treated human HL-60 leukemic cells to generate a high-throughput, global map of apoptotic DNA breakpoints. These data highlighted that DNA breaks are non-random and show a significant association with active genes and open chromatin regions. We noted that transcription factor binding sites were also enriched within a fraction of the apoptotic breakpoints. Interestingly, extensive apoptotic cleavage was noted within genes that are frequently translocated in human cancers. We speculate that the non-random fragmentation of DNA during apoptosis may contribute to gene translocations and the development of human cancers. PMID:22087219

  17. GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data

    PubMed Central

    Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit

    2016-01-01

    Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics

  18. Complete mitogenome sequences of four flatfishes (Pleuronectiformes) reveal a novel gene arrangement of L-strand coding genes

    PubMed Central

    2013-01-01

    Background Few mitochondrial gene rearrangements are found in vertebrates and large-scale changes in these genomes occur even less frequently. It is difficult, therefore, to propose a mechanism to account for observed changes in mitogenome structure. Mitochondrial gene rearrangements are usually explained by the recombination model or tandem duplication and random loss model. Results In this study, the complete mitochondrial genomes of four flatfishes, Crossorhombus azureus (blue flounder), Grammatobothus krempfi, Pleuronichthys cornutus, and Platichthys stellatus were determined. A striking finding is that eight genes in the C. azureus mitogenome are located in a novel position, differing from that of available vertebrate mitogenomes. Specifically, the ND6 and seven tRNA genes (the Q, A, C, Y, S1, E, P genes) encoded by the L-strand have been translocated to a position between tRNA-T and tRNA-F though the original order of the genes is maintained. Conclusions These special features are used to suggest a mechanism for C. azureus mitogenome rearrangement. First, a dimeric molecule was formed by two monomers linked head-to-tail, then one of the two sets of promoters lost function and the genes controlled by the disabled promoters became pseudogenes, non-coding sequences, and even were lost from the genome. This study provides a new gene-rearrangement model that accounts for the events of gene-rearrangement in a vertebrate mitogenome. PMID:23962312

  19. DNA sequencing and expression of the formyl coenzyme A transferase gene, frc, from Oxalobacter formigenes.

    PubMed Central

    Sidhu, H; Ogden, S D; Lung, H Y; Luttge, B G; Baetz, A L; Peck, A B

    1997-01-01

    Oxalic acid, a highly toxic by-product of metabolism, is catabolized by a limited number of bacterial species utilizing an activation-decarboxylation reaction which yields formate and CO2. frc, the gene encoding formyl coenzyme A transferase, an enzyme which transfers a coenzyme A moiety to activate oxalic acid, was cloned from the bacterium Oxalobacter formigenes. DNA sequencing revealed a single open reading frame of 1,284 bp capable of encoding a 428-amino-acid protein. A presumed promoter region and a rho-independent termination sequence suggest that this gene is part of a monocistronic operon. A PCR fragment containing the open reading frame, when overexpressed in Escherichia coli, produced a product exhibiting enzymatic activity similar to the purified native enzyme. With this, the two genes necessary for bacterial catabolism of oxalate, frc and oxc, have now been cloned, sequenced, and expressed. PMID:9150242

  20. Gene organization and complete sequence of the mitochondrial genome of Linwu mallard.

    PubMed

    Tian, Ke-Xiong; Liu, Li-Li; Yu, Qi-Fang; He, Shao-Ping; He, Jian-Hua

    2016-01-01

    Linwu mallard is an excellent native breeds from Hunan province in China. This is the first study to determine the complete mitochondrial genome sequence of L. mallard using PCR-based amplification and Sanger sequencing. The characteristic of the entire mitochondrial genome was analyzed in detail, with the base composition of 29.19% A, 22.19% T, 32.83% C, 15.79% G in the L. mallard (16,605 bp in length). It contained 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of L. mallard will be useful for the phylogenetics of poultry, and be available as basic data for the genetics and breeding. PMID:24938102

  1. Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi

    PubMed Central

    Cacho, Ralph A.; Tang, Yi; Chooi, Yit-Heng

    2015-01-01

    Genomics has revolutionized the research on fungal secondary metabolite (SM) biosynthesis. To elucidate the molecular and enzymatic mechanisms underlying the biosynthesis of a specific SM compound, the important first step is often to find the genes that responsible for its synthesis. The accessibility to fungal genome sequences allows the bypass of the cumbersome traditional library construction and screening approach. The advance in next-generation sequencing (NGS) technologies have further improved the speed and reduced the cost of microbial genome sequencing in the past few years, which has accelerated the research in this field. Here, we will present an example work flow for identifying the gene cluster encoding the biosynthesis of SMs of interest using an NGS approach. We will also review the different strategies that can be employed to pinpoint the targeted gene clusters rapidly by giving several examples stemming from our work. PMID:25642215

  2. Preliminary study on mitochondrial 16S rRNA gene sequences and phylogeny of flatfishes (Pleuronectiformes)

    NASA Astrophysics Data System (ADS)

    You, Feng; Liu, Jing; Zhang, Peijun; Xiang, Jianhai

    2005-09-01

    A 605 bp section of mitochondrial 16S rRNA gene from Paralichthys olivaceus, Pseudorhombus cinnamomeus, Psetta maxima and Kareius bicoloratus, which represent 3 families of Order Pleuronectiformes was amplified by PCR and sequenced to show the molecular systematics of Pleuronectiformes for comparison with related gene sequences of other 6 flatfish downloaded from GenBank. Phylogenetic analysis based on genetic distance from related gene sequences of 10 flatfish showed that this method was ideal to explore the relationship between species, genera and families. Phylogenetic trees set-up is based on neighbor-joining, maximum parsimony and maximum likelihood methods that accords to the general rule of Pleuronectiformes evolution. But they also resulted in some confusion. Unlike data from morphological characters, P. olivaceus clustered with K. bicoloratus, but P. cinnamomeus did not cluster with P. olivaceus, which is worth further studying.

  3. Isolation of laccase gene-specific sequences from white rot and brown rot fungi by PCR

    SciTech Connect

    D`Souza, T.M.; Boominathan, K.; Reddy, C.A.

    1996-10-01

    Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.

  4. Complete nucleotide sequence of the structural gene for alkaline proteinase from Pseudomonas aeruginosa IFO 3455.

    PubMed Central

    Okuda, K; Morihara, K; Atsumi, Y; Takeuchi, H; Kawamoto, S; Kawasaki, H; Suzuki, K; Fukushima, J

    1990-01-01

    The DNA-encoding alkaline proteinase (AP) of Pseudomonas aeruginosa IFO 3455 was cloned, and its complete nucleotide sequence was determined. When the cloned gene was ligated to pUC18, the Escherichia coli expression vector, the gene-incorporated bacteria expressed high levels of both AP activity and AP antigens. The amino acid sequence deduced from the nucleotide sequence revealed that the mature AP consists of 467 amino acids with a relative molecular weight of 49,507. The amino acid composition predicted from the DNA sequence was similar to the chemically determined composition of purified AP reported previously. The amino acid sequence analysis revealed that both the N-terminal side sequence of the purified AP and several internal lysyl peptide fragments were identical to the deduced amino acid sequences. The percent homology of amino acid sequences between AP and Serratia protease was about 55%. The zinc ligands and an active site of the AP were predicted by comparing the structure of the enzyme with of Serratia protease, thermolysin, Bacillus subtilis neutral protease, and Pseudomonas elastase. PMID:2123832

  5. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  6. Candidate Resistant Genes of Sand Pear (Pyrus pyrifolia Nakai) to Alternaria alternata Revealed by Transcriptome Sequencing

    PubMed Central

    Yang, Xiaoping; Hu, Hongju; Yu, Dazhao; Sun, Zhonghai; He, Xiujuan; Zhang, Jingguo; Chen, Qiliang; Tian, Rui; Fan, Jing

    2015-01-01

    Pear black spot (PBS) disease, which is caused by Alternaria alternata (Aa), is one of the most serious diseases affecting sand pear (Pyrus pyrifolia Nakai) cultivation worldwide. To investigate the defense mechanisms of sand pear in response to Aa, the transcriptome of a sand pear germplasm with differential resistance to Aa was analyzed using Illumina paired-end sequencing. Four libraries derived from PBS-resistant and PBS-susceptible sand pear leaves were characterized through inoculation or mock-inoculation. In total, 20.5 Gbp of sequence data and 101,632,565 reads were generated, representing 44717 genes. Approximately 66% of the genes or sequenced reads could be aligned to the pear reference genome. A large number (5213) of differentially expressed genes related to PBS resistance were obtained; 34 microsatellites were detected in these genes, and 28 genes were found to be closely related to PBS resistance. Using a transcriptome analysis in response to PBS inoculation and comparison analysis to the PHI database, 4 genes (Pbr039001, Pbr001627, Pbr025080 and Pbr023112) were considered to be promising candidates for sand pear resistance to PBS. This study provides insight into changes in the transcriptome of sand pear in response to PBS infection, and the findings have improved our understanding of the resistance mechanism of sand pear to PBS and will facilitate future gene discovery and functional genome studies of sand pear. PMID:26292286

  7. Whole Exome Sequencing in Females with Autism Implicates Novel and Candidate Genes

    PubMed Central

    Butler, Merlin G.; Rafi, Syed K.; Hossain, Waheeda; Stephan, Dietrich A.; Manzardo, Ann M.

    2015-01-01

    Classical autism or autistic disorder belongs to a group of genetically heterogeneous conditions known as Autism Spectrum Disorders (ASD). Heritability is estimated as high as 90% for ASD with a recently reported compilation of 629 clinically relevant candidate and known genes. We chose to undertake a descriptive next generation whole exome sequencing case study of 30 well-characterized Caucasian females with autism (average age, 7.7 ± 2.6 years; age range, 5 to 16 years) from multiplex families. Genomic DNA was used for whole exome sequencing via paired-end next generation sequencing approach and X chromosome inactivation status. The list of putative disease causing genes was developed from primary selection criteria using machine learning-derived classification score and other predictive parameters (GERP2, PolyPhen2, and SIFT). We narrowed the variant list to 10 to 20 genes and screened for biological significance including neural development, function and known neurological disorders. Seventy-eight genes identified met selection criteria ranging from 1 to 9 filtered variants per female. Five females presented with functional variants of X-linked genes (IL1RAPL1, PIR, GABRQ, GPRASP2, SYTL4) with cadherin, protocadherin and ankyrin repeat gene families most commonly altered (e.g., CDH6, FAT2, PCDH8, CTNNA3, ANKRD11). Other genes related to neurogenesis and neuronal migration (e.g., SEMA3F, MIDN), were also identified. PMID:25574603

  8. Predisposition gene identification in common cancers by exome sequencing: insights from familial breast cancer

    PubMed Central

    Snape, Katie; Ruark, Elise; Tarpey, Patrick; Renwick, Anthony; Turnbull, Clare; Seal, Sheila; Murray, Anne; Hanks, Sandra; Douglas, Jenny; Stratton, Michael R.; Rahman, Nazneen

    2013-01-01

    The genetic component of breast cancer predisposition remains largely unexplained. Candidate-gene case-control resequencing has identified predisposition genes characterised by rare, protein truncating mutations that confer moderate risks of disease. In theory, exome sequencing should yield additional genes of this class. Here, we explore the feasibility and design considerations of this approach. We performed exome sequencing in 50 individuals with familial breast cancer, applying frequency and protein function filters to identify variants most likely to be pathogenic. We identified 867,378 variants that passed the call quality filters of which 1,296 variants passed the frequency and protein truncation filters. The median number of validated, rare, protein truncating variants (PTVs) was 10 in individuals with, and without, mutations in known genes. The functional candidacy of mutated genes was similar in both groups. Without prior knowledge, the known genes would not have been recognisable as breast cancer predisposition genes. Everyone carries multiple rare mutations that are plausibly related to disease. Exome sequencing in common conditions will therefore require intelligent sample and variant prioritisation strategies in large case-control studies to deliver robust genetic evidence of disease association. PMID:22527104

  9. A Synthesis Method of Gene Networks Having Cyclic Expression Pattern Sequences by Network Learning

    NASA Astrophysics Data System (ADS)

    Mori, Yoshihiro; Kuroe, Yasuaki

    Recently, synthesis of gene networks having desired functions has become of interest to many researchers because it is a complementary approach to understanding gene networks, and it could be the first step in controlling living cells. There exist several periodic phenomena in cells, e.g. circadian rhythm. These phenomena are considered to be generated by gene networks. We have already proposed synthesis method of gene networks based on gene expression. The method is applicable to synthesizing gene networks possessing the desired cyclic expression pattern sequences. It ensures that realized expression pattern sequences are periodic, however, it does not ensure that their corresponding solution trajectories are periodic, which might bring that their oscillations are not persistent. In this paper, in order to resolve the problem we propose a synthesis method of gene networks possessing the desired cyclic expression pattern sequences together with their corresponding solution trajectories being periodic. In the proposed method the persistent oscillations of the solution trajectories are realized by specifying passing points of them.

  10. Identification, sequencing and structural analysis of a nifA-like gene of Acetobacter diazotrophicus.

    PubMed

    Teixeira, K R; Morgan, T; Meletzus, D; Galler, R; Baldani, J I; Kennedy, C

    1999-01-01

    A recombinant plasmid, pAD101, containing a DNA fragment of Acetobacter diazotrophicus strain PAL5 was isolated by its ability to restore Nif+ phenotype to a nifA- ntrC- double mutant of Azotobacter vinelandii. Hybridization with the nifA genes of Azospirillum brasilense located the nifA gene more precisely to specific fragments of pAD101. DNA sequencing of appropriate subclones of pAD101 revealed that the nifA gene was adjacent to the nifB gene in A. diazotrophicus, and the 5' end of the nifB gene was located downstream of the nitrogenase MoFe subunit gene, nifK. The deduced aminoacid sequence of A. diazotrophicus nifA and nifB gene were most similar to the NifA and NifB proteins of Azorhizobium caulinodans and Rhodobacter capsulatus, respectively. In addition, nucleotide sequences upstream of the A. diazotrophicus nifA-encoding region indicate features similar to those in the A. caulinodans nifA promoter region involved in O2 and fixed N regulation of nifA expression. PMID:10530336

  11. Extraordinary sequence divergence at Tsga8, an X-linked gene involved in mouse spermiogenesis.

    PubMed

    Good, Jeffrey M; Vanderpool, Dan; Smith, Kimberly L; Nachman, Michael W

    2011-05-01

    The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion-deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5' and 3' ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189

  12. Gene Sequence Variability of the Three Surface Proteins of Human Respiratory Syncytial Virus (HRSV) in Texas

    PubMed Central

    Tapia, Lorena I.; Shaw, Chad A.; Aideyan, Letisha O.; Jewell, Alan M.; Dawson, Brian C.; Haq, Taha R.; Piedra, Pedro A.

    2014-01-01

    Human respiratory syncytial virus (HRSV) has three surface glycoproteins: small hydrophobic (SH), attachment (G) and fusion (F), encoded by three consecutive genes (SH-G-F). A 270-nt fragment of the G gene is used to genotype HRSV isolates. This study genotyped and investigated the variability of the gene and amino acid sequences of the three surface proteins of HRSV strains collected from 1987 to 2005 from one center. Sixty original clinical isolates and 5 prototype strains were analyzed. Sequences containing SH, F and G genes were generated, and multiple alignments and phylogenetic trees were analyzed. Genetic variability by protein domains comparing virus genotypes was assessed. Complete sequences of the SH-G-F genes were obtained for all 65 samples: HRSV-A = 35; HRSV-B = 30. In group A strains, genotypes GA5 and GA2 were predominant. For HRSV-B strains, the genotype GB4 was predominant from 1992 to 1994 and only genotype BA viruses were detected in 2004–2005. Different genetic variability at nucleotide level was detected between the genes, with G gene being the most variable and the highest variability detected in the 270-nt G fragment that is frequently used to genotype the virus. High variability (>10%) was also detected in the signal peptide and transmembrane domains of the F gene of HRSV A strains. Variability among the HRSV strains resulting in non-synonymous changes was detected in hypervariable domains of G protein, the signal peptide of the F protein, a not previously defined domain in the F protein, and the antigenic site Ø in the pre-fusion F. Divergent trends were observed between HRSV -A and -B groups for some functional domains. A diverse population of HRSV -A and -B genotypes circulated in Houston during an 18 year period. We hypothesize that diverse sequence variation of the surface protein genes provide HRSV strains a survival advantage in a partially immune-protected community. PMID:24625544

  13. G-boxes, bigfoot genes, and environmental response: characterization of intragenomic conserved noncoding sequences in Arabidopsis.

    PubMed

    Freeling, Michael; Rapaka, Lakshmi; Lyons, Eric; Pedersen, Brent; Thomas, Brian C

    2007-05-01

    A tetraploidy left Arabidopsis thaliana with 6358 pairs of homoeologs that, when aligned, generated 14,944 intragenomic conserved noncoding sequences (CNSs). Our previous work assembled these phylogenetic footprints into a database. We show that known transcription factor (TF) binding motifs, including the G-box, are overrepresented in these CNSs. A total of 254 genes spanning long lengths of CNS-rich chromosomes (Bigfoot) dominate this database. Therefore, we made subdatabases: one containing Bigfoot genes and the other containing genes with three to five CNSs (Smallfoot). Bigfoot genes are generally TFs that respond to signals, with their modal CNS positioned 3.1 kb 5' from the ATG. Smallfoot genes encode components of signal transduction machinery, the cytoskeleton, or involve transcription. We queried each subdatabase with each possible 7-nucleotide sequence. Among hundreds of hits, most were purified from CNSs, and almost all of those significantly enriched in CNSs had no experimental history. The 7-mers in CNSs are not 5'- to 3'-oriented in Bigfoot genes but are often oriented in Smallfoot genes. CNSs with one G-box tend to have two G-boxes. CNSs were shared with the homoeolog only and with no other gene, suggesting that binding site turnover impedes detection. Bigfoot genes may function in adaptation to environmental change. PMID:17496117

  14. Nucleotide sequence and expression of alpha-glucosidase-encoding gene (agdA) from Aspergillus oryzae.

    PubMed

    Minetoki, T; Gomi, K; Kitamoto, K; Kumagai, C; Tamura, G

    1995-08-01

    We have isolated an alpha-glucosidase(AGL)-encoding gene (agdA) from Aspergillus oryzae by heterologous hybridization using the corresponding Aspergillus niger gene as a probe. Southern hybridization analysis showed that the agdA gene is on a 5.0-kb ScaI fragment and there is a single copy in the A. oryzae chromosome. Comparison with the A. niger agdA gene indicated that the agdA gene contains three putative introns from 52 to 59 nucleotides long, and that it encodes 985 amino acid residues. The deduced amino acid sequence of A. oryzae AGL is 78% homologous with the A. niger AGL. The high degree of homology with the amino acid sequence bordering the putative catalytic residue of a number of AGL enzymes, and this enzyme suggests that Asp492 is a catalytic residue of A. oryzae AGL. The cloned gene was functional. Transformants of A. oryzae containing multiple copies of the cloned agdA gene showed a 6-16 fold increase in AGL activity. Like the Taka-amylase A and glucoamylase genes of A. oryzae, expression of the agdA gene was induced when maltose was provided as a carbon source, but expression was not induced by glucose. This result suggested that cis-element(s) involved in maltose induction may be also present in the agdA promoter region. PMID:7549103

  15. Evolutionary Analysis of Sequence Divergence and Diversity of Duplicate Genes in Aspergillus fumigatus

    PubMed Central

    Yang, Ence; Hulse, Amanda M.; Cai, James J.

    2012-01-01

    Gene duplication as a major source of novel genetic material plays an important role in evolution. In this study, we focus on duplicate genes in Aspergillus fumigatus, a ubiquitous filamentous fungus causing life-threatening human infections. We characterize the extent and evolutionary patterns of the duplicate genes in the genome of A. fumigatus. Our results show that A. fumigatus contains a large amount of duplicate genes with pronounced sequence divergence between two copies, and approximately 10% of them diverge asymmetrically, i.e. two copies of a duplicate gene pair diverge at significantly different rates. We use a Bayesian approach of the McDonald-Kreitman test to infer distributions of selective coefficients γ(=2Nes) and find that (1) the values of γ for two copies of duplicate genes co-vary positively and (2) the average γ for the two copies differs between genes from different gene families. This analysis highlights the usefulness of combining divergence and diversity data in studying the evolution of duplicate genes. Taken together, our results provide further support and refinement to the theories of gene duplication. Through characterizing the duplicate genes in the genome of A. fumigatus, we establish a computational framework, including parameter settings and methods, for comparative study of genetic redundancy and gene duplication between different fungal species. PMID:23225993

  16. Simultaneous Sequencing of 24 Genes Associated with Steroid-Resistant Nephrotic Syndrome

    PubMed Central

    McCarthy, Hugh J.; Bierzynska, Agnieszka; Wherlock, Matt; Ognjanovic, Milos; Kerecuk, Larissa; Hegde, Shivaram; Feather, Sally; Gilbert, Rodney D.; Krischock, Leah; Jones, Caroline; Sinha, Manish D.; Webb, Nicholas J.A.; Christian, Martin; Williams, Margaret M.; Marks, Stephen; Koziell, Ania; Welsh, Gavin I.

    2013-01-01

    Summary Background and objectives Up to 95% of children presenting with steroid-resistant nephrotic syndrome in early life will have a pathogenic single-gene mutation in 1 of 24 genes currently associated with this disease. Others may be affected by polymorphic variants. There is currently no accepted diagnostic algorithm for clinical genetic testing. The hypothesis was that the increasing reliability of next generation sequencing allows comprehensive one-step genetic investigation of this group and similar patient groups. Design, setting, participants, & measurements This study used next generation sequencing to screen 446 genes, including the 24 genes known to be associated with hereditary steroid-resistant nephrotic syndrome. The first 36 pediatric patients collected through a national United Kingdom Renal Registry were chosen with comprehensive phenotypic detail. Significant variants detected by next generation sequencing were confirmed by conventional Sanger sequencing. Results Analysis revealed known and novel disease-associated variations in expected genes such as NPHS1, NPHS2, and PLCe1 in 19% of patients. Phenotypically unexpected mutations were also detected in COQ2 and COL4A4 in two patients with isolated nephropathy and associated sensorineural deafness, respectively. The presence of an additional heterozygous polymorphism in WT1 in a patient with NPHS1 mutation was associated with earlier-onset disease, supporting modification of phenotype through genetic epistasis. Conclusions This study shows that next generation sequencing analysis of pediatric steroid-resistant nephrotic syndrome patients is accurate and revealing. This analysis should be considered part of the routine genetic workup of diseases such as childhood steroid-resistant nephrotic syndrome, where the chance of genetic mutation is high but requires sequencing of multiple genes. PMID:23349334

  17. Analysis of tandem gene copies in maize chromosomal regions reconstructed from long sequence reads

    PubMed Central

    Dong, Jiaqiang; Feng, Yaping; Kumar, Dibyendu; Zhang, Wei; Zhu, Tingting; Luo, Ming-Cheng; Messing, Joachim

    2016-01-01

    Haplotype variation not only involves SNPs but also insertions and deletions, in particular gene copy number variations. However, comparisons of individual genomes have been difficult because traditional sequencing methods give too short reads to unambiguously reconstruct chromosomal regions containing repetitive DNA sequences. An example of such a case is the protein gene family in maize that acts as a sink for reduced nitrogen in the seed. Previously, 41–48 gene copies of the alpha zein gene family that spread over six loci spanning between 30- and 500-kb chromosomal regions have been described in two Iowa Stiff Stalk (SS) inbreds. Analyses of those regions were possible because of overlapping BAC clones, generated by an expensive and labor-intensive approach. Here we used single-molecule real-time (Pacific Biosciences) shotgun sequencing to assemble the six chromosomal regions from the Non-Stiff Stalk maize inbred W22 from a single DNA sequence dataset. To validate the reconstructed regions, we developed an optical map (BioNano genome map; BioNano Genomics) of W22 and found agreement between the two datasets. Using the sequences of full-length cDNAs from W22, we found that the error rate of PacBio sequencing seemed to be less than 0.1% after autocorrection and assembly. Expressed genes, some with premature stop codons, are interspersed with nonexpressed genes, giving rise to genotype-specific expression differences. Alignment of these regions with those from the previous analyzed regions of SS lines exhibits in part dramatic differences between these two heterotic groups. PMID:27354512

  18. Hunting Down Frame Shifts: Ecological Analysis of Diverse Functional Gene Sequences

    PubMed Central

    Strejcek, Michal; Wang, Qiong; Ridl, Jakub; Uhlik, Ondrej

    2015-01-01

    Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frameshifts (FS). Genes encoding for alpha subunits of biphenyl (bphA) and benzoate (benA) dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 44% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of maximum expected error filtering and single linkage pre-clustering proved to be the most efficient read processing approach. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study or available at https://github.com/strejcem/FBdenovo. The tool was also implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/. PMID:26635739

  19. Nucleotide sequence variation of chitin synthase genes among ectomycorrhizal fungi and its potential use in taxonomy.

    PubMed Central

    Mehmann, B; Brunner, I; Braus, G H

    1994-01-01

    DNA sequences of single-copy genes coding for chitin synthases (UDP-N-acetyl-D-glucosamine:chitin 4-beta-N-acetylglucosaminyltransferase; EC 2.4.1.16) were used to characterize ectomycorrhizal fungi. Degenerate primers deduced from short, completely conserved amino acid stretches flanking a region of about 200 amino acids of zymogenic chitin synthases allowed the amplification of DNA fragments of several members of this gene family. Different DNA band patterns were obtained from basidiomycetes because of variation in the number and length of amplified fragments. Cloning and sequencing of the most prominent DNA fragments revealed that these differences were due to various introns at conserved positions. The presence of introns in basidiomycetous fungi therefore has a potential use in identification of genera by analyzing PCR-generated DNA fragment patterns. Analyses of the nucleotide sequences of cloned fragments revealed variations in nucleotide sequences from 4 to 45%. By comparison of the deduced amino acid sequences, the majority of the DNA fragments were identified as members of genes for chitin synthase class II. The deduced amino acid sequences from species of the same genus differed only in one amino acid residue, whereas identity between the amino acid sequences of ascomycetous and basidiomycetous fungi within the same taxonomic class was found to be approximately 43 to 66%. Phylogenetic analysis of the amino acid sequence of class II chitin synthase-encoding gene fragments by using parsimony confirmed the current taxonomic groupings. In addition, our data revealed a fourth class of putative zymogenic chitin synthesis. Images PMID:7944356

  20. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence

    PubMed Central

    Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method. PMID:26808495

  1. Nucleotide sequence of the gene for the b subunit of human factor XIII

    SciTech Connect

    Bottenus, R.E.; Ichinose, A.; Davie, E.W. )

    1990-12-01

    Factor XIII (M{sub r} 320 000) is a blood coagulation factor that stabilizes and strengthens the fibrin clot. It circulates in blood as a tetramer composed of two a subunits (M{sub r} 75 000 each) and two b subunits (M{sub r} 80 000 each). The b subunit consists of 641 amino acids and includes 10 tandem repeats of 60 amino acids known as GP-I structures, short consensus repeats (SCR), or sushi domains. In the present study, the human gene for the b subunit has been isolated from three different genomic libraries prepared in {lambda} phage. Fifteen independent phage with inserts coding for the entire gene were isolated and characterized by restriction mapping, Southern blotting, and DNA sequencing. The gene was found to be 28 kilobases in length and consisted of 12 exons (I-XII) separated by 11 intervening sequences. The leader sequence was encoded by exon I, while the carbonyl-terminal region of the protein was encoded by exon XII. Exons II-XI each coded for a single sushi domain, suggesting that the gene evolved through exon shuffling and duplication. The 12 exons in the gene ranged in size from 64 to 222 base pairs, while the introns ranged in size from 87 to 9970 nucleotides and made up 92{percent} of the gene. One nucleotide change was found in the coding region of the gene when its sequence was compared to that of the cDNA. This difference, however, did not result in a change in the amino acid sequence of the protein.

  2. Sequence and regulation of a gene encoding a human 89-kilodalton heat shock protein

    SciTech Connect

    Hickey, E.; Brandon, S.E.; Weber, L.A.; Lloyd, D.

    1989-06-01

    Vertebrate cells synthesize two forms of the 82- to 90-kilodalton heat shock protein that are encoded by distinct gene families. In HeLa cells, both proteins (hsp89/alpha/ and hspio/beta/) are abundant under normal growth conditions and are synthesized at increased rates in response to heat stress. Only the larger form, hsp89/alpha/, is induced by the adenovirus E1A gene product. The authors have isolated a human hsp89/alpha/ gene that shows complete sequence identity with heat- and E1A-inducible cDNA used as a hybridization probe. The 5'-flanking region contained overlapping and inverted consensus heat shock control elements that can confer heat-inducible expression n a /beta/-globin reporter gene. The gene contained 10 intervening sequences. The first intron was located adjacent to the translation start codon, an arrangement also found in the Drosophila hsp82 gene. The spliced mRNA sequence contained a single open reading frame encoding an 84,564-dalton polypeptide showing high homology with the hsp82 to hsp90 proteins of other organisms. The deduced hsp89/alpha/ protein sequence differed from the human hsp89/beta/ sequence reported elsewhere in at least 99 out of the 732 amino acids. Transcription of the hsp89/alpha/ gene was induced by serum during normal cell growth, but expression did not appear to be restricted to a particular stage of the cell cycles. hsp89/alpha/ mRNA was considerably more stable than the mRNA encoding hsp70, which can account for the higher constitutive rate of hsp89 synthesis in unstressed cells.

  3. Yersinia spp. Identification Using Copy Diversity in the Chromosomal 16S rRNA Gene Sequence.

    PubMed

    Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin

    2016-01-01

    API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method. PMID:26808495

  4. Targeted next-generation sequencing of deafness genes in hearing-impaired individuals uncovers informative mutations

    PubMed Central

    Vona, Barbara; Müller, Tobias; Nanda, Indrajit; Neuner, Cordula; Hofrichter, Michaela A. H.; Schröder, Jörg; Bartsch, Oliver; Läßig, Anne; Keilmann, Annerose; Schraven, Sebastian; Kraus, Fabian; Shehata-Dieler, Wafaa; Haaf, Thomas

    2014-01-01

    Purpose: Targeted next-generation sequencing provides a remarkable opportunity to identify variants in known disease genes, particularly in extremely heterogeneous disorders such as nonsyndromic hearing loss. The present study attempts to shed light on the complexity of hearing impairment. Methods: Using one of two next-generation sequencing panels containing either 80 or 129 deafness genes, we screened 30 individuals with nonsyndromic hearing loss (from 23 unrelated families) and analyzed 9 normal-hearing controls. Results: Overall, we found an average of 3.7 variants (in 80 genes) with deleterious prediction outcome, including a number of novel variants, in individuals with nonsyndromic hearing loss and 1.4 in controls. By next-generation sequencing alone, 12 of 23 (52%) probands were diagnosed with monogenic forms of nonsyndromic hearing loss; one individual displayed a DNA sequence mutation together with a microdeletion. Two (9%) probands have Usher syndrome. In the undiagnosed individuals (10/23; 43%) we detected a significant enrichment of potentially pathogenic variants as compared to controls. Conclusion: Next-generation sequencing combined with microarrays provides the diagnosis for approximately half of the GJB2 mutation–negative individuals. Usher syndrome was found to be more frequent in the study cohort than anticipated. The conditions in a proportion of individuals with nonsyndromic hearing loss, particularly in the undiagnosed group, may have been caused or modified by an accumulation of unfavorable variants across multiple genes. PMID:24875298

  5. Phylogeny and PCR-based classification of Wolbachia strains using wsp gene sequences.

    PubMed Central

    Zhou, W; Rousset, F; O'Neil, S

    1998-01-01

    Wolbachia are a group of intracellular inherited bacteria that infect a wide range of arthropods. They are associated with a number of different reproductive phenotypes in their hosts, such as cytoplasmic incompatibility, parthenogenesis and feminization. While it is known that the bacterial strains responsible for these different host phenotypes form a single clade within the alpha-Proteobacteria, until now it has not been possible to resolve the evolutionary relationships between different Wolbachia strains. To address this issue we have cloned and sequenced a gene encoding a surface protein of Wolbachia (wsp) from a representative sample of 28 Wolbachia strains. The sequences from this gene were highly variable and could be used to resolve the phylogenetic relationships of different Wolbachia strains. Based on the sequence of the wsp gene from different Wolbachia isolates we propose that the Wolbachia pipientis clade be initially divided into 12 groups. As more sequence information becomes available we expect the number of such groups to increase. In addition, we present a method of Wolbachia classification based on the use of group-specific wsp polymerase chain reaction (PGR) primers which will allow Wolbachia isolates to be typed without the need to clone and sequence individual Wolbachia genes. This system should facilitate future studies investigating the distribution and biology of Wolbachia strains from large samples of different host species. PMID:9569669

  6. Nucleotide sequences of immunoglobulin eta genes of chimpanzee and orangutan: DNA molecular clock and hominoid evolution

    SciTech Connect

    Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.

    1987-02-01

    To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.

  7. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering

    PubMed Central

    Kelley, David R.; Liu, Bo; Delcher, Arthur L.; Pop, Mihai; Salzberg, Steven L.

    2012-01-01

    Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested. PMID:22102569

  8. Soluble normal and mutated DNA sequences from single-copy genes in human blood.

    PubMed

    Sorenson, G D; Pribish, D M; Valone, F H; Memoli, V A; Bzik, D J; Yao, S L

    1994-01-01

    Healthy individuals have soluble (extracellular) DNA in their blood, and increased amounts are present in cancer patients. Here we report the detection of specific sequences of the cystic fibrosis and K-ras genes in plasma DNA from normal donors by amplification with the polymerase chain reaction. In addition, mutated K-ras sequences are identified by polymerase chain reaction utilizing allele-specific primers in the plasma or serum from three patients with pancreatic carcinoma that contain mutated K-ras genes. The mutations are confirmed by direct sequencing. These results indicate that sequences of single-copy genes can be identified in normal plasma and that the sequences of mutated oncogenes can be detected and identified with allele-specific amplification by polymerase chain reaction in plasma or serum from patients with malignant tumors containing identical mutated genes. Mutated oncogenes in plasma and serum may represent tumor markers that could be useful for diagnosis, determining response to treatment, and predicting prognosis. PMID:8118388

  9. Target Gene Capture Sequencing in Chinese Population of Sporadic Parkinson Disease

    PubMed Central

    Li, Zhiming; Lin, Qing; Huang, Wenqing; Tzeng, Chi-Meng

    2015-01-01

    Abstract Deciphering of genetic variants plays a critical role in research and clinic of genetic disorders, such as the well known neurodegenerative disease Parkinson disease (PD). To combine pool of targeted genes and next-generation sequencing (NGS), investigators could obtain high efficient but low-cost sequencing data of interested genes. Aim to discover genetic variants that might contribute to PD, we selected 48 candidate genes involved in different pathways and conducted a pilot study to screen nonsynonymous SNPs (nsSNPs) in 4 pooled samples from 237 sporadic Chinese PD patients. Using our custom-designed NimbleGen array and Illumina HiSeq2000, a total of 4 novel nsSNPs (c. 352G>T in STK39, c. 823G>T in DGKQ, c. 36T>A in DLA-DRB5, and c. 1981G>T in GRN) were discovered but not validated by Sanger sequencing. Additionally, we also selected 6 annotated nsSNPs without report in previous PD studies and validated by Sanger sequencing. However, genotyping analysis of 6 validated nsSNPs in 50 PD patients and 50 controls showed no significant differences in cases compared with controls. These data represent the first documentation and validation of these mutations in PD using target gene capture sequencing. Additional replication studies in other populations and functional research are merited to better evaluate precapture multiplex protocol and validate the role of the 6 nsSNPs in PD risk. PMID:25997059

  10. Cloning and nucleotide sequence of the gene encoding the Ecal DNA methyltransferase.

    PubMed Central

    Brenner, V; Venetianer, P; Kiss, A

    1990-01-01

    The gene coding for the GGTNACC specific Ecal DNA methyltransferase (M.Ecal) has been cloned in E. coli from Enterobacter cloacae and its nucleotide sequence has been determined. The ecalM gene codes for a protein of 452 amino acids (Mr: 51,111). It was determined that M.Ecal is an adenine methyltransferase. M.Ecal shows limited amino acid sequence similarity to other adenine methyltransferases. A clone that expresses Ecal methyltransferase at high level was constructed. Images PMID:2183182

  11. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

    PubMed

    Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  12. Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

    PubMed Central

    Dasenko, Mark A.

    2015-01-01

    In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles

  13. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  14. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways.

    PubMed

    Cirulli, Elizabeth T; Lasseigne, Brittany N; Petrovski, Slavé; Sapp, Peter C; Dion, Patrick A; Leblond, Claire S; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E; Boone, Braden E; Wimbish, Jack R; Waite, Lindsay L; Jones, Angela L; Carulli, John P; Day-Williams, Aaron G; Staropoli, John F; Xin, Winnie W; Chesi, Alessandra; Raphael, Alya R; McKenna-Yasek, Diane; Cady, Janet; Vianney de Jong, J M B; Kenna, Kevin P; Smith, Bradley N; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E; Baloh, Robert H; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M; Gibson, Summer; Trojanowski, John Q; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A; Chung, Wendy K; Ravits, John M; Glass, Jonathan D; Sims, Katherine B; Van Deerlin, Vivianna M; Maniatis, Tom; Hayes, Sebastian D; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S; Bedlack, Richard S; Harper, J Wade; Gitler, Aaron D; Rouleau, Guy A; Brown, Robert; Harms, Matthew B; Cooper, Gregory M; Harris, Tim; Myers, Richard M; Goldstein, David B

    2015-03-27

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS patients and 6405 controls. Several known ALS genes were found to be associated, and TBK1 (the gene encoding TANK-binding kinase 1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. PMID:25700176

  15. Molecular cloning, sequence characterization and expression pattern of Rab18 gene from watermelon (Citrullus lanatus)

    PubMed Central

    Xinli, Xiao; Lei, Peng

    2015-01-01

    The complete mRNA sequence of watermelon Rab18 gene was amplified through the rapid amplification of cDNA ends (RACE) method. The full-length mRNA was 1010 bp containing a 645 bp open reading frame, which encodes a protein of 214 amino acids. Sequence analysis revealed that watermelon Rab18 protein shares high homology with the Rab18 of cucumber (99%), muskmelon (98%), Morus notabilis (90%), tomato (89%), wine grape (89%) and potato (88%). Phylogenetic analysis revealed that watermelon Rab18 gene has a closer genetic relationship with Rab18 gene of cucumber and muskmelon. Tissue expression profile analysis indicated that watermelon Rab18 gene was highly expressed in root, stem and leaf, moderately expressed in flower and weakly expressed in fruit. PMID:26019638

  16. Sequence and evolution of the blue cone pigment gene in old and new world primates

    SciTech Connect

    Hunt, D.M.; Cowing, J.A.; Patel, R.

    1995-06-10

    The sequences of the blue cone photopigments in the talapoin monkey (Miopithecus talapoin), an Old World primate, and in the marmoset (Callithrix jacchus), a New World monkey, are presented. Both genes are composed of 5 exons separated by 4 introns. In this respect, they are identical to the human blue gene, and intron sizes are also similar. Based on the level of amino acid identity, both monkey pigments are members of the S branch of pigments. Alignment of these sequences with the human gene requires the insertion/deletion of two separate codons in exon 1. The silent site divergence between these primate blue genes indicates a separation of the Old and New World primate lineages around 43 million years ago. 41 refs., 1 fig., 3 tabs.

  17. Discovery of single-gene inborn errors of immunity by next generation sequencing

    PubMed Central

    Conley, Mary Ellen; Casanova, Jean-Laurent

    2014-01-01

    Many patients with clinical and laboratory evidence of primary immunodeficiency do not have a gene specific diagnosis. The use of next generation sequencing, particularly whole exome sequencing, has given us an extraordinarily powerful tool to identify the disease-causing genes in some of these patients. At least 34 new gene defects have been identified in the last 4 years. These findings document the striking heterogeneity of the phenotype in patients with mutations in the same gene. In some cases this can be attributed to loss-of-function mutations in some patients, but gain-of-function mutations in others. In addition, the surprisingly high frequency of autosomal dominant immunodeficiencies with variable penetrance, and de novo mutations in disorders with a severe phenotype has been unmasked. PMID:24886697

  18. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data.

    PubMed

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  19. Cloning, Sequencing, and Expression of the Chitinase Gene chiA74 from Bacillus thuringiensis

    PubMed Central

    Barboza-Corona, J. Eleazar; Nieto-Mazzocco, Elizabeth; Velázquez-Robledo, Rocio; Salcedo-Hernandez, Rubén; Bautista, Mayela; Jiménez, Beatriz; Ibarra, Jorge E.

    2003-01-01

    The endochitinase gene chiA74 from Bacillus thuringiensis serovar kenyae strain LBIT-82 was cloned in Escherichia coli DH5αF′. A sequence of 676 amino acids was deduced when the gene was completely sequenced. A molecular mass of 74 kDa was estimated for the preprotein, which includes a putative 4-kDa signal sequence located at the N terminus. The deduced amino acid sequence showed high degree of identity with other chitinases such as ChiB from Bacillus cereus (98%) and ChiA71 from Bacillus thuringiensis serovar pakistani (70%). Additionally, ChiA74 showed a modular structure comprised of three domains: a catalytic domain, a fibronectin-like domain, and a chitin-binding domain. All three domains showed conserved sequences when compared to other bacterial chitinase sequences. A ca. 70-kDa mature protein expressed by the cloned gene was detected in zymograms, comigrating with a chitinase produced by the LBIT-82 wild-type strain. ChiA74 is active within a wide pH range (4 to 9), although a bimodal activity was shown at pH 4.79 and 6.34. The optimal temperature was estimated at 57.2°C when tested at pH 6. The potential use of ChiA74 as a synergistic agent, along with the B. thuringiensis insecticidal Cry proteins, is discussed. PMID:12571025

  20. Identification of a DNA methylation-dependent activator sequence in the pseudoxanthoma elasticum gene, ABCC6.

    PubMed

    Arányi, Tamás; Ratajewski, Marcin; Bardóczy, Viola; Pulaski, Lukasz; Bors, András; Tordai, Attila; Váradi, András

    2005-05-13

    ABCC6 encodes MRP6, a member of the ABC protein family with an unknown physiological role. The human ABCC6 and its two pseudogenes share 99% identical DNA sequence. Loss-of-function mutations of ABCC6 are associated with the development of pseudoxanthoma elasticum (PXE), a recessive hereditary disorder affecting the elastic tissues. Various disease-causing mutations were found in the coding region; however, the mutation detection rate in the ABCC6 coding region of bona fide PXE patients is only approximately 80%. This suggests that polymorphisms or mutations in the regulatory regions may contribute to the development of the disease. Here, we report the first characterization of the ABCC6 gene promoter. Phylogenetic in silico analysis of the 5' regulatory regions revealed the presence of two evolutionarily conserved sequence elements embedded in CpG islands. The study of DNA methylation of ABCC6 and the pseudogenes identified a correlation between the methylation of the CpG island in the proximal promoter and the ABCC6 expression level in cell lines. Both activator and repressor sequences were uncovered in the proximal promoter by reporter gene assays. The most potent activator sequence was one of the conserved elements protected by DNA methylation on the endogenous gene in non-expressing cells. Finally, in vitro methylation of this sequence inhibits the transcriptional activity of the luciferase promoter constructs. Altogether these results identify a DNA methylation-dependent activator sequence in the ABCC6 promoter. PMID:15760889

  1. Development of a Comprehensive Sequencing Assay for Inherited Cardiac Condition Genes.

    PubMed

    Pua, Chee Jian; Bhalshankar, Jaydutt; Miao, Kui; Walsh, Roddy; John, Shibu; Lim, Shi Qi; Chow, Kingsley; Buchan, Rachel; Soh, Bee Yong; Lio, Pei Min; Lim, Jaclyn; Schafer, Sebastian; Lim, Jing Quan; Tan, Patrick; Whiffin, Nicola; Barton, Paul J; Ware, James S; Cook, Stuart A

    2016-02-01

    Inherited cardiac conditions (ICCs) are characterised by marked genetic and allelic heterogeneity and require extensive sequencing for genetic characterisation. We iteratively optimised a targeted gene capture panel for ICCs that includes disease-causing, putatively pathogenic, research and phenocopy genes (n = 174 genes). We achieved high coverage of the target region on both MiSeq (>99.8% at ≥ 20× read depth, n = 12) and NextSeq (>99.9% at ≥ 20×, n = 48) platforms with 100% sensitivity and precision for single nucleotide variants and indels across the protein-coding target on the MiSeq. In the final assay, 40 out of 43 established ICC genes informative in clinical practice achieved complete coverage (100 % at ≥ 20×). By comparison, whole exome sequencing (WES; ∼ 80×), deep WES (∼ 500×) and whole genome sequencing (WGS; ∼ 70×) had poorer performance (88.1, 99.2 and 99.3% respectively at ≥ 20×) across the ICC target. The assay described here delivers highly accurate and affordable sequencing of ICC genes, complemented by accessible cloud-based computation and informatics. See Editorial in this issue (DOI: 10.1007/s12265-015-9667-8 ). PMID:26888179

  2. POLYMORPHISM IN THE CODING REGION SEQUENCE OF GDF8 GENE IN INDIAN SHEEP.

    PubMed

    Pothuraju, M; Mishra, S K; Kumar, S N; Mohamed, N F; Kataria, R S; Yadav, D K; Arora, R

    2015-11-01

    The present study was undertaken to identify polymorphism in the coding sequence of GDF8gene across indigenous meat type sheep breeds. A 1647 bp sequence was generated, encompassing 208 bp of the 5'UTR, 1128 bp of coding region (exon1, 2 and 3) as well as 311 bp of 3'UTR. The sheep and goat GDF8 gene sequences were observed to be highly conserved as compared to cattle, buffalo, horse and pig. Several nucleotide variations were observed across coding sequence of GDF8 gene in Indian sheep. Three polymorphic sites were identified in the 5'UTR, one in exon 1 and one in the exon 2 regions. Both SNPs in the exonic region were found to be non-synonymous. The mutations c.539T > G and c.821T > A discovered in this study in the exon 1 and exon 2, respectively, have not been previously reported. The information generated provides preliminary indication of the functional diversity present in Indian sheep at the coding region of GDF8gene. The novel as well as the previously reported SNPs discovered in the Indian sheep warrant further analysis to see whether they affect the phenotype. Future studies will need to establish the affect of reported SNPs in the expression of the GDF8 gene in Indian sheep population. PMID:26845859

  3. DNA sequence and expression of the 36-kilodalton outer membrane protein gene of Brucella abortus.

    PubMed Central

    Ficht, T A; Bearden, S W; Sowa, B A; Adams, L G

    1989-01-01

    The cloning of the gene(s) encoding a 36-kilodalton (kDa) cell envelope protein of Brucella abortus has been previously described (T. A. Ficht, S. W. Bearden, B. A. Sowa, and L. G. Adams, Infect, Immun. 56:2036-2046, 1988). In an attempt to define the nature of the previously described duplication at this locus we have sequenced 3,500 base pairs of genomic DNA encompassing this region. The duplication represented two similar open reading frames which shared more than 85% homology at the nucleotide level but differed primarily because of the absence of 108 nucleotides from one of the two gene copies. These two genes were read from opposite strands and potentially encoded proteins which are 96% homologous. The predicted gene products were identical over the first 100 amino acids, including 22-amino-acid-long signal sequences. The amino acid composition of the predicted proteins was similar to that obtained for the Brucella porin isolated by Verstreate et al. (D. R. Verstreate, M. T. Creasy, N. T. Caveney, C. L. Baldwin, M. W. Blab, and A. J. Winter, Infect. Immun. 35:979-989, 1982) and presumably represented two copies of the porin gene, tentatively identified as omp 2a (silent) and omp 2b (expressed). The homology between the two genes extended to and included Shine-Dalgarno sequences 7 base pairs upstream from the ATG start codons. Homology at the 3' ends extended only as far as the termination codon, but both genes had putative rho-independent transcription termination sites. Localization of the promoters proved more difficult, since the canonical procaryotic sequences could not be identified in the region upstream of either gene. Promoter activity was demonstrated by ligation to a promoterless lacZ gene in pMC1871. However, only one active promoter could be identified by using this system. A 36-kDa protein was synthesized in E. coli with the promoter in the native orientation and was identical in size to the protein produced in laboratory-grown B. abortus. When

  4. Cloning and sequencing of the growth hormone gene of large yellow croaker and its phylogenetic significance.

    PubMed

    Chen, Yun; Wang, Yaping; He, Shunping; Zhu, Zuoyan

    2004-10-01

    Using conserved primers and the PCR reaction, the growth hormone (GH) gene and the 3'-UTR of the large yellow croaker (Pseudosciaena crocea) were amplified and sequenced. The gene structure was analyzed and compared to the GH genes of 5 other percoid fish downloaded from Genbank. Also the GH gene of the large yellow croaker and the genes from 14 Percoidei and 2 Labroidei species were aligned using Clustal X. A matrix of 564 bp was used to construct the phylogenetic tree using maximum parsimony and neighbor-joining methods. Phylogenetic trees by the two methods are identical in most of the clades with high bootstrap support. The results are also identical to those from morphological data. In general, this analysis does not support the monophyll of the families Centropomidae and Carangidae. But our GH gene tree indicates that the representative species of the families Sparidae and Sciaenidae are a monophyletic group. PMID:15524313

  5. The topoisomerase I gene from Ustilago maydis: sequence, disruption and mutant phenotype.

    PubMed Central

    Gerhold, D; Thiyagarajan, M; Kmiec, E B

    1994-01-01

    The Ustilago maydis genomic TOP1 gene encoding DNA topoisomerase I was cloned by amplifying a gene fragment using the polymerase chain reaction, and using this fragment to search a genomic DNA library by hybridization. The predicted peptide sequence exhibited 30-40% identity to other eukaryotic TOP1 genes, yet differed in several features. First, an unusually long acidic region was identified near the amino terminus (28/29 residues are acidic), which resembles other nucleolar peptide motifs. Second, an atypical carboxy-terminal 'tail', absent in other TOP1 genes, followed the active site tyrosine residue. A top1 gene disruption mutant was constructed by replacing the genomic TOP1 gene, with a top1::HygR null allele. This mutant lost the abundant topoisomerase I activity evident in wild-type U.maydis, and displayed a subtle coloration phenotype evident during cell senescence. Images PMID:7937091

  6. Complete sequence and gene organization of the mitochondrial genome of Asio flammeus (Strigiformes, strigidae).

    PubMed

    Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei

    2016-07-01

    The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group. PMID:25980662

  7. Expression profiling of Drosophila mitochondrial genes via deep mRNA sequencing

    PubMed Central

    Torres, Tatiana Teixeira; Dolezal, Marlies; Schlötterer, Christian; Ottenwälder, Birgit

    2009-01-01

    Mitochondria play an essential role in several cellular processes. Nevertheless, very little is known about patterns of gene expression of genes encoded by the mitochondrial DNA (mtDNA). In this study, we used next-generation sequencing (NGS) for transcription profiling of genes encoded in the mitochondrial genome of Drosophila melanogaster and D. pseudoobscura. The analysis of males and females in both species indicated that the expression pattern was conserved between the two species, but differed significantly between both sexes. Interestingly, mRNA levels were not only different among genes encoded by separate transcription units, but also showed significant differences among genes located in the same transcription unit. Hence, mRNA abundance of genes encoded by mtDNA seems to be heavily modulated by post-transcriptional regulation. Finally, we also identified several transcripts with a noncanonical structure, suggesting that processing of mitochondrial transcripts may be more complex than previously assumed. PMID:19843606

  8. Molecular cloning and long terminal repeat sequences of human endogenous retrovirus genes related to types A and B retrovirus genes

    SciTech Connect

    Ono, M.

    1986-06-01

    By using a DNA fragment primarily encoding the reverse transcriptase (pol) region of the Syrian hamster intracisternal A particle (IAP; type A retrovirus) gene as a probe, human endogenous retrovirus genes, tentatively termed HERV-K genes, were cloned from a fetal human liver gene library. Typical HERV-K genes were 9.1 or 9.4 kilobases in length, having long terminal repeats (LTRs) of ca. 970 base pairs. Many structural features commonly observed on the retrovirus LTRs, such as the TATAA box, polyadenylation signal, and terminal inverted repeats, were present on each LTR, and a lysine (K) tRNA having a CUU anticodon was identified as a presumed primer tRNA. The HERV-K LTR, however, had little sequence homology to either the IAP LTR or other typical oncovirus LTRs. By filter hybridization, the number of HERV-K genes was estimated to be ca. 50 copies per haploid human genome. The cloned mouse mammary tumor virus (type B) gene was found to hybridize with both the HERV-K and IAP genes to essentially the same extent.

  9. Analysis of Pseudomonas putida alkane-degradation gene clusters and flanking insertion sequences: evolution and regulation of the alk genes.

    PubMed

    van Beilen, J B; Panke, S; Lucchini, S; Franchini, A G; Röthlisberger, M; Witholt, B

    2001-06-01

    The Pseudomonas putida GPo1 (commonly known as Pseudomonas oleovorans GPo1) alkBFGHJKL and alkST gene clusters, which encode proteins involved in the conversion of n-alkanes to fatty acids, are located end to end on the OCT plasmid, separated by 9.7 kb of DNA. This DNA segment encodes, amongst others, a methyl-accepting transducer protein (AlkN) that may be involved in chemotaxis to alkanes. In P. putida P1, the alkBFGHJKL and alkST gene clusters are flanked by almost identical copies of the insertion sequence ISPpu4, constituting a class 1 transposon. Other insertion sequences flank and interrupt the alk genes in both strains. Apart from the coding regions of the GPo1 and P1 alk genes (80-92% sequence identity), only the alkB and alkS promoter regions are conserved. Competition experiments suggest that highly conserved inverted repeats in the alkB and alkS promoter regions bind ALKS: PMID:11390693

  10. The structure and complete nucleotide sequence of the human cyclophilin 40 (PPID) gene

    SciTech Connect

    Yokoi, Haruhiko; Shimizu, Yukiko; Anazawa, Hideharu

    1996-08-01

    Cyclophilin 40 is a recently identified member of the cyclophilin family that is found in an unactivated steroid hormone receptor complex. Cyclophilin 40 possesses a region homologous to FKBP59, a member of the FK506-binding protein family that is also a component of the receptor complex. We report the isolation and sequencing of the entire human cyclophilin 40 (hCyP40) gene (human gene symbol PPID). The gene contains 10 exons (43 to 698 bp) and 9 introns encompassing 14.2 kb. The exon organization of the cyclophilin-like region is not similar to that of the human cyclophilin A gene (PPIA), suggesting their early divergence in evolution. Determination of the sequence of the 5{prime} end of the hCyP40 mRNA by an {open_quotes}anchor-ligation PCR{close_quotes} procedure showed that transcription is initiated from a cluster of sites about 80 bp upstream from the first in-frame ATG. The immediate 5{prime}-flanking region of the gene lacks typical TATA and CAAT boxes, but is GC-rich and contains Sp1 sites, features characteristic of promoters associated with housekeeping genes. The hCyP40 gene was mapped to chromosome 4 by PCR with genomic DNA from somatic cell hybrids. As shown by {open_quotes}Zoo blot{close_quotes} analysis, the cylophilin 40 gene appears to be highly conserved throughout evolution. 47 refs., 4 figs., 1 tab.

  11. Characterization of 16 novel human genes showing high similarity to yeast sequences.

    PubMed

    Stanchi, F; Bertocco, E; Toppo, S; Dioguardi, R; Simionati, B; Cannata, N; Zimbello, R; Lanfranchi, G; Valle, G

    2001-01-15

    The entire set of open reading frames (ORFs) of Saccharomyces cerevisiae has been used to perform systematic similarity searches against nucleic acid and protein databases: with the aim of identifying interesting homologies between yeast and mammalian genes. Many similarities were detected: mostly with known genes. However: several yeast ORFs were only found to match human partial sequence tags: indicating the presence of human transcripts still uncharacterized that have a homologous counterpart in yeast. About 30 such transcripts were further studied and named HUSSY (human sequence similar to yeast). The 16 most interesting are presented in this paper along with their sequencing and mapping data. As expected: most of these genes seem to be involved in basic metabolic and cellular functions (lipoic acid biosynthesis: ribulose-5-phosphate-3-epimerase: glycosyl transferase: beta-transducin: serine-threonine-kinase: ABC proteins: cation transporters). Genes related to RNA maturation were also found (homologues to DIM1: ROK1-RNA-elicase and NFS1). Furthermore: five novel human genes were detected (HUSSY-03: HUSSY-22: HUSSY-23: HUSSY-27: HUSSY-29) that appear to be homologous to yeast genes whose function is still undetermined. More information on this work can be obtained at the website http://grup.bio.unipd.it/hussy PMID:11124703

  12. Harnessing Gene Conversion in Chicken B Cells to Create a Human Antibody Sequence Repertoire

    PubMed Central

    Schusser, Benjamin; Yi, Henry; Collarini, Ellen J.; Izquierdo, Shelley Mettler; Harriman, William D.; Etches, Robert J.; Leighton, Philip A.

    2013-01-01

    Transgenic chickens expressing human sequence antibodies would be a powerful tool to access human targets and epitopes that have been intractable in mammalian hosts because of tolerance to conserved proteins. To foster the development of the chicken platform, it is beneficial to validate transgene constructs using a rapid, cell culture-based method prior to generating fully transgenic birds. We describe a method for the expression of human immunoglobulin variable regions in the chicken DT40 B cell line and the further diversification of these genes by gene conversion. Chicken VL and VH loci were knocked out in DT40 cells and replaced with human VK and VH genes. To achieve gene conversion of human genes in chicken B cells, synthetic human pseudogene arrays were inserted upstream of the functional human VK and VH regions. Proper expression of chimeric IgM comprised of human variable regions and chicken constant regions is shown. Most importantly, sequencing of DT40 genetic variants confirmed that the human pseudogene arrays contributed to the generation of diversity through gene conversion at both the Igl and Igh loci. These data show that engineered pseudogene arrays produce a diverse pool of human antibody sequences in chicken B cells, and suggest that these constructs will express a functional repertoire of chimeric antibodies in transgenic chickens. PMID:24278246

  13. Sequence evolution and expression regulation of stress-responsive genes in natural populations of wild tomato.

    PubMed

    Fischer, Iris; Steige, Kim A; Stephan, Wolfgang; Mboup, Mamadou

    2013-01-01

    The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives. PMID:24205149

  14. Sequence Evolution and Expression Regulation of Stress-Responsive Genes in Natural Populations of Wild Tomato

    PubMed Central

    Fischer, Iris; Steige, Kim A.; Stephan, Wolfgang; Mboup, Mamadou

    2013-01-01

    The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives. PMID:24205149

  15. A Multiplexed Amplicon Approach for Detecting Gene Fusions by Next-Generation Sequencing.

    PubMed

    Beadling, Carol; Wald, Abigail I; Warrick, Andrea; Neff, Tanaya L; Zhong, Shan; Nikiforov, Yuri E; Corless, Christopher L; Nikiforova, Marina N

    2016-03-01

    Chromosomal rearrangements that result in oncogenic gene fusions are clinically important drivers of many cancer types. Rapid and sensitive methods are therefore needed to detect a broad range of gene fusions in clinical specimens that are often of limited quantity and quality. We describe a next-generation sequencing approach that uses a multiplex PCR-based amplicon panel to interrogate fusion transcripts that involve 19 driver genes and 94 partners implicated in solid tumors. The panel also includes control assays that evaluate the 3'/5' expression ratios of 12 oncogenic kinases, which might be used to infer gene fusion events when the partner is unknown or not included on the panel. There was good concordance between the solid tumor fusion gene panel and other methods, including fluorescence in situ hybridization, real-time PCR, Sanger sequencing, and other next-generation sequencing panels, because 40 specimens known to harbor gene fusions were correctly identified. No specific fusion reads were observed in 59 fusion-negative specimens. The 3'/5' expression ratio was informative for fusions that involved ALK, RET, and NTRK1 but not for BRAF or ROS1 fusions. However, among 37 ALK or RET fusion-negative specimens, four exhibited elevated 3'/5' expression ratios, indicating that fusions predicted solely by 3'/5' read ratios require confirmatory testing. PMID:26747586

  16. GOblet: a platform for Gene Ontology annotation of anonymous sequence data

    PubMed Central

    Groth, Detlef; Lehrach, Hans; Hennig, Steffen

    2004-01-01

    GOblet is a comprehensive web server application providing the annotation of anonymous sequence data with Gene Ontology (GO) terms. It uses a variety of different protein databases (human, murines, invertebrates, plants, sp-trembl) and their respective GO mappings. The user selects the appropriate database and alignment threshold and thereafter submits single or multiple nucleotide or protein sequences. Results are shown in different ways, e.g. as survey statistics for the main GO categories for all sequences or as detailed results for each single sequence that has been submitted. In its newest version, GOblet allows the batch submission of sequences and provides an improved display of results with the aid of Java applets. All output data, together with the Java applet, are packed to a downloadable archive for local installation and analysis. GOblet can be accessed freely at http://goblet.molgen.mpg.de. PMID:15215401

  17. Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans

    SciTech Connect

    Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.; Jones, W.A.; Kirby, R.; Woods, D.R.

    1987-01-01

    The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homology (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.

  18. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization.

    PubMed

    Anahtar, Melis N; Bowman, Brittany A; Kwon, Douglas S

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  19. Recombinations between Alu repeat sequences that result in partial deletions within the C1 inhibitor gene.

    PubMed

    Ariga, T; Carter, P E; Davis, A E

    1990-12-01

    Genomic DNA sequence analysis was used to define the extent of deletions within the C1 inhibitor gene in two families with type I hereditary angioneurotic edema. Southern blot analysis initially indicated the presence of the partial deletions. One deletion was approximately 2 kb and included exon VII, whereas the other was approximately 8.5 kb and included exons IV-VI. Genomic libraries from an affected member of each family were constructed and clones containing the deletions were analyzed. Sequence analysis of the deletion joints of the mutants and corresponding regions of the normal gene in the two families demonstrated that both deletion joints resulted from recombination of two Alu repetitive DNA elements. Alu repeat sequences from introns VI and VII combined to make a novel Alu in family A, and Alu sequences in introns III and VI were spliced to make a new Alu in family B. The splice sites in the Alu sequences of both mutants were located in the left arm of the Alu element, and both recombination joints overlapped one of the RNA polymerase III promoter sequences. Because the involved Alu sequences, in both instances, were oriented in the same direction, unequal crossingover is the most likely mechanism to account for these mutations. PMID:2276734

  20. Sequence characterisation of deletion breakpoints in the dystrophin gene by PCR

    SciTech Connect

    Abbs, S.; Sandhu, S.; Bobrow, M.

    1994-09-01

    Partial deletions of the dystrophin gene account for 65% of cases of Duchenne muscular dystrophy. A high proportion of these structural changes are generated by new mutational events, and lie predominantly within two `hotspot` regions, yet the underlying reasons for this are not known. We are characterizing and sequencing the regions surrounding deletion breakpoints in order to: (i) investigate the mechanisms of deletion mutation, and (ii) enable the design of PCR assays to specifically amplify mutant and normal sequences, allowing us to search for the presence of somatic mosaicism in appropriate family members. Using this approach we have been able to demonstrate the presence of somatic mosaicism in a maternal grandfather of a DMD-affected male, deleted for exons 49-50. Three deletions, namely of exons 48-49, 49-50, and 50, have been characterized using a PCR approach that avoids any cloning procedures. Breakpoints were initially localized to within regions of a few kilobases using Southern blot restriction analyses with exon-specific probes and PCR amplification of exonic and intronic loci. Sequencing was performed directly on PCR products: (i) mutant sequences were obtained from long-range or inverse-PCR across the deletion junction fragments, and (ii) normal sequences were obtained from the products of standard PCR, vectorette PCR, or inverse-PCR performed on YACs. Further characterization of intronic sequences will allow us to amplify and sequence across other deletion breakpoints and increase our knowledge of the mechanisms of mutation in the dystophin gene.

  1. Efficient Nucleic Acid Extraction and 16S rRNA Gene Sequencing for Bacterial Community Characterization

    PubMed Central

    Anahtar, Melis N.; Bowman, Brittany A.; Kwon, Douglas S.

    2016-01-01

    There is a growing appreciation for the role of microbial communities as critical modulators of human health and disease. High throughput sequencing technologies have allowed for the rapid and efficient characterization of bacterial communities using 16S rRNA gene sequencing from a variety of sources. Although readily available tools for 16S rRNA sequence analysis have standardized computational workflows, sample processing for DNA extraction remains a continued source of variability across studies. Here we describe an efficient, robust, and cost effective method for extracting nucleic acid from swabs. We also delineate downstream methods for 16S rRNA gene sequencing, including generation of sequencing libraries, data quality control, and sequence analysis. The workflow can accommodate multiple samples types, including stool and swabs collected from a variety of anatomical locations and host species. Additionally, recovered DNA and RNA can be separated and used for other applications, including whole genome sequencing or RNA-seq. The method described allows for a common processing approach for multiple sample types and accommodates downstream analysis of genomic, metagenomic and transcriptional information. PMID:27168460

  2. Inversion of Moraxella lacunata type 4 pilin gene sequences by a Neisseria gonorrhoeae site-specific recombinase.

    PubMed Central

    Rozsa, F W; Meyer, T F; Fussenegger, M

    1997-01-01

    A plasmid library of Neisseria gonorrhoeae sequences was screened for the ability to mediate recombinations on a sequence containing the Moraxella lacunata type 4 pilin gene invertible region in Escherichia coli. A plasmid containing the N. gonorrhoeae sequence encoding the putative recombinase (gcr) was identified and sequenced. Plasmids containing gcr were able to mediate site-specific recombinations despite a weak amino acid homology to Piv, the native M. lacunata pilin gene invertase. The gcr gene is present only in pathogenic strains of Neisseria tested; however, in our assays gene knockouts of gcr did not alter the variation of surface features that play a role in the pathogenesis of N. gonorrhoeae. PMID:9079926

  3. Whole-genome sequencing and identification of Morganella morganii KT pathogenicity-related genes

    PubMed Central

    2012-01-01

    Background The opportunistic enterobacterium, Morganella morganii, which can cause bacteraemia, is the ninth most prevalent cause of clinical infections in patients at Changhua Christian Hospital, Taiwan. The KT strain of M. morganii was isolated during postoperative care of a cancer patient with a gallbladder stone who developed sepsis caused by bacteraemia. M. morganii is sometimes encountered in nosocomial settings and has been causally linked to catheter-associated bacteriuria, complex infections of the urinary and/or hepatobiliary tracts, wound infection, and septicaemia. M. morganii infection is associated with a high mortality rate, although most patients respond well to appropriate antibiotic therapy. To obtain insights into the genome biology of M. morganii and the mechanisms underlying its pathogenicity, we used Illumina technology to sequence the genome of the KT strain and compared its sequence with the genome sequences of related bacteria. Results The 3,826,919-bp sequence contained in 58 contigs has a GC content of 51.15% and includes 3,565 protein-coding sequences, 72 tRNA genes, and 10 rRNA genes. The pathogenicity-related genes encode determinants of drug resistance, fimbrial adhesins, an IgA protease, haemolysins, ureases, and insecticidal and apoptotic toxins as well as proteins found in flagellae, the iron acquisition system, a type-3 secretion system (T3SS), and several two-component systems. Comparison with 14 genome sequences from other members of Enterobacteriaceae revealed different degrees of similarity to several systems found in M. morganii. The most striking similarities were found in the IS4 family of transposases, insecticidal toxins, T3SS components, and proteins required for ethanolamine use (eut operon) and cobalamin (vitamin B12) biosynthesis. The eut operon and the gene cluster for cobalamin biosynthesis are not present in the other Proteeae genomes analysed. Moreover, organisation of the 19 genes of the eut operon differs from

  4. Transcriptome Sequencing and Expression Analysis of Terpenoid Biosynthesis Genes in Litsea cubeba

    PubMed Central

    Han, Xiao-Jiao; Wang, Yang-Dong; Chen, Yi-Cun; Lin, Li-Yuan; Wu, Qing-Ke

    2013-01-01

    Background Aromatic essential oils extracted from fresh fruits of Litsea cubeba (Lour.) Pers., have diverse medical and economic values. The dominant components in these essential oils are monoterpenes and sesquiterpenes. Understanding the molecular mechanisms of terpenoid biosynthesis is essential for improving the yield and quality of terpenes. However, the 40 available L. cubeba nucleotide sequences in the public databases are insufficient for studying the molecular mechanisms. Thus, high-throughput transcriptome sequencing of L. cubeba is necessary to generate large quantities of transcript sequences for the purpose of gene discovery, especially terpenoid biosynthesis related genes. Results Using Illumina paired-end sequencing, approximately 23.5 million high-quality reads were generated. De novo assembly yielded 68,648 unigenes with an average length of 834 bp. A total of 38,439 (56%) unigenes were annotated for their functions, and 35,732 and 25,806 unigenes could be aligned to the GO and COG database, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG), 16,130 unigenes were assigned to 297 KEGG pathways, and 61 unigenes, which contained the mevalonate and 2-C-methyl-D-erythritol 4-phosphate pathways, could be related to terpenoid backbone biosynthesis. Of the 12,963 unigenes, 285 were annotated to the terpenoid pathways using the PlantCyc database. Additionally, 14 terpene synthase genes were identified from the transcriptome. The expression patterns of the 16 genes related to terpenoid biosynthesis were analyzed by RT-qPCR to explore their putative functions. Conclusion RNA sequencing was effective in identifying a large quantity of sequence information. To our knowledge, this study is the first exploration of the L. cubeba transcriptome, and the substantial amount of transcripts obtained will accelerate the understanding of the molecular mechanisms of essential oils biosynthesis. The results may help

  5. GeneLook: a novel ab initio gene identification system suitable for automated annotation of prokaryotic sequences.

    PubMed

    Nishi, Tatsunari; Ikemura, Toshimichi; Kanaya, Shigehiko

    2005-02-14

    With the rapid increases in the amounts of sequence data for prokaryotic genomes, it has become important to develop systems for automated and accurate genome annotation. We present herein a novel ab initio gene identification system, GeneLook, that predicts protein-coding open reading frames (ORFs) with high sensitivity and specificity with no prior knowledge of the sequence composition. The system predicts protein-coding ORFs in two stages, seed ORF selection and main prediction. In the selection of reliable seed ORFs containing at least 200 codons, GeneLook predicts translation start sites and operon structures through searches for ribosome-binding sites and a novel operon prediction algorithm. The codon and nucleotide frequencies of seed ORFs are then used to determine values for two new coding-potential parameters for identification of protein-coding ORFs of at least 34 codons and for another parameter that improves the prediction accuracy for GC-rich genomes. In the main prediction, GeneLook uses these parameters to identify the most likely genes of a given minimal length. We assessed the performance of GeneLook with two indices, sensitivity and specificity that are defined as true positives (TP)/(TP+false negatives) and TP/(TP+false positives), respectively. This system predicted protein-coding ORFs for Escherichia coli and Bacillus subtilis with sensitivities of 96.5% and 96.2%, respectively, and specificities of 96.9% and 96.1%, respectively. The system also identified 94.1% of annotated genes of the Pseudomonas aeruginosa genome, which is GC-rich, with high specificity (97.2%). Furthermore, GeneLook identified protein-coding ORFs with high accuracy from a wide variety of prokaryotic genomes. PMID:15716020

  6. Sequence variation within the rRNA gene loci of 12 Drosophila species

    PubMed Central

    Stage, Deborah E.; Eickbush, Thomas H.

    2007-01-01

    Concerted evolution maintains at near identity the hundreds of tandemly arrayed ribosomal RNA (rRNA) genes and their spacers present in any eukaryote. Few comprehensive attempts have been made to directly measure the identity between the rDNA units. We used the original sequencing reads (trace archives) available through the whole-genome shotgun sequencing projects of 12 Drosophila species to locate the sequence variants within the 7.8–8.2 kb transcribed portions of the rDNA units. Three to 18 variants were identified in >3% of the total rDNA units from 11 species. Species where the rDNA units are present on multiple chromosomes exhibited only minor increases in sequence variation. Variants were 10–20 times more abundant in the noncoding compared with the coding regions of the rDNA unit. Within the coding regions, variants were three to eight times more abundant in the expansion compared with the conserved core regions. The distribution of variants was largely consistent with models of concerted evolution in which there is uniform recombination across the transcribed portion of the unit with the frequency of standing variants dependent upon the selection pressure to preserve that sequence. However, the 28S gene was found to contain fewer variants than the 18S gene despite evolving 2.5-fold faster. We postulate that the fewer variants in the 28S gene is due to localized gene conversion or DNA repair triggered by the activity of retrotransposable elements that are specialized for insertion into the 28S genes of these species. PMID:17989256

  7. Sequence heterogeneity, multiplicity, and genomic organization of. cap alpha. - and. beta. -tubulin genes in Sea Urchins

    SciTech Connect

    Alexandraki, D.; Ruderman, J.V.

    1981-12-01

    The authors analyzed the multiplicity, heterogeneity, and organization of the genes encoding the ..cap alpha.. and ..beta.. tubulins in the sea urchin Lytechinus pictus by using cloned complementary deoxyribonucleic acid (cDNA) and genomic tubulin sequences. cDNA clones were constructed by using immature spermatogenic testis polyadenylic acid-containing ribonucleic acid as a template. ..cap alpha.. and ..beta..-tubulin clones were identified by hybrid selection and in vitro translation of the corresponding messenger ribonucleic acids, followed by immunoprecipitation and two-dimensional gel electrophoresis of the translation products. The ..cap alpha.. cDNA clone contains a sequence that encodes the 48 C-terminal amino acids of ..cap alpha.. tubulin and 104 base pairs of the 3' nontranslated portion of the messenger ribonucleic acid. The ..beta.. cDNA insertion contains the coding sequence for the 100 C-terminal amino acids of ..beta.. tubulin and 83 base pairs of the 3' noncoding sequence. Hybrid selections performed at different criteria demonstrated the presence of several heterogeneous, closely related tubulin messenger ribonucleic acids, suggesting the existence of heterogeneous ..cap alpha..- and ..beta..-tubulin genes. Hybridization analyses indicated that there are at least 9 to 13 sequences for each of the two tubulin gene families per haploid genome. Hybridization of the cDNA probes to both total genomic DNA and cloned germline DNA fragments gave no evidence for close physical linkage of ..cap alpha..-tubulin genes with ..beta..-tubulin genes at the DNA level. In contrast, these experiments indicated that some genes within the same family are clustered.

  8. Discovery of clubroot-resistant genes in Brassica napus by transcriptome sequencing.

    PubMed

    Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B

    2016-01-01

    Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence. PMID:27525940

  9. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  10. Sequence heterogeneity and differential expression of the alpha-Amy2 gene family in wheat.

    PubMed

    Huttly, A K; Martienssen, R A; Baulcombe, D C

    1988-10-01

    The alpha-Amy2 genes of wheat are a multigene family which is expressed in the aleurone cells of germinating grain under control of the plant hormone gibberellin. A subset of the genes are also expressed in developing grain. Comparison of five genomic clones containing alpha-Amy2 genes, using DNA sequence analysis and Southern hybridisation, showed that the extent of similarity between genes differed. Two of the most heterogeneous genes compared were located to the same group 7 chromosome while the most similar genes alpha-Amy2/54 and alpha-Amy2/8 were located to different ones; hence sequence variation could not be correlated to the ancestry of the alpha-Amy2 genes during the separate existence of the constituent genomes of hexaploid wheat. Expression of the cloned genes was measured using an S1 nuclease protection assay and this identified alpha-Amy2/54 and alpha-Amy2/8 as part of the subset of alpha-Amy2 genes expressed in both the developing grain and in aleurone cells. Comparison of the 5' upstream regions of all five genes showed high similarity, with the exception of one gene, up to -280 nucleotides from the transcriptional start, while similarity between alpha-Amy2/54 and alpha-Amy2/8 extended a further 90 bp upstream of this point. It is suggested that regulatory elements responsible for tissue specificity and gibberellin regulation may be located within these regions of similarity. PMID:2467183

  11. De Novo Transcriptome Sequencing of Oryza officinalis Wall ex Watt to Identify Disease-Resistance Genes

    PubMed Central

    He, Bin; Gu, Yinghong; Tao, Xiang; Cheng, Xiaojie; Wei, Changhe; Fu, Jian; Cheng, Zaiquan; Zhang, Yizheng

    2015-01-01

    Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant–pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future. PMID:26690414

  12. A Cluster of Cuticle Protein Genes of Drosophila Melanogaster at 65a: Sequence, Structure and Evolution

    PubMed Central

    Charles, J. P.; Chihara, C.; Nejad, S.; Riddiford, L. M.

    1997-01-01

    A 36-kb genomic DNA segment of the Drosophila melanogaster genome containing 12 clustered cuticle genes has been mapped and partially sequenced. The cluster maps at 65A 5-6 on the left arm of the third chromosome, in agreement with the previously determined location of a putative cluster encompassing the genes for the third instar larval cuticle proteins LCP5, LCP6 and LCP8. This cluster is the largest cuticle gene cluster discovered to date and shows a number of surprising features that explain in part the genetic complexity of the LCP5, LCP6 and LCP8 loci. The genes encoding LCP5 and LCP8 are multiple copy genes and the presence of extensive similarity in their coding regions gives the first evidence for gene conversion in cuticle genes. In addition, five genes in the cluster are intronless. Four of these five have arisen by retroposition. The other genes in the cluster have a single intron located at an unusual location for insect cuticle genes. PMID:9383064

  13. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification.

    PubMed

    Li, Cong-Jun; Li, Robert W; Baldwin, Ransom L; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  14. Transcriptomic Sequencing Reveals a Set of Unique Genes Activated by Butyrate-Induced Histone Modification

    PubMed Central

    Li, Cong-Jun; Li, Robert W.; Baldwin, Ransom L.; Blomberg, Le Ann; Wu, Sitao; Li, Weizhong

    2016-01-01

    Butyrate is a nutritional element with strong epigenetic regulatory activity as a histone deacetylase inhibitor. Based on the analysis of differentially expressed genes in the bovine epithelial cells using RNA sequencing technology, a set of unique genes that are activated only after butyrate treatment were revealed. A complementary bioinformatics analysis of the functional category, pathway, and integrated network, using Ingenuity Pathways Analysis, indicated that these genes activated by butyrate treatment are related to major cellular functions, including cell morphological changes, cell cycle arrest, and apoptosis. Our results offered insight into the butyrate-induced transcriptomic changes and will accelerate our discerning of the molecular fundamentals of epigenomic regulation. PMID:26819550

  15. Common nucleotide sequence of structural gene encoding fibroblast growth factor 4 in eight cattle derived from three breeds.

    PubMed

    Sato, Sho; Takahashi, Toshikiyo; Nishinomiya, Hiroshi; Katoh, Makiko; Itoh, Ryu; Yokoo, Masaki; Yokoo, Mari; Iha, Momoe; Mori, Yuki; Kasuga, Kano; Kojima, Ikuo; Kobayashi, Masayuki

    2012-03-01

    Fibroblast growth factor 4 (FGF4) is considered as a crucial gene for the proper development of bovine embryos. However, the complete nucleotide sequences of the structural genes encoding FGF4 in identified breeds are still unknown. In the present study, direct sequencing of PCR products derived from genomic DNA samples obtained from three Japanese Black, two Japanese Shorthorn and three Holstein cattle, revealed that the nucleotide sequences of the structural gene encoding FGF4 matched completely among these eight cattle. On the other hand, differences in the nucleotide sequences, leading to substitutions, insertions or deletions of amino acid residues were detected when compared with the already reported sequence from unidentified breeds. We cannot rule out a possibility that the structural gene elucidated in the present study is widely distributed in cattle. To the best of our knowledge, this is the first determination of the complete nucleotide sequence of the structural gene encoding bovine FGF4 in identified breeds. PMID:22435631

  16. LGMD phenotype due to a new gene and dysferlinopathy investigated by next-generation sequencing.

    PubMed

    Angelini, Corrado I

    2015-12-01

    In this issue of Neurology® Genetics, Endo et al.(1) report 3 cases of limb-girdle muscular dystrophy (LGMD) phenotype with mental retardation or hyperCKemia found by next-generation sequencing (NGS) to have a variant in the POMGNT2 gene, which has so far been recognized only as causing congenital muscular dystrophy (CMD). PMID:27066575

  17. Next-generation sequencing of 28 ALS-related genes in a Japanese ALS cohort.

    PubMed

    Nakamura, Ryoichi; Sone, Jun; Atsuta, Naoki; Tohnai, Genki; Watanabe, Hazuki; Yokoi, Daichi; Nakatochi, Masahiro; Watanabe, Hirohisa; Ito, Mizuki; Senda, Jo; Katsuno, Masahisa; Tanaka, Fumiaki; Li, Yuanzhe; Izumi, Yuishin; Morita, Mitsuya; Taniguchi, Akira; Kano, Osamu; Oda, Masaya; Kuwabara, Satoshi; Abe, Koji; Aiba, Ikuko; Okamoto, Koichi; Mizoguchi, Kouichi; Hasegawa, Kazuko; Aoki, Masashi; Hattori, Nobutaka; Tsuji, Shoji; Nakashima, Kenji; Kaji, Ryuji; Sobue, Gen

    2016-03-01

    We investigated the frequency and contribution of variants of the 28 known amyotrophic lateral sclerosis (ALS)-related genes in Japanese ALS patients. We designed a multiplex, polymerase chain reaction-based primer panel to amplify the coding regions of the 28 ALS-related genes and sequenced DNA samples from 257 Japanese ALS patients using an Ion Torrent PGM sequencer. We also performed exome sequencing and identified variants of the 28 genes in an additional 251 ALS patients using an Illumina HiSeq 2000 platform. We identified the known ALS pathogenic variants and predicted the functional properties of novel nonsynonymous variants in silico. These variants were confirmed by Sanger sequencing. Known pathogenic variants were identified in 19 (48.7%) of the 39 familial ALS patients and 14 (3.0%) of the 469 sporadic ALS patients. Thirty-two sporadic ALS patients (6.8%) harbored 1 or 2 novel nonsynonymous variants of ALS-related genes that might be deleterious. This study reports the first extensive genetic screening of Japanese ALS patients. These findings are useful for developing genetic screening and counseling strategies for such patients. PMID:26742954

  18. Phylogenetic analysis of Rutaceous plants based on single nucleotide polymorphism in chloroplast and nuclear gene sequences

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...

  19. Molecular cloning, sequence characteristics, and tissue expression analysis of ECE1 gene in Tibetan pig.

    PubMed

    Wang, Yan-Dong; Zhang, Jian; Li, Chuan-Hao; Xu, Hai-Peng; Chen, Wei; Zeng, Yong-Qing; Wang, Hui

    2015-10-25

    Low air pressure and low oxygen partial pressure at high altitude seriously affect the survival and development of human beings and animals. ECE1 is a recently discovered gene that is involved in anti-hypoxia, but the full-length cDNA sequence has not been obtained. For a better understanding of the structure and function of the ECE1 gene and to study its effect in Tibetan pig, the cDNA of the ECE1 gene from the muscle of Tibetan pig was cloned, sequenced and characterized. The ECE1 full-length cDNA sequence consists of 2262 bp coding sequence (CDS) that encodes 753 amino acids with a molecular mass of 85,449 kD, 2 bp 5'UTR and 1507 bp 3'UTR. In addition, the phylogenetic tree analysis revealed that the Tibetan pig ECE1 has a closer genetic relationship and evolution distance with the land mammals ECE1. Furthermore, analysis by qPCR showed that the ECE1 transcript is constitutively expressed in the 10 tissues tested: the liver, subcutaneous fat, kidney, muscle, stomach, heart, brain, spleen, pancreas, and lung. These results serve as a foundation for further insight into the Tibetan pig ECE1 gene. PMID:26115769

  20. Genomic structure and complete nucleotide sequence of the Batten disease gene, CLN3

    SciTech Connect

    Mitchison, H.M.; Munroe, P.B.; O`Rawe, A.M.

    1997-03-01

    We recently cloned a cDNA for CLN3, the gene for juvenile-onset neuronal ceroid lipofuscinosis or Batten disease. To resolve the genomic organization we used a cosmid clone containing CLN3 to sequence the entire gene in addition to 1.1 kb 5{prime} of the start of the published CLN3 cDNA and 0.3 kb 3{prime} to the polyadenylation site. CLN3 is organized into at least 15 exons spanning 15 kb and ranging from 47 to 356 bp. The 14 introns vary from 80 to 4227 bp, and all exon/intron junction sequences conform to the GTAG rule. Numerous repetitive Alu elements are present within the introns and 5{prime}- and 3{prime}-untranslated regions. The 5{prime} region of the CLN3 gene contains several potential transcription regulatory elements but no consensus TATA-1 box was identified. CLN3 is homologous to 27 deposited human ESTs, and sequence comparisons suggest alternative splicing of the gene and the existence of transcribed sequences upstream to the start of the published CLN3 cDNA. 19 refs., 2 figs., 1 tab.

  1. A sequencing strategy for identifying variation throughout the prion gene of BSE-affected cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cattle prion gene (PRNP) polymorphisms have been associated with bovine spongiform encephalopathy (BSE) susceptibility. We developed a method for sequencing bovine PRNP through all exons, introns and part of the promoter (25.2 kb) that accounts for known variation. The method can be used to detect...

  2. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  3. Gene ontology based characterization of expressed sequence tags (ESTs) of Brassica rapa cv. Osome.

    PubMed

    Arasan, Senthil Kumar Thamil; Park, Jong-In; Ahmed, Nasar Uddin; Jung, Hee-Jeong; Lee, In-Ho; Cho, Yong-Gu; Lim, Yong-Pyo; Kang, Kwon-Kyoo; Nou, Ill-Sup

    2013-07-01

    Chinese cabbage (Brassica rapa) is widely recognized for its economic importance and contribution to human nutrition but abiotic and biotic stresses are main obstacle for its quality, nutritional status and production. In this study, 3,429 Express Sequence Tag (EST) sequences were generated from B. rapa cv. Osome cDNA library and the unique transcripts were classified functionally using a gene ontology (GO) hierarchy, Kyoto encyclopedia of genes and genomes (KEGG). KEGG orthology and the structural domain data were obtained from the biological database for stress related genes (SRG). EST datasets provided a wide outlook of functional characterization of B. rapa cv. Osome. In silico analysis revealed % 83 of ESTs to be well annotated towards reeds one dimensional concept. Clustering of ESTs returned 333 contigs and 2,446 singlets, giving a total of 3,284 putative unigene sequences. This dataset contained 1,017 EST sequences functionally annotated to stress responses and from which expression of randomly selected SRGs were analyzed against cold, salt, drought, ABA, water and PEG stresses. Most of the SRGs showed differentially expression against these stresses. Thus, the EST dataset is very important for discovering the potential genes related to stress resistance in Chinese cabbage, and can be of useful resources for genetic engineering of Brassica sp. PMID:23898551

  4. Expressed gene sequences from egg and larval stages of the horn fly, Haematobia irritans

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We used an EST approach to initiate a study of the genome of the horn fly, Haematobia irritans and have isolated and sequenced 8,427 and 8,275 expressed genes from the egg and first instar larvae lifestages. Normalized cDNA libraries from eggs and first instar larvae from a field population of horn ...

  5. Molecular cloning of extensive sequences of the in vitro synthesized chicken ovalbumin structural gene.

    PubMed Central

    Humphries, P; Cochet, M; Krust, A; Gerlinger, P; Kourilsky, P; Chambon, P

    1977-01-01

    Double-stranded DNA molecules complementary to ovalbumin chicken messenger RNA were synthesized in vitro and integrated into the E. coli plasmid pCR1 using an oligodG-dc tailing procedure. The resultant hybrid plasmids, amplified by transfection of E. coli, were shown by hybridization and gel electrophoresis to contain extensive DNA sequences of the ovalbumin structural gene. Images PMID:333389

  6. Tsukamurella pulmonis Bloodstream Infection Identified by secA1 Gene Sequencing

    PubMed Central

    Cano, María E.; García de la Fuente, Celia; Martínez-Martínez, Luis; López, Mónica; Fernández-Mazarrasa, Carlos

    2014-01-01

    Recurrent bloodstream infections caused by a Gram-positive bacterium affected an immunocompromised child. Tsukamurella pulmonis was the microorganism identified by secA1 gene sequencing. Antibiotic treatment in combination with removal of the subcutaneous port healed the patient. PMID:25520439

  7. Transcriptomic sequencing reveals a set of unique genes activated by butyrate-induced histone modification

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...

  8. Insert sequence length determines transfection efficiency and gene expression levels in bicistronic mammalian expression vectors

    PubMed Central

    Payne, Andrew J; Gerdes, Bryan C; Kaja, Simon; Koulen, Peter

    2013-01-01

    Bicistronic expression vectors have been widely used for co-expression studies since the initial discovery of the internal ribosome entry site (IRES) about 25 years ago. IRES sequences allow the 5’ cap-independent initiation of translation of multiple genes on a single messenger RNA strand. Using a commercially available mammalian expression vector containing an IRES sequence with a 3’ green fluorescent protein fluorescent marker, we found that sequence length of the gene of interest expressed 5’ of the IRES site influences both expression of the 3’ fluorescent marker and overall transfection efficiency of the vector construct. Furthermore, we generated a novel construct expressing two distinct fluorescent markers and found that high expression of one gene can lower expression of the other. Observations from this study indicate that caution is warranted in the design of experiments utilizing an IRES system with a short 5’ gene of interest sequence (<300 bp), selection of single cells based on the expression profile of the 3’ optogenetic fluorescent marker, and assumptions made during data analysis. PMID:24380024

  9. Sub-genomic level sequence analysis of the aquaporin multi-gene family in cotton

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aquaporins function mainly as water transport channel proteins that facilitate water movement across intracellular and intercellular membranes in most living organisms. Plant aquaporins belong to a multi-gene family and are commonly categorized into 5 subfamilies according to sequence similarity. Re...

  10. A multi gene sequence-based phylogeny of the Musaceae (banana) family

    PubMed Central

    2011-01-01

    Background The classification of the Musaceae (banana) family species and their phylogenetic inter-relationships remain controversial, in part due to limited nucleotide information to complement the morphological and physiological characters. In this work the evolutionary relationships within the Musaceae family were studied using 13 species and DNA sequences obtained from a set of 19 unlinked nuclear genes. Results The 19 gene sequences represented a sample of ~16 kb of genome sequence (~73% intronic). The sequence data were also used to obtain estimates for the divergence times of the Musaceae genera and Musa sections. Nucleotide variation within the sample confirmed the close relationship of Australimusa and Callimusa sections and showed that Eumusa and Rhodochlamys sections are not reciprocally monophyletic, which supports the previous claims for the merger between the two latter sections. Divergence time analysis supported the previous dating of the Musaceae crown age to the Cretaceous/Tertiary boundary (~ 69 Mya), and the evolution of Musa to ~50 Mya. The first estimates for the divergence times of the four Musa sections were also obtained. Conclusions The gene sequence-based phylogeny presented here provides a substantial insight into the course of speciation within the Musaceae. An understanding of the main phylogenetic relationships between banana species will help to fine-tune the taxonomy of Musaceae. PMID:21496296

  11. Distribution of Genes and Repetitive Elements in the Diabrotica virgifera virgifera Genome Estimated Using BAC Sequencing

    PubMed Central

    Coates, Brad S.; Alves, Analiza P.; Wang, Haichuan; Walden, Kimberly K. O.; French, B. Wade; Miller, Nicholas J.; Abel, Craig A.; Robertson, Hugh M.; Sappington, Thomas W.; Siegfried, Blair D.

    2012-01-01

    Feeding damage caused by the western corn rootworm, Diabrotica virgifera virgifera, is destructive to corn plants in North America and Europe where control remains challenging due to evolution of resistance to chemical and transgenic toxins. A BAC library, DvvBAC1, containing 109,486 clones with 104 ± 34.5 kb inserts was created, which has an ~4.56X genome coverage based upon a 2.58 Gb (2.80 pg) flow cytometry-estimated haploid genome size. Paired end sequencing of 1037 BAC inserts produced 1.17 Mb of data (~0.05% genome coverage) and indicated ~9.4 and 16.0% of reads encode, respectively, endogenous genes and transposable elements (TEs). Sequencing genes within BAC full inserts demonstrated that TE densities are high within intergenic and intron regions and contribute to the increased gene size. Comparison of homologous genome regions cloned within different BAC clones indicated that TE movement may cause haplotype variation within the inbred strain. The data presented here indicate that the D. virgifera virgifera genome is large in size and contains a high proportion of repetitive sequence. These BAC sequencing methods that are applicable for characterization of genomes prior to sequencing may likely be valuable resources for genome annotation as well as scaffolding. PMID:22919272

  12. Distribution of genes and repetitive elements in the Diabrotica virgifera virgifera genome estimated using BAC sequencing.

    PubMed

    Coates, Brad S; Alves, Analiza P; Wang, Haichuan; Walden, Kimberly K O; French, B Wade; Miller, Nicholas J; Abel, Craig A; Robertson, Hugh M; Sappington, Thomas W; Siegfried, Blair D

    2012-01-01

    Feeding damage caused by the western corn rootworm, Diabrotica virgifera virgifera, is destructive to corn plants in North America and Europe where control remains challenging due to evolution of resistance to chemical and transgenic toxins. A BAC library, DvvBAC1, containing 109,486 clones with 104 ± 34.5 kb inserts was created, which has an ~4.56X genome coverage based upon a 2.58 Gb (2.80 pg) flow cytometry-estimated haploid genome size. Paired end sequencing of 1037 BAC inserts produced 1.17 Mb of data (~0.05% genome coverage) and indicated ~9.4 and 16.0% of reads encode, respectively, endogenous genes and transposable elements (TEs). Sequencing genes within BAC full inserts demonstrated that TE densities are high within intergenic and intron regions and contribute to the increased gene size. Comparison of homologous genome regions cloned within different BAC clones indicated that TE movement may cause haplotype variation within the inbred strain. The data presented here indicate that the D. virgifera virgifera genome is large in size and contains a high proportion of repetitive sequence. These BAC sequencing methods that are applicable for characterization of genomes prior to sequencing may likely be valuable resources for genome annotation as well as scaffolding. PMID:22919272

  13. Cloning and nucleotide sequence of the Salmonella typhimurium dcp gene encoding dipeptidyl carboxypeptidase.

    PubMed Central

    Hamilton, S; Miller, C G

    1992-01-01

    Plasmids carrying the Salmonella typhimurium dcp gene were isolated from a pBR328 library of Salmonella chromosomal DNA by screening for complementation of a peptide utilization defect conferred by a dcp mutation. Strains carrying these plasmids overproduced dipeptidyl carboxypeptidase approximately 50-fold. The nucleotide sequence of a 2.8-kb region of one of these plasmids contained an open reading frame coding for a protein of 77,269 Da, in agreement with the 80-kDa size for dipeptidyl carboxypeptidase (determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis and gel filtration). The N-terminal amino acid sequence of dipeptidyl carboxypeptidase purified from an overproducer strain agreed with that predicted by the nucleotide sequence. Northern (RNA) blot data indicated that dcp is not cotranscribed with other genes, and primer extension analysis showed the start of transcription to be 22 bases upstream of the translational start. The amino acid sequence of dcp was not similar to that of a mammalian dipeptidyl carboxypeptidase, angiotensin I-converting enzyme, but showed striking similarities to the amino acid sequence of another S. typhimurium peptidase encoded by the opdA (formerly optA) gene. Images PMID:1537804

  14. Genome-wide discovery of cis-elements in promoter sequences using gene expression.

    PubMed

    Troukhan, Maxim; Tatarinova, Tatiana; Bouck, John; Flavell, Richard B; Alexandrov, Nickolai N

    2009-04-01

    The availability of complete or nearly complete genome sequences, a large number of 5' expressed sequence tags, and significant public expression data allow for a more accurate identification of cis-elements regulating gene expression. We have implemented a global approach that takes advantage of available expression data, genomic sequences, and transcript information to predict cis-elements associated with specific expression patterns. The key components of our approach are: (1) precise identification of transcription start sites, (2) specific locations of cis-elements relative to the transcription start site, and (3) assessment of statistical significance for all sequence motifs. By applying our method to promoters of Arabidopsis thaliana and Mus musculus, we have identified motifs that affect gene expression under specific environmental conditions or in certain tissues. We also found that the presence of the TATA box is associated with increased variability of gene expression. Strong correlation between our results and experimentally determined motifs shows that the method is capable of predicting new functionally important cis-elements in promoter sequences. PMID:19231992

  15. Glyceraldehyde-3-phosphate dehydrogenase gene from Zymomonas mobilis: cloning, sequencing, and identification of promoter region

    SciTech Connect

    Conway, T.; Sewell, G.W.; Ingram, L.O.

    1987-12-01

    The gene encoding glyceraldehyde-3-phosphate dehydrogenase was isolated from a library of Zymomonas mobilis DNA fragments by complementing a deficient strain of Escherichia coli. It contained tandem promoters which were recognized by E. coli but appeared to function less efficiently than the enteric lac promoter in E. coli. The open reading frame for this gene encoded 337 amino acids with an aggregate molecular weight of 36,099 (including the N-terminal methionine). The primary amino acid sequence for this gene had considerable functional homology and amino acid identity with other eukaryotic and bacterial genes. Based on this comparison, the gap gene from Z. mobilis appeared to be most closely related to that of the thermophilic bacteria and to the chloroplast isozymes. Comparison of this gene with other glycolytic enzymes from Z. mobilis revealed a conserved pattern of codon bias and several common features of gene structure. A tentative transcriptional consensus sequence is proposed for Z. mobilis based on comparison of the five known promoters for three glycolytic enzymes.

  16. Sequencing and complementation analysis of the nifUSV genes from Azospirillum brasilense.

    PubMed

    Frazzon, J; Schrank, I S

    1998-02-15

    The functionality of nitrogenase in diazotrophic bacteria is dependent upon nif genes other than the structural nifH, D, and K genes which encode the enzyme subunit proteins. Such genes are involved in the activation of nif gene expression, maturation of subunit proteins, cofactor biosynthesis, and electron transport. In this work, approximately 5500 base pairs located within the major nif gene cluster of Azospirillum brasilense Sp7 have been sequenced. The deduced open reading frames were compared to the nif gene products of Azotobacter vinelandii and other diazotrophs. This analysis indicates the presence of five ORFs encoding ORF2, nifU, nifS, nifV, and ORF4 in the same sequential organization as found in other organisms. Consensus sigma 54 and NifA binding sites are present in the putative promoter region upstream of ORF2 in the A. brasilense sequence. The nifV gene of A. brasilense but not nifU or nifS complemented corresponding mutants strains of A. vinelandii. PMID:9503607

  17. Cloning, nucleotide sequence, and expression of the Escherichia coli gene encoding carnitine dehydratase.

    PubMed Central

    Eichler, K; Schunck, W H; Kleber, H P; Mandrand-Berthelot, M A

    1994-01-01

    Carnitine dehydratase from Escherichia coli O44 K74 is an inducible enzyme detectable in cells grown anaerobically in the presence of L-(-)-carnitine or crotonobetaine. The purified enzyme catalyzes the dehydration of L-(-)-carnitine to crotonobetaine (H. Jung, K. Jung, and H.-P. Kleber, Biochim. Biophys. Acta 1003:270-276, 1989). The caiB gene, encoding carnitine dehydratase, was isolated by oligonucleotide screening from a genomic library of E. coli O44 K74. The caiB gene is 1,215 bp long, and it encodes a protein of 405 amino acids with a predicted M(r) of 45,074. The identity of the gene product was first assessed by its comigration in sodium dodecyl sulfate-polyacrylamide gels with the purified enzyme after overexpression in the pT7 system and by its enzymatic activity. Moreover, the N-terminal amino acid sequence of the purified protein was found to be identical to that predicted from the gene sequence. Northern (RNA) analysis showed that caiB is likely to be cotranscribed with at least one other gene. This other gene could be the gene encoding a 47-kDa protein, which was overexpressed upstream of caiB. Images PMID:8188598

  18. Genome-wide analysis of immune system genes by expressed sequence Tag profiling.

    PubMed

    Giallourakis, Cosmas C; Benita, Yair; Molinie, Benoit; Cao, Zhifang; Despo, Orion; Pratt, Henry E; Zukerberg, Lawrence R; Daly, Mark J; Rioux, John D; Xavier, Ramnik J

    2013-06-01

    Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated EST road map of gene expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes. PMID:23616578

  19. SFM: A novel sequence-based fusion method for disease genes identification and prioritization.

    PubMed

    Yousef, Abdulaziz; Moghadam Charkari, Nasrollah

    2015-10-21

    The identification of disease genes from human genome is of great importance to improve diagnosis and treatment of disease. Several machine learning methods have been introduced to identify disease genes. However, these methods mostly differ in the prior knowledge used to construct the feature vector for each instance (gene), the ways of selecting negative data (non-disease genes) where there is no investigational approach to find them and the classification methods used to make the final decision. In this work, a novel Sequence-based fusion method (SFM) is proposed to identify disease genes. In this regard, unlike existing methods, instead of using a noisy and incomplete prior-knowledge, the amino acid sequence of the proteins which is universal data has been carried out to present the genes (proteins) into four different feature vectors. To select more likely negative data from candidate genes, the intersection set of four negative sets which are generated using distance approach is considered. Then, Decision Tree (C4.5) has been applied as a fusion method to combine the results of four independent state-of the-art predictors based on support vector machine (SVM) algorithm, and to make the final decision. The experimental results of the proposed method have been evaluated by some standard measures. The results indicate the precision, recall and F-measure of 82.6%, 85.6% and 84, respectively. These results confirm the efficiency and validity of the proposed method. PMID:26209022

  20. Evaluation and update of cutoff values for methanotrophic pmoA gene sequences.

    PubMed

    Wen, Xi; Yang, Sizhong; Liebner, Susanne

    2016-09-01

    The functional pmoA gene is frequently used to probe the diversity and phylogeny of methane-oxidizing bacteria (MOB) in various environments. Here, we compared the similarities between the pmoA gene and the corresponding 16S rRNA gene sequences of 77 described species covering gamma- and alphaproteobacterial methanotrophs (type I and type II MOB, respectively) as well as methanotrophs from the phylum Verrucomicrobia. We updated and established the weighted mean pmoA gene cutoff values on the nucleotide level at 86, 82, and 71 % corresponding to the 97, 95, and 90 % similarity of the 16S rRNA gene. Based on these cutoffs, the functional gene fragments can be entirely processed at the nucleotide level throughout software platforms such as Mothur or QIIME which provide a user-friendly and command-based alternative to amino acid-based pipelines. Type II methanotrophs are less divergent than type I both with regard to ribosomal and functional gene sequence similarity and GC content. We suggest that this agrees with the theory of different life strategies proposed for type I and type II MOB. PMID:27098810

  1. The Unique hmuY Gene Sequence as a Specific Marker of Porphyromonas gingivalis

    PubMed Central

    Mackiewicz, Paweł; Radwan-Oczko, Małgorzata; Kantorowicz, Małgorzata; Chomyszyn-Gajewska, Maria; Frąszczak, Magdalena; Bielecki, Marcin; Olczak, Mariusz; Olczak, Teresa

    2013-01-01

    Porphyromonas gingivalis, a major etiological agent of chronic periodontitis, acquires heme from host hemoproteins using the HmuY hemophore. The aim of this study was to develop a specific P. gingivalis marker based on a hmuY gene sequence. Subgingival samples were collected from 66 patients with chronic periodontitis and 40 healthy subjects and the entire hmuY gene was analyzed in positive samples. Phylogenetic analyses demonstrated that both the amino acid sequence of the HmuY protein and the nucleotide sequence of the hmuY gene are unique among P. gingivalis strains/isolates and show low identity to sequences found in other species (below 50 and 56%, respectively). In agreement with these findings, a set of hmuY gene-based primers and standard/real-time PCR with SYBR Green chemistry allowed us to specifically detect P. gingivalis in patients with chronic periodontitis (77.3%) and healthy subjects (20%), the latter possessing lower number of P. gingivalis cells and total bacterial cells. Isolates from healthy subjects possess the hmuY gene-based nucleotide sequence pattern occurring in W83/W50/A7436 (n = 4), 381/ATCC 33277 (n = 3) or TDC60 (n = 1) strains, whereas those from patients typically have TDC60 (n = 21), W83/W50/A7436 (n = 17) and 381/ATCC 33277 (n = 13) strains. We observed a significant correlation between periodontal index of risk of infectiousness (PIRI) and the presence/absence of P. gingivalis (regardless of the hmuY gene-based sequence pattern of the isolate identified [r = 0.43; P = 0.0002] and considering particular isolate pattern [r = 0.38; P = 0.0012]). In conclusion, we demonstrated that the hmuY gene sequence or its fragments may be used as one of the molecular markers of P. gingivalis. PMID:23844074

  2. Resolution of the African hominoid trichotomy by use of a mitochondrial gene sequence

    SciTech Connect

    Ruvolo, M.; Disotell, T.R.; Allard, M.W. ); Brown, W.M. ); Honeycutt, R.L. )

    1991-02-15

    Mitochondrial DNA sequences encoding the cytochrome oxidase subunit II gene have been determined for five primate species, siamang (Hylobates syndactylus), lowland gorilla (Gorilla gorilla), pygmy chimpanzee (Pan paniscus), crab-eating macaque (Macaca fascicularis), and green monkey (Cercopithecus aethiops), and compared with published sequences of other primate and nonprimate species. Comparisons of cytochrome oxidase subunit II gene sequences provide clear-cut evidence from the mitochondrial genome for the separation of the African ape trichotomy into two evolutionary lineages, one leading to gorillas and the other to humans and chimpanzees. Several different tree-building methods support this same phylogenetic tree topology. The comparisons also yield trees in which a substantial length separates the divergence point of gorillas from that of humans and chimpanzees, suggesting that the lineage most immediately ancestral to humans and chimpanzees may have been in existence for a relatively long time.

  3. [A molecular phylogeny of Shennongjia white bear based on mitochondrial cytochrome b gene sequence].

    PubMed

    Wang, Hui-Juan; Zhang, Zhi-Min; Liu, Zhong-Lai; Xiong, Guo-Mei

    2006-10-01

    The phylogenetic relationship of Shennongjia white bear has been an open question. Total DNA was extracted and sequenced from hair and feces of Shennongjia white bear. Based on the partial Cyt b gene sequence obtained from the samples, the authors aligned them using the Clustal W software program. The MEGA software was used to analyze the divergences and base substitutions of the partial Cyt b gene among the 11 species: Shennongjia white bear, Selenarctos thibetanus, Euarctos americanus, Helarctos malayanus, Ursus arctos, Thalarctos maritimus, Melursus ursinus, Procyon lotor, Ailuropoda melanoleuca, Ailurus fulgens and Tremarctos ornatus. The phylogenetic trees constructed by multiple methods (NJ and MP) supported nearly the same topology. Our molecular results show that the sequence divergence between Shennongjia white bear and Asiatic black bear (Selenarctos thibetanus) is lower than that between other species. PMID:17035181

  4. Structure and nucleotide sequence of the rat intestinal vitamin D-dependent calcium binding protein gene.

    PubMed Central

    Krisinger, J; Darwish, H; Maeda, N; DeLuca, H F

    1988-01-01

    The vitamin D-dependent intestinal calcium binding protein (ICaBP, 9 kDa) is under transcriptional regulation by 1,25-dihydroxyvitamin D3 [1,25-(OH)2D3], the hormonal active form of the vitamin. To study the mechanism of gene regulation by 1,25-(OH)2D3, we isolated the rat ICaBP gene by using a cDNA probe. Its nucleotide sequence revealed 3 exons separated by 2 introns within approximately 3 kilobases. The first exon represents only noncoding sequences, while the second and third encode the two calcium binding domains of the protein. The gene contains a 15-base-pair imperfect palindrome in the first intron that shows high homology to the estrogen-responsive element. This sequence may represent the vitamin D-responsive element involved in the regulation of the ICaBP gene. The second intron shows an 84-base-pair-long simple nucleotide repeat that implicates Z-DNA formation. Genomic Southern analysis shows that the rat gene is represented as a single copy. Images PMID:3194402

  5. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes

    PubMed Central

    Soh, Y.Q. Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G.; Graves, Tina; Minx, Patrick J.; Fulton, Robert S.; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L.; Rozen, Steve; Hughes, Jennifer F.; Owens, Elaine; Womack, James E.; Murphy, William J.; Cao, Qing; de Jong, Pieter; Warren, Wesley C.; Wilson, Richard K.; Skaletsky, Helen; Page, David C.

    2014-01-01

    Summary We sequenced the MSY (Male-Specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only two percent of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 50 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs, but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism. PMID:25417157

  6. Mutation Spectrum of Six Genes in Chinese Phenylketonuria Patients Obtained through Next-Generation Sequencing

    PubMed Central

    Cen, Zhong; Yu, Li; Lin, Lin; Hao, Jing; Yang, Zhigang; Peng, Jiabao; Cui, Shujian; Huang, Jian

    2014-01-01

    Background The identification of gene variants plays an important role in the diagnosis of genetic diseases. Methodology/Principal Findings To develop a rapid method for the diagnosis of phenylketonuria (PKU) and tetrahydrobiopterin (BH4) deficiency, we designed a multiplex, PCR-based primer panel to amplify all the exons and flanking regions (50 bp average) of six PKU-associated genes (PAH, PTS, GCH1, QDPR, PCBD1 and GFRP). The Ion Torrent Personal Genome Machine (PGM) System was used to detect mutations in all the exons of these six genes. We tested 93 DNA samples from blood specimens from 35 patients and their parents (32 families) and 26 healthy adults. Using strict bioinformatic criteria, this sequencing data provided, on average, 99.14% coverage of the 39 exons at more than 70-fold mean depth of coverage. We found 23 previously documented variants in the PAH gene and six novel mutations in the PAH and PTS genes. A detailed analysis of the mutation spectrum of these patients is described in this study. Conclusions/Significance These results were confirmed by Sanger sequencing. In conclusion, benchtop next-generation sequencing technology can be used to detect mutations in monogenic diseases and can detect both point mutations and indels with high sensitivity, fidelity and throughput at a lower cost than conventional methods in clinical applications. PMID:24705691

  7. SeqGene: a comprehensive software solution for mining exome- and transcriptome- sequencing data

    PubMed Central

    2011-01-01

    Background The popularity of massively parallel exome and transcriptome sequencing projects demands new data mining tools with a comprehensive set of features to support a wide range of analysis tasks. Results SeqGene, a new data mining tool, supports mutation detection and annotation, dbSNP and 1000 Genome data integration, RNA-Seq expression quantification, mutation and coverage visualization, allele specific expression (ASE), differentially expressed genes (DEGs) identification, copy number variation (CNV) analysis, and gene expression quantitative trait loci (eQTLs) detection. We also developed novel methods for testing the association between SNP and expression and identifying genotype-controlled DEGs. We showed that the results generated from SeqGene compares favourably to other existing methods in our case studies. Conclusion SeqGene is designed as a general-purpose software package. It supports both paired-end reads and single reads generated on most sequencing platforms; it runs on all major types of computers; it supports arbitrary genome assemblies for arbitrary organisms; and it scales well to support both large and small scale sequencing projects. The software homepage is http://seqgene.sourceforge.net. PMID:21714929

  8. A transcriptional regulatory element in the coding sequence of the human Bcl-2 gene

    PubMed Central

    Lang, Georgina; Gombert, Wendy M; Gould, Hannah J

    2005-01-01

    We investigated the protein-binding sites in a DNAse I hypersensitive site associated with bcl-2 gene expression in human B cells. We mapped this hypersensitive site to the coding sequence of exon 2 of the bcl-2 gene in the bcl-2-expressing REH B-cell line. Electrophoretic mobility shift assays (EMSAs) with extracts from REH cells revealed three previously unrecognized B-Myb-binding sites in this sequence. The protein was identified as B-Myb by using a specific antibody and EMSAs. Accordingly, the levels of B-Myb and bcl-2 proteins, and of Myb EMSA activity, were correlated over a wide range of cell lines, representing different stages of B-cell development. Transfection of REH cells with antisense B-myb down-regulated EMSA activity and the level of bcl-2, and led to the apoptosis of REH cells. Transfection of the bcl-2-non-expressing RPMI 8226 cell line with a B-Myb expression vector induced B-Myb EMSA activity and the expression of bcl-2. Reporter assays indicated that the HSS8 sequence containing the three B-Myb sites may act as an enhancer when it is linked to the bcl-2 gene promoter. Interaction of B-Myb with HSS8 may enhance bcl-2 gene expression by co-operating with positive regulatory elements (e.g. previously identified B-Myb response elements) or silencing negative response elements in the bcl-2 gene promoter. PMID:15606792

  9. Two Lamprey Hedgehog Genes Share Non-Coding Regulatory Sequences and Expression Patterns with Gnathostome Hedgehogs

    PubMed Central

    Ekker, Marc; Hadzhiev, Yavor; Müller, Ferenc; Casane, Didier; Magdelenat, Ghislaine; Rétaux, Sylvie

    2010-01-01

    Hedgehog (Hh) genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE) with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional) changes in the intronic/regulatory sequences. PMID:20967201

  10. Sequencing the mouse Y chromosome reveals convergent gene acquisition and amplification on both sex chromosomes.

    PubMed

    Soh, Y Q Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G; Graves, Tina; Minx, Patrick J; Fulton, Robert S; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L; Rozen, Steve; Hughes, Jennifer F; Owens, Elaine; Womack, James E; Murphy, William J; Cao, Qing; de Jong, Pieter; Warren, Wesley C; Wilson, Richard K; Skaletsky, Helen; Page, David C

    2014-11-01

    We sequenced the MSY (male-specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only 2% of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 45 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism. PMID:25417157

  11. Comparative gene expression in the symbiotic and aposymbiotic Aiptasia pulchella by expressed sequence tag analysis.

    PubMed

    Kuo, Jimmy; Chen, Ming-Chyuan; Lin, Chorng-Horng; Fang, Lee-Shing

    2004-05-21

    Intracellular symbiotic relationships are prevalent between cnidarians, such as corals and sea anemones, and the photosynthetic dinoflagellate symbionts. However, there is little understanding about how the genes express when the symbiotic relationship is set up. To characterize genes involved in this association, the endosymbiosis between sea anemone, Aiptasia pulchella, and dinoflagellate zooxanthellae, Symbiodinium spp., was employed as a model. Two complementary DNA (cDNA) libraries were constructed from RNA isolated from symbiotic and aposymbiotic A. pulchella. Using single-pass sequencing of cDNA clones, a total of 870 expressed sequence tags (ESTs) clones were generated from the two libraries: 474 from symbiotic animal and 396 from aposymbiotic animal. The initial ESTs consisted of 143 clusters and 231 singletons. A BLASTX search revealed that 147 unique genes had similarities with protein sequences available from databases; 120 of these clones were categorized according to their putative function. However, many ESTs could not assign functionally. The putative roles of some of the identified genes relative to endosymbiosis were discussed. This is the first report of the use of EST analysis to examine the gene expression in symbiotic and aposymbiotic states of the cnidarians. The systematic analysis of EST from this study provides a useful database for future investigations of the molecular mechanisms involved in algal-cnidarian symbiosis. PMID:15110770

  12. Sequence analysis of two genomic regions containing the KIT and the FMS receptor tyrosine kinase genes

    SciTech Connect

    Andre, C.; Hampe, A.; Lachaume, P.

    1997-01-15

    The KIT and FMS tyrosine kinase receptors, which are implicated in the control of cell growth and differentiation, stem through duplications from a common ancestor. We have conducted a detailed structural analysis of the two loci containing the KIT and FMS genes. The sequence of the {approximately}90-kb KIT locus reveals the position and size of the 21 introns and of the 5{prime} regulatory region of the KIT gene. The introns and the 3{prime}-untranslated parts of KIT and FMS have been analyzed in parallel. Comparison of the two sequences shows that, while introns of both genes have extensively diverged in size and sequence, this divergence is, at least in part, due to intron expansion through internal duplications, as suggested by the discrete extant analogies. Repetitive elements as well as exon predictions obtained using the GRAIL and GENEFINDER programs are described in detail. These programs led us to identify a novel gene, designated SMF, immediately downstream of FMS, in the opposite orientation. This finding emphasizes the gene-rich characteristic of this genomic region. 49 refs., 4 figs., 7 tabs.

  13. A gene-specific DNA sequencing chip for exploring molecular evolutionary change.

    PubMed

    Fedrigo, Olivier; Naylor, Gavin

    2004-01-01

    Sequencing by hybridization (SBH) approaches to DNA sequencing face two conflicting constraints. First, in order to ensure that the target DNA binds reliably, the oligonucleotide probes that are attached to the chip array must be >15 bp in length. Secondly, the total number of possible 15 bp oligonucleotides is too large (>4(15)) to fit on a chip with current technology. To circumvent the conflict between these two opposing constraints, we present a novel gene-specific DNA chip design. Our design is based on the idea that not all conceivable oligonucleotides need to be placed on a chip--only those that capture sequence combinations occurring in nature. Our approach uses a training set of aligned sequences that code for the gene in question. We compute the minimum number of oligonucleotides (generally 15-30 bp in length) that need to be placed on a DNA chip to capture the variation implied by the training set using a graph search algorithm. We tested the approach in silico using cytochrome-b sequences. Results indicate that on average, 98% of the sequence of an unknown target can be determined using the approach. PMID:14973200

  14. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    SciTech Connect

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.

  15. Nucleotide sequence of the gene encoding the two-subunit pilin of Bacteroides nodosus 265.

    PubMed Central

    Elleman, T C; Hoyne, P A; McKern, N M; Stewart, D J

    1986-01-01

    The nucleotide sequence of the gene encoding pilin from Bacteroides nodosus 265 has been determined. The pilin is encoded by a single-copy gene, from which can be predicted a prepilin comprising a single protein chain of Mr 16,637. The prepilin sequence differs in several respects from the mature protein sequence. Seven additional N-terminal amino acid residues are present in prepilin, whereas residue 8, phenylalanine, undergoes posttranslational modification to become the N-methylated amino-terminal residue of mature pilin. In addition, further processing occurs through internal cleavage to produce two noncovalently linked subunits characteristic of pilins from serogroup H of B. nodosus, of which strain 265 is a member. The position of cleavage has been identified between alanine residues at positions 72 and 73 of the mature 149-residue pilin protein. The predicted pilin sequence of B. nodosus 265 shows extensive N-terminal amino acid sequence homology with other pilins of the N-methylphenylalanine type. In addition this sequence also shows homology with these N-methylphenylalanine-type pilins in the C-terminal region of the molecule, especially with pilin from Pseudomonas aeruginosa PAK. Images PMID:2873127

  16. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    PubMed Central

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-01-01

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution. PMID:26005436

  17. Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

    DOE PAGESBeta

    Aziz, Ramy K.; Dwivedi, Bhakti; Akhter, Sajia; Breitbart, Mya; Edwards, Robert A.

    2015-05-08

    Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set ofmore » publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. By adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.« less

  18. Nucleotide sequence and characterization of the gene for secreted alkaline phosphatase from Lysobacter enzymogenes.

    PubMed Central

    Au, S; Roy, K L; von Tigerstrom, R G

    1991-01-01

    Lysobacter enzymogenes produces an alkaline phosphatase which is secreted into the medium. The gene for the enzyme (phoA) was isolated from a recombinant lambda library. It was identified within a 4.4-kb EcoRI-BamH1 fragment, and its sequence was determined by the chain termination method. The structural gene consists of an open reading frame which encodes a 539-amino-acid protein with a 29-residue signal sequence, followed by a 119-residue propeptide, the 281-residue mature phosphatase, and a 110-residue carboxy-terminal domain. The roles of the propeptide and the carboxy-terminal peptide remain to be determined. A molecular weight of 30,000 was determined for the mature enzyme from sodium dodecyl sulfate-polyacrylamide gel electrophoresis. The amino acid sequence was compared with sequences available in the current protein data base, and a region of the sequence was found to show considerable homology with sequences in mammalian type 5 iron-containing purple acid phosphatases. Images PMID:1856159

  19. Mutations analysis of C1 inhibitor coding sequence gene among Portuguese patients with hereditary angioedema.

    PubMed

    Martinho, A; Mendes, J; Simões, O; Nunes, R; Gomes, J; Dias Castro, E; Leiria-Pinto, P; Ferreira, M B; Pereira, C; Castel-Branco, M G; Pais, L

    2013-04-01

    Mutations that modify the amino acid sequence of C1-INH (except Val458Met) are associated with HAE. More than 200 different mutations scattering the entire C1-INH gene have been reported. The main objective of this study was to report the mutational findings in a HAE cohort of 138 Portuguese patients followed in specialized consultation all over the country. DNA was extracted from peripheral blood with QiaSymphony BioRobot (QIAGEN Portugal). The sequence reactions were performed by using a DNA sequencing kit (Big Dye terminator cycle sequencing v1.1/v3.1 from Applied Biosystems) and sequencing products were immediately submitted to direct sequencing on an Applied Biosystem 3130 DNA Analyser. DNA sequences were analyzed at four different stages. Raw data and sequence alignments of all 8 exons and intron-exon boundaries were performed for each patient individually with SeqScape software and using SERPING1 gene NG_009625 of 24,300 bp (12-March-2011) as reference sequence. Sequence comparisons among patients and controls were performed with software CodonCode Aligner v.3.7 from CodonCode Corp and with Geneious 4.5 from Biomatters Lda. A total of 94 point mutations were observed among patients, and 67% of them were located on exon 8. In addition, we noticed one not described stop codon at position c.1459 C>T in three different patients. Translation termination was also found on exon 3 and 7, as a result of mutations at positions c.481A>7, c.1174C>T. In this population, the prevalence of the missense mutation p.Arg444Cys was 39 out of 42. Mutational analysis revealed 22 different pathogenic mutations, of which 64% were not described on HAE database. Although identification of disease causing mutations is not necessary to establish HAE diagnosis, studies on gene expression and characterization of rearrangements in SERPING1 gene are suggested in order to get new insights on function and genetic tests of C1 inhibitor. PMID:23123409

  20. DNA SEQUENCING, ANALYSIS, AND IDENTIFICATION OF SEROGROUP-SPECIFIC GENES IN THE ESCHERICHIA COLI O28AC AND O118 O ANTIGEN GENE CLUSTERS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The DNA sequence of the O antigen gene clusters of Escherichia coli serogroups O28ac and O118 was determined, and 7 and 13 ORFs were identified, respectively, encoding genes required for O antigen sugar biosynthesis, transfer, and processing. Analysis of the DNA sequence revealed that the wzx (O ...

  1. Identification of cancer/testis-antigen genes by massively parallel signature sequencing

    PubMed Central

    Chen, Yao-Tseng; Scanlan, Matthew J.; Venditti, Charis A.; Chua, Ramon; Theiler, Gregory; Stevenson, Brian J.; Iseli, Christian; Gure, Ali O.; Vasicek, Tom; Strausberg, Robert L.; Jongeneel, C. Victor; Old, Lloyd J.; Simpson, Andrew J. G.

    2005-01-01

    Massively parallel signature sequencing (MPSS) generates millions of short sequence tags corresponding to transcripts from a single RNA preparation. Most MPSS tags can be unambiguously assigned to genes, thereby generating a comprehensive expression profile of the tissue of origin. From the comparison of MPSS data from 32 normal human tissues, we identified 1,056 genes that are predominantly expressed in the testis. Further evaluation by using MPSS tags from cancer cell lines and EST data from a wide variety of tumors identified 202 of these genes as candidates for encoding cancer/testis (CT) antigens. Of these genes, the expression in normal tissues was assessed by RT-PCR in a subset of 166 intron-containing genes, and those with confirmed testis-predominant expression were further evaluated for their expression in 21 cancer cell lines. Thus, 20 CT or CT-like genes were identified, with several exhibiting expression in five or more of the cancer cell lines examined. One of these genes is a member of a CT gene family that we designated as CT45. The CT45 family comprises six highly similar (>98% cDNA identity) genes that are clustered in tandem within a 125-kb region on Xq26.3. CT45 was found to be frequently expressed in both cancer cell lines and lung cancer specimens. Thus, MPSS analysis has resulted in a significant extension of our knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family. PMID:15905330

  2. Regulation of SHOOT MERISTEMLESS genes via an upstream-conserved noncoding sequence coordinates leaf development

    PubMed Central

    Uchida, Naoyuki; Townsley, Brad; Chung, Kook-Hyun; Sinha, Neelima

    2007-01-01

    The indeterminate shoot apical meristem of plants is characterized by the expression of the Class 1 KNOTTED1-LIKE HOMEOBOX (KNOX1) genes. KNOX1 genes have been implicated in the acquisition and/or maintenance of meristematic fate. One of the earliest indicators of a switch in fate from indeterminate meristem to determinate leaf primordium is the down-regulation of KNOX1 genes orthologous to SHOOT MERISTEMLESS (STM) in Arabidopsis (hereafter called STM genes) in the initiating primordia. In simple leafed plants, this down-regulation persists during leaf formation. In compound leafed plants, however, KNOX1 gene expression is reestablished later in the developing primordia, creating an indeterminate environment for leaflet formation. Despite this knowledge, most aspects of how STM gene expression is regulated remain largely unknown. Here, we identify two evolutionarily conserved noncoding sequences within the 5′ upstream region of STM genes in both simple and compound leafed species across monocots and dicots. We show that one of these elements is involved in the regulation of the persistent repression and/or the reestablishment of STM expression in the developing leaves but is not involved in the initial down-regulation in the initiating primordia. We also show evidence that this regulation is developmentally significant for leaf formation in the pathway involving ASYMMETRIC LEAVES1/2 (AS1/2) gene expression; these genes are known to function in leaf development. Together, these findings reveal a regulatory point of leaf development mediated through a conserved, noncoding sequence in STM genes. PMID:17898165

  3. [Cloning, sequence analysis and expression of N-acetylglutamate kinase gene in Corynebacterium crenatum].

    PubMed

    Hao, Ning; Zhao, Zhi; Wang, Yu; Zhang, Ying-zi; Ding, Jiu-yuan

    2006-02-01

    N-Acetylglutamate kinase (EC 2.7.2.8;NAGK) genes from wild-type Corynebacterium crenatum AS 1.542 and a L-arginine-producing mutant C. crenatum 971.1 were cloned and sequenced. Analysis of argB sequences revealed that only one ORF existed, which used ATG as the initiation codon and coded a peptide of 317 amino acids with a calculated molecular weight of 33.6kDa. Only one nucleotide difference was found in the structure gene and the difference did not cause a change of amino acid by comparison of the gene sequences between the wild type C. crenatum AS 1.542 and the mutant 971.1. The ORF sequence of argB from C. crenatum AS 1.542 showed homologies of 99.89%, 76.62%, 37.94% to those from Corynebacterium glutamicum ATCC 13032, Corynebacterium efficient YS-314 and Escherichia coli k12. And the amino acid sequence deduced from ORF displayed homologies of 100%, 78.55%, 25.25% to those from microorganisms above, respectively. An internal promoter was found in the upstream of the argB gene from C. crenatum. The argB gene from C. crenatum AS 1.542 was expressed both in C. crenatum AS 1.542 and 971.1. The NAGK activity of transformed C. crenatum AS 1.542 was greatly increased by the induction of IPTG. The NAGK activity of transformed C. crenatum 971.1 was almost twice as much as that of C. crenatum 971.1 under the same induction. The amplification of the NAGK activity yielded 25% increase of L-arginine production in C. crenatum 971.1. PMID:16579472

  4. The Gene-Finder computer tools for analysis of human and model organisms genome sequences.

    PubMed

    Solovyev, V; Salamov, A

    1997-01-01

    We present a complex of new programs for promoter, 3'-processing, splice sites, coding exons and gene structure identification in genomic DNA of several model species. The human gene structure prediction program FGENEH, exon prediction-FEXH and splice site prediction-HSPL have been modified for sequence analysis of Drosophila (FGENED, FEXD and DSPL), C.elegance (FGENEN, FEXN and NSPL), Yeast (FEXY and YSPL) and Plant (FGENEA, FEXA and ASPL) genomic sequences. We recomputed all frequency and discriminant function parameters for these organisms and adjusted organism specific minimal intron lengths. An accuracy of coding region prediction for these programs is similar with the observed accuracy of FEXH and FGENEH. We have developed FEXHB and FGENEHB programs combining pattern recognition features and information about similarity of predicted exons with known sequences in protein databases. These programs have approximately 10% higher average accuracy of coding region recognition. Two new programs for human promoter site prediction (TSSG and TSSW) have been developed which use Gosh (1993) and Wingender (1994) data bases of functional motifs, respectively. POLYAH program was designed for prediction of 3'-processing regions in human genes and CDSB program was developed for bacterial gene prediction. We have developed a new approach to predict multiple genes based on double dynamic programming, that is very important for analysis of long genomic DNA fragments generated by genome sequencing projects. Analysis of uncharacterized sequences based on our methods is available through the University of Houston, Weizmann Institute of Science email servers and several Web pages at Baylor College of Medicine. PMID:9322052

  5. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

    PubMed Central

    Cirulli, Elizabeth T.; Lasseigne, Brittany N.; Petrovski, Slavé; Sapp, Peter C.; Dion, Patrick A.; Leblond, Claire S.; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J.; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E.; Boone, Braden E.; Wimbish, Jack R.; Waite, Lindsay L.; Jones, Angela L.; Carulli, John P.; Day-Williams, Aaron G.; Staropoli, John F.; Xin, Winnie W.; Chesi, Alessandra; Raphael, Alya R.; McKenna-Yasek, Diane; Cady, Janet; de Jong, J.M.B. Vianney; Kenna, Kevin P.; Smith, Bradley N.; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H.; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E.; Baloh, Robert H.; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M.; Gibson, Summer; Trojanowski, John Q.; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Shneider, Neil A.; Chung, Wendy K.; Ravits, John M.; Glass, Jonathan D.; Sims, Katherine B.; Van Deerlin, Vivianna M.; Maniatis, Tom; Hayes, Sebastian D.; Ordureau, Alban; Swarup, Sharan; Landers, John; Baas, Frank; Allen, Andrew S.; Bedlack, Richard S.; Harper, J. Wade; Gitler, Aaron D.; Rouleau, Guy A.; Brown, Robert; Harms, Matthew B.; Cooper, Gregory M.; Harris, Tim; Myers, Richard M.; Goldstein, David B.

    2015-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. Here we report the results of a moderate-scale sequencing study aimed at identifying new genes contributing to predisposition for ALS. We performed whole exome sequencing of 2,874 ALS patients and compared them to 6,405 controls. Several known ALS genes were found to be associated, and the non-canonical IκB kinase family TANK-Binding Kinase 1 (TBK1) was identified as an ALS gene. TBK1 is known to bind to and phosphorylate a number of proteins involved in innate immunity and autophagy, including optineurin (OPTN) and p62 (SQSTM1/sequestosome), both of which have also been implicated in ALS. These observations reveal a key role of the autophagic pathway in ALS and suggest specific targets for therapeutic intervention. PMID:25700176

  6. Identification of genes encoding Schistosoma mansoni antigens using an antigenic sequence tag strategy.

    PubMed

    Zouain, C S; Azevedo, V A; Franco, G R; Pena, S D; Goes, A M

    1998-12-01

    Another approach for the identification of genes that code for antigenic products is described using an antigenic sequence tag (AST) strategy. A Schistosoma mansoni adult worm cDNA library was screened with affinity chromatography-purified immunoglobulins from infected human sera and a mild oxidation treatment with sodium periodate. From 1 or both ends of 30 cDNA clones, 30 ASTs were obtained. Of these, 22 were previously known Sm antigens. One clone had matches with entries for other organisms in the databases and 6 had homology with Sm-expressed sequence tags (EST) entries. These clones, together with another 1 that had no significant database matches, were considered new antigenic genes in S. mansoni. The strategy proved to be efficient for the identification of genes that could be used for immunological studies and evaluation as vaccine candidates. PMID:9920341

  7. Stable intronic sequence RNAs (sisRNAs): a new layer of gene regulation.

    PubMed

    Osman, Ismail; Tay, Mandy Li-Ian; Pek, Jun Wei

    2016-09-01

    Upon splicing, introns are rapidly degraded. Hence, RNAs derived from introns are commonly deemed as junk sequences. However, the discoveries of intronic-derived small nucleolar RNAs (snoRNAs), small Cajal body associated RNAs (scaRNAs) and microRNAs (miRNAs) suggested otherwise. These non-coding RNAs are shown to play various roles in gene regulation. In this review, we highlight another class of intron-derived RNAs known as stable intronic sequence RNAs (sisRNAs). sisRNAs have been observed since the 1980 s; however, we are only beginning to understand their biological significance. Recent studies have shown or suggested that sisRNAs regulate their own host's gene expression, function as molecular sinks or sponges, and regulate protein translation. We propose that sisRNAs function as an additional layer of gene regulation in the cells. PMID:27147469

  8. Diversity through duplication: whole-genome sequencing reveals novel gene retrocopies in the human population.

    PubMed

    Richardson, Sandra R; Salvador-Palomeque, Carmen; Faulkner, Geoffrey J

    2014-05-01

    Gene retrocopies are generated by reverse transcription and genomic integration of mRNA. As such, retrocopies present an important exception to the central dogma of molecular biology, and have substantially impacted the functional landscape of the metazoan genome. While an estimated 8,000-17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms of retrocopy content has remained largely unexplored. Three recent studies by Abyzov et al., Ewing et al. and Schrider et al. have exploited 1,000 Genomes Project Consortium data, as well as other sources of whole-genome sequencing data, to uncover novel gene retrocopies. Here, we compare the methods and results of these three studies, highlight the impact of retrocopies in human diversity and genome evolution, and speculate on the potential for somatic gene retrocopies to impact cancer etiology and genetic diversity among individual neurons in the mammalian brain. PMID:24615986

  9. The mitochondrial genome of Anopheles quadrimaculatus species A: complete nucleotide sequence and gene organization.

    PubMed

    Mitchell, S E; Cockburn, A F; Seawright, J A

    1993-12-01

    The complete sequence (15,455 bp) of the mitochondrial DNA of the mosquito Anopheles quadrimaculatus species A is reported. This genome is compact and very A+T rich (77.4% A+T). It contains genes for 2 ribosomal RNAs (rRNAs), 22 transfer RNAs (tRNAs), and 13 subunits of the mitochondrial inner membrane respiratory complexes. The gene arrangement is the same as in Drosophila yakuba, except that the positions of two contiguous tRNAs are reversed and a third tRNA is transcribed from the complementary strand. Protein-coding genes, rRNAs, and most tRNAs were similar to D. yakuba. Two tRNAs had nonstandard secondary structures comparable with those of nematode mitochondrial tRNAs. The very small putative control region (625 bp) contains no sequence motifs similar to those used in vertebrates and other insects for initiation of transcription and replication. PMID:8112570

  10. Whole-Exome Sequencing and Homozygosity Analysis Implicate Depolarization-Regulated Neuronal Genes in Autism

    PubMed Central

    Chahrour, Maria H.; Yu, Timothy W.; Lim, Elaine T.; Ataman, Bulent; Coulter, Michael E.; Hill, R. Sean; Stevens, Christine R.; Schubert, Christian R.; Greenberg, Michael E.; Gabriel, Stacey B.; Walsh, Christopher A.

    2012-01-01

    Although autism has a clear genetic component, the high genetic heterogeneity of the disorder has been a challenge for the identification of causative genes. We used homozygosity analysis to identify probands from nonconsanguineous families that showed evidence of distant shared ancestry, suggesting potentially recessive mutations. Whole-exome sequencing of 16 probands revealed validated homozygous, potentially pathogenic recessive mutations that segregated perfectly with disease in 4/16 families. The candidate genes (UBE3B, CLTCL1, NCKAP5L, ZNF18) encode proteins involved in proteolysis, GTPase-mediated signaling, cytoskeletal organization, and other pathways. Furthermore, neuronal depolarization regulated the transcription of these genes, suggesting potential activity-dependent roles in neurons. We present a multidimensional strategy for filtering whole-exome sequence data to find candidate recessive mutations in autism, which may have broader applicability to other complex, heterogeneous disorders. PMID:22511880

  11. Using CATH-Gene3D to Analyze the Sequence, Structure, and Function of Proteins.

    PubMed

    Sillitoe, Ian; Lewis, Tony; Orengo, Christine

    2015-01-01

    The CATH database is a classification of protein structures found in the Protein Data Bank (PDB). Protein structures are chopped into individual units of structural domains, and these domains are grouped together into superfamilies if there is sufficient evidence that they have diverged from a common ancestor during the process of evolution. A sister resource, Gene3D, extends this information by scanning sequence profiles of these CATH domain superfamilies against many millions of known proteins to identify related sequences. Thus the combined CATH-Gene3D resource provides confident predictions of the likely structural fold, domain organisation, and evolutionary relatives of these proteins. In addition, this resource incorporates annotations from a large number of external databases such as known enzyme active sites, GO molecular functions, physical interactions, and mutations. This unit details how to access and understand the information contained within the CATH-Gene3D Web pages, the downloadable data files, and the remotely accessible Web services. PMID:26087950

  12. Nucleotide sequence of the regulatory locus controlling expression of bacterial genes for bioluminescence.

    PubMed Central

    Engebrecht, J; Silverman, M

    1987-01-01

    Production of light by the marine bacterium Vibrio fischeri and by recombinant hosts containing cloned lux genes is controlled by the density of the culture. Density-dependent regulation of lux gene expression has been shown to require a locus consisting of the luxR and luxI genes and two closely linked divergent promoters. As part of a genetic analysis to understand the regulation of bioluminescence, we have sequenced the region of DNA containing this control circuit. Open reading frames corresponding to luxR and luxI were identified; transcription start sites were defined by S1 nuclease mapping and sequences resembling promoter elements were located. Images PMID:3697093

  13. Chicken TAP genes differ from their human orthologues in locus organisation, size, sequence features and polymorphism.

    PubMed

    Walker, Brian A; van Hateren, Andrew; Milne, Sarah; Beck, Stephan; Kaufman, Jim

    2005-05-01

    We have previously shown that in the chicken major histocompatibility complex, the two transporters associated with antigen processing genes (TAP1 and TAP2) are located head to head between two classical class I genes. Here we show that the region between these two TAP genes has transcription factor-binding sites in common with class I gene promoters. The TAP genes are also up-regulated by interferon-gamma in a similar way to mammalian TAP genes and in a way that suggests they are both transcribed from a bi-directional promoter. The gene structures of TAP1 and TAP2 differ from that of human TAPs in that TAP1 has a truncated exon 1 and TAP2 has fused exons, resulting in a much smaller gene size. The truncation of TAP1 results in the loss of approximately 150 amino acids, which are thought to be involved in endoplasmic reticulum retention, heterodimer formation and tapasin binding, compared to human TAP1. Most of the protein sequence features involved in binding ATP are conserved, with two exceptions: chicken TAP1 has a glycine in the switch region where other TAPs have glutamine or histidine, and both chicken TAP genes have serines in the C motif where mammalian TAP2 has an alanine. Lastly, the chicken TAP genes are highly polymorphic, with at least as many TAP alleles as there are class I alleles, as seen by investigating nine inbred lines of chicken. The close proximity of the TAP genes to the class I genes and the high level of polymorphism may allow co-evolution of the genes, allowing TAP molecules to transport peptides specifically for the class I molecules of that haplotype. PMID:15900495

  14. Exome sequencing of extended families with autism reveals genes shared across neurodevelopmental and neuropsychiatric disorders

    PubMed Central

    2014-01-01

    Background Autism spectrum disorders (ASDs) comprise a range of neurodevelopmental conditions of varying severity, characterized by marked qualitative difficulties in social relatedness, communication, and behavior. Despite overwhelming evidence of high heritability, results from genetic studies to date show that ASD etiology is extremely heterogeneous and only a fraction of autism genes have been discovered. Methods To help unravel this genetic complexity, we performed whole exome sequencing on 100 ASD individuals from 40 families with multiple distantly related affected individuals. All families contained a minimum of one pair of ASD cousins. Each individual was captured with the Agilent SureSelect Human All Exon kit, sequenced on the Illumina Hiseq 2000, and the resulting data processed and annotated with Burrows-Wheeler Aligner (BWA), Genome Analysis Toolkit (GATK), and SeattleSeq. Genotyping information on each family was utilized in order to determine genomic regions that were identical by descent (IBD). Variants identified by exome sequencing which occurred in IBD regions and present in all affected individuals within each family were then evaluated to determine which may potentially be disease related. Nucleotide alterations that were novel and rare (minor allele frequency, MAF, less than 0.05) and predicted to be detrimental, either by altering amino acids or splicing patterns, were prioritized. Results We identified numerous potentially damaging, ASD associated risk variants in genes previously unrelated to autism. A subset of these genes has been implicated in other neurobehavioral disorders including depression (SLIT3), epilepsy (CLCN2, PRICKLE1), intellectual disability (AP4M1), schizophrenia (WDR60), and Tourette syndrome (OFCC1). Additional alterations were found in previously reported autism candidate genes, including three genes with alterations in multiple families (CEP290, CSMD1, FAT1, and STXBP5). Compiling a list of ASD candidate genes from the

  15. Identification of genes related to Parkinson's disease using expressed sequence tags.

    PubMed

    Kim, Jeong-Min; Lee, Kyu-Hwa; Jeon, Yeo-Jin; Oh, Jung-Hwa; Jeong, So-Young; Song, In-Sung; Kim, Jin-Man; Lee, Dong-Seok; Kim, Nam-Soon

    2006-12-31

    In a search for novel target genes related to Parkinson's disease (PD), two full-length cDNA libraries were constructed from a human normal substantia nigra (SN) and a PD patient's SN. An analysis of the gene expression profiles between them was done using the expressed sequence tags (ESTs) frequency. Data for the differently expressed genes were verified by quantitative real-time RT-PCR, immunohistochemical analysis and a cell death assay. Among the 76 genes identified with a significant difference (P > 0.9), 21 upregulated genes and 13 downregulated genes were confirmed to be differentially expressed in human PD tissues and/or in an MPTP-treated mice model by quantitative real-time RT-PCR. Among those genes, an immunohistochemical analysis using an MPTP mice model for alpha-tubulin including TUBA3 and TUBA6 showed that the protein levels are downregulated, as well as the RNA levels. In addition, MBP, PBP and GNAS were confirmed to accelerate cell death activity, whereas SPP1 and TUBA3 to retard this process. Using an analysis of ESTs frequency, it was possible to identify a large number of genes related to human PD. These new genes, MBP, PBP, GNAS, SPP1 and TUBA3 in particular, represent potential biomarkers for PD and could serve as useful targets for elucidating the molecular mechanisms associated with PD. PMID:17213182

  16. Driver Gene Mutations in Stools of Colorectal Carcinoma Patients Detected by Targeted Next-Generation Sequencing.

    PubMed

    Armengol, Gemma; Sarhadi, Virinder K; Ghanbari, Reza; Doghaei-Moghaddam, Masoud; Ansari, Reza; Sotoudeh, Masoud; Puolakkainen, Pauli; Kokkola, Arto; Malekzadeh, Reza; Knuutila, Sakari

    2016-07-01

    Detection of driver gene mutations in stool DNA represents a promising noninvasive approach for screening colorectal cancer (CRC). Amplicon-based next-generation sequencing (NGS) is a good option to study mutations in many cancer genes simultaneously and from a low amount of DNA. Our aim was to assess the feasibility of identifying mutations in 22 cancer driver genes with Ion Torrent technology in stool DNA from a series of 65 CRC patients. The assay was successful in 80% of stool DNA samples. NGS results showed 83 mutations in cancer driver genes, 29 hotspot and 54 novel mutations. One to five genes were mutated in 75% of cases. TP53, KRAS, FBXW7, and SMAD4 were the top mutated genes, consistent with previous studies. Of samples with mutations, 54% presented concomitant mutations in different genes. Phosphatidylinositol 3-kinase/mitogen-activated protein kinase pathway genes were mutated in 70% of samples, with 58% having alterations in KRAS, NRAS, or BRAF. Because mutations in these genes can compromise the efficacy of epidermal growth factor receptor blockade in CRC patients, identifying mutations that confer resistance to some targeted treatments may be useful to guide therapeutic decisions. In conclusion, the data presented herein show that NGS procedures on stool DNA represent a promising tool to detect genetic mutations that could be used in the future for diagnosis, monitoring, or treating CRC. PMID:27155048

  17. Human DNA sequence homologous to the transforming gene (mos) of Moloney murine sarcoma virus.

    PubMed Central

    Watson, R; Oskarsson, M; Vande Woude, G F

    1982-01-01

    We describe the molecular cloning of a 9-kilo-base-pair BamHI fragment from human placental DNA containing a sequence homologous to the transforming gene (v-mos) of Moloney murine sarcoma virus. The DNA sequence of the homologous region of human DNA (termed humos) was resolved and compared to that of the mouse cellular homolog of v-mos (termed mumos) [Van Beveren, C., van Straaten, F., Galleshaw, J.A. & Verma, I.M. (1981) Cell 27, 97-108]. The humos gene contained an open reading frame of 346 codons that was aligned with the equivalent mumos DNA sequence by the introduction of two gaps of 15 and 3 bases into the mumos DNA and a single gap of 9 bases into the humos DNA. The aligned coding sequences were 77% homologous and terminated at equivalent opal codons. The humos open reading frame initiated at an ATG found internally in the mumos coding sequence. The polypeptides predicted from the DNA sequence to be encoded by humos and mumos also were found to be extensively homologous, and 253 of 337 amino acids were shared between the two polypeptides. The first five NH2-terminal and last two COOH-terminal amino acids of the humos gene product were in common with those of mumos. In addition, near the middle of the polypeptide chains, four regions ranging from 19 to 26 consecutive amino acids were conserved. However, we have not been able to transform mouse cells with transfected humos DNA fragments or with hybrid DNA recombinants containing humos and retroviral long terminal repeat (LTR) sequences. Images PMID:6287464

  18. Phylogenetic analysis of sequences from diverse bacteria with homology to the Escherichia coli rho gene.

    PubMed Central

    Opperman, T; Richardson, J P

    1994-01-01

    Genes from Pseudomonas fluorescens, Chromatium vinosum, Micrococcus luteus, Deinococcus radiodurans, and Thermotoga maritima with homology to the Escherichia coli rho gene were cloned and sequenced, and their sequences were compared with other available sequences. The species for all of the compared sequences are members of five bacterial phyla, including Thermotogales, the most deeply diverged phylum. This suggests that a rho-like gene is ubiquitous in the Bacteria and was present in their common ancestor. The comparative analysis revealed that the Rho homologs are highly conserved, exhibiting a minimum identity of 50% of their amino acid residues in pairwise comparisons. The ATP-binding domain had a particularly high degree of conservation, consisting of some blocks with sequences of residues that are very similar to segments of the alpha and beta subunits of F1-ATPase and of other blocks with sequences that are unique to Rho. The RNA-binding domain is more diverged than the ATP-binding domain. However, one of its most highly conserved segments includes a RNP1-like sequence, which is known to be involved in RNA binding. Overall, the degree of similarity is lowest in the first 50 residues (the first half of the RNA-binding domain), in the putative connector region between the RNA-binding and the ATP-binding domains, and in the last 50 residues of the polypeptide. Since functionally defective mutants for E. coli Rho exist in all three of these segments, they represent important parts of Rho that have undergone adaptive evolution. PMID:8051015

  19. DNA Sequence Heterogeneity of Campylobacter jejuni CJIE4 Prophages and Expression of Prophage Genes

    PubMed Central

    Clark, Clifford G.; Chong, Patrick M.; McCorrister, Stuart J.; Mabon, Philip; Walker, Matthew; Westmacott, Garrett R.

    2014-01-01

    Campylobacter jejuni carry temperate bacteriophages that can affect the biology or virulence of the host bacterium. Known effects include genomic rearrangements and resistance to DNA transformation. C. jejuni prophage CJIE1 shows sequence variability and variability in the content of morons. Homologs of the CJIE1 prophage enhance both adherence and invasion to cells in culture and increase the expression of a specific subset of bacterial genes. Other C. jejuni temperate phages have so far not been well characterized. In this study we describe investigations into the DNA sequence variability and protein expression in a second prophage, CJIE4. CJIE4 sequences were obtained de novo from DNA sequencing of five C. jejuni isolates, as well as from whole genome sequences submitted to GenBank by other research groups. These CJIE4 DNA sequences were heterogenous, with several different insertions/deletions (indels) in different parts of the prophage genome. Two variants of a 3–4 kb region inserted within CJIE4 had different gene content that distinguished two major conserved CJIE4 prophage families. Additional indels were detected throughout the prophage. Detection of proteins in the five isolates characterized in our laboratory in isobaric Tags for Relative and Absolute Quantitation (iTRAQ) experiments indicated that prophage proteins within each of the two large indel variants were expressed during growth of the bacteria on Mueller Hinton agar plates. These proteins included the extracellular DNase associated with resistance to DNA transformation and prophage repressor proteins. Other proteins associated with known or suspected roles in prophage biology were also expressed from CJIE4, including capsid protein, the phage integrase, and MazF, a type II toxin-antitoxin system protein. Together with the results previously obtained for the CJIE1 prophage these results demonstrate that sequence variability and expression of moron genes are both general properties of temperate

  20. Purification and characterization of Clostridium perfringens 120-kilodalton collagenase and nucleotide sequence of the corresponding gene.

    PubMed Central

    Matsushita, O; Yoshihara, K; Katayama, S; Minami, J; Okabe, A

    1994-01-01

    Clostridium perfringens type C NCIB 10662 produced various gelatinolytic enzymes with molecular masses ranging from approximately 120 to approximately 80 kDa. A 120-kDa gelatinolytic enzyme was present in the largest quantity in the culture supernatant, and this enzyme was purified to homogeneity on the basis of sodium dodecyl sulfate-polyacrylamide gel electrophoresis. The purified enzyme was identified as the major collagenase of the organism, and it cleaved typical collagenase substrates such as azocoll, a synthetic substrate (4-phenylazobenzyloxy-carbonyl-Pro-Leu-Gly-Pro-D-Arg [Pz peptide]), and a type I collagen fibril. In addition, a gene (colA) encoding a 120-kDa collagenase was cloned in Escherichia coli. Nested deletions were used to define the coding region of colA, and this region was sequenced; from the nucleotide sequence, this gene encodes a protein of 1,104 amino acids (M(r), 125,966). Furthermore, from the N-terminal amino acid sequence of the purified enzyme which was found in this reading frame, the molecular mass of the mature enzyme was calculated to be 116,339 Da. Analysis of the primary structure of the gene product showed that the enzyme was produced with a stretch of 86 amino acids containing a putative signal sequence. Within this stretch was found PLGP, the amino acid sequence constituting the Pz peptide. This sequence may be implicated in self-processing of the collagenase. A consensus zinc-binding sequence (HEXXH) suggested for vertebrate Zn collagenases is present in this bacterial collagenase. Vibrio alginolyticus collagenase and Achromobacter lyticus protease I showed significant homology with the 120-kDa collagenase of C. perfringens, suggesting that these three enzymes are evolutionarily related. Images PMID:8282691

  1. Exome sequencing identifies potential novel candidate genes in patients with unexplained colorectal adenomatous polyposis.

    PubMed

    Spier, Isabel; Kerick, Martin; Drichel, Dmitriy; Horpaopan, Sukanya; Altmüller, Janine; Laner, Andreas; Holzapfel, Stefanie; Peters, Sophia; Adam, Ronja; Zhao, Bixiao; Becker, Tim; Lifton, Richard P; Holinski-Feder, Elke; Perner, Sven; Thiele, Holger; Nöthen, Markus M; Hoffmann, Per; Timmermann, Bernd; Schweiger, Michal R; Aretz, Stefan

    2016-04-01

    In up to 30% of patients with colorectal adenomatous polyposis, no germline mutation in the known genes APC, causing familial adenomatous polyposis, MUTYH, causing MUTYH-associated polyposis, and POLE or POLD1, causing Polymerase-Proofreading-associated polyposis can be identified, although a hereditary etiology is likely. To uncover new causative genes, exome sequencing was performed using DNA from leukocytes and a total of 12 colorectal adenomas from seven unrelated patients with unexplained sporadic adenomatous polyposis. For data analysis and variant filtering, an established bioinformatics pipeline including in-house tools was applied. Variants were filtered for rare truncating point mutations and copy-number variants assuming a dominant, recessive, or tumor suppressor model of inheritance. Subsequently, targeted sequence analysis of the most promising candidate genes was performed in a validation cohort of 191 unrelated patients. All relevant variants were validated by Sanger sequencing. The analysis of exome sequencing data resulted in the identification of rare loss-of-function germline mutations in three promising candidate genes (DSC2, PIEZO1, ZSWIM7). In the validation cohort, further variants predicted to be pathogenic were identified in DSC2 and PIEZO1. According to the somatic mutation spectra, the adenomas in this patient cohort follow the classical pathways of colorectal tumorigenesis. The present study identified three candidate genes which might represent rare causes for a predisposition to colorectal adenoma formation. Especially PIEZO1 (FAM38A) and ZSWIM7 (SWS1) warrant further exploration. To evaluate the clinical relevance of these genes, investigation of larger patient cohorts and functional studies are required. PMID:26780541

  2. Deep sequencing of New World screw-worm transcripts to discover genes involved in insecticide resistance

    PubMed Central

    2010-01-01

    Background The New World screw-worm (NWS), Cochliomyia hominivorax, is one of the most important myiasis-causing flies, causing severe losses to the livestock industry. In its current geographical distribution, this species has been controlled by the application of insecticides, mainly organophosphate (OP) compounds, but a number of lineages have been identified that are resistant to such chemicals. Despite its economic importance, only limited genetic information is available for the NWS. Here, as a part of an effort to characterize the C. hominivorax genome and identify putative genes involved in insecticide resistance, we sampled its transcriptome by deep sequencing of polyadenylated transcripts using the 454 sequencing technology. Results Deep sequencing on the 454 platform of three normalized libraries (larval, adult male and adult female) generated a total of 548,940 reads. Eighteen candidate genes coding for three metabolic detoxification enzyme families, cytochrome P450 monooxygenases, glutathione S-transferases and carboxyl/cholinesterases were selected and gene expression levels were measured using quantitative real-time polymerase chain reaction (qRT-PCR). Of the investigated candidates, only one gene was expressed differently between control and resistant larvae with, at least, a 10-fold down-regulation in the resistant larvae. The presence of mutations in the acetylcholinesterase (target site) and carboxylesterase E3 genes was investigated and all of the resistant flies presented E3 mutations previously associated with insecticide resistance. Conclusions Here, we provided the largest database of NWS expressed sequence tags that is an important resource, not only for further studies on the molecular basis of the OP resistance in NWS fly, but also for functional and comparative studies among Calliphoridae flies. Among our candidates, only one gene was found differentially expressed in resistant individuals, and its role on insecticide resistance should

  3. Transduction of the cellular src gene and 3' adjacent sequences in avian sarcoma virus PR2257.

    PubMed Central

    Geryk, J; Dezélée, P; Barnier, J V; Svoboda, J; Nehyba, J; Karakoz, I; Rynditch, A V; Yatsula, B A; Calothy, G

    1989-01-01

    When injected into chickens, a transformation-defective mutant of the Prague C strain of Rous sarcoma virus induced tumors at low incidence and after a long latency. One such tumor released a replication-defective virus designated PR2257. We molecularly cloned and sequenced the proviral DNA from quail fibroblasts transformed by PR2257. Comparison of PR2257 sequence with that of Prague C, cellular src, and 3' adjacent cellular DNA showed that the spliced version of the c-src gene and about 950 base pairs (bp) of 3'-flanking cellular DNA were transduced into PR2257. This transduction eliminated nearly all replicative genes, since the gag gene splice donor site was linked to the splice acceptor site of the src gene and, on the 3' side, recombination occurred in the end of env gene. Insertion of two extra cytosines 23 bp before and 19 bp after the c-src stop codon resulted in an extension of the coding portion up to 587 amino acids, divergence of sequences after Pro-525 and replacement of Tyr-527 by a valine residue. In addition, it appears that the 5' and 3' untranslated regions of PR2257 result from multiple recombinations between exogenous and endogenous virus genomes. Limited digestion of p66src encoded by PR2257 with Staphylococcus aureus V8 protease yielded a V2 peptide (C-terminal moiety) with an apparent molecular mass of 31 kilodaltons, consistent with the 5.7-kilodalton increase expected from the DNA sequence. The structure of PR2257 suggests that the first step in the capture of c-src gene by avian lymphomatosis viruses is the trans splicing of the viral leader mRNA to exon 1 of c-src. Images PMID:2463376

  4. Species identification using genetic tools: the value of nuclear and mitochondrial gene sequences in whale conservation.

    PubMed

    Palumbi, S R; Cipriano, F

    1998-01-01

    DNA sequence analysis is a powerful tool for identifying the source of samples thought to be derived from threatened or endangered species. Analysis of mitochondrial DNA (mtDNA) from retail whale meat markets has shown consistently that the expected baleen whale in these markets, the minke whale, makes up only about half the products analyzed. The other products are either unregulated small toothed whales like dolphins or are protected baleen whales such as humpback, Bryde's, fin, or blue whales. Independent verification of such mtDNA identifications requires analysis of nuclear genetic loci, but this is technically more difficult than standard mtDNA sequencing. In addition, evolution of species-specific sequences (i.e., fixation of sequence differences to produce reciprocally monophyletic gene trees) is slower in nuclear than in mitochondrial genes primarily because genetic drift is slower at nuclear loci. When will use of nuclear sequences allow forensic DNA identification? Comparison of neutral theories of coalescence of mitochondrial and nuclear loci suggests a simple rule of thumb. The "three-times rule" suggests that phylogenetic sorting at nuclear loci is likely to produce species-specific sequences when mitochondrial alleles are reciprocally monophyletic and the branches leading to the mtDNA sequences of a species are three times longer than the average difference observed within species. A preliminary test of the three-times rule, which depends on many assumptions about the species and genes involved, suggests that blue and fin whales should have species-specific sequences at most neutral nuclear loci, whereas humpback and fin whales should show species-specific sequences at fewer nuclear loci. Partial sequences of actin introns from these species confirm the predictions of the three-times rule and show that blue and fin whales are reciprocally monophyletic at this locus. These intron sequences are thus good tools for the identification of these species

  5. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.

    PubMed

    Besemer, J; Lomsadze, A; Borodovsky, M

    2001-06-15

    Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed. PMID:11410670

  6. Cloning and nucleotide sequence determination of the Clostridium pasteurianum ferredoxin gene.

    PubMed Central

    Graves, M C; Mullenbach, G T; Rabinowitz, J C

    1985-01-01

    We have constructed a library of Clostridium pasteurianum DNA cloned in the plasmid pBR322. Based on the known amino acid sequence for C. pasteurianum ferredoxin, a 64-fold degenerate heptadecanucleotide pool was synthesized. This mixed probe hybridized to two clones which were shown to contain greater than 6 kilobase pairs of the same genomic DNA. Sequence analysis of a common Sau3A1 0.6-kilobase-pair fragment revealed that it contains the information for the apoferredoxin structural gene. According to the DNA sequence, the only post-translational processing of this small apoprotein is the hydrolysis of the initiator methionine. Putative transcription and translation start and stop signals are present within the sequence. Images PMID:3856844

  7. Challenges in identifying cancer genes by analysis of exome sequencing data.

    PubMed

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F; Bandyopadhyay, Sourav; Mischel, Paul S; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13-60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  8. Challenges in identifying cancer genes by analysis of exome sequencing data

    PubMed Central

    Hofree, Matan; Carter, Hannah; Kreisberg, Jason F.; Bandyopadhyay, Sourav; Mischel, Paul S.; Friend, Stephen; Ideker, Trey

    2016-01-01

    Massively parallel sequencing has permitted an unprecedented examination of the cancer exome, leading to predictions that all genes important to cancer will soon be identified by genetic analysis of tumours. To examine this potential, here we evaluate the ability of state-of-the-art sequence analysis methods to specifically recover known cancer genes. While some cancer genes are identified by analysis of recurrence, spatial clustering or predicted impact of somatic mutations, many remain undetected due to lack of power to discriminate driver mutations from the background mutational load (13–60% recall of cancer genes impacted by somatic single-nucleotide variants, depending on the method). Cancer genes not detected by mutation recurrence also tend to be missed by all types of exome analysis. Nonetheless, these genes are implicated by other experiments such as functional genetic screens and expression profiling. These challenges are only partially addressed by increasing sample size and will likely hold even as greater numbers of tumours are analysed. PMID:27417679

  9. Complete mitochondrial genome sequence and gene organization of Chinese indigenous chickens with phylogenetic considerations.

    PubMed

    Zhao, F P; Fan, H Y; Li, G H; Zhang, B K

    2016-01-01

    In this study, we sequenced the complete mitochondrial DNA of Chinese indigenous Jinhu Black-bone and Rugao chickens. The two chicken mitochondrial genomes were deposited in GenBank under accession Nos. KP742951 and KR347464, respectively. The complete mitochondrial genomes of Jinhu Black-bone and Rugao chickens were sequenced and found to span 16,785 and 16,786 bp, respectively, and consisted of 22 tRNA genes, two rRNA genes (12S rRNA and 16S rRNA), 13 protein-coding genes, and one control region (D-loop). The majority of genes were positioned on the H-strand, and the ND6 and eight tRNA genes were found to be encoded on the L-strand. The mitogenomes showed a similar gene order to that of the published Gallus gallus genome, as neither included a control region. The overall base composition of the genome of the two chickens was A = 30.22/30.28%, G = 13.57/13.49%, T = 23.74/23.76%, and C = 32.48/32.48%. Nucleotide skewness of the coding strands of the two chicken genomes (AT-skew = 0.12, GC-skew = -0.41) was biased towards T and G. Phylogenetic analysis revealed 29 subspecies, and the molecular genetic relationship among the 29 subspecies was identical to that of traditional taxonomy. PMID:27421002

  10. Network‐Informed Gene Ranking Tackles Genetic Heterogeneity in Exome‐Sequencing Studies of Monogenic Disease

    PubMed Central

    Schulz, Reiner; Weale, Michael E.; Southgate, Laura; Oakey, Rebecca J.; Simpson, Michael A.; Schlitt, Thomas

    2015-01-01

    ABSTRACT Genetic heterogeneity presents a significant challenge for the identification of monogenic disease genes. Whole‐exome sequencing generates a large number of candidate disease‐causing variants and typical analyses rely on deleterious variants being observed in the same gene across several unrelated affected individuals. This is less likely to occur for genetically heterogeneous diseases, making more advanced analysis methods necessary. To address this need, we present HetRank, a flexible gene‐ranking method that incorporates interaction network data. We first show that different genes underlying the same monogenic disease are frequently connected in protein interaction networks. This motivates the central premise of HetRank: those genes carrying potentially pathogenic variants and whose network neighbors do so in other affected individuals are strong candidates for follow‐up study. By simulating 1,000 exome sequencing studies (20,000 exomes in total), we model varying degrees of genetic heterogeneity and show that HetRank consistently prioritizes more disease‐causing genes than existing analysis methods. We also demonstrate a proof‐of‐principle application of the method to prioritize genes causing Adams‐Oliver syndrome, a genetically heterogeneous rare disease. An implementation of HetRank in R is available via the Website http://sourceforge.net/p/hetrank/. PMID:26394720

  11. Inactivation of gene expression in plants as a consequence of specific sequence duplication.

    PubMed Central

    Flavell, R B

    1994-01-01

    Numerous examples now exist in plants where the insertion of multiple copies of a transgene leads to loss of expression of some or all copies of the transgene. Where the transgene contains sequences homologous to an endogenous gene, expression of both transgene and endogenous gene is sometimes found to be impaired. Several examples of these phenomena displaying different features are reviewed. Possible explanations for the observed phenomena are outlined, drawing on known cellular processes in Drosophila, fungi, and mammals as well as plants. It is hypothesized that duplicated sequences can, under certain circumstances, become involved in cycles of hybrid chromatin formation or other processes that generate the potential for modification of inherited chromatin structure and cytosine methylation patterns. These epigenetic changes could lead to altered transcription rates or altered efficiencies of mRNA maturation and export from the nucleus. Where the loss of gene expression is posttranscriptional, antisense RNA could be formed on accumulated, inefficiently processed RNAs by an RNA-dependent RNA polymerase or from a chromosomal promoter and cause the observed loss of homologous mRNAs and possibly the modification of homologous genes. It is suggested that the mechanisms evolved to help silence the many copies of transposable elements in plants. Multicopy genes that are part of the normal gene catalog of a plant species must have evolved to avoid these silencing mechanisms or their consequences. PMID:8170935

  12. Nucleotide sequence specifying the glycoprotein gene, gB, of herpes simplex virus type 1.

    PubMed

    Bzik, D J; Fox, B A; DeLuca, N A; Person, S

    1984-03-01

    The nucleotide sequence thought to specify the glycoprotein gene, gB, of the KOS strain of herpes simplex virus type 1 (HSV-1) has been determined. A 3.1-kilobase (kb), viral-specified RNA was mapped to the left half of the BamHI-G fragment (0.345 to 0.399 map units). TATA, CAT-box, and possible mRNA start sequences characteristic of HSV-1 genes are found near 0.368 map units. The first available ATG codon is at 0.366 and the first in-phase chain terminator at 0.348 map units. A polyA-addition signal (AATAAA) occurs 17 nucleotides past the chain terminator. Translation of these sequences would yield a 100.3-kilodalton (kDa) polypeptide characterized by a 5' signal sequence, nine N-linked saccharide addition sites, a strongly hydrophobic membrane-spanning sequence, and a highly charged 3' cytoplasmic anchor sequence. Two mutants of KOS, tsJ12 and tsJ20, that are temperature-sensitive for viral growth and for the production of gB, have been physically mapped to 0.357 to 0.360 and 0.360 to 0.364 map units, respectively (DeLuca et al., in preparation). The nucleotide sequence of the mutants was determined in these regions. In both cases a single amino acid replacement within the 100.3-kDa polypeptide is predicted from the sequence analysis. PMID:6324454

  13. Rhizobium sp. strain BN4 (a selenium oxyanion-reducing bacterium) 16S rRNA gene complete sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    This study used 1482 base pair 16S rRNA gene sequence methods in conjunction with other biochemical and morphological studies to confirm the identification of a bacterium (refer to as the BN4 strain) as a Rhizobium sp. The 16S rRNA gene sequence places it with the Rhizobium clade that includes R. d...

  14. Nucleotide Sequence and Gene Organization of the Starfish Asterina Pectinifera Mitochondrial Genome

    PubMed Central

    Asakawa, S.; Himeno, H.; Miura, K. I.; Watanabe, K.

    1995-01-01

    The 16,260-bp mitochondrial DNA (mtDNA) from the starfish Asterina pectinifera has been sequenced. The genes for 13 proteins, two rRNAs and 22 tRNAs are organized in an extremely economical fashion, similar to those of other animal mtDNAs, with some of the genes overlapping each other. The gene organization is the same as that for another echinoderm, sea urchin, except for the inversion of a 4.6-kb segment that contains genes for two proteins, 13 tRNAs and the 16S rRNA. Judging from the organization of the protein coding genes, mammalian mtDNAs resemble the sea urchin mtDNA more than that of the starfish. The region around the 3' end of the 12S rRNA gene of the starfish shows a high similarity with those for vertebrates. This region encodes a possible stem and loop structure; similar potential structures occur in this region of vertebrate mtDNAs and also in nonmitochondrial small subunit rRNA. A similar stem and loop structure is also found at the 3' end of the 16S rRNA genes in A. pectinifera, in another starfish Pisaster ochraceus, in vertebrates and in Drosophila, but not in sea urchins. The full sequence data confirm the presumption that AGA/AGG, AUA and AAA codons, respectively, code for serine, isoleucine, and asparagine in the starfish mitochondria, and that AGA/AGG codons are read by tRNA(GCU)(Ser), which possesses a truncated dihydrouridine arm, that was previously suggested from a partial mtDNA sequence. The structural characteristics of tRNAs and possible mechanisms for the change in the mitochondrial genetic code are also discussed. PMID:7672576

  15. Major Soybean Maturity Gene Haplotypes Revealed by SNPViz Analysis of 72 Sequenced Soybean Genomes

    PubMed Central

    Langewisch, Tiffany; Zhang, Hongxin; Vincent, Ryan; Joshi, Trupti; Xu, Dong; Bilyeu, Kristin

    2014-01-01

    In this Genomics Era, vast amounts of next-generation sequencing data have become publicly available for multiple genomes across hundreds of species. Analyses of these large-scale datasets can become cumbersome, especially when comparing nucleotide polymorphisms across many samples within a dataset and among different datasets or organisms. To facilitate the exploration of allelic variation and diversity, we have developed and deployed an in-house computer software to categorize and visualize these haplotypes. The SNPViz software enables users to analyze region-specific haplotypes from single nucleotide polymorphism (SNP) datasets for different sequenced genomes. The examination of allelic variation and diversity of important soybean [Glycine max (L.) Merr.] flowering time and maturity genes may provide additional insight into flowering time regulation and enhance researchers' ability to target soybean breeding for particular environments. For this study, we utilized two available soybean genomic datasets for a total of 72 soybean genotypes encompassing cultivars, landraces, and the wild species Glycine soja. The major soybean maturity genes E1, E2, E3, and E4 along with the Dt1 gene for plant growth architecture were analyzed in an effort to determine the number of major haplotypes for each gene, to evaluate the consistency of the haplotypes with characterized variant alleles, and to identify evidence of artificial selection. The results indicated classification of a small number of predominant haplogroups for each gene and important insights into possible allelic diversity for each gene within the context of known causative mutations. The software has both a stand-alone and web-based version and can be used to analyze other genes, examine additional soybean datasets, and view similar genome sequence and SNP datasets from other species. PMID:24727730

  16. Transcriptome Sequencing and Gene Expression Analysis of Trichoderma brevicompactum under Different Culture Conditions

    PubMed Central

    Shentu, Xu-Ping; Liu, Wei-Ping; Zhan, Xiao-Huan; Xu, Yi-Peng; Xu, Jian-Feng; Yu, Xiao-Ping; Zhang, Chuan-Xi

    2014-01-01

    Background Trichoderma brevicompactum is the Trichoderma species producing simple trichothecenes-trichodermin, a potential antifungal antibiotic and a protein synthesis inhibitor. However, the biosynthetic pathway of trichodermin in Trichoderma is not completely clarified. Therefore, transcriptome and gene expression profiling data for this species are needed as an important resource to better understand the mechanism of the trichodermin biosynthesis and provide a blueprint for further study of T. brevicompactum. Results In this study, de novo assembly of the T. brevicompactum transcriptome using the short-read sequencing technology (Illumina) was performed. In addition, two digital gene expression (DGE) libraries of T. brevicompactum under the trichodermin-producing and trichodermin-nonproducing culture conditions, respectively, were constructed to identify the differences in gene expression. A total of 23,351 unique transcripts with a mean length of 856 bp were obtained by a new Trinity de novo assembler. The variations of the gene expression under different culture conditions were also identified. The expression profiling data revealed that 3,282 unique transcripts had a significantly differential expression under the trichodermin-producing condition, as compared to the trichodermin-nonproducing condition. This study provides a large amount of transcript sequence data that will contribute to the study of the trichodermin biosynthesis in T. brevicompactum. Furthermore, quantitative real-time PCR (qRT-PCR) was found to be useful to confirm the differential expression of the unique transcripts. Conclusion Our study provides considerable gene expression information of T. brevicompactum at the transcriptional level,which will help accelerate the research on the trichodermin biosynthesis. Additionally, we have demonstrated the feasibility of using the Illumina sequencing based DGE system for gene expression profiling, and have shed new light on functional studies of

  17. Finding Single Copy Genes Out of Sequenced Genomes for Multilocus Phylogenetics in Non-Model Fungi

    PubMed Central

    Feau, Nicolas; Decourcelle, Thibaut; Husson, Claude; Desprez-Loustau, Marie-Laure; Dutech, Cyril

    2011-01-01

    Historically, fungal multigene phylogenies have been reconstructed based on a small number of commonly used genes. The availability of complete fungal genomes has given rise to a new wave of model organisms that provide large number of genes potentially useful for building robust gene genealogies. Unfortunately, cross-utilization of these resources to study phylogenetic relationships in the vast majority of non-model fungi (i.e. “orphan” species) remains an unexamined question. To address this problem, we developed a method coupled with a program named “PHYLORPH” (PHYLogenetic markers for ORPHans). The method screens fungal genomic databases (107 fungal genomes fully sequenced) for single copy genes that might be easily transferable and well suited for studies at low taxonomic levels (for example, in species complexes) in non-model fungal species. To maximize the chance to target genes with informative regions, PHYLORPH displays a graphical evaluation system based on the estimation of nucleotide divergence relative to substitution type. The usefulness of this approach was tested by developing markers in four non-model groups of fungal pathogens. For each pathogen considered, 7 to 40% of the 10–15 best candidate genes proposed by PHYLORPH yielded sequencing success. Levels of polymorphism of these genes were compared with those obtained for some genes traditionally used to build fungal phylogenies (e.g. nuclear rDNA, β-tubulin, γ-actin, Elongation factor EF-1α). These genes were ranked among the best-performing ones and resolved accurately taxa relationships in each of the four non-model groups of fungi considered. We envision that PHYLORPH will constitute a useful tool for obtaining new and accurate phylogenetic markers to resolve relationships between closely related non-model fungal species. PMID:21533204

  18. Complete exon sequencing of all known Usher syndrome genes greatly improves molecular diagnosis

    PubMed Central

    2011-01-01

    Background Usher syndrome (USH) combines sensorineural deafness with blindness. It is inherited in an autosomal recessive mode. Early diagnosis is critical for adapted educational and patient management choices, and for genetic counseling. To date, nine causative genes have been identified for the three clinical subtypes (USH1, USH2 and USH3). Current diagnostic strategies make use of a genotyping microarray that is based on the previously reported mutations. The purpose of this study was to design a more accurate molecular diagnosis tool. Methods We sequenced the 366 coding exons and flanking regions of the nine known USH genes, in 54 USH patients (27 USH1, 21 USH2 and 6 USH3). Results Biallelic mutations were detected in 39 patients (72%) and monoallelic mutations in an additional 10 patients (18.5%). In addition to biallelic mutations in one of the USH genes, presumably pathogenic mutations in another USH gene were detected in seven patients (13%), and another patient carried monoallelic mutations in three different USH genes. Notably, none of the USH3 patients carried detectable mutations in the only known USH3 gene, whereas they all carried mutations in USH2 genes. Most importantly, the currently used microarray would have detected only 30 of the 81 different mutations that we found, of which 39 (48%) were novel. Conclusions Based on these results, complete exon sequencing of the currently known USH genes stands as a definite improvement for molecular diagnosis of this disease, which is of utmost importance in the perspective of gene therapy. PMID:21569298

  19. Identification of planarian homeobox sequences indicates the antiquity of most Hox/homeotic gene subclasses.

    PubMed Central

    Balavoine, G; Telford, M J

    1995-01-01

    The homeotic gene complex (HOM-C) is a cluster of genes involved in the anteroposterior axial patterning of animal embryos. It is composed of homeobox genes belonging to the Hox/HOM superclass. Originally discovered in Drosophila, Hox/HOM genes have been identified in organisms as distantly related as arthropods, vertebrates, nematodes, and cnidarians. Data obtained in parallel from the organization of the complex, the domains of gene expression during embryogenesis, and phylogenetic relationships allow the subdivision of the Hox/HOM superclass into five classes (lab, pb/Hox3, Dfd, Antp, and Abd-B) that appeared early during metazoan evolution. We describe a search for homologues of these genes in platyhelminths, triploblast metazoans emerging as an outgroup to the great coelomate ensemble. A degenerate PCR screening for Hox/HOM homeoboxes in three species of triclad planarians has revealed 10 types of Antennapedia-like genes. The homeobox-containing sequences of these PCR fragments allowed the amplification of the homeobox-coding exons for five of these genes in the species Polycelis nigra. A phylogenetic analysis shows that two genes are clear orthologues of Drosophila labial, four others are members of a Dfd/Antp superclass, and a seventh gene, although more difficult to classify with certainty, may be related to the pb/Hox3 class. Together with previously identified Hox/HOM genes in other flatworms, our analyses demonstrate the existence of an elaborate family of Hox/HOM genes in the ancestor of all triploblast animals. Images Fig. 4 PMID:7638172

  20. Analysis of mutations in the entire coding sequence of the factor VIII gene

    SciTech Connect

    Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M.

    1994-09-01

    Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.

  1. Cloning and DNA sequence of the gene coding for Bacillus stearothermophilus T-6 xylanase.

    PubMed Central

    Gat, O; Lapidot, A; Alchanati, I; Regueros, C; Shoham, Y

    1994-01-01

    Bacillus stearothermophilus T-6 produces an extracellular thermostable xylanase. Affinity-purified polyclonal serum raised against the enzyme was used to screen a genomic library of B. stearothermophilus T-6 constructed in lambda-EMBL3. Two positive phages were isolated, both containing similar 13-kb inserts, and their lysates exhibited xylanase activity. A 3,696-bp SalI-BamHI fragment containing the xylanase gene was subcloned in Escherichia coli and subsequently sequenced. The open reading frame of xylanase T-6 consists of 1,236 bp. On the basis of sequence similarity, two possible -10 and -35 regions, a ribosome-binding site at the 5' end of the gene and a potential transcriptional termination motif at the 3' end of the gene, were identified. From the previously known N-terminal amino acid sequence of xylanase T-6 and the possible ribosome-binding site, a putative 28-amino-acid signal peptide was deduced. The mature xylanase T-6 consists of 379 amino acids with a calculated molecular weight and pI of 43,808 and 6.88, respectively. Multiple alignment of beta-glycanase amino acid sequences revealed highly conserved regions. Northern (RNA) blot analysis indicated that the xylanase T-6 transcript is about 1.4 kb and that the induction of this enzyme synthesis by xylose is on the transcriptional level. Images PMID:8031084

  2. New Hosts of Simplicimonas similis and Trichomitus batrachorum Identified by 18S Ribosomal RNA Gene Sequences

    PubMed Central

    Dimasuay, Kris Genelyn B.; Lavilla, Orlie John Y.; Rivera, Windell L.

    2013-01-01

    Trichomonads are obligate anaerobes generally found in the digestive and genitourinary tract of domestic animals. In this study, four trichomonad isolates were obtained from carabao, dog, and pig hosts using rectal swab. Genomic DNA was extracted using Chelex method and the 18S rRNA gene was successfully amplified through novel sets of primers and undergone DNA sequencing. Aligned isolate sequences together with retrieved 18S rRNA gene sequences of known trichomonads were utilized to generate phylogenetic trees using maximum likelihood and neighbor-joining analyses. Two isolates from carabao were identified as Simplicimonas similis while each isolate from dog and pig was identified as Pentatrichomonas hominis and Trichomitus batrachorum, respectively. This is the first report of S. similis in carabao and the identification of T. batrachorum in pig using 18S rRNA gene sequence analysis. The generated phylogenetic tree yielded three distinct groups mostly with relatively moderate to high bootstrap support and in agreement with the most recent classification. Pathogenic potential of the trichomonads in these hosts still needs further investigation. PMID:23936631

  3. Nucleotide sequence of ompV, the gene for a major Vibrio cholerae outer membrane protein.

    PubMed

    Pohlner, J; Meyer, T F; Jalajakumari, M B; Manning, P A

    1986-12-01

    The nucleotide sequence of the ompV gene of Vibrio cholerae was determined. The product of the gene is a 28,000 dalton protein which, after the removal of a 19 amino acid signal sequence, produces a mature outer membrane protein of 26,000 daltons. The cleavage site was determined by amino-terminal amino acid sequencing of the purified mature protein. The DNA upstream of the gene shows the presence of a typical promoter region as judged from the Escherichia coli consensus information; however, the Shine-Dalgarno sequence is associated with a region capable of forming a secondary structure in the mRNA. The formation of this structure would inhibit binding of the mRNA to the ribosome and reduce translation. It is proposed that this structure is recognized by a positive activator in V. cholerae and because of its absence in E. coli ompV is poorly expressed. The distribution of rare codons within ompV suggests that they may serve to slow down the translation of particular domains such that the nascent polypeptide has an opportunity to take up its conformation without interference from the later formed regions. Such a mechanism could aid localization of the protein if export were by a contranslational secretion system. PMID:3031428

  4. Massive parallel IGHV gene sequencing reveals a germinal center pathway in origins of human multiple myeloma

    PubMed Central

    Bryant, Dean; Seckinger, Anja; Hose, Dirk; Zojer, Niklas; Sahota, Surinder S.

    2015-01-01

    Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway. PMID:25929340

  5. Proteus mirabilis ambient-temperature fimbriae: cloning and nucleotide sequence of the aft gene cluster.

    PubMed Central

    Massad, G; Fulkerson, J F; Watson, D C; Mobley, H L

    1996-01-01

    Uropathogenic Proteus mirabilis produces at least four types of fimbriae. Amino acid sequences from two peptides, derived by tryptic digestion of the structural subunit of one type of these fimbriae, the ambient-temperature fimbriae, were determined: NVVPGQPSSTQ and LIEGENQLNYNA. PCR primers, based on these sequences and that of the N terminus, were used to amplify a 359-bp fragment. A cosmid clone, isolated from a P. mirabilis genomic library by hybridization with the 359-bp PCR product, was used to determine the nucleotide sequence of the atf gene cluster. A 3,903-bp region encodes three polypeptides: AtfA, the structural subunit; AtfB, the chaperone; and AtfC, the outer membrane molecular usher. No fimbria-related genes are evident either 5' or 3' to the three contiguous genes. AtfA demonstrates significant amino acid sequence identity with type 1 major fimbrial subunits of several enteric species. The 359-bp PCR product hybridized strongly with all Proteus isolates (n = 9) and 25% of 355 Escherichia coli isolates but failed to hybridize with any of 26 isolates among nine other uropathogenic species. Ambient-temperature fimbriae of P. mirabilis may represent a novel type of fimbriae of enteric species. PMID:8926119

  6. Increased functional protein expression using nucleotide sequence features enriched in highly expressed genes in zebrafish

    PubMed Central

    Horstick, Eric J.; Jordan, Diana C.; Bergeron, Sadie A.; Tabor, Kathryn M.; Serpe, Mihaela; Feldman, Benjamin; Burgess, Harold A.

    2015-01-01

    Many genetic manipulations are limited by difficulty in obtaining adequate levels of protein expression. Bioinformatic and experimental studies have identified nucleotide sequence features that may increase expression, however it is difficult to assess the relative influence of these features. Zebrafish embryos are rapidly injected with calibrated doses of mRNA, enabling the effects of multiple sequence changes to be compared in vivo. Using RNAseq and microarray data, we identified a set of genes that are highly expressed in zebrafish embryos and systematically analyzed for enrichment of sequence features correlated with levels of protein expression. We then tested enriched features by embryo microinjection and functional tests of multiple protein reporters. Codon selection, releasing factor recognition sequence and specific introns and 3′ untranslated regions each increased protein expression between 1.5- and 3-fold. These results suggested principles for increasing protein yield in zebrafish through biomolecular engineering. We implemented these principles for rational gene design in software for codon selection (CodonZ) and plasmid vectors incorporating the most active non-coding elements. Rational gene design thus significantly boosts expression in zebrafish, and a similar approach will likely elevate expression in other animal models. PMID:25628360

  7. Cloning and nucleotide sequence of the leucyl-tRNA synthetase gene of Bacillus subtilis.

    PubMed Central

    Vander Horn, P B; Zahler, S A

    1992-01-01

    The leucyl-tRNA synthetase gene (leuS) of Bacillus subtilis was cloned and sequenced. A mutation in the gene, leuS1, increases the transcription and expression of the ilv-leu operion, permitting monitoring of leuS alleles. The leuS1 mutation was mapped to 270 degrees on the chromosome. Sequence analysis showed that the mutation is a single-base substitution, possibly in a monocistronic operon. The leader mRNA predicted by the sequence would contain a number of possible secondary structures and a T box, a sequence observed upstream of leader mRNA terminators of Bacillus tRNA synthetases and the B. subtilis ilv-leu operon. The DNA of the B. subtilis leuS open reading frame is 48% identical to the leuS gene of Escherichia coli and is predicted to encode a polypeptide with 46% identity to the leucyl-tRNA synthetase of E. coli. PMID:1317842

  8. Distribution and sequence variations of selected virulence genes among group A streptococcal isolates from western Norway.

    PubMed

    Mylvaganam, H; Bjorvatn, B; Osland, A

    2000-11-01

    In order to compare the distribution of selected virulence genes among group A streptococci recovered from invasive disease and superficial infections, 42 isolates were screened for mga, speB, speA, ssa and ska, by PCR. The isolates were predominantly of the sequence types emm1, emm3 and emm6, but also included a few of the types emm22, emm28, emm75 and emm78. The phage-mediated speA seemed to be prevalent in emm types 1 and 3, and its distribution was not related to disease severity. The other genes were present in all isolates. The mga, speB and speA were further studied by sequence analysis. Although allotypic associations with invasiveness were not found, allelic specificity to the emm sequence type was observed. In addition, the mga sequences indicated two lineages, related to opacity factor production. A possible recombination between these two main divergent mga genes was observed in isolates of the types emm22 and emm75. A logical nomenclature of the alleles of mga and speB is suggested. PMID:11211972

  9. RAF gene fusion breakpoints in pediatric brain tumors are characterized by significant enrichment of sequence microhomology

    PubMed Central

    Lawson, Andrew R.J.; Hindley, Guy F.L.; Forshew, Tim; Tatevossian, Ruth G.; Jamie, Gabriel A.; Kelly, Gavin P.; Neale, Geoffrey A.; Ma, Jing; Jones, Tania A.; Ellison, David W.; Sheer, Denise

    2011-01-01

    Gene fusions involving members of the RAF family of protein kinases have recently been identified as characteristic aberrations of low-grade astrocytomas, the most common tumors of the central nervous system in children. While it has been shown that these fusions cause constitutive activation of the ERK/MAPK pathway, very little is known about their formation. Here, we present a detailed analysis of RAF gene fusion breakpoints from a well-characterized cohort of 43 low-grade astrocytomas. Our findings show that the rearrangements that generate these RAF gene fusions may be simple or complex and that both inserted nucleotides and microhomology are common at the DNA breakpoints. Furthermore, we identify novel enrichment of microhomologous sequences in the regions immediately flanking the breakpoints. We thus provide evidence that the tandem duplications responsible for these fusions are generated by microhomology-mediated break-induced replication (MMBIR). Although MMBIR has previously been implicated in the pathogenesis of other diseases and the evolution of eukaryotic genomes, we demonstrate here that the proposed details of MMBIR are consistent with a recurrent rearrangement in cancer. Our analysis of repetitive elements, Z-DNA and sequence motifs in the fusion partners identified significant enrichment of the human minisatellite conserved sequence/χ-like element at one side of the breakpoint. Therefore, in addition to furthering our understanding of low-grade astrocytomas, this study provides insights into the molecular mechanistic details of MMBIR and the sequence of events that occur in the formation of genomic rearrangements. PMID:21393386

  10. SERPINA1 Full-Gene Sequencing Identifies Rare Mutations Not Detected in Targeted Mutation Analysis.

    PubMed

    Graham, Rondell P; Dina, Michelle A; Howe, Sarah C; Butz, Malinda L; Willkomm, Kurt S; Murray, David L; Snyder, Melissa R; Rumilla, Kandelaria M; Halling, Kevin C; Highsmith, W Edward

    2015-11-01

    Genetic α-1 antitrypsin (AAT) deficiency is characterized by low serum AAT levels and the identification of causal mutations or an abnormal protein. It needs to be distinguished from deficiency because of nongenetic causes, and diagnostic delay may contribute to worse patient outcome. Current routine clinical testing assesses for only the most common mutations. We wanted to determine the proportion of unexplained cases of AAT deficiency that harbor causal mutations not identified through current standard allele-specific genotyping and isoelectric focusing (IEF). All prospective cases from December 1, 2013, to October 1, 2014, with a low serum AAT level not explained by allele-specific genotyping and IEF were assessed through full-gene sequencing with a direct sequencing method for pathogenic mutations. We reviewed the results using American Council of Medical Genetics criteria. Of 3523 cases, 42 (1.2%) met study inclusion criteria. Pathogenic or likely pathogenic mutations not identified through clinical testing were detected through full-gene sequencing in 16 (38%) of the 42 cases. Rare mutations not detected with current allele-specific testing and IEF underlie a substantial proportion of genetic AAT deficiency. Full-gene sequencing, therefore, has the ability to improve accuracy in the diagnosis of AAT deficiency. PMID:26321041

  11. Identification of expressed resistance gene-like sequences by data mining in 454-derived transcriptomic sequences of common bean (Phaseolus vulgaris L.)

    PubMed Central

    2012-01-01

    Background Common bean (Phaseolus vulgaris L.) is one of the most important legumes in the world. Several diseases severely reduce bean production and quality; therefore, it is very important to better understand disease resistance in common bean in order to prevent these losses. More than 70 resistance (R) genes which confer resistance against various pathogens have been cloned from diverse plant species. Most R genes share highly conserved domains which facilitates the identification of new candidate R genes from the same species or other species. The goals of this study were to isolate expressed R gene-like sequences (RGLs) from 454-derived transcriptomic sequences and expressed sequence tags (ESTs) of common bean, and to develop RGL-tagged molecular markers. Results A data-mining approach was used to identify tentative P. vulgaris R gene-like sequences from approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non-redundant sequences were identified and named as common bean (P. vulgaris = Pv) resistance gene-like sequences (PvRGLs). Among the identified PvRGLs, about 60% (218 PvRGLs) were from 454-derived sequences. Reverse transcriptase-polymerase chain reaction (RT-PCR) analysis confirmed that PvRGLs were actually expressed in the leaves of common bean. Upon comparison to P. vulgaris genomic sequences, 105 (28.77%) of the 365 tentative PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between common bean and soybean, 237 (64.93%) PvRGLs were anchored on the P. vulgaris genetic map and will need to be mapped to determine order. In addition, 11 sequence-tagged-site (STS) and 19 cleaved amplified polymorphic sequence (CAPS) molecular markers were developed for 25 unique PvRGLs. Conclusions In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs available in GenBank and about 65% of PvRGLs were integrated into the common

  12. In silico comparative analysis of DNA and amino acid sequences for prion protein gene.

    PubMed

    Kim, Y; Lee, J; Lee, C

    2008-01-01

    Genetic variability might contribute to species specificity of prion diseases in various organisms. In this study, structures of the prion protein gene (PRNP) and its amino acids were compared among species of which sequence data were available. Comparisons of PRNP DNA sequences among 12 species including human, chimpanzee, monkey, bovine, ovine, dog, mouse, rat, wallaby, opossum, chicken and zebrafish allowed us to identify candidate regulatory regions in intron 1 and 3'-untranslated region (UTR) in addition to the coding region. Highly conserved putative binding sites for transcription factors, such as heat shock factor 2 (HSF2) and myocite enhancer factor 2 (MEF2), were discovered in the intron 1. In 3'-UTR, the functional sequence (ATTAAA) for nucleus-specific polyadenylation was found in all the analysed species. The functional sequence (TTTTTAT) for maturation-specific polyadenylation was identically observed only in ovine, and one or two nucleotide mismatches in the other species. A comparison of the amino acid sequences in 53 species revealed a large sequence identity. Especially the octapeptide repeat region was observed in all the species but frog and zebrafish. Functional changes and susceptibility to prion diseases with various isoforms of prion protein could be caused by numeric variability and conformational changes discovered in the repeat sequences. PMID:18397498

  13. Gene Profiling of Bone around Orthodontic Mini-Implants by RNA-Sequencing Analysis

    PubMed Central

    Nahm, Kyung-Yen; Heo, Jung Sun; Lee, Jae-Hyung; Lee, Dong-Yeol; Chung, Kyu-Rhim; Ahn, Hyo-Won; Kim, Seong-Hun

    2015-01-01

    This study aimed to evaluate the genes that were expressed in the healing bones around SLA-treated titanium orthodontic mini-implants in a beagle at early (1-week) and late (4-week) stages with RNA-sequencing (RNA-Seq). Samples from sites of surgical defects were used as controls. Total RNA was extracted from the tissue around the implants, and an RNA-Seq analysis was performed with Illumina TruSeq. In the 1-week group, genes in the gene ontology (GO) categories of cell growth and the extracellular matrix (ECM) were upregulated, while genes in the categories of the oxidation-reduction process, intermediate filaments, and structural molecule activity were downregulated. In the 4-week group, the genes upregulated included ECM binding, stem cell fate specification, and intramembranous ossification, while genes in the oxidation-reduction process category were downregulated. GO analysis revealed an upregulation of genes that were related to significant mechanisms, including those with roles in cell proliferation, the ECM, growth factors, and osteogenic-related pathways, which are associated with bone formation. From these results, implant-induced bone formation progressed considerably during the times examined in this study. The upregulation or downregulation of selected genes was confirmed with real-time reverse transcription polymerase chain reaction. The RNA-Seq strategy was useful for defining the biological responses to orthodontic mini-implants and identifying the specific genetic networks for targeted evaluations of successful peri-implant bone remodeling. PMID:25759820

  14. Rapid cloning of disease-resistance genes in plants using mutagenesis and sequence capture.

    PubMed

    Steuernagel, Burkhard; Periyannan, Sambasivam K; Hernández-Pinzón, Inmaculada; Witek, Kamil; Rouse, Matthew N; Yu, Guotai; Hatta, Asyraf; Ayliffe, Mick; Bariana, Harbans; Jones, Jonathan D G; Lagudah, Evans S; Wulff, Brande B H

    2016-06-01

    Wild relatives of domesticated crop species harbor multiple, diverse, disease resistance (R) genes that could be used to engineer sustainable disease control. However, breeding R genes into crop lines often requires long breeding timelines of 5-15 years to break linkage between R genes and deleterious alleles (linkage drag). Further, when R genes are bred one at a time into crop lines, the protection that they confer is often overcome within a few seasons by pathogen evolution. If several cloned R genes were available, it would be possible to pyramid R genes in a crop, which might provide more durable resistance. We describe a three-step method (MutRenSeq)-that combines chemical mutagenesis with exome capture and sequencing for rapid R gene cloning. We applied MutRenSeq to clone stem rust resistance genes Sr22 and Sr45 from hexaploid bread wheat. MutRenSeq can be applied to other commercially relevant crops and their relatives, including, for example, pea, bean, barley, oat, rye, rice and maize. PMID:27111722

  15. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing.

    PubMed

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-05-31

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies. PMID:27025386

  16. Identification of Genetic Causes of Inherited Peripheral Neuropathies by Targeted Gene Panel Sequencing

    PubMed Central

    Nam, Soo Hyun; Hong, Young Bin; Hyun, Young Se; Nam, Da Eun; Kwak, Geon; Hwang, Sun Hee; Choi, Byung-Ok; Chung, Ki Wha

    2016-01-01

    Inherited peripheral neuropathies (IPN), which are a group of clinically and genetically heterogeneous peripheral nerve disorders including Charcot-Marie-Tooth disease (CMT), exhibit progressive degeneration of muscles in the extremities and loss of sensory function. Over 70 genes have been reported as genetic causatives and the number is still growing. We prepared a targeted gene panel for IPN diagnosis based on next generation sequencing (NGS). The gene panel was designed to detect mutations in 73 genes reported to be genetic causes of IPN or related peripheral neuropathies, and to detect duplication of the chromosome 17p12 region, the major genetic cause of CMT1A. We applied the gene panel to 115 samples from 63 non-CMT1A families, and isolated 15 pathogenic or likely-pathogenic mutations in eight genes from 25 patients (17 families). Of them, eight mutations were unreported variants. Of particular interest, this study revealed several very rare mutations in the SPTLC2, DCTN1, and MARS genes. In addition, the effectiveness of the detection of CMT1A was confirmed by comparing five 17p12-nonduplicated controls and 15 CMT1A cases. In conclusion, we developed a gene panel for one step genetic diagnosis of IPN. It seems that its time- and cost-effectiveness are superior to previous tiered-genetic diagnosis algorithms, and it could be applied as a genetic diagnostic system for inherited peripheral neuropathies. PMID:27025386

  17. Patterns of homoeologous gene expression shown by RNA sequencing in hexaploid bread wheat

    PubMed Central

    2014-01-01

    Background Bread wheat (Triticum aestivum) has a large, complex and hexaploid genome consisting of A, B and D homoeologous chromosome sets. Therefore each wheat gene potentially exists as a trio of A, B and D homoeoloci, each of which may contribute differentially to wheat phenotypes. We describe a novel approach combining wheat cytogenetic resources (chromosome substitution ‘nullisomic-tetrasomic’ lines) with next generation deep sequencing of gene transcripts (RNA-Seq), to directly and accurately identify homoeologue-specific single nucleotide variants and quantify the relative contribution of individual homoeoloci to gene expression. Results We discover, based on a sample comprising ~5-10% of the total wheat gene content, that at least 45% of wheat genes are expressed from all three distinct homoeoloci. Most of these genes show strikingly biased expression patterns in which expression is dominated by a single homoeolocus. The remaining ~55% of wheat genes are expressed from either one or two homoeoloci only, through a combination of extensive transcriptional silencing and homoeolocus loss. Conclusions We conclude that wheat is tending towards functional diploidy, through a variety of mechanisms causing single homoeoloci to become the predominant source of gene transcripts. This discovery has profound consequences for wheat breeding and our understanding of wheat evolution. PMID:24726045

  18. FrameD: a flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences

    PubMed Central

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-01-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  19. FrameD: A flexible program for quality check and gene prediction in prokaryotic genomes and noisy matured eukaryotic sequences.

    PubMed

    Schiex, Thomas; Gouzy, Jérôme; Moisan, Annick; de Oliveira, Yannick

    2003-07-01

    We describe FrameD, a program that predicts coding regions in prokaryotic and matured eukaryotic sequences. Initially targeted at gene prediction in bacterial GC rich genomes, the gene model used in FrameD also allows to predict genes in the presence of frameshifts and partially undetermined sequences which makes it also very suitable for gene prediction and frameshift correction in unfinished sequences such as EST and EST cluster sequences. Like recent eukaryotic gene prediction programs, FrameD also includes the ability to take into account protein similarity information both in its prediction and its graphical output. Its performances are evaluated on different bacterial genomes. The web site (http://genopole.toulouse.inra.fr/bioinfo/FrameD/FD) allows direct prediction, sequence correction and translation and the ability to learn new models for new organisms. PMID:12824407

  20. Zooplankton diversity analysis through single-gene sequencing of a community sample

    PubMed Central

    Machida, Ryuji J; Hashiguchi, Yasuyuki; Nishida, Mutsumi; Nishida, Shuhei

    2009-01-01

    Background Oceans cover more than 70% of the earth's surface and are critical for the homeostasis of the environment. Among the components of the ocean ecosystem, zooplankton play vital roles in energy and matter transfer through the system. Despite their importance, understanding of zooplankton biodiversity is limited because of their fragile nature, small body size, and the large number of species from various taxonomic phyla. Here we present the results of single-gene zooplankton community analysis using a method that determines a large number of mitochondrial COI gene sequences from a bulk zooplankton sample. This approach will enable us to estimate the species richness of almost the entire zooplankton community. Results A sample was collected from a depth of 721 m to the surface in the western equatorial Pacific off Pohnpei Island, Micronesia, with a plankton net equipped with a 2-m2 mouth opening. A total of 1,336 mitochondrial COI gene sequences were determined from the cDNA library made from the sample. From the determined sequences, the occurrence of 189 species of zooplankton was estimated. BLASTN search results showed high degrees of similarity (>98%) between the query and database for 10 species, including holozooplankton and merozooplankton. Conclusion In conjunction with the Census of Marine Zooplankton and Barcode of Life projects, single-gene zooplankton community analysis will be a powerful tool for estimating the species richness of zooplankton communities. PMID:19758460

  1. The novel organization and complete sequence of the ribosomal RNA gene of Nosema bombycis.

    PubMed

    Huang, Wei-Fone; Tsai, Shu-Jen; Lo, Chu-Fang; Soichi, Yamane; Wang, Chung-Hsiung

    2004-05-01

    We present here for the first time the complete DNA sequence data (4301bp) of the ribosomal RNA (rRNA) gene of the microsporidian type species, Nosema bombycis. Sequences for the large subunit gene (LSUrRNA: 2497bp, GenBank Accession No. ), the internal transcribed spacer (ITS: 179bp, GenBank Accession No. ), the small subunit gene (SSUrRNA: 1232bp), intergenic spacer (IGS: 279bp), and 5S region (114bp) are also given, and the secondary structure of the large subunit is discussed. The organization of the N. bombycis rRNA gene is LSUrRNA-ITS-SSUrRNA-IGS-5S. This novel arrangement, in which the LSU is 5' of the SSU, is the reverse of the organizational sequence (i.e., SSU-ITS-LSU) found in all previously reported microsporidian rRNAs, including Nosema apis. This unique character in the type species may have taxonomic implications for the members of the genus Nosema. PMID:15050536

  2. Phosphoglycerate kinase gene from Zymomonas mobilis: cloning sequencing, and localization within the gap operon

    SciTech Connect

    Conway, T.; Ingram, L.O.

    1988-04-01

    The Zymomonas mobilis gene encoding phosphoglycerate kinase (EC 2.7.3.2), pgk, has been cloned into Escherichia coli and sequenced. It consists of 336 amino acids, including the N-terminal methionine, with a molecular weight of 47,384. This promoterless gene is located 225 base pairs downstream from the gap gene and is part of the gap operon. Previous studies have shown that the specific activities of glyceraldehyde-3-phosphate dehydrogenase and phosphoglycerate kinase do not change coordinately in Z. mobilis, although the two enzymes appear to be under the control of a common promoter. The translated amino acid sequence for the Z. mobilis phosphoglycerte kinase is less conserved than those of eucaryotic genes. A comparison of known sequences for phosphoglycerate kinase revealed a high degree of conservation of structure with 102 amino acid positions being retained by all. In general, the amino acid positions at the boundaries of ..beta..-sheet and ..cap alpha..-helical regions and those connecting these regions were more highly conserved than the amino acid positions within regions of secondary structure.

  3. Next generation sequencing approach for detecting 491 fusion genes from human cancer.

    PubMed

    Urakami, Kenichi; Shimoda, Yuji; Ohshima, Keiichi; Nagashima, Takeshi; Serizawa, Masakuni; Tanabe, Tomoe; Saito, Junko; Usui, Tamiko; Watanabe, Yuko; Naruoka, Akane; Ohnami, Sumiko; Ohnami, Shumpei; Mochizuki, Tohru; Kusuhara, Masatoshi; Yamaguchi, Ken

    2016-01-01

    Next-generation DNA sequencing (NGS) of the genomes of cancer cells is contributing to new discoveries that illuminate the mechanisms of tumorigenesis. To this end, the International Cancer Genome Consortium and The Cancer Genome Atlas are investigating novel alterations of genes that will define the pathways and mechanisms of the development and growth of cancers. These efforts contribute to the development of innovative pharmaceuticals as well as to the introduction of genome sequencing as a component of personalized medicine. In particular, chromosomal translocations that fuse coding sequences serve as important pharmaceutical targets and diagnostic markers given their association with tumorigenesis. Although increasing numbers of fusion genes are being discovered using NGS, the methodology used to identify such fusion genes is complicated, expensive, and requires relatively large samples. Here, to address these problems, we describe the design and development of a panel of 491 fusion genes that performed well in the analysis of cultured human cancer cell lines and 600 clinical tumor specimens. PMID:26912140

  4. Molecular cloning, sequence analysis and tissue-specific expression of Akirin2 gene in Tianfu goat.

    PubMed

    Ma, Jisi; Xu, Gangyi; Wan, Lu; Wang, Nianlu

    2015-01-01

    The Akirin2 gene is a nuclear factor and is considered as a potential functional candidate gene for meat quality. To better understand the structures and functions of Akirin2 gene, the cDNA of the Tianfu goat Akirin2 gene was cloned. Sequence analysis showed that the Tianfu goat Akirin2 cDNA full coding sequence (CDS) contains 579bp nucleotides that encode 192 amino acids. A phylogenic tree of the Akirin2 protein sequence from the Tianfu goat and other species revealed that the Tianfu goat Akirin2 was closely related with cattle and sheep Akirin2. RT-qPCR analysis showed that Akirin2 was expressed in the myocardium, liver, spleen, lung, kidney, leg muscle, abdominal muscle and the longissimus dorsi muscle. Especially, high expression levels of Akirin2 were detected in the spleen, lung, and kidney whereas lower expression levels were seen in the liver, myocardium, leg muscle, abdominal muscle and longissimus dorsi muscle. Temporal mRNA expression showed that Akirin2 expression levels in the longissimus dorsi muscle, first increased then decreased from day 1 to month 12. Western blotting results showed that the Akirin2 protein was only detected in the lung and three skeletal muscle tissues. PMID:25239665

  5. Identification and characterization of rhizospheric microbial diversity by 16S ribosomal RNA gene sequencing.

    PubMed

    Naveed, Muhammad; Mubeen, Samavia; Khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad

    2014-01-01

    In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization. PMID:25477935

  6. Identification and characterization of rhizospheric microbial diversity by 16S ribosomal RNA gene sequencing

    PubMed Central

    Naveed, Muhammad; Mubeen, Samavia; khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad

    2014-01-01

    In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization. PMID:25477935

  7. Drosophila GRAIL: An intelligent system for gene recognition in Drosophila DNA sequences

    SciTech Connect

    Xu, Ying; Einstein, J.R.; Uberbacher, E.C.; Helt, G.; Rubin, G.

    1995-06-01

    An AI-based system for gene recognition in Drosophila DNA sequences was designed and implemented. The system consists of two main modules, one for coding exon recognition and one for single gene model construction. The exon recognition module finds a coding exon by recognition of its splice junctions (or translation start) and coding potential. The core of this module is a set of neural networks which evaluate an exon candidate for the possibility of being a true coding exon using the ``recognized`` splice junction (or translation start) and coding signals. The recognition process consists of four steps: generation of an exon candidate pool, elimination of improbable candidates using heuristic rules, candidate evaluation by trained neural networks, and candidate cluster resolution and final exon prediction. The gene model construction module takes as input the clustered exon candidates and builds a ``best`` possible single gene model using an efficient dynamic programming algorithm. 129 Drosophila sequences consisting of 441 coding exons including 216358 coding bases were extructed from GenBank and used to build statistical matrices and to train the neural networks. On this training set the system recognized 97% of the coding messages and predicted only 5% false messages. Among the ``correctly`` predicted exons, 68% match the actual exon exactly and 96% have at least one edge predicted correctly. On an independent test set consisting of 30 Drosophila sequences, the system recognized 96% of the coding messages and predicted 7% false messages.

  8. Sequence and transcription analysis of the human cytomegalovirus DNA polymerase gene

    SciTech Connect

    Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.

    1987-01-01

    DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus.

  9. Phylogenetic Relationships and the Evolution of Regulatory Gene Sequences in the Parrotfishes

    PubMed Central

    Smith, Lydia L.; Fessler, Jennifer L.; Alfaro, Michael E.; Streelman, J. Todd; Westneat, Mark W.

    2008-01-01

    Regulatory genes control the expression of other genes and are key components of developmental processes such as segmentation and embryonic construction of the skull in vertebrates. Here we examine the variability and evolution of three vertebrate regulatory genes, addressing issues of their utility for phylogenetics and comparing the rates of genetic change seen in regulatory loci to the rates seen in other genes in the parrotfishes. The parrotfishes are a diverse group of colorful fishes from coral reefs and seagrasses worldwide and have been placed phylogenetically within the family Labridae. We tested phylogenetic hypotheses among the parrotfishes, with a focus on the genera Chlorurus and Scarus, by analyzing eight gene fragments for 42 parrotfishes and eight outgroup species. We sequenced mitochondrial 12s rRNA (967 bp), 16s rRNA (577 bp), and cytochrome b (477 bp). From the nuclear genome, we sequenced part of the protein-coding genes rag2 (715 bp), tmo4c4 (485 bp), and the developmental regulatory genes otx1 (672 bp), bmp4 (488 bp), and dlx2 (522 bp). Bayesian, likelihood, and parsimony analyses on the resulting 4903 bp of DNA sequence produced similar topologies that confirm the monophyly of the scarines and provide a phylogeny at the species level for portions of the genera Scarus and Chlorurus. Four major clades of Scarus were recovered, with three distributed in the Indo-Pacific and one containing Caribbean/Atlantic taxa. Molecular rates suggest a Miocene origin of the parrotfishes (22 mya) and a recent divergence of species within Scarus and Chlorurus, within the past 5 million years. Developmentally important genes made a significant contribution to phylogenetic structure, and rates of genetic evolution were high in bmp4, similar to other coding nuclear genes, but low in otx1 and the dlx2 exons. Synonymous and nonsynonymous substitution patterns in developmental regulatory genes support the hypothesis of stabilizing selection during the history of

  10. Nucleotide sequence analysis of beta tubulin gene in a wide range of dermatophytes.

    PubMed

    Rezaei-Matehkolaei, Ali; Mirhendi, Hossein; Makimura, Koichi; de Hoog, G Sybren; Satoh, Kazuo; Najafzadeh, Mohammad Javad; Shidfar, Mohammad Reza

    2014-10-01

    We investigated the resolving power of the beta tubulin protein-coding gene (BT2) for systematic study of dermatophyte fungi. Initially, 144 standard and clinical strains belonging to 26 species in the genera Trichophyton, Microsporum, and Epidermophyton were identified by internal transcribe spacer (ITS) sequencing. Subsequently, BT2 was partially amplified in all strains, and sequence analysis performed after construction of a BT2 database that showed length ranged from approximately 723 (T. ajelloi) to 808 nucleotides (M. persicolor) in different species. Intraspecific sequence variation was found in some species, but T. tonsurans, T. equinum, T. concentricum, T. verrucosum, T. rubrum, T. violaceum, T. eriotrephon, E. floccosum, M. canis, M. ferrugineum, and M. audouinii were invariant. The sequences were found to be relatively conserved among different strains of the same species. The species with the closest resemblance were Arthroderma benhamiae and T. concentricum and T. tonsurans and T. equinum with 100% and 99.8% identity, respectively; the most distant species were M. persicolor and M. amazonicum. The dendrogram obtained from BT2 topology was almost compatible with the species concept based on ITS sequencing, and similar clades and species were distinguished in the BT2 tree. Here, beta tubulin was characterized in a wide range of dermatophytes in order to assess intra- and interspecies variation and resolution and was found to be a taxonomically valuable gene. PMID:25079222

  11. Cloning and sequencing of a Bacteroides ruminicola B(1)4 endoglucanase gene.