Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.
Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano
2018-01-01
Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Zhao, A; Guo, A; Liu, Z; Pape, L
1997-01-01
The coding sequences for a Schizosaccharomyces pombe sequence-specific DNA binding protein, Reb1p, have been cloned. The predicted S. pombe Reb1p is 24-29% identical to mouse TTF-1 (transcription termination factor-1) and Saccharomyces cerevisiae REB1 protein, both of which direct termination of RNA polymerase I catalyzed transcripts. The S.pombe Reb1 cDNA encodes a predicted polypeptide of 504 amino acids with a predicted molecular weight of 58.4 kDa. The S. pombe Reb1p is unusual in that the bipartite DNA binding motif identified originally in S.cerevisiae and Klyveromyces lactis REB1 proteins is uninterrupted and thus S.pombe Reb1p may contain the smallest natural REB1 homologous DNA binding domain. Its genomic coding sequences were shown to be interrupted by two introns. A recombinant histidine-tagged Reb1 protein bearing the rDNA binding domain has two homologous, sequence-specific binding sites in the S. pomber DNA intergenic spacer, located between 289 and 480 nt downstream of the end of the approximately 25S rRNA coding sequences. Each binding site is 13-14 bp downstream of two of the three proposed in vivo termination sites. The core of this 17 bp site, AGGTAAGGGTAATGCAC, is specifically protected by Reb1p in footprinting analysis. PMID:9016645
LaPolla, R J; Mayne, K M; Davidson, N
1984-01-01
A mouse cDNA clone has been isolated that contains the complete coding region of a protein highly homologous to the delta subunit of the Torpedo acetylcholine receptor (AcChoR). The cDNA library was constructed in the vector lambda 10 from membrane-associated poly(A)+ RNA from BC3H-1 mouse cells. Surprisingly, the delta clone was selected by hybridization with cDNA encoding the gamma subunit of the Torpedo AcChoR. The nucleotide sequence of the mouse cDNA clone contains an open reading frame of 520 amino acids. This amino acid sequence exhibits 59% and 50% sequence homology to the Torpedo AcChoR delta and gamma subunits, respectively. However, the mouse nucleotide sequence has several stretches of high homology with the Torpedo gamma subunit cDNA, but not with delta. The mouse protein has the same general structural features as do the Torpedo subunits. It is encoded by a 3.3-kilobase mRNA. There is probably only one, but at most two, chromosomal genes coding for this or closely related sequences. Images PMID:6096870
Context influences on TALE–DNA binding revealed by quantitative profiling
Rogers, Julia M.; Barrera, Luis A.; Reyon, Deepak; Sander, Jeffry D.; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L.
2015-01-01
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE–DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000–20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE–DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design. PMID:26067805
Context influences on TALE-DNA binding revealed by quantitative profiling.
Rogers, Julia M; Barrera, Luis A; Reyon, Deepak; Sander, Jeffry D; Kellis, Manolis; Joung, J Keith; Bulyk, Martha L
2015-06-11
Transcription activator-like effector (TALE) proteins recognize DNA using a seemingly simple DNA-binding code, which makes them attractive for use in genome engineering technologies that require precise targeting. Although this code is used successfully to design TALEs to target specific sequences, off-target binding has been observed and is difficult to predict. Here we explore TALE-DNA interactions comprehensively by quantitatively assaying the DNA-binding specificities of 21 representative TALEs to ∼5,000-20,000 unique DNA sequences per protein using custom-designed protein-binding microarrays (PBMs). We find that protein context features exert significant influences on binding. Thus, the canonical recognition code does not fully capture the complexity of TALE-DNA binding. We used the PBM data to develop a computational model, Specificity Inference For TAL-Effector Design (SIFTED), to predict the DNA-binding specificity of any TALE. We provide SIFTED as a publicly available web tool that predicts potential genomic off-target sites for improved TALE design.
Discrete Ramanujan transform for distinguishing the protein coding regions from other regions.
Hua, Wei; Wang, Jiasong; Zhao, Jian
2014-01-01
Based on the study of Ramanujan sum and Ramanujan coefficient, this paper suggests the concepts of discrete Ramanujan transform and spectrum. Using Voss numerical representation, one maps a symbolic DNA strand as a numerical DNA sequence, and deduces the discrete Ramanujan spectrum of the numerical DNA sequence. It is well known that of discrete Fourier power spectrum of protein coding sequence has an important feature of 3-base periodicity, which is widely used for DNA sequence analysis by the technique of discrete Fourier transform. It is performed by testing the signal-to-noise ratio at frequency N/3 as a criterion for the analysis, where N is the length of the sequence. The results presented in this paper show that the property of 3-base periodicity can be only identified as a prominent spike of the discrete Ramanujan spectrum at period 3 for the protein coding regions. The signal-to-noise ratio for discrete Ramanujan spectrum is defined for numerical measurement. Therefore, the discrete Ramanujan spectrum and the signal-to-noise ratio of a DNA sequence can be used for distinguishing the protein coding regions from the noncoding regions. All the exon and intron sequences in whole chromosomes 1, 2, 3 and 4 of Caenorhabditis elegans have been tested and the histograms and tables from the computational results illustrate the reliability of our method. In addition, we have analyzed theoretically and gotten the conclusion that the algorithm for calculating discrete Ramanujan spectrum owns the lower computational complexity and higher computational accuracy. The computational experiments show that the technique by using discrete Ramanujan spectrum for classifying different DNA sequences is a fast and effective method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Cloning and sequence analysis of Hemonchus contortus HC58cDNA.
Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li
2007-06-01
The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
Recombinant pinoresinol/lariciresinol reductase, recombinant dirigent protein, and methods of use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki; Gang, David R.; Sarkanen, Simo; Ford, Joshua D.
2001-04-03
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
Reicher, S; Seroussi, E; Weller, J I; Rosov, A; Gootwine, E
2012-07-01
Polymorphisms in mitochondrial DNA (mtDNA) protein- and tRNA-coding genes were shown to be associated with various diseases in humans as well as with production and reproduction traits in livestock. Alignment of full length mitochondria sequences from the 5 known ovine haplogroups: HA (n = 3), HB (n = 5), HC (n = 3), HD (n = 2), and HE (n = 2; GenBank accession nos. HE577847-50 and 11 published complete ovine mitochondria sequences) revealed sequence variation in 10 out of the 13 protein coding mtDNA sequences. Twenty-six of the 245 variable sites found in the protein coding sequences represent non-synonymous mutations. Sequence variation was observed also in 8 out of the 22 tRNA mtDNA sequences. On the basis of the mtDNA control region and cytochrome b partial sequences along with information on maternal lineages within an Afec-Assaf flock, 1,126 Afec-Assaf ewes were assigned to mitochondrial haplogroups HA, HB, and HC, with frequencies of 0.43, 0.43, and 0.14, respectively. Analysis of birth weight and growth rate records of lamb (n = 1286) and productivity from 4,993 lambing records revealed no association between mitochondrial haplogroup affiliation and female longevity, lambs perinatal survival rate, birth weight, and daily growth rate of lambs up to 150 d that averaged 1,664 d, 88.3%, 4.5 kg, and 320 g/d, respectively. However, significant (P < 0.0001) differences among the haplogroups were found for prolificacy of ewes, with prolificacies (mean ± SE) of 2.14 ± 0.04, 2.25 ± 0.04, and 2.30 ± 0.06 lamb born/ewe lambing for the HA, HB, and the HC haplogroups, respectively. Our results highlight the ovine mitogenome genetic variation in protein- and tRNA coding genes and suggest that sequence variation in ovine mtDNA is associated with variation in ewe prolificacy.
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-01-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. PMID:24792163
Recominant Pinoresino-Lariciresinol Reductase, Recombinant Dirigent Protein And Methods Of Use
Lewis, Norman G.; Davin, Laurence B.; Dinkova-Kostova, Albena T.; Fujita, Masayuki , Gang; David R. , Sarkanen; Simo , Ford; Joshua D.
2003-10-21
Dirigent proteins and pinoresinol/lariciresinol reductases have been isolated, together with cDNAs encoding dirigent proteins and pinoresinol/lariciresinol reductases. Accordingly, isolated DNA sequences are provided from source species Forsythia intermedia, Thuja plicata, Tsuga heterophylla, Eucommia ulmoides, Linum usitatissimum, and Schisandra chinensis, which code for the expression of dirigent proteins and pinoresinol/lariciresinol reductases. In other aspects, replicable recombinant cloning vehicles are provided which code for dirigent proteins or pinoresinol/lariciresinol reductases or for a base sequence sufficiently complementary to at least a portion of dirigent protein or pinoresinol/lariciresinol reductase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding dirigent protein or pinoresinol/lariciresinol reductase. Thus, systems and methods are provided for the recombinant expression of dirigent proteins and/or pinoresinol/lariciresinol reductases.
Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.
Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T
1993-01-01
A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829
Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji
2015-12-01
Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Vlahovicek, K; Munteanu, M G; Pongor, S
1999-01-01
Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
de Lange, Orlando; Wolf, Christina; Dietze, Jörn; Elsaesser, Janett; Morbitzer, Robert; Lahaye, Thomas
2014-06-01
The tandem repeats of transcription activator like effectors (TALEs) mediate sequence-specific DNA binding using a simple code. Naturally, TALEs are injected by Xanthomonas bacteria into plant cells to manipulate the host transcriptome. In the laboratory TALE DNA binding domains are reprogrammed and used to target a fused functional domain to a genomic locus of choice. Research into the natural diversity of TALE-like proteins may provide resources for the further improvement of current TALE technology. Here we describe TALE-like proteins from the endosymbiotic bacterium Burkholderia rhizoxinica, termed Bat proteins. Bat repeat domains mediate sequence-specific DNA binding with the same code as TALEs, despite less than 40% sequence identity. We show that Bat proteins can be adapted for use as transcription factors and nucleases and that sequence preferences can be reprogrammed. Unlike TALEs, the core repeats of each Bat protein are highly polymorphic. This feature allowed us to explore alternative strategies for the design of custom Bat repeat arrays, providing novel insights into the functional relevance of non-RVD residues. The Bat proteins offer fertile grounds for research into the creation of improved programmable DNA-binding proteins and comparative insights into TALE-like evolution. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
1-deoxy-d-xylulose-5-phosphate reductoisomerases and method of use
Croteau, Rodney B.; Lange, Bernd M.
2001-01-01
The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.
1-deoxy-D-xylulose-5-phosphate reductoisomerases, and methods of use
Croteau, Rodney B.; Lange, Bernd M.
2002-07-16
The present invention relates to isolated DNA sequences which code for the expression of plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein, such as the sequence presented in SEQ ID NO:1 which encodes a 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein from peppermint (Mentha x piperita). Additionally, the present invention relates to isolated plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase protein. In other aspects, the present invention is directed to replicable recombinant cloning vehicles comprising a nucleic acid sequence which codes for a plant 1-deoxy-D-xylulose-5-phosphate reductoisomerase, to modified host cells transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence of the invention.
Physics behind the mechanical nucleosome positioning code
NASA Astrophysics Data System (ADS)
Zuiddam, Martijn; Everaers, Ralf; Schiessel, Helmut
2017-11-01
The positions along DNA molecules of nucleosomes, the most abundant DNA-protein complexes in cells, are influenced by the sequence-dependent DNA mechanics and geometry. This leads to the "nucleosome positioning code", a preference of nucleosomes for certain sequence motives. Here we introduce a simplified model of the nucleosome where a coarse-grained DNA molecule is frozen into an idealized superhelical shape. We calculate the exact sequence preferences of our nucleosome model and find it to reproduce qualitatively all the main features known to influence nucleosome positions. Moreover, using well-controlled approximations to this model allows us to come to a detailed understanding of the physics behind the sequence preferences of nucleosomes.
NASA Technical Reports Server (NTRS)
Reddy, A. S.; Czernik, A. J.; An, G.; Poovaiah, B. W.
1992-01-01
We cloned and sequenced a plant cDNA that encodes U1 small nuclear ribonucleoprotein (snRNP) 70K protein. The plant U1 snRNP 70K protein cDNA is not full length and lacks the coding region for 68 amino acids in the amino-terminal region as compared to human U1 snRNP 70K protein. Comparison of the deduced amino acid sequence of the plant U1 snRNP 70K protein with the amino acid sequence of animal and yeast U1 snRNP 70K protein showed a high degree of homology. The plant U1 snRNP 70K protein is more closely related to the human counter part than to the yeast 70K protein. The carboxy-terminal half is less well conserved but, like the vertebrate 70K proteins, is rich in charged amino acids. Northern analysis with the RNA isolated from different parts of the plant indicates that the snRNP 70K gene is expressed in all of the parts tested. Southern blotting of genomic DNA using the cDNA indicates that the U1 snRNP 70K protein is coded by a single gene.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-02-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-01-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Cloning and expression of cDNA coding for bouganin.
den Hartog, Marcel T; Lubelli, Chiara; Boon, Louis; Heerkens, Sijmie; Ortiz Buijsse, Antonio P; de Boer, Mark; Stirpe, Fiorenzo
2002-03-01
Bouganin is a ribosome-inactivating protein that recently was isolated from Bougainvillea spectabilis Willd. In this work, the cloning and expression of the cDNA encoding for bouganin is described. From the cDNA, the amino-acid sequence was deduced, which correlated with the primary sequence data obtained by amino-acid sequencing on the native protein. Bouganin is synthesized as a pro-peptide consisting of 305 amino acids, the first 26 of which act as a leader signal while the 29 C-terminal amino acids are cleaved during processing of the molecule. The mature protein consists of 250 amino acids. Using the cDNA sequence encoding the mature protein of 250 amino acids, a recombinant protein was expressed, purified and characterized. The recombinant molecule had similar activity in a cell-free protein synthesis assay and had comparable toxicity on living cells as compared to the isolated native bouganin.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
2012-01-01
Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
Kangaroo – A pattern-matching program for biological sequences
2002-01-01
Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
Nagano, Yukio; Furuhashi, Hirofumi; Inaba, Takehito; Sasaki, Yukiko
2001-01-01
Complementary DNA encoding a DNA-binding protein, designated PLATZ1 (plant AT-rich sequence- and zinc-binding protein 1), was isolated from peas. The amino acid sequence of the protein is similar to those of other uncharacterized proteins predicted from the genome sequences of higher plants. However, no paralogous sequences have been found outside the plant kingdom. Multiple alignments among these paralogous proteins show that several cysteine and histidine residues are invariant, suggesting that these proteins are a novel class of zinc-dependent DNA-binding proteins with two distantly located regions, C-x2-H-x11-C-x2-C-x(4–5)-C-x2-C-x(3–7)-H-x2-H and C-x2-C-x(10–11)-C-x3-C. In an electrophoretic mobility shift assay, the zinc chelator 1,10-o-phenanthroline inhibited DNA binding, and two distant zinc-binding regions were required for DNA binding. A protein blot with 65ZnCl2 showed that both regions are required for zinc-binding activity. The PLATZ1 protein non-specifically binds to A/T-rich sequences, including the upstream region of the pea GTPase pra2 and plastocyanin petE genes. Expression of the PLATZ1 repressed those of the reporter constructs containing the coding sequence of luciferase gene driven by the cauliflower mosaic virus (CaMV) 35S90 promoter fused to the tandem repeat of the A/T-rich sequences. These results indicate that PLATZ1 is a novel class of plant-specific zinc-dependent DNA-binding protein responsible for A/T-rich sequence-mediated transcriptional repression. PMID:11600698
Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew
2018-05-17
Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride
Matroudi, S.; Zamani, M.R.; Motallebi, M.
2008-01-01
In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Pietan, Lucas L.; Spradling, Theresa A.
2016-01-01
In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Toward rules relating zinc finger protein sequences and DNA binding site preferences.
Desjarlais, J R; Berg, J M
1992-08-15
Zinc finger proteins of the Cys2-His2 type consist of tandem arrays of domains, where each domain appears to contact three adjacent base pairs of DNA through three key residues. We have designed and prepared a series of variants of the central zinc finger within the DNA binding domain of Sp1 by using information from an analysis of a large data base of zinc finger protein sequences. Through systematic variations at two of the three contact positions (underlined), relatively specific recognition of sequences of the form 5'-GGGGN(G or T)GGG-3' has been achieved. These results provide the basis for rules that may develop into a code that will allow the design of zinc finger proteins with preselected DNA site specificity.
Simon, J W; Slabas, A R
1998-09-18
The GenBank database was searched using the E. coli malonyl CoA:ACP transacylase (MCAT) sequence, for plant protein/cDNA sequences corresponding to MCAT, a component of plant fatty acid synthetase (FAS), for which the plant cDNA has not been isolated. A 272-bp Zea mays EST sequence (GenBank accession number: AA030706) was identified which has strong homology to the E. coli MCAT. A PCR derived cDNA probe from Zea mays was used to screen a Brassica napus (rape) cDNA library. This resulted in the isolation of a 1200-bp cDNA clone which encodes an open reading frame corresponding to a protein of 351 amino acids. The protein shows 47% homology to the E. coli MCAT amino acid sequence in the coding region for the mature protein. Expression of a plasmid (pMCATrap2) containing the plant cDNA sequence in Fab D89, an E. coli mutant, in MCAT activity restores growth demonstrating functional complementation and direct function of the cloned cDNA. This is the first functional evidence supporting the identification of a plant cDNA for MCAT.
DNA-based watermarks using the DNA-Crypt algorithm.
Heider, Dominik; Barnekow, Angelika
2007-05-29
The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.
DNA-based watermarks using the DNA-Crypt algorithm
Heider, Dominik; Barnekow, Angelika
2007-01-01
Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002.
Longkumer, Toshisangba; Kamireddy, Swetha; Muthyala, Venkateswar Reddy; Akbarpasha, Shaikh; Pitchika, Gopi Krishna; Kodetham, Gopinath; Ayaluru, Murali; Siddavattam, Dayananda
2013-01-01
While analyzing plasmids of Acinetobacter sp. DS002 we have detected a circular DNA molecule pTS236, which upon further investigation is identified as the genome of a phage. The phage genome has shown sequence similarity to the recently discovered Sphinx 2.36 DNA sequence co-purified with the Transmissible Spongiform Encephalopathy (TSE) particles isolated from infected brain samples collected from diverse geographical regions. As in Sphinx 2.36, the phage genome also codes for three proteins. One of them codes for RepA and is shown to be involved in replication of pTS236 through rolling circle (RC) mode. The other two translationally coupled ORFs, orf106 and orf96, code for coat proteins of the phage. Although an orf96 homologue was not previously reported in Sphinx 2.36, a closer examination of DNA sequence of Sphinx 2.36 revealed its presence downstream of orf106 homologue. TEM images and infection assays revealed existence of phage AbDs1 in Acinetobacter sp. DS002. PMID:23867905
DeWitt, D L; Smith, W L
1988-01-01
Prostaglandin G/H synthase (8,11,14-icosatrienoate, hydrogen-donor:oxygen oxidoreductase, EC 1.14.99.1) catalyzes the first step in the formation of prostaglandins and thromboxanes, the conversion of arachidonic acid to prostaglandin endoperoxides G and H. This enzyme is the site of action of nonsteroidal anti-inflammatory drugs. We have isolated a 2.7-kilobase complementary DNA (cDNA) encompassing the entire coding region of prostaglandin G/H synthase from sheep vesicular glands. This cDNA, cloned from a lambda gt 10 library prepared from poly(A)+ RNA of vesicular glands, hybridizes with a single 2.75-kilobase mRNA species. The cDNA clone was selected using oligonucleotide probes modeled from amino acid sequences of tryptic peptides prepared from the purified enzyme. The full-length cDNA encodes a protein of 600 amino acids, including a signal sequence of 24 amino acids. Identification of the cDNA as coding for prostaglandin G/H synthase is based on comparison of amino acid sequences of seven peptides comprising 103 amino acids with the amino acid sequence deduced from the nucleotide sequence of the cDNA. The molecular weight of the unglycosylated enzyme lacking the signal peptide is 65,621. The synthase is a glycoprotein, and there are three potential sites for N-glycosylation, two of them in the amino-terminal half of the molecule. The serine reported to be acetylated by aspirin is at position 530, near the carboxyl terminus. There is no significant similarity between the sequence of the synthase and that of any other protein in amino acid or nucleotide sequence libraries, and a heme binding site(s) is not apparent from the amino acid sequence. The availability of a full-length cDNA clone coding for prostaglandin G/H synthase should facilitate studies of the regulation of expression of this enzyme and the structural features important for catalysis and for interaction with anti-inflammatory drugs. Images PMID:3125548
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
Osipiuk, J; Joachimiak, A
1997-09-12
We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.
Sasaki, H; Yokoyama, E; Kuroiwa, A
1990-01-01
The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats
de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas
2015-01-01
Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. PMID:26481363
Harper, J R; Prince, J T; Healy, P A; Stuart, J K; Nauman, S J; Stallcup, W B
1991-03-01
We have isolated cDNA clones coding for the human homologue of the neuronal cell adhesion molecule L1. The nucleotide sequence of the cDNA clones and the deduced primary amino acid sequence of the carboxy terminal portion of the human L1 are homologous to the corresponding sequences of mouse L1 and rat NILE glycoprotein, with an especially high sequences identity in the cytoplasmic regions of the proteins. There is also protein sequence homology with the cytoplasmic region of the Drosophila cell adhesion molecule, neuroglian. The conservation of the cytoplasmic domain argues for an important functional role for this portion of the molecule.
Qiu, Guo-Hua
2016-01-01
In this review, the protective function of the abundant non-coding DNA in the eukaryotic genome is discussed from the perspective of genome defense against exogenous nucleic acids. Peripheral non-coding DNA has been proposed to act as a bodyguard that protects the genome and the central protein-coding sequences from ionizing radiation-induced DNA damage. In the proposed mechanism of protection, the radicals generated by water radiolysis in the cytosol and IR energy are absorbed, blocked and/or reduced by peripheral heterochromatin; then, the DNA damage sites in the heterochromatin are removed and expelled from the nucleus to the cytoplasm through nuclear pore complexes, most likely through the formation of extrachromosomal circular DNA. To strengthen this hypothesis, this review summarizes the experimental evidence supporting the protective function of non-coding DNA against exogenous nucleic acids. Based on these data, I hypothesize herein about the presence of an additional line of defense formed by small RNAs in the cytosol in addition to their bodyguard protection mechanism in the nucleus. Therefore, exogenous nucleic acids may be initially inactivated in the cytosol by small RNAs generated from non-coding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. Exogenous nucleic acids may enter the nucleus, where some are absorbed and/or blocked by heterochromatin and others integrate into chromosomes. The integrated fragments and the sites of DNA damage are removed by repetitive non-coding DNA elements in the heterochromatin and excluded from the nucleus. Therefore, the normal eukaryotic genome and the central protein-coding sequences are triply protected by non-coding DNA against invasion by exogenous nucleic acids. This review provides evidence supporting the protective role of non-coding DNA in genome defense. Copyright © 2016 Elsevier B.V. All rights reserved.
Applications of statistical physics and information theory to the analysis of DNA sequences
NASA Astrophysics Data System (ADS)
Grosse, Ivo
2000-10-01
DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Takai, T; Nishita, Y; Iguchi-Ariga, S M; Ariga, H
1994-01-01
We have previously reported the human cDNA encoding MSSP-1, a sequence-specific double- and single-stranded DNA binding protein [Negishi, Nishita, Saëgusa, Kakizaki, Galli, Kihara, Tamai, Miyajima, Iguchi-Ariga and Ariga (1994) Oncogene, 9, 1133-1143]. MSSP-1 binds to a DNA replication origin/transcriptional enhancer of the human c-myc gene and has turned out to be identical with Scr2, a human protein which complements the defect of cdc2 kinase in S.pombe [Kataoka and Nojima (1994) Nucleic Acid Res., 22, 2687-2693]. We have cloned the cDNA for MSSP-2, another member of the MSSP family of proteins. The MSSP-2 cDNA shares highly homologous sequences with MSSP-1 cDNA, except for the insertion of 48 bp coding 16 amino acids near the C-terminus. Like MSSP-1, MSSP-2 has RNP-1 consensus sequences. The results of the experiments using bacterially expressed MSSP-2, and its deletion mutants, as histidine fusion proteins suggested that the binding specificity of MSSP-2 to double- and single-stranded DNA is the same as that of MSSP-1, and that the RNP consensus sequences are required for the DNA binding of the protein. MSSP-2 stimulated the DNA replication of an SV40-derived plasmid containing the binding sequence for MSSP-1 or -2. MSSP-2 is hence suggested to play an important role in regulation of DNA replication. Images PMID:7838710
Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud
2000-01-01
Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
Bhattacharya, D; Steinkötter, J; Melkonian, M
1993-12-01
Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
The Coding of Biological Information: From Nucleotide Sequence to Protein Recognition
NASA Astrophysics Data System (ADS)
Štambuk, Nikola
The paper reviews the classic results of Swanson, Dayhoff, Grantham, Blalock and Root-Bernstein, which link genetic code nucleotide patterns to the protein structure, evolution and molecular recognition. Symbolic representation of the binary addresses defining particular nucleotide and amino acid properties is discussed, with consideration of: structure and metric of the code, direct correspondence between amino acid and nucleotide information, and molecular recognition of the interacting protein motifs coded by the complementary DNA and RNA strands.
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
NASA Astrophysics Data System (ADS)
Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.
1999-12-01
We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin
2011-03-29
Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
DNA sequence-dependent mechanics and protein-assisted bending in repressor-mediated loop formation
Boedicker, James Q.; Garcia, Hernan G.; Johnson, Stephanie; Phillips, Rob
2014-01-01
As the chief informational molecule of life, DNA is subject to extensive physical manipulations. The energy required to deform double-helical DNA depends on sequence, and this mechanical code of DNA influences gene regulation, such as through nucleosome positioning. Here we examine the sequence-dependent flexibility of DNA in bacterial transcription factor-mediated looping, a context for which the role of sequence remains poorly understood. Using a suite of synthetic constructs repressed by the Lac repressor and two well-known sequences that show large flexibility differences in vitro, we make precise statistical mechanical predictions as to how DNA sequence influences loop formation and test these predictions using in vivo transcription and in vitro single-molecule assays. Surprisingly, sequence-dependent flexibility does not affect in vivo gene regulation. By theoretically and experimentally quantifying the relative contributions of sequence and the DNA-bending protein HU to DNA mechanical properties, we reveal that bending by HU dominates DNA mechanics and masks intrinsic sequence-dependent flexibility. Such a quantitative understanding of how mechanical regulatory information is encoded in the genome will be a key step towards a predictive understanding of gene regulation at single-base pair resolution. PMID:24231252
Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
Vesicular monoamine transporter-1 (VMAT-1) mRNA and immunoreactive proteins in mouse brain.
Ashe, Karen M; Chiu, Wan-Ling; Khalifa, Ahmed M; Nicolas, Antoine N; Brown, Bonnie L; De Martino, Randall R; Alexander, Clayton P; Waggener, Christopher T; Fischer-Stenger, Krista; Stewart, Jennifer K
2011-01-01
Vesicular monoamine transporter 1 (VMAT-1) mRNA and protein were examined (1) to determine whether adult mouse brain expresses full-length VMAT-1 mRNA that can be translated to functional transporter protein and (2) to compare immunoreactive VMAT-1 proteins in brain and adrenal. VMAT-1 mRNA was detected in mouse brain with RT-PCR. The cDNA was sequenced, cloned into an expression vector, transfected into COS-1 cells, and cell protein was assayed for VMAT-1 activity. Immunoreactive proteins were examined on western blots probed with four different antibodies to VMAT-1. Sequencing confirmed identity of the entire coding sequences of VMAT-1 cDNA from mouse medulla oblongata/pons and adrenal to a Gen-Bank reference sequence. Transfection of the brain cDNA into COS-1 cells resulted in transporter activity that was blocked by the VMAT inhibitor reserpine and a proton ionophore, but not by tetrabenazine, which has a high affinity for VMAT-2. Antibodies to either the C- or N- terminus of VMAT-1 detected two proteins (73 and 55 kD) in transfected COS-1 cells. The C-terminal antibodies detected both proteins in extracts of mouse medulla/pons, cortex, hypothalamus, and cerebellum but only the 73 kD protein and higher molecular weight immunoreactive proteins in mouse adrenal and rat PC12 cells, which are positive controls for rodent VMAT-1. These findings demonstrate that a functional VMAT-1 mRNA coding sequence is expressed in mouse brain and suggest processing of VMAT-1 protein differs in mouse adrenal and brain.
qPMS9: An Efficient Algorithm for Quorum Planted Motif Search
NASA Astrophysics Data System (ADS)
Nicolae, Marius; Rajasekaran, Sanguthevar
2015-01-01
Discovering patterns in biological sequences is a crucial problem. For example, the identification of patterns in DNA sequences has resulted in the determination of open reading frames, identification of gene promoter elements, intron/exon splicing sites, and SH RNAs, location of RNA degradation signals, identification of alternative splicing sites, etc. In protein sequences, patterns have led to domain identification, location of protease cleavage sites, identification of signal peptides, protein interactions, determination of protein degradation elements, identification of protein trafficking elements, discovery of short functional motifs, etc. In this paper we focus on the identification of an important class of patterns, namely, motifs. We study the (l, d) motif search problem or Planted Motif Search (PMS). PMS receives as input n strings and two integers l and d. It returns all sequences M of length l that occur in each input string, where each occurrence differs from M in at most d positions. Another formulation is quorum PMS (qPMS), where the motif appears in at least q% of the strings. We introduce qPMS9, a parallel exact qPMS algorithm that offers significant runtime improvements on DNA and protein datasets. qPMS9 solves the challenging DNA (l, d)-instances (28, 12) and (30, 13). The source code is available at https://code.google.com/p/qpms9/.
Complete cDNA sequence and amino acid analysis of a bovine ribonuclease K6 gene.
Pietrowski, D; Förster, M
2000-01-01
The complete cDNA sequence of a ribonuclease k6 gene of Bos Taurus has been determined. It codes for a protein with 154 amino acids and contains the invariant cysteine, histidine and lysine residues as well as the characteristic motifs specific to ribonuclease active sites. The deduced protein sequence is 27 residues longer than other known ribonucleases k6 and shows amino acids exchanges which could reflect a strain specificity or polymorphism within the bovine genome. Based on sequence similarity we have termed the identified gene bovine ribonuclease k6 b (brk6b).
Fortin, Connor H; Schulze, Katharina V; Babbitt, Gregory A
2015-01-01
It is now widely-accepted that DNA sequences defining DNA-protein interactions functionally depend upon local biophysical features of DNA backbone that are important in defining sites of binding interaction in the genome (e.g. DNA shape, charge and intrinsic dynamics). However, these physical features of DNA polymer are not directly apparent when analyzing and viewing Shannon information content calculated at single nucleobases in a traditional sequence logo plot. Thus, sequence logos plots are severely limited in that they convey no explicit information regarding the structural dynamics of DNA backbone, a feature often critical to binding specificity. We present TRX-LOGOS, an R software package and Perl wrapper code that interfaces the JASPAR database for computational regulatory genomics. TRX-LOGOS extends the traditional sequence logo plot to include Shannon information content calculated with regard to the dinucleotide-based BI-BII conformation shifts in phosphate linkages on the DNA backbone, thereby adding a visual measure of intrinsic DNA flexibility that can be critical for many DNA-protein interactions. TRX-LOGOS is available as an R graphics module offered at both SourceForge and as a download supplement at this journal. To demonstrate the general utility of TRX logo plots, we first calculated the information content for 416 Saccharomyces cerevisiae transcription factor binding sites functionally confirmed in the Yeastract database and matched to previously published yeast genomic alignments. We discovered that flanking regions contain significantly elevated information content at phosphate linkages than can be observed at nucleobases. We also examined broader transcription factor classifications defined by the JASPAR database, and discovered that many general signatures of transcription factor binding are locally more information rich at the level of DNA backbone dynamics than nucleobase sequence. We used TRX-logos in combination with MEGA 6.0 software for molecular evolutionary genetics analysis to visually compare the human Forkhead box/FOX protein evolution to its binding site evolution. We also compared the DNA binding signatures of human TP53 tumor suppressor determined by two different laboratory methods (SELEX and ChIP-seq). Further analysis of the entire yeast genome, center aligned at the start codon, also revealed a distinct sequence-independent 3 bp periodic pattern in information content, present only in coding region, and perhaps indicative of the non-random organization of the genetic code. TRX-LOGOS is useful in any situation in which important information content in DNA can be better visualized at the positions of phosphate linkages (i.e. dinucleotides) where the dynamic properties of the DNA backbone functions to facilitate DNA-protein interaction.
Wolffe, E J; Gause, W C; Pelfrey, C M; Holland, S M; Steinberg, A D; August, J T
1990-01-05
We describe the isolation and sequencing of a cDNA encoding mouse Pgp-1. An oligonucleotide probe corresponding to the NH2-terminal sequence of the purified protein was synthesized by the polymerase chain reaction and used to screen a mouse macrophage lambda gt11 library. A cDNA clone with an insert of 1.2 kilobases was selected and sequenced. In Northern blot analysis, only cells expressing Pgp-1 contained mRNA species that hybridized with this Pgp-1 cDNA. The nucleotide sequence of the cDNA has a single open reading frame that yields a protein-coding sequence of 1076 base pairs followed by a 132-base pair 3'-untranslated sequence that includes a putative polyadenylation signal but no poly(A) tail. The translated sequence comprises a 13-amino acid signal peptide followed by a polypeptide core of 345 residues corresponding to an Mr of 37,800. Portions of the deduced amino acid sequence were identical to those obtained by amino acid sequence analysis from the purified glycoprotein, confirming that the cDNA encodes Pgp-1. The predicted structure of Pgp-1 includes an NH2-terminal extracellular domain (residues 14-265), a transmembrane domain (residues 266-286), and a cytoplasmic tail (residues 287-358). Portions of the mouse Pgp-1 sequence are highly similar to that of the human CD44 cell surface glycoprotein implicated in cell adhesion. The protein also shows sequence similarity to the proteoglycan tandem repeat sequences found in cartilage link protein and cartilage proteoglycan core protein which are thought to be involved in binding to hyaluronic acid.
DNA-binding proteins from marine bacteria expand the known sequence diversity of TALE-like repeats.
de Lange, Orlando; Wolf, Christina; Thiel, Philipp; Krüger, Jens; Kleusch, Christian; Kohlbacher, Oliver; Lahaye, Thomas
2015-11-16
Transcription Activator-Like Effectors (TALEs) of Xanthomonas bacteria are programmable DNA binding proteins with unprecedented target specificity. Comparative studies into TALE repeat structure and function are hindered by the limited sequence variation among TALE repeats. More sequence-diverse TALE-like proteins are known from Ralstonia solanacearum (RipTALs) and Burkholderia rhizoxinica (Bats), but RipTAL and Bat repeats are conserved with those of TALEs around the DNA-binding residue. We study two novel marine-organism TALE-like proteins (MOrTL1 and MOrTL2), the first to date of non-terrestrial origin. We have assessed their DNA-binding properties and modelled repeat structures. We found that repeats from these proteins mediate sequence specific DNA binding conforming to the TALE code, despite low sequence similarity to TALE repeats, and with novel residues around the BSR. However, MOrTL1 repeats show greater sequence discriminating power than MOrTL2 repeats. Sequence alignments show that there are only three residues conserved between repeats of all TALE-like proteins including the two new additions. This conserved motif could prove useful as an identifier for future TALE-likes. Additionally, comparing MOrTL repeats with those of other TALE-likes suggests a common evolutionary origin for the TALEs, RipTALs and Bats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Artificial Intelligence, DNA Mimicry, and Human Health.
Stefano, George B; Kream, Richard M
2017-08-14
The molecular evolution of genomic DNA across diverse plant and animal phyla involved dynamic registrations of sequence modifications to maintain existential homeostasis to increasingly complex patterns of environmental stressors. As an essential corollary, driver effects of positive evolutionary pressure are hypothesized to effect concerted modifications of genomic DNA sequences to meet expanded platforms of regulatory controls for successful implementation of advanced physiological requirements. It is also clearly apparent that preservation of updated registries of advantageous modifications of genomic DNA sequences requires coordinate expansion of convergent cellular proofreading/error correction mechanisms that are encoded by reciprocally modified genomic DNA. Computational expansion of operationally defined DNA memory extends to coordinate modification of coding and previously under-emphasized noncoding regions that now appear to represent essential reservoirs of untapped genetic information amenable to evolutionary driven recruitment into the realm of biologically active domains. Additionally, expansion of DNA memory potential via chemical modification and activation of noncoding sequences is targeted to vertical augmentation and integration of an expanded cadre of transcriptional and epigenetic regulatory factors affecting linear coding of protein amino acid sequences within open reading frames.
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Selfish DNA in protein-coding genes of Rickettsia.
Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M
2000-10-13
Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.
Gomes, S L; Gober, J W; Shapiro, L
1990-01-01
Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
The complete DNA sequence of lymphocystis disease virus.
Tidona, C A; Darai, G
1997-04-14
Lymphocystis disease virus (LCDV) is the causative agent of lymphocystis disease, which has been reported to occur in over 100 different fish species worldwide. LCDV is a member of the family Iridoviridae and the type species of the genus Lymphocystivirus. The virions contain a single linear double-stranded DNA molecule, which is circularly permuted, terminally redundant, and heavily methylated at cytosines in CpG sequences. The complete nucleotide sequence of LCDV-1 (flounder isolate) was determined by automated cycle sequencing and primer walking. The genome of LCDV-1 is 102.653 bp in length and contains 195 open reading frames with coding capacities ranging from 40 to 1199 amino acids. Computer-assisted analyses of the deduced amino acid sequences led to the identification of several putative gene products with significant homologies to entries in protein data banks, such as the two major subunits of the viral DNA-dependent RNA polymerase, DNA polymerase, several protein kinases, two subunits of the ribonucleoside diphosphate reductase, DNA methyltransferase, the viral major capsid protein, insulin-like growth factor, and tumor necrosis factor receptor homolog.
Saavedra-Lira, E; Pérez-Montfort, R
1994-05-16
We isolated three overlapping clones from a DNA genomic library of Entamoeba histolytica strain HM1:IMSS, whose translated nucleotide (nt) sequence shows similarities of 51, 48 and 47% with the amino acid (aa) sequences reported for the pyruvate phosphate dikinases from Bacteroides symbiosus, maize and Flaveria trinervia, respectively. The reading frame determined codes for a protein of 886 aa.
... exons, the parts of DNA that code for proteins in the body. Researchers like this method because it is faster and cheaper. Learn More More still needs to be done before whole genome sequencing becomes a routine part of medical care. Many ...
Fey, G; Lewis, J B; Grodzicker, T; Bothwell, A
1979-01-01
The adenovirus type 2-simian virus 40 (SV40) hybrid virus Ad2+ND1 dp2 (E. Lukanidin, manuscript in preparation) specified two proteins (molecular weights, 24,000 and 23,000) that are, in part, products of an insertion of SV40 early DNA sequences. This was demonstrated by translation in vitro from viral mRNA that had been selected by hybridization to SV40 DNA. These two phosphorylated, nonvirion proteins were produced late in infection in amounts similar to adenovirus 2 structural proteins and were closely related to each other in tryptic peptide composition. The portion of SV40 DNA (map units 0.17 to 0.22 on the SV40 genome) coding for these proteins was joined to sequences coding for the amino-terminal part of the adenovirus type 2 structural protein IV (fiber). The Ad2+ND1 dp2 23,000- and 24,000-molecular-weight proteins were hybrid polypeptides, with about two-thirds of their tryptic peptides contributed by the fiber protein and the remainder contributed by SV40 T-antigen. They shared with T-antigen (molecular weight, 96,000) a carboxy-terminal proline-rich tryptic peptide. Together, the tryptic peptide composition of these proteins and the known SV40 DNA sequences suggested the reading frame for the translation of T-antigen. The carboxy terminus for T-anigen would then be located on the SV40 genome map next to the TAA terminator triplet at position 0.175, 910 bases away from the cleavage site of the restriction endonuclease EcoRI. Seven host range mutants from Ad2+ND1 dp2 were isolated that had lost the capacity to propagate on monkey cells. They did not induce detectable levels of the hybrid proteins. Three of these mutants had lost the SV40 DNA insertion that codes in part for these proteins. Thus, in analogy to the Ad2+ND1 30,000-molecular-weight protein, the presence of these proteins correlates with the presence of the helper function for adenovirus replication on monkey cells. Images PMID:225516
Fanning, T; Singer, M
1987-01-01
Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Human mRNA polyadenylate binding protein: evolutionary conservation of a nucleic acid binding motif.
Grange, T; de Sa, C M; Oddos, J; Pictet, R
1987-01-01
We have isolated a full length cDNA (cDNA) coding for the human poly(A) binding protein. The cDNA derived 73 kd basic translation product has the same Mr, isoelectric point and peptidic map as the poly(A) binding protein. DNA sequence analysis reveals a 70,244 dalton protein. The N terminal part, highly homologous to the yeast poly(A) binding protein, is sufficient for poly(A) binding activity. This domain consists of a four-fold repeated unit of approximately 80 amino acids present in other nucleic acid binding proteins. In the C terminal part there is, as in the yeast protein, a sequence of approximately 150 amino acids, rich in proline, alanine and glutamine which together account for 48% of the residues. A 2,9 kb mRNA corresponding to this cDNA has been detected in several vertebrate cell types and in Drosophila melanogaster at every developmental stage including oogenesis. Images PMID:2885805
Marck, C
1988-01-01
DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831
Feng, X; Happ, G M
1996-11-14
The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
Crowley, T E; Bond, M W; Meyerowitz, E M
1983-01-01
The polytene chromosome puff at 68C on the Drosophila melanogaster third chromosome is thought from genetic experiments to contain the structural gene for one of the secreted salivary gland glue polypeptides, sgs-3. Previous work has demonstrated that the DNA included in this puff contains sequences that are transcribed to give three different polyadenylated RNAs that are abundant in third-larval-instar salivary glands. These have been called the group II, group III, and group IV RNAs. In the experiments reported here, we used the nucleotide sequence of the DNA coding for these RNAs to predict some of the physical and chemical properties expected of their protein products, including molecular weight, amino acid composition, and amino acid sequence. Salivary gland polypeptides with molecular weights similar to those expected for the 68C RNA translation products, and with the expected degree of incorporation of different radioactive amino acids, were purified. These proteins were shown by amino acid sequencing to correspond to the protein products of the 68C RNAs. It was further shown that each of these proteins is a part of the secreted salivary gland glue: the group IV RNA codes for the previously described sgs-3, whereas the group II and III RNAs code for the newly identified glue polypeptides sgs-8 and sgs-7. Images PMID:6406838
The Purine Bias of Coding Sequences is Determined by Physicochemical Constraints on Proteins.
Ponce de Leon, Miguel; de Miranda, Antonio Basilio; Alvarez-Valin, Fernando; Carels, Nicolas
2014-01-01
For this report, we analyzed protein secondary structures in relation to the statistics of three nucleotide codon positions. The purpose of this investigation was to find which properties of the ribosome, tRNA or protein level, could explain the purine bias (Rrr) as it is observed in coding DNA. We found that the Rrr pattern is the consequence of a regularity (the codon structure) resulting from physicochemical constraints on proteins and thermodynamic constraints on ribosomal machinery. The physicochemical constraints on proteins mainly come from the hydropathy and molecular weight (MW) of secondary structures as well as the energy cost of amino acid synthesis. These constraints appear through a network of statistical correlations, such as (i) the cost of amino acid synthesis, which is in favor of a higher level of guanine in the first codon position, (ii) the constructive contribution of hydropathy alternation in proteins, (iii) the spatial organization of secondary structure in proteins according to solvent accessibility, (iv) the spatial organization of secondary structure according to amino acid hydropathy, (v) the statistical correlation of MW with protein secondary structures and their overall hydropathy, (vi) the statistical correlation of thymine in the second codon position with hydropathy and the energy cost of amino acid synthesis, and (vii) the statistical correlation of adenine in the second codon position with amino acid complexity and the MW of secondary protein structures. Amino acid physicochemical properties and functional constraints on proteins constitute a code that is translated into a purine bias within the coding DNA via tRNAs. In that sense, the Rrr pattern within coding DNA is the effect of information transfer on nucleotide composition from protein to DNA by selection according to the codon positions. Thus, coding DNA structure and ribosomal machinery co-evolved to minimize the energy cost of protein coding given the functional constraints on proteins.
Gutiérrez, Pablo; Alzate, Juan; Yepes, Mauricio Salazar; Marín, Mauricio
2016-01-01
Colletotrichum lindemuthianum is the causal agent of anthracnose in common bean (Phaseolus vulgaris), one of the most limiting factors for this crop in South and Central America. In this work, the mitochondrial sequence of a Colombian isolate of C. lindemuthianum obtained from a common bean plant (var. Cargamanto) with anthracnose symptoms is presented. The mtDNA codes for 13 proteins of the respiratory chain, 1 ribosomal protein, 2 homing endonucleases, 2 ribosomal RNAs and 28 tRNAs. This is the first report of a complete mtDNA genome sequence from C. lindemuthianum.
What Information is Stored in DNA: Does it Contain Digital Error Correcting Codes?
NASA Astrophysics Data System (ADS)
Liebovitch, Larry
1998-03-01
The longest term correlations in living systems are the information stored in DNA which reflects the evolutionary history of an organism. The 4 bases (A,T,G,C) encode sequences of amino acids as well as locations of binding sites for proteins that regulate DNA. The fidelity of this important information is maintained by ANALOG error check mechanisms. When a single strand of DNA is replicated the complementary base is inserted in the new strand. Sometimes the wrong base is inserted that sticks out disrupting the phosphate backbone. The new base is not yet methylated, so repair enzymes, that slide along the DNA, can tear out the wrong base and replace it with the right one. The bases in DNA form a sequence of 4 different symbols and so the information is encoded in a DIGITAL form. All the digital codes in our society (ISBN book numbers, UPC product codes, bank account numbers, airline ticket numbers) use error checking code, where some digits are functions of other digits to maintain the fidelity of transmitted informaiton. Does DNA also utitlize a DIGITAL error chekcing code to maintain the fidelity of its information and increase the accuracy of replication? That is, are some bases in DNA functions of other bases upstream or downstream? This raises the interesting mathematical problem: How does one determine whether some symbols in a sequence of symbols are a function of other symbols. It also bears on the issue of determining algorithmic complexity: What is the function that generates the shortest algorithm for reproducing the symbol sequence. The error checking codes most used in our technology are linear block codes. We developed an efficient method to test for the presence of such codes in DNA. We coded the 4 bases as (0,1,2,3) and used Gaussian elimination, modified for modulus 4, to test if some bases are linear combinations of other bases. We used this method to analyze the base sequence in the genes from the lac operon and cytochrome C. We did not find evidence for such error correcting codes in these genes. However, we analyzed only a small amount of DNA and if digitial error correcting schemes are present in DNA, they may be more subtle than such simple linear block codes. The basic issue we raise here, is how information is stored in DNA and an appreciation that digital symbol sequences, such as DNA, admit of interesting schemes to store and protect the fidelity of their information content. Liebovitch, Tao, Todorov, Levine. 1996. Biophys. J. 71:1539-1544. Supported by NIH grant EY6234.
Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E
2016-01-04
Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Scherer, N M; Basso, D M
2008-09-16
DNATagger is a web-based tool for coloring and editing DNA, RNA and protein sequences and alignments. It is dedicated to the visualization of protein coding sequences and also protein sequence alignments to facilitate the comprehension of evolutionary processes in sequence analysis. The distinctive feature of DNATagger is the use of codons as informative units for coloring DNA and RNA sequences. The codons are colored according to their corresponding amino acids. It is the first program that colors codons in DNA sequences without being affected by "out-of-frame" gaps of alignments. It can handle single gaps and gaps inside the triplets. The program also provides the possibility to edit the alignments and change color patterns and translation tables. DNATagger is a JavaScript application, following the W3C guidelines, designed to work on standards-compliant web browsers. It therefore requires no installation and is platform independent. The web-based DNATagger is available as free and open source software at http://www.inf.ufrgs.br/~dmbasso/dnatagger/.
Cloning and expression of a cDNA coding for catalase from zebrafish (Danio rerio).
Ken, C F; Lin, C T; Wu, J L; Shaw, J F
2000-06-01
A full-length complementary DNA (cDNA) clone encoding a catalase was amplified by the rapid amplication of cDNA ends-polymerase chain reaction (RACE-PCR) technique from zebrafish (Danio rerio) mRNA. Nucleotide sequence analysis of this cDNA clone revealed that it comprised a complete open reading frame coding for 526 amino acid residues and that it had a molecular mass of 59 654 Da. The deduced amino acid sequence showed high similarity with the sequences of catalase from swine (86.9%), mouse (85.8%), rat (85%), human (83.7%), fruit fly (75.6%), nematode (71.1%), and yeast (58.6%). The amino acid residues for secondary structures are apparently conserved as they are present in other mammal species. Furthermore, the coding region of zebrafish catalase was introduced into an expression vector, pET-20b(+), and transformed into Escherichia coli expression host BL21(DE3)pLysS. A 60-kDa active catalase protein was expressed and detected by Coomassie blue staining as well as activity staining on polyacrylamide gel followed electrophoresis.
Structure-related statistical singularities along protein sequences: a correlation study.
Colafranceschi, Mauro; Colosimo, Alfredo; Zbilut, Joseph P; Uversky, Vladimir N; Giuliani, Alessandro
2005-01-01
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein-protein and protein-DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Generate Optimized Genetic Rhythm for Enzyme Expression in Non-native systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
2016-11-03
Most amino acids are represented by more than one codon, resulting in redundancy in the genetic code. Silent codon substitutions that do not alter the amino acid sequence still have an effect on protein expression. We have developed an algorithm, GoGREEN, to enhance the expression of foreign proteins in a host organism. GoGREEN selects codons according to frequency patterns seen in the gene of interest using the codon usage table from the host organism. GoGREEN is also designed to accommodate gaps in the sequence.This software takes for input (1) the aligned protein sequences for genes the user wishes to express,more » (2) the codon usage table for the host organism, (3) and the DNA sequence for the target protein found in the host organism. The program will select codons based on codon usage patterns for the target DNA sequence. The program will also select codons for “gaps” found in the aligned protein sequences using the codon usage table from the host organism.« less
NASA Technical Reports Server (NTRS)
Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)
1993-01-01
Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.
Numerical classification of coding sequences
NASA Technical Reports Server (NTRS)
Collins, D. W.; Liu, C. C.; Jukes, T. H.
1992-01-01
DNA sequences coding for protein may be represented by counts of nucleotides or codons. A complete reading frame may be abbreviated by its base count, e.g. A76C158G121T74, or with the corresponding codon table, e.g. (AAA)0(AAC)1(AAG)9 ... (TTT)0. We propose that these numerical designations be used to augment current methods of sequence annotation. Because base counts and codon tables do not require revision as knowledge of function evolves, they are well-suited to act as cross-references, for example to identify redundant GenBank entries. These descriptors may be compared, in place of DNA sequences, to extract homologous genes from large databases. This approach permits rapid searching with good selectivity.
Albertini, A M; Caramori, T; Crabb, W D; Scoffone, F; Galizzi, A
1991-01-01
We cloned and sequenced 8.3 kb of Bacillus subtilis DNA corresponding to the flaA locus involved in flagellar biosynthesis, motility, and chemotaxis. The DNA sequence revealed the presence of 10 complete and 2 incomplete open reading frames. Comparison of the deduced amino acid sequences to data banks showed similarities of nine of the deduced products to a number of proteins of Escherichia coli and Salmonella typhimurium for which a role in flagellar functioning has been directly demonstrated. In particular, the sequence data suggest that the flaA operon codes for the M-ring protein, components of the motor switch, and the distal part of the basal-body rod. The gene order is remarkably similar to that described for region III of the enterobacterial flagellar regulon. One of the open reading frames was translated into a protein with 48% amino acid identity to S. typhimurium FliI and 29% identity to the beta subunit of E. coli ATP synthase. PMID:1828465
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.
Suyama, Mikita; Lathe, Warren C; Bork, Peer
2005-10-10
We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leong, JoAnn Ching
The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less
Biomimetic Artificial Epigenetic Code for Targeted Acetylation of Histones.
Taniguchi, Junichi; Feng, Yihong; Pandian, Ganesh N; Hashiya, Fumitaka; Hidaka, Takuya; Hashiya, Kaori; Park, Soyoung; Bando, Toshikazu; Ito, Shinji; Sugiyama, Hiroshi
2018-06-13
While the central role of locus-specific acetylation of histone proteins in eukaryotic gene expression is well established, the availability of designer tools to regulate acetylation at particular nucleosome sites remains limited. Here, we develop a unique strategy to introduce acetylation by constructing a bifunctional molecule designated Bi-PIP. Bi-PIP has a P300/CBP-selective bromodomain inhibitor (Bi) as a P300/CBP recruiter and a pyrrole-imidazole polyamide (PIP) as a sequence-selective DNA binder. Biochemical assays verified that Bi-PIPs recruit P300 to the nucleosomes having their target DNA sequences and extensively accelerate acetylation. Bi-PIPs also activated transcription of genes that have corresponding cognate DNA sequences inside living cells. Our results demonstrate that Bi-PIPs could act as a synthetic programmable histone code of acetylation, which emulates the bromodomain-mediated natural propagation system of histone acetylation to activate gene expression in a sequence-selective manner.
DNA barcode goes two-dimensions: DNA QR code web server.
Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.
1987-06-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilkins, T.A.
1993-06-01
This study investigates the molecular events of vacuole ontogeny in rapidly elongated cotton plant cells. Within the DNA coding region, the cotton and carrot cDNA clones exhibit 82.2% nucleotide sequence homology; at the amino acid level cotton and carrot catalytic subunits exhibited 95.7% identity and 2.1% amino acid similarity. When aligned with the analogous sequences from yeast, the cotton protein shared only 60.5% amino acid identity and 12.7% similarity. 10 refs., 1 tab.
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Human somatostatin I: sequence of the cDNA.
Shen, L P; Pictet, R L; Rutter, W J
1982-01-01
RNA has been isolated from a human pancreatic somatostatinoma and used to prepare a cDNA library. After prescreening, clones containing somatostatin I sequences were identified by hybridization with an anglerfish somatostatin I-cloned cDNA probe. From the nucleotide sequence of two of these clones, we have deduced an essentially full-length mRNA sequence, including the preprosomatostatin coding region, 105 nucleotides from the 5' untranslated region and the complete 150-nucleotide 3' untranslated region. The coding region predicts a 116-amino acid precursor protein (Mr, 12.727) that contains somatostatin-14 and -28 at its COOH terminus. The predicted amino acid sequence of human somatostatin-28 is identical to that of somatostatin-28 isolated from the porcine and ovine species. A comparison of the amino acid sequences of human and anglerfish preprosomatostatin I indicated that the COOH-terminal region encoding somatostatin-14 and the adjacent 6 amino acids are highly conserved, whereas the remainder of the molecule, including the signal peptide region, is more divergent. However, many of the amino acid differences found in the pro region of the human and anglerfish proteins are conservative changes. This suggests that the propeptides have a similar secondary structure, which in turn may imply a biological function for this region of the molecule. Images PMID:6126875
Geranyl diphosphate synthase large subunit, and methods of use
Croteau, Rodney B.; Burke, Charles C.; Wildung, Mark R.
2001-10-16
A cDNA encoding geranyl diphosphate synthase large subunit from peppermint has been isolated and sequenced, and the corresponding amino acid sequence has been determined. Replicable recombinant cloning vehicles are provided which code for geranyl diphosphate synthase large subunit). In another aspect, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding geranyl diphosphate synthase large subunit. In yet another aspect, the present invention provides isolated, recombinant geranyl diphosphate synthase protein comprising an isolated, recombinant geranyl diphosphate synthase large subunit protein and an isolated, recombinant geranyl diphosphate synthase small subunit protein. Thus, systems and methods are provided for the recombinant expression of geranyl diphosphate synthase.
Su, Huei-Jiun; Hu, Jer-Ming
2012-01-01
Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.
Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong
2016-05-01
Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.
Sikorav, J L; Duval, N; Anselmet, A; Bon, S; Krejci, E; Legay, C; Osterlund, M; Reimund, B; Massoulié, J
1988-01-01
In this paper, we show the existence of alternative splicing in the 3' region of the coding sequence of Torpedo acetylcholinesterase (AChE). We describe two cDNA structures which both diverge from the previously described coding sequence of the catalytic subunit of asymmetric (A) forms (Schumacher et al., 1986; Sikorav et al., 1987). They both contain a coding sequence followed by a non-coding sequence and a poly(A) stretch. Both of these structures were shown to exist in poly(A)+ RNAs, by S1 mapping experiments. The divergent region encoded by the first sequence corresponds to the precursor of the globular dimeric form (G2a), since it contains the expected C-terminal amino acids, Ala-Cys. These amino acids are followed by a 29 amino acid extension which contains a hydrophobic segment and must be replaced by a glycolipid in the mature protein. Analyses of intact G2a AChE showed that the common domain of the protein contains intersubunit disulphide bonds. The divergent region of the second type of cDNA consists of an adjacent genomic sequence, which is removed as an intron in A and Ga mRNAs, but may encode a distinct, less abundant catalytic subunit. The structures of the cDNA clones indicate that they are derived from minor mRNAs, shorter than the three major transcripts which have been described previously (14.5, 10.5 and 5.5 kb). Oligonucleotide probes specific for the asymmetric and globular terminal regions hybridize with the three major transcripts, indicating that their size is determined by 3'-untranslated regions which are not related to the differential splicing leading to A and Ga forms. Images PMID:3181125
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server
Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Bozzoni, I; Beccari, E; Luo, Z X; Amaldi, F
1981-01-01
Poly-A+ mRNA from Xenopus laevis oocytes, partially enriched for r-protein coding capacity has been used as starting material for preparing a cDNA bank in plasmid pBR322. The clones containing sequences specific for r-proteins have been selected by translation of the complementary mRNAs. Clones for six different r-proteins have been identified and utilized as probes for studying their genomic organization. Two gene copies per haploid genome were found for r-proteins L1, L14, S19, and four-five for protein S1, S8 and L32. Moreover a population polymorphism has been observed for the genomic regions containing sequences for r-protein S1, S8 and L14. Images PMID:6112733
The TGA codons are present in the open reading frame of selenoprotein P cDNA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hill, K.E.; Lloyd, R.S.; Read, R.
1991-03-11
The TGA codon in DNA has been shown to direct incorporation of selenocysteine into protein. Several proteins from bacteria and animals contain selenocysteine in their primary structures. Each of the cDNA clones of these selenoproteins contains one TGA codon in the open reading frame which corresponds to the selenocysteine in the protein. A cDNA clone for selenoprotein P (SeP), obtained from a {gamma}ZAP rat liver library, was sequenced by the dideoxy termination method. The correct reading frame was determined by comparison of the deduced amino acid sequence with the amino acid sequence of several peptides from SeP. Using SeP labelledmore » with {sup 75}Se in vivo, the selenocysteine content of the peptides was verified by the collection of carboxymethylated {sup 77}Se-selenocysteine as it eluted from the amino acid analyzer and determination of the radioactivity contained in the collected samples. Ten TGA codons are present in the open reading frame of the cDNA. Peptide fragmentation studies and the deduced sequence indicate that selenium-rich regions are located close to the carboxy terminus. Nine of the 10 selenocysteines are located in the terminal 26% of the sequence with four in the terminal 15 amino acids. The deduced sequence codes for a protein of 385 amino acids. Cleavage of the signal peptide gives the mature protein with 366 amino acids and a calculated mol wt of 41,052 Da. Searches of PIR and SWISSPROT protein databases revealed no similarity with glutathione peroxidase or other selenoproteins.« less
Toubart, P; Desiderio, A; Salvi, G; Cervone, F; Daroda, L; De Lorenzo, G
1992-05-01
Polygalacturonase-inhibiting protein (PGIP) is a cell wall protein purified from hypocotyls of true bean (Phaseolus vulgaris L.). PGIP inhibits fungal endopolygalacturonases and is considered to be an important factor for plant resistance to phytopathogenic fungi (Albersheim and Anderson, 1971; Cervone et al., 1987). The amino acid sequences of the N-terminus and one internal tryptic peptide of the PGIP purified from P. vulgaris cv. Pinto were used to design redundant oligonucleotides that were successfully utilized as primers in a polymerase chain reaction (PCR) with total DNA of P. vulgaris as a template. A DNA band of 758 bp (a specific PCR amplification product of part of the gene coding for PGIP) was isolated and cloned. By using the 758-bp DNA as a hybridization probe, a lambda clone containing the PGIP gene was isolated from a genomic library of P. vulgaris cv. Saxa. The coding and immediate flanking regions of the PGIP gene, contained on a subcloned 3.3 kb SalI-SalI DNA fragment, were sequenced. A single, continuous ORF of 1026 nt (342 amino acids) was present in the genomic clone. The nucleotide and deduced amino acid sequences of the PGIP gene showed no significant similarity with any known databank sequence. Northern blotting analysis of poly(A)+ RNAs, isolated from various tissues of bean seedlings or from suspension-cultured bean cells, were also performed using the cloned PCR-generated DNA as a probe. A 1.2 kb transcript was detected in suspension-cultured cells and, to a lesser extent, in leaves, hypocotyls, and flowers.(ABSTRACT TRUNCATED AT 250 WORDS)
Busslinger, M; Portmann, R; Irminger, J C; Birnstiel, M L
1980-01-01
The DNA sequences of the entire structural H4, H3, H2A and H2B genes and of their 5' flanking regions have been determined in the histone DNA clone h19 of the sea urchin Psammechinus miliaris. In clone h19 the polarity of transcription and the relative arrangement of the histone genes is identical to that in clone h22 of the same species. The histone proteins encoded by h19 DNA differ in their primary structure from those encoded by clone h22 and have been compared to histone protein sequences of other sea urchin species as well as other eukaryotes. A comparative analysis of the 5' flanking DNA sequences of the structural histone genes in both clones revealed four ubiquitous sequence motifs; a pentameric element GATCC, followed at short distance by the Hogness box GTATAAATAG, a conserved sequence PyCATTCPu, in or near which the 5' ends of the mRNAs map in h22 DNA and lastly a sequence A, containing the initiation codon. These sequences are also found, sometimes in modified version, in front of other eukaryotic genes transcribed by polymerase II. When prelude sequences of isocoding histone genes in clone h19 and h22 are compared areas of homology are seen to extend beyond the ubiquitous sequence motifs towards the divergent AT-rich spacer and terminate between approximately 140 and 240 nucleotides away from the structural gene. These prelude regions contain quite large conservative sequence blocks which are specific for each type of histone genes. Images PMID:7443547
Miller, Andrew D
2015-02-01
A sense peptide can be defined as a peptide whose sequence is coded by the nucleotide sequence (read 5' → 3') of the sense (positive) strand of DNA. Conversely, an antisense (complementary) peptide is coded by the corresponding nucleotide sequence (read 5' → 3') of the antisense (negative) strand of DNA. Research has been accumulating steadily to suggest that sense peptides are capable of specific interactions with their corresponding antisense peptides. Unfortunately, although more and more examples of specific sense-antisense peptide interactions are emerging, the very idea of such interactions does not conform to standard biology dogma and so there remains a sizeable challenge to lift this concept from being perceived as a peripheral phenomenon if not worse, into becoming part of the scientific mainstream. Specific interactions have now been exploited for the inhibition of number of widely different protein-protein and protein-receptor interactions in vitro and in vivo. Further, antisense peptides have also been used to induce the production of antibodies targeted to specific receptors or else the production of anti-idiotypic antibodies targeted against auto-antibodies. Such illustrations of utility would seem to suggest that observed sense-antisense peptide interactions are not just the consequence of a sequence of coincidental 'lucky-hits'. Indeed, at the very least, one might conclude that sense-antisense peptide interactions represent a potentially new and different source of leads for drug discovery. But could there be more to come from studies in this area? Studies on the potential mechanism of sense-antisense peptide interactions suggest that interactions may be driven by amino acid residue interactions specified from the genetic code. If so, such specified amino acid residue interactions could form the basis for an even wider amino acid residue interaction code (proteomic code) that links gene sequences to actual protein structure and function, even entire genomes to entire proteomes. The possibility that such a proteomic code should exist is discussed. So too the potential implications for biology and pharmaceutical science are also discussed were such a code to exist.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.
Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew
2012-12-20
The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Kikhno, Irina
2014-01-01
Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Molecular cloning of a cDNA coding for GTP cyclohydrolase I from Dictyostelium discoideum.
Witter, K; Cahill, D J; Werner, T; Ziegler, I; Rödl, W; Bacher, A; Gütlich, M
1996-01-01
The GTP cyclohydrolase I (GTP-CH) gene of the cellular slime mould Dictyostelium discoideum has been cloned and sequenced. The 855 bp cDNA of this gene contains the open reading frame (ORF) encoding 232 amino acids with a predicted molecular mass of approx. 26 kDa. Southern blot analysis indicated the presence of a single gene for GTP-CH in Dictyostelium. PCR amplification of the ORF from chromosomal DNA and sequencing showed the existence of a 101 bp intron in the GTP-CH gene of Dictyostelium discoideum. The amino acid sequence has 47% and 49% positional identity to those of the human and yeast enzymes respectively. Most of the sequence variation between species is located in the N-terminal part of the protein. The overall identity with the E. coli protein is markedly lower. The enzyme was expressed in E. coli and purified as a 68 kDa fusion protein with the maltose-binding protein of E. coli. GTP-CH of Dictyostelium is heat-stable and showed maximal activity at 60 degrees C. The Km value for GTP is 50 microM. PMID:8870645
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments
Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic
2001-01-01
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael
2013-01-01
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
Prody, C A; Zevin-Sonkin, D; Gnatt, A; Goldberg, O; Soreq, H
1987-01-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase (BtChoEase; EC 3.1.1.8) and Torpedo electric organ "true" acetylcholinesterase (AcChoEase; EC 3.1.1.7). Using these probes, we isolated several cDNA clones from lambda gt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)+ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These findings demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species. Images PMID:3035536
An insight into the sialome of the blood-sucking bug Triatoma infestans, a vector of Chagas' disease
Assumpção, Teresa C. F.; Francischetti, Ivo M. B.; Andersen, John F.; Schwarz, Alexandra; Santana, Jaime M.; Ribeiro, José M. C.
2008-01-01
Triatoma infestans is a hemiptera, vector of Chagas’ disease, that feeds exclusively on vertebrate blood in all life stages. Hematophagous insects’ salivary glands (SG) produce potent pharmacological compounds that counteract host hemostasis, including anti-clotting, anti-platelet, and vasodilatory molecules. To obtain a further insight into the salivary biochemical and pharmacological complexity of this insect, a cDNA library from its salivary glands was randomly sequenced. Also, salivary proteins were submitted to two dimentional gel (2D-gel) electrophoresis followed by MS analysis. We present the analysis of a set of 1,534 (SG) cDNA sequences, 645 of which coded for proteins of a putative secretory nature. Most salivary proteins described as lipocalins matched peptide sequences obtained from proteomic results. PMID:18207082
Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S
1999-01-01
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
Gene Unprediction with Spurio: A tool to identify spurious protein sequences.
Höps, Wolfram; Jeffryes, Matt; Bateman, Alex
2018-01-01
We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation. Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases. We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.
The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome.
Kim, K S; Lee, S E; Jeong, H W; Ha, J H
1998-10-01
The complete nucleotide sequence of the mitochondrial genome of the domestic dog, Canis familiaris, was determined. The length of the sequence was 16,728 bp; however, the length was not absolute due to the variation (heteroplasmy) caused by differing numbers of the repetitive motif, 5'-GTACACGT(A/G)C-3', in the control region. The genome organization, gene contents, and codon usage conformed to those of other mammalian mitochondrial genomes. Although its features were unknown, the "CTAGA" duplication event which followed the translational stop codon of the COII gene was not observed in other mammalian mitochondrial genomes. In order to determine the possible differences between mtDNAs in carnivores, two rRNA and 13 protein-coding genes from the cat, dog, and seal were compared. The combined molecular differences, in two rRNA genes as well as in the inferred amino acid sequences of the mitochondrial 13 protein-coding genes, suggested that there is a closer relationship between the dog and the seal than there is between either of these species and the cat. Based on the molecular differences of the mtDNA, the evolutionary divergence between the cat, the dog, and the seal was dated to approximately 50 +/- 4 million years ago. The degree of difference between carnivore mtDNAs varied according to the individual protein-coding gene applied, showing that the evolutionary relationships of distantly related species should be presented in an extended study based on ample sequence data like complete mtDNA molecules. Copyright 1998 Academic Press.
Angsuthanasombat, C; Chungjatupornchai, W; Kertbundit, S; Luxananil, P; Settasatian, C; Wilairat, P; Panyim, S
1987-07-01
Five recombinant E. coli clones exhibiting toxicity to Aedes aegypti larvae were obtained from a library of 800 clones containing XbaI DNA fragments of 110 kb plasmid from B. thuringiensis var. israelensis. All the five clones (pMU 14/258/303/388/679) had the same 3.8-kb insert and encoded a major protein of 130 kDa which was highly toxic to A. aegypti larvae. Three clones (pMU 258/303/388) transcribed the 130 kD a gene in the same direction as that of lac Z promoter of pUC12 vector whereas the transcription of the other two (pMU 14/679) was in the opposite direction. A 1.9-kb fragment of the 3.8 kb insert coded for a protein of 65 kDa. Partial DNA sequence of the 3.8 kb insert, corresponding to the 5'-terminal of the 130 kDa gene, revealed a continuous reading frame, a Shine-Dalgarno sequence and a tentative 5'-regulatory region. These results demonstrated that the 3.8 kb insert is a minimal DNA fragment containing a regulatory region plus the coding sequence of the 130 kDa protein that is highly toxic to mosquito larvae.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
dos Reis, Sávio Pinho; Tavares, Liliane de Souza Conceição; Costa, Carinne de Nazaré Monteiro; Brígida, Aílton Borges Santa; de Souza, Cláudia Regina Batista
2012-06-01
Cassava (Manihot esculenta Crantz) is one of the world's most important food crops. It is cultivated mainly in developing countries of tropics, since its root is a major source of calories for low-income people due to its high productivity and resistance to many abiotic and biotic factors. A previous study has identified a partial cDNA sequence coding for a putative RING zinc finger in cassava storage root. The RING zinc finger protein is a specialized type of zinc finger protein found in many organisms. Here, we isolated the full-length cDNA sequence coding for M. esculenta RZF (MeRZF) protein by a combination of 5' and 3' RACE assays. BLAST analysis showed that its deduced amino acid sequence has a high level of similarity to plant proteins of RZF family. MeRZF protein contains a signature sequence motif for a RING zinc finger at its C-terminal region. In addition, this protein showed a histidine residue at the fifth coordination site, likely belonging to the RING-H2 subgroup, as confirmed by our phylogenetic analysis. There is also a transmembrane domain in its N-terminal region. Finally, semi-quantitative RT-PCR assays showed that MeRZF expression is increased in detached leaves treated with sodium chloride. Here, we report the first evidence of a RING zinc finger gene of cassava showing potential role in response to salt stress.
Cheng, Rubin; Zheng, Xiaodong; Ma, Yuanyuan; Li, Qi
2013-01-01
In the present study, we determined the complete mitochondrial DNA (mtDNA) sequences of two species of Cistopus, namely C. chinensis and C. taiwanicus, and conducted a comparative mt genome analysis across the class Cephalopoda. The mtDNA length of C. chinensis and C. taiwanicus are 15706 and 15793 nucleotides with an AT content of 76.21% and 76.5%, respectively. The sequence identity of mtDNA between C. chinensis and C. taiwanicus was 88%, suggesting a close relationship. Compared with C. taiwanicus and other octopods, C. chinensis encoded two additional tRNA genes, showing a novel gene arrangement. In addition, an unusual 23 poly (A) signal structure is found in the ATP8 coding region of C. chinensis. The entire genome and each protein coding gene of the two Cistopus species displayed notable levels of AT and GC skews. Based on sliding window analysis among Octopodiformes, ND1 and DN5 were considered to be more reliable molecular beacons. Phylogenetic analyses based on the 13 protein-coding genes revealed that C. chinensis and C. taiwanicus form a monophyletic group with high statistical support, consistent with previous studies based on morphological characteristics. Our results also indicated that the phylogenetic position of the genus Cistopus is closer to Octopus than to Amphioctopus and Callistoctopus. The complete mtDNA sequence of C. chinensis and C. taiwanicus represent the first whole mt genomes in the genus Cistopus. These novel mtDNA data will be important in refining the phylogenetic relationships within Octopodiformes and enriching the resource of markers for systematic, population genetic and evolutionary biological studies of Cephalopoda. PMID:24358345
NASA Astrophysics Data System (ADS)
Haryati, Sri; Agung Prasetyo, Afiono; Sari, Yulia; Dharmawan, Ruben
2018-05-01
Toxoplasma gondii Surface Antigen 1 (SAG1) is often used as a diagnostic tool due to its immunodominant-specific as antigen. However, data of the Toxoplasma gondii SAG1 protein from Indonesian isolate is limited. To study the protein, genomic DNA was isolated from a Javanese acute toxoplasmosis blood samples patient. A complete coding sequence of Toxoplasma gondii SAG1 was cloned and inserted into an Escherichia coli expression plasmid and sequenced. The sequencing results were subjected to bioinformatics analysis. The Toxoplasma gondii SAG1 complete coding sequences were successfully cloned. Physicochemical analysis revealed the 336 aa of SAG1 had 34.7 kDa of weight. The isoelectric point and aliphatic index were 8.4 and 78.4, respectively. The N-terminal methionine half-life in Escherichia coli was more than 10 hours. The antigenicity, secondary structure, and identification of the HLA binding motifs also had been discussed. The results of this study would contribute information about Toxoplasma gondii SAG1 and benefits for further works willing to develop diagnostic and therapeutic strategies against the parasite.
Seto, P; Hirayu, H; Magnusson, R P; Gestautas, J; Portmann, L; DeGroot, L J; Rapoport, B
1987-01-01
The thyroid microsomal antigen (MSA) in autoimmune thyroid disease is a protein of approximately 107 kD. We screened a human thyroid cDNA library constructed in the expression vector lambda gt11 with anti-107-kD monoclonal antibodies. Of five clones obtained, the recombinant beta-galactosidase fusion protein from one clone (PM-5) was confirmed to react with the monoclonal antiserum. The complementary DNA (cDNA) insert from PM-5 (0.8 kb) was used as a probe on Northern blot analysis to estimate the size of the mRNA coding for the MSA. The 2.9-kb messenger RNA (mRNA) species observed was the same size as that coding for human thyroid peroxidase (TPO). The probe did not bind to human liver mRNA, indicating the thyroid-specific nature of the PM-5-related mRNA. The nucleotide sequence of PM-5 (842 bp) was determined and consisted of a single open reading frame. Comparison of the nucleotide sequence of PM-5 with that presently available for pig TPO indicates 84% homology. In conclusion, a cDNA clone representing part of the microsomal antigen has been isolated. Sequence homology with porcine TPO, as well as identity in the size of the mRNA species for both the microsomal antigen and TPO, indicate that the microsomal antigen is, at least in part, TPO. Images PMID:3654979
Sequencing proteins with transverse ionic transport in nanochannels.
Boynton, Paul; Di Ventra, Massimiliano
2016-05-03
De novo protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that compose a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Here, we propose a new technique for de novo protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting perpendicular nanochannel. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for de novo protein sequencing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolan, Kyle T.; Duguid, Erica M.; He, Chuan
2011-11-17
SlyA is a master virulence regulator that controls the transcription of numerous genes in Salmonella enterica. We present here crystal structures of SlyA by itself and bound to a high-affinity DNA operator sequence in the slyA gene. SlyA interacts with DNA through direct recognition of a guanine base by Arg-65, as well as interactions between conserved Arg-86 and the minor groove and a large network of non-base-specific contacts with the sugar phosphate backbone. Our structures, together with an unpublished structure of SlyA bound to the small molecule effector salicylate (Protein Data Bank code 3DEU), reveal that, unlike many other MarRmore » family proteins, SlyA dissociates from DNA without large conformational changes when bound to this effector. We propose that SlyA and other MarR global regulators rely more on indirect readout of DNA sequence to exert control over many genes, in contrast to proteins (such as OhrR) that recognize a single operator.« less
The eukaryotic genome is structurally and functionally more like a social insect colony than a book.
Qiu, Guo-Hua; Yang, Xiaoyan; Zheng, Xintian; Huang, Cuiqin
2017-11-01
Traditionally, the genome has been described as the 'book of life'. However, the metaphor of a book may not reflect the dynamic nature of the structure and function of the genome. In the eukaryotic genome, the number of centrally located protein-coding sequences is relatively constant across species, but the amount of noncoding DNA increases considerably with the increase of organismal evolutional complexity. Therefore, it has been hypothesized that the abundant peripheral noncoding DNA protects the genome and the central protein-coding sequences in the eukaryotic genome. Upon comparison with the habitation, sociality and defense mechanisms of a social insect colony, it is found that the genome is similar to a social insect colony in various aspects. A social insect colony may thus be a better metaphor than a book to describe the spatial organization and physical functions of the genome. The potential implications of the metaphor are also discussed.
Ju, Jung Won; Kim, Ho-Cheol; Shin, Hyun-Il; Kim, Yu Jung; Kim, Dong-Myung
2015-01-01
Progress towards genetic sequencing of human parasites has provided the groundwork for a post-genomic approach to develop novel antigens for the diagnosis and treatment of parasite infections. To fully utilize the genomic data, however, high-throughput methodologies are required for functional analysis of the proteins encoded in the genomic sequences. In this study, we investigated cell-free expression and in situ immobilization of parasite proteins as a novel platform for the discovery of antigenic proteins. PCR-amplified parasite DNA was immobilized on microbeads that were also functionalized to capture synthesized proteins. When the microbeads were incubated in a reaction mixture for cell-free synthesis, proteins expressed from the microbead-immobilized DNA were instantly immobilized on the same microbeads, providing a physical linkage between the genetic information and encoded proteins. This approach of in situ expression and isolation enables streamlined recovery and analysis of cell-free synthesized proteins and also allows facile identification of the genes coding antigenic proteins through direct PCR of the microbead-bound DNA. PMID:26599101
Bonen, Linda; Boer, Poppo H.; Gray, Michael W.
1984-01-01
We have determined the sequence of the wheat mitochondrial gene for cytochrome oxidase subunit II (COII) and find that its derived protein sequence differs from that of maize at only three amino acid positions. Unexpectedly, all three replacements are non-conservative ones. The wheat COII gene has a highly-conserved intron at the same position as in maize, but the wheat intron is 1.5 times longer because of an insert relative to its maize counterpart. Hybridization analysis of mitochondrial DNA from rye, pea, broad bean and cucumber indicates strong sequence conservation of COII coding sequences among all these higher plants. However, only rye and maize mitochondrial DNA show homology with wheat COII intron sequences and rye alone with intron-insert sequences. We find that a sequence identical to the region of the 5' exon corresponding to the transmembrane domain of the COII protein is present at a second genomic location in wheat mitochondria. These variations in COII gene structure and size, as well as the presence of repeated COII sequences, illustrate at the DNA sequence level, factors which contribute to higher plant mitochondrial DNA diversity and complexity. ImagesFig. 3.Fig. 4.Fig. 5. PMID:16453565
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest
2006-01-01
Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
Microbeads display of proteins using emulsion PCR and cell-free protein synthesis.
Gan, Rui; Yamanaka, Yumiko; Kojima, Takaaki; Nakano, Hideo
2008-01-01
We developed a method for coupling protein to its coding DNA on magnetic microbeads using emulsion PCR and cell-free protein synthesis in emulsion. A PCR mixture containing streptavidin-coated microbeads was compartmentalized by water-in-oil (w/o) emulsion with estimated 0.5 template molecules per droplet. The template molecules were amplified and immobilized on beads via bead-linked reverse primers and biotinylated forward primers. After amplification, the templates were sequentially labeled with streptavidin and biotinylated anti-glutathione S-transferase (GST) antibody. The pool of beads was then subjected to cell-free protein synthesis compartmentalized in another w/o emulsion, in which templates were coupled to their coding proteins. We mixed two types of DNA templates of Histidine6 tag (His6)-fused and FLAG tag-fused GST in a ratio of 1:1,000 (His6: FLAG) for use as a model DNA library. After incubation with fluorescein isothiocyanate (FITC)-labeled anti-His6 (C-term) antibody, the beads with the His6 gene were enriched 917-fold in a single-round screening by using flow cytometry. A library with a theoretical diversity of 10(6) was constructed by randomizing the middle four residues of the His6 tag. After a two-round screening, the randomized sequences were substantially converged to peptide-encoding sequences recognized by the anti-His6 antibody.
The cDNA-derived amino acid sequence of hemoglobin II from Lucina pectinata.
Torres-Mercado, Elineth; Renta, Jessicca Y; Rodríguez, Yolanda; López-Garriga, Juan; Cadilla, Carmen L
2003-11-01
Hemoglobin II from the clam Lucina pectinata is an oxygen-reactive protein with a unique structural organization in the heme pocket involving residues Gln65 (E7), Tyr30 (B10), Phe44 (CD1), and Phe69 (E11). We employed the reverse transcriptase-polymerase chain reaction (RT-PCR) and methods to synthesize various cDNA(HbII). An initial 300-bp cDNA clone was amplified from total RNA by RT-PCR using degenerate oligonucleotides. Gene-specific primers derived from the HbII-partial cDNA sequence were used to obtain the 5' and 3' ends of the cDNA by RACE. The length of the HbII cDNA, estimated from overlapping clones, was approximately 2114 bases. Northern blot analysis revealed that the mRNA size of HbII agrees with the estimated size using cDNA data. The coding region of the full-length HbII cDNA codes for 151 amino acids. The calculated molecular weight of HbII, including the heme group and acetylated N-terminal residue, is 17,654.07 Da.
Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.
Kohany, Oleksiy; Gentles, Andrew J; Hankus, Lukasz; Jurka, Jerzy
2006-10-25
Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases. We describe the software tools RepbaseSubmitter and Censor, which are designed to facilitate updating and screening the content of Repbase. RepbaseSubmitter is a java-based interface for formatting and annotating Repbase entries. It eliminates many common formatting errors, and automates actions such as calculation of sequence lengths and composition, thus facilitating curation of Repbase sequences. In addition, it has several features for predicting protein coding regions in sequences; searching and including Pubmed references in Repbase entries; and searching the NCBI taxonomy database for correct inclusion of species information and taxonomic position. Censor is a tool to rapidly identify repetitive elements by comparison to known repeats. It uses WU-BLAST for speed and sensitivity, and can conduct DNA-DNA, DNA-protein, or translated DNA-translated DNA searches of genomic sequence. Defragmented output includes a map of repeats present in the query sequence, with the options to report masked query sequence(s), repeat sequences found in the query, and alignments. Censor and RepbaseSubmitter are available as both web-based services and downloadable versions. They can be found at http://www.girinst.org/repbase/submission.html (RepbaseSubmitter) and http://www.girinst.org/censor/index.php (Censor).
Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der
2017-03-01
abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
Position specific variation in the rate of evolution in transcription factor binding sites
Moses, Alan M; Chiang, Derek Y; Kellis, Manolis; Lander, Eric S; Eisen, Michael B
2003-01-01
Background The binding sites of sequence specific transcription factors are an important and relatively well-understood class of functional non-coding DNAs. Although a wide variety of experimental and computational methods have been developed to characterize transcription factor binding sites, they remain difficult to identify. Comparison of non-coding DNA from related species has shown considerable promise in identifying these functional non-coding sequences, even though relatively little is known about their evolution. Results Here we analyse the genome sequences of the budding yeasts Saccharomyces cerevisiae, S. bayanus, S. paradoxus and S. mikatae to study the evolution of transcription factor binding sites. As expected, we find that both experimentally characterized and computationally predicted binding sites evolve slower than surrounding sequence, consistent with the hypothesis that they are under purifying selection. We also observe position-specific variation in the rate of evolution within binding sites. We find that the position-specific rate of evolution is positively correlated with degeneracy among binding sites within S. cerevisiae. We test theoretical predictions for the rate of evolution at positions where the base frequencies deviate from background due to purifying selection and find reasonable agreement with the observed rates of evolution. Finally, we show how the evolutionary characteristics of real binding motifs can be used to distinguish them from artefacts of computational motif finding algorithms. Conclusion As has been observed for protein sequences, the rate of evolution in transcription factor binding sites varies with position, suggesting that some regions are under stronger functional constraint than others. This variation likely reflects the varying importance of different positions in the formation of the protein-DNA complex. The characterization of the pattern of evolution in known binding sites will likely contribute to the effective use of comparative sequence data in the identification of transcription factor binding sites and is an important step toward understanding the evolution of functional non-coding DNA. PMID:12946282
Garcia, J A; Harrich, D; Soultanakis, E; Wu, F; Mitsuyasu, R; Gaynor, R B
1989-01-01
The human immunodeficiency virus (HIV) type 1 LTR is regulated at the transcriptional level by both cellular and viral proteins. Using HeLa cell extracts, multiple regions of the HIV LTR were found to serve as binding sites for cellular proteins. An untranslated region binding protein UBP-1 has been purified and fractions containing this protein bind to both the TAR and TATA regions. To investigate the role of cellular proteins binding to both the TATA and TAR regions and their potential interaction with other HIV DNA binding proteins, oligonucleotide-directed mutagenesis of both these regions was performed followed by DNase I footprinting and transient expression assays. In the TATA region, two direct repeats TC/AAGC/AT/AGCTGC surround the TATA sequence. Mutagenesis of both of these direct repeats or of the TATA sequence interrupted binding over the TATA region on the coding strand, but only a mutation of the TATA sequence affected in vivo assays for tat-activation. In addition to TAR serving as the site of binding of cellular proteins, RNA transcribed from TAR is capable of forming a stable stem-loop structure. To determine the relative importance of DNA binding proteins as compared to secondary structure, oligonucleotide-directed mutations in the TAR region were studied. Local mutations that disrupted either the stem or loop structure were defective in gene expression. However, compensatory mutations which restored base pairing in the stem resulted in complete tat-activation. This indicated a significant role for the stem-loop structure in HIV gene expression. To determine the role of TAR binding proteins, mutations were constructed which extensively changed the primary structure of the TAR region, yet left stem base pairing, stem energy and the loop sequence intact. These mutations resulted in decreased protein binding to TAR DNA and defects in tat-activation, and revealed factor binding specifically to the loop DNA sequence. Further mutagenesis which inverted this stem and loop mutation relative to the HIV LTR mRNA start site resulted in even larger decreases in tat-activation. This suggests that multiple determinants, including protein binding, the loop sequence, and RNA or DNA secondary structure, are important in tat-activation and suggests that tat may interact with cellular proteins binding to DNA to increase HIV gene expression. Images PMID:2721501
NASA Astrophysics Data System (ADS)
Sun, S. M.; Slightom, J. L.; Hall, T. C.
1981-01-01
A plant gene coding for the major storage protein (phaseolin, G1-globulin) of the French bean was isolated from a genomic library constructed in the phage vector Charon 24A. Comparison of the nucleotide sequence of part of the gene with that of the cloned messenger RNA (cDNA) revealed the presence of three intervening sequences, all beginning with GTand ending with AG. The 5' and 3' boundaries of intervening sequences TVS-A (88 base pairs) and IVS-B (124 base pairs) are similar to those described for animal and viral genes, but the 3' boundary of IVS-C (129 base pairs) shows some differences. A sequence of 185 amino acids deduced from the cloned DMAs represents about 40% of a phaseolin polypeptide.
Zhang, Ai-bing; Feng, Jie; Ward, Robert D; Wan, Ping; Gao, Qiang; Wu, Jun; Zhao, Wei-zhong
2012-01-01
Species identification via DNA barcodes is contributing greatly to current bioinventory efforts. The initial, and widely accepted, proposal was to use the protein-coding cytochrome c oxidase subunit I (COI) region as the standard barcode for animals, but recently non-coding internal transcribed spacer (ITS) genes have been proposed as candidate barcodes for both animals and plants. However, achieving a robust alignment for non-coding regions can be problematic. Here we propose two new methods (DV-RBF and FJ-RBF) to address this issue for species assignment by both coding and non-coding sequences that take advantage of the power of machine learning and bioinformatics. We demonstrate the value of the new methods with four empirical datasets, two representing typical protein-coding COI barcode datasets (neotropical bats and marine fish) and two representing non-coding ITS barcodes (rust fungi and brown algae). Using two random sub-sampling approaches, we demonstrate that the new methods significantly outperformed existing Neighbor-joining (NJ) and Maximum likelihood (ML) methods for both coding and non-coding barcodes when there was complete species coverage in the reference dataset. The new methods also out-performed NJ and ML methods for non-coding sequences in circumstances of potentially incomplete species coverage, although then the NJ and ML methods performed slightly better than the new methods for protein-coding barcodes. A 100% success rate of species identification was achieved with the two new methods for 4,122 bat queries and 5,134 fish queries using COI barcodes, with 95% confidence intervals (CI) of 99.75-100%. The new methods also obtained a 96.29% success rate (95%CI: 91.62-98.40%) for 484 rust fungi queries and a 98.50% success rate (95%CI: 96.60-99.37%) for 1094 brown algae queries, both using ITS barcodes.
50 years of DNA ‘Breathing’: Reflections on Old and New Approaches
von Hippel, Peter H.; Johnson, Neil P.; Marcus, Andrew H.
2015-01-01
Summary The coding sequences for genes, and much other regulatory information involved in genome expression, are located ‘inside’ the DNA duplex. Thus the ‘macromolecular machines’ that read-out this information from the base sequence of the DNA must somehow access the DNA ‘interior’. Double-stranded (ds) DNA is a highly structured and cooperatively stabilized system at physiological temperatures, but is also only marginally stable and undergoes a cooperative ‘melting phase transition’ at temperatures not far above physiological. Furthermore, due to its length and heterogeneous sequence, with AT-rich segments being less stable than GC-rich segments, the DNA genome ‘melts’ in a multistate fashion. Therefore the DNA genome must also manifest thermally driven structural (‘breathing’) fluctuations at physiological temperatures that should reflect the heterogeneity of the dsDNA stability near the melting temperature. Thus many of the breathing fluctuations of dsDNA are likely also to be sequence dependent, and could well contain information that should be ‘readable’ and useable by regulatory proteins and protein complexes in site-specific binding reactions involving dsDNA ‘opening’. Our laboratory has been involved in studying the breathing fluctuations of duplex DNA for about 50 years. In this ‘Reflections’ article we present a relatively chronological overview of these studies, starting with the use of simple chemical probes (such as hydrogen exchange, formaldehyde and simple DNA ‘melting’ proteins) to examine the local stability of the dsDNA structure, and culminating in sophisticated spectroscopic approaches that can be used to monitor the breathing-dependent interactions of regulatory complexes with their duplex DNA targets in ‘real time’. PMID:23840028
Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita
2002-08-01
Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.
Martín, A C; López, R; García, P
1996-06-01
Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.
Fearnley, I M; Finel, M; Skehel, J M; Walker, J E
1991-01-01
The 39 kDa and 42 kDa subunits of NADH:ubiquinone oxidoreductase from bovine heart mitochondria are nuclear-coded components of the hydrophobic protein fraction of the enzyme. Their amino acid sequences have been deduced from the sequences of overlapping cDNA clones. These clones were amplified from total bovine heart cDNA by means of the polymerase chain reaction, with the use of complex mixtures of oligonucleotide primers based upon fragments of protein sequence determined at the N-terminals of the proteins and at internal sites. The protein sequences of the 39 kDa and 42 kDa subunits are 345 and 320 amino acid residues long respectively, and their calculated molecular masses are 39,115 Da and 36,693 Da. Both proteins are predominantly hydrophilic, but each contains one or two hydrophobic segments that could possibly be folded into transmembrane alpha-helices. The bovine 39 kDa protein sequence is related to that of a 40 kDa subunit from complex I from Neurospora crassa mitochondria; otherwise, it is not related significantly to any known sequence, including redox proteins and two polypeptides involved in import of proteins into mitochondria, known as the mitochondrial processing peptidase and the processing-enhancing protein. Therefore the functions of the 39 kDa and 42 kDa subunits of complex I are unknown. The mitochondrial gene product, ND4, a hydrophobic component of complex I with an apparent molecular mass of about 39 kDa, has been identified in preparations of the enzyme. This subunit stains faintly with Coomassie Blue dye, and in many gel systems it is not resolved from the nuclearcoded 36 kDa subunit. Images Fig. 1. PMID:1832859
Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric
2017-02-01
Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Le Chevanton, L; Leblon, G
1989-04-15
We cloned the ura5 gene coding for the orotate phosphoribosyl transferase from the ascomycete Sordaria macrospora by heterologous probing of a Sordaria genomic DNA library with the corresponding Podospora anserina sequence. The Sordaria gene was expressed in an Escherichia coli pyrE mutant strain defective for the same enzyme, and expression was shown to be promoted by plasmid sequences. The nucleotide sequence of the 1246-bp DNA fragment encompassing the region of homology with the Podospora gene has been determined. This sequence contains an open reading frame of 699 nucleotides. The deduced amino acid sequence shows 72% similarity with the corresponding Podospora protein.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Draft Genome Sequence of Mycobacterium boenickei CIP 107829.
Bouam, Amar; Robert, Catherine; Croce, Olivier; Levasseur, Anthony; Drancourt, Michel
2017-05-04
Mycobacterium boenickei is a rapidly growing mycobacterium isolated for the first time from a leg wound in the United States. Its 6,506,908-bp draft genome exhibits a 66.77% G+C content, 6,279 protein-coding genes, and 59 predicted RNA genes. In silico DNA-DNA hybridization confirms its assignment to the Mycobacterium fortuitum complex. Copyright © 2017 Bouam et al.
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.
Eernisse, D J
1992-04-01
DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
Whitaker, Weston R; Lee, Hanson; Arkin, Adam P; Dueber, John E
2015-03-20
Genetic sequences ported into non-native hosts for synthetic biology applications can gain unexpected properties. In this study, we explored sequences functioning as ribosome binding sites (RBSs) within protein coding DNA sequences (CDSs) that cause internal translation, resulting in truncated proteins. Genome-wide prediction of bacterial RBSs, based on biophysical calculations employed by the RBS calculator, suggests a selection against internal RBSs within CDSs in Escherichia coli, but not those in Saccharomyces cerevisiae. Based on these calculations, silent mutations aimed at removing internal RBSs can effectively reduce truncation products from internal translation. However, a solution for complete elimination of internal translation initiation is not always feasible due to constraints of available coding sequences. Fluorescence assays and Western blot analysis showed that in genes with internal RBSs, increasing the strength of the intended upstream RBS had little influence on the internal translation strength. Another strategy to minimize truncated products from an internal RBS is to increase the relative strength of the upstream RBS with a concomitant reduction in promoter strength to achieve the same protein expression level. Unfortunately, lower transcription levels result in increased noise at the single cell level due to stochasticity in gene expression. At the low expression regimes desired for many synthetic biology applications, this problem becomes particularly pronounced. We found that balancing promoter strengths and upstream RBS strengths to intermediate levels can achieve the target protein concentration while avoiding both excessive noise and truncated protein.
Phylogenetic Network for European mtDNA
Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari
2001-01-01
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
Structural Relationships Between Minor and Major Proteins of Hepatitis B Surface Antigen
Stibbe, Werner; Gerlich, Wolfram H.
1983-01-01
The minor glycoproteins from hepatitis B surface antigen, GP33 and GP36, contain at their carboxy-terminal part the sequence of the major protein P24. They have 55 additional amino acids at the amino-terminal part which are coded by the pre-S region of the viral DNA. Images PMID:6842680
DOE Office of Scientific and Technical Information (OSTI.GOV)
Claffey, K.P.; Herrera, V.L.; Brecher, P.
1987-12-01
A fatty acid binding protein (FABP) as been identified and characterized in rat heart, but the function and regulation of this protein are unclear. In this study the cDNA for rat heart FABP was cloned from a lambda gt11 library. Sequencing of the cDNA showed an open reading frame coding for a protein with 133 amino acids and a calculated size of 14,776 daltons. Several differences were found between the sequence determined from the cDNA and that reported previously by protein sequencing techniques. Northern blot analysis using rat heart FABP cDNA as a probe established the presence of an abundantmore » mRNA in rat heart about 0.85 kilobases in length. This mRNA was detected, but was not abundant, in fetal heart tissue. Tissue distribution studies showed a similar mRNA species in red, but not white, skeletal muscle. In general, the mRNA tissue distribution was similar to that of the protein detected by Western immunoblot analysis, suggesting that heart FABP expression may be regulated at the transcriptional level. S1 nuclease mapping studies confirmed that the mRNA hybridized to rat heart FABP cDNA was identical in heart and red skeletal muscle throughout the entire open reading frame. The structural differences between heart FABP and other members of this multigene family may be related to the functional requirements of oxidative muscle for fatty acids as a fuel source.« less
2018-01-01
ABSTRACT The complete genome sequence of Bacillus cereus strain TG1-6, which is a highly salt-tolerant rhizobacterium that enhances plant tolerance to drought stress, is reported here. The sequencing process was performed based on a combination of pyrosequencing and single-molecule sequencing. The complete genome is estimated to be approximately 5.42 Mb, containing a total of 5,610 predicted protein-coding DNA sequences (CDSs). PMID:29748401
Vílchez, Juan Ignacio; Tang, Qiming; Kaushal, Richa; Wang, Wei; Lv, Suhui; He, Danxia; Chu, Zhaoqing; Zhang, Heng; Liu, Renyi; Zhang, Huiming
2018-06-21
Here, we report the complete genome sequence for Bacillus megaterium strain YC4-R4, a highly salt-tolerant rhizobacterium that promotes growth in plants. The sequencing process was performed by combining pyrosequencing and single-molecule sequencing techniques. The complete genome is estimated to be approximately 5.44 Mb, containing a total of 5,673 predicted protein-coding DNA sequences (CDSs). Copyright © 2018 Vílchez et al.
Conserved Curvature of RNA Polymerase I Core Promoter Beyond rRNA Genes: The Case of the Tritryps
Smircich, Pablo; Duhagon, María Ana; Garat, Beatriz
2015-01-01
In trypanosomatids, the RNA polymerase I (RNAPI)-dependent promoters controlling the ribosomal RNA (rRNA) genes have been well identified. Although the RNAPI transcription machinery recognizes the DNA conformation instead of the DNA sequence of promoters, no conformational study has been reported for these promoters. Here we present the in silico analysis of the intrinsic DNA curvature of the rRNA gene core promoters in Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major. We found that, in spite of the absence of sequence conservation, these promoters hold conformational properties similar to other eukaryotic rRNA promoters. Our results also indicated that the intrinsic DNA curvature pattern is conserved within the Leishmania genus and also among strains of T. cruzi and T. brucei. Furthermore, we analyzed the impact of point mutations on the intrinsic curvature and their impact on the promoter activity. Furthermore, we found that the core promoters of protein-coding genes transcribed by RNAPI in T. brucei show the same conserved conformational characteristics. Overall, our results indicate that DNA intrinsic curvature of the rRNA gene core promoters is conserved in these ancient eukaryotes and such conserved curvature might be a requirement of RNAPI machinery for transcription of not only rRNA genes but also protein-coding genes. PMID:26718450
Tappaz, M; Bitoun, M; Reymond, I; Sergeant, A
1999-09-01
Cysteine sulfinate decarboxylase (CSD) is considered as the rate-limiting enzyme in the biosynthesis of taurine, a possible osmoregulator in brain. Through cloning and sequencing of RT-PCR and RACE-PCR products of rat brain mRNAs, a 2,396-bp cDNA sequence was obtained encoding a protein of 493 amino acids (calculated molecular mass, 55.2 kDa). The corresponding fusion protein showed a substrate specificity similar to that of the endogenous enzyme. The sequence of the encoded protein is identical to that encoded by liver CSD cDNA. Among other characterized amino acid decarboxylases, CSD shows the highest homology (54%) with either isoform of glutamic acid decarboxylase (GAD65 and GAD67). A single mRNA band, approximately 2.5 kb, was detected by northern blot in RNA extracts of brain, liver, and kidney. However, brain and liver CSD cDNA sequences differed in the 5' untranslated region. This indicates two forms of CSD mRNA. Analysis of PCR-amplified products of genomic DNA suggests that the brain form results from the use of a 3' alternative internal splicing site within an exon specifically found in liver CSD mRNA. Through selective RT-PCR the brain form was detected in brain only, whereas the liver form was found in liver and kidney. These results indicate a tissue-specific regulation of CSD genomic expression.
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S
1999-03-04
We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
Murray, R; Pederson, K; Prosser, H; Muller, D; Hutchison, C A; Frelinger, J A
1988-01-01
We have used random oligonucleotide mutagenesis (or saturation mutagenesis) to create a library of point mutations in the alpha 1 protein domain of a Major Histocompatibility Complex (MHC) molecule. This protein domain is critical for T cell and B cell recognition. We altered the MHC class I H-2DP gene sequence such that synthetic mutant alpha 1 exons (270 bp of coding sequence), which contain mutations identified by sequence analysis, can replace the wild type alpha 1 exon. The synthetic exons were constructed from twelve overlapping oligonucleotides which contained an average of 1.3 random point mutations per intact exon. DNA sequence analysis of mutant alpha 1 exons has shown a point mutant distribution that fits a Poisson distribution, and thus emphasizes the utility of this mutagenesis technique to "scan" a large protein sequence for important mutations. We report our use of saturation mutagenesis to scan an entire exon of the H-2DP gene, a cassette strategy to replace the wild type alpha 1 exon with individual mutant alpha 1 exons, and analysis of mutant molecules expressed on the surface of transfected mouse L cells. Images PMID:2903482
USDA-ARS?s Scientific Manuscript database
In this paper, we report the full length coding sequence of bovine ATGL cDNA are reported and analyze its expression in bovine tissues. Similar to human, mouse, and pig ATGL sequences, bovine ATGL has a highly conserved patatin domain that is necessary for lipolytic function in mice and humans. Thi...
Jiang, Jiming
2015-04-01
Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
The complete chloroplast genome of Aconitum chiisanense Nakai (Ranunculaceae).
Lim, Chae Eun; Kim, Goon-Bo; Baek, Seunghoon; Han, Su-Min; Yu, Hee-Ju; Mun, Jeong-Hwan
2017-01-01
We determined the complete chloroplast DNA sequence of Aconitum chiisanense Nakai, a rare Aconitum species endemic to Korea. The chloroplast genome is 155 934 bp in length and contains 4 rRNA, 30 tRNA, and 78 protein-coding genes. Phylogenetic analysis revealed that the chloroplast genome of A. chiisanense is closely related to that of A. barbatum var. puberulum. Sequence comparison with other Ranunculaceae chloroplasts identified a unique deletion in the rps16 gene of A. chiisanense chloroplast DNA that can serve as a molecular marker for species identification.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
The complete mitochondrial genome of Conus tulipa (Neogastropoda: Conidae).
Chen, Po-Wei; Hsiao, Sheng-Tai; Huang, Chih-Wei; Chen, Kao-Sung; Tseng, Chen-Te; Wu, Wen-Lung; Hwang, Deng-Fwu
2016-07-01
The complete mitogenome sequence of the cone snail Conus tulipa (Linnaeus, 1758) has been sequenced by next-generation sequencing method. The assembled mitogenome is 16,599 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The overall base composition of C. tulipa is 28.7% A, 15.2% C, 18.4% G and 37.7% T. It shows 81.1% identity to the cone snail C. consors, 78.5% to C. borgesi and 77.5% to C. textile. Using the 13 protein-coding genes and 2 ribosomal RNA genes of C. tulipa in this study, together with 18 other closely species, we constructed the species phylogenetic tree to verify the accuracy and utility of new determined mitogenome sequence. The complete mitogenome of the C. tulipa provides an essential and important DNA molecular data for further phylogeography and evolutionary analysis for cone snail phylogeny.
Characterization of an Equine α-S2-Casein Variant Due to a 1.3 kb Deletion Spanning Two Coding Exons
Brinkmann, Julia; Koudelka, Tomas; Keppler, Julia K.; Tholey, Andreas; Schwarz, Karin; Thaller, Georg; Tetens, Jens
2015-01-01
The production and consumption of mare’s milk in Europe has gained importance, mainly based on positive health effects and a lower allergenic potential as compared to cows’ milk. The allergenicity of milk is to a certain extent affected by different genetic variants. In classical dairy species, much research has been conducted into the genetic variability of milk proteins, but the knowledge in horses is scarce. Here, we characterize two major forms of equine αS2-casein arising from genomic 1.3 kb in-frame deletion involving two coding exons, one of which represents an equid specific duplication. Findings at the DNA-level have been verified by cDNA sequencing from horse milk of mares with different genotypes. At the protein-level, we were able to show by SDS-page and in-gel digestion with subsequent LC-MS analysis that both proteins are actually expressed. The comparison with published sequences of other equids revealed that the deletion has probably occurred before the ancestor of present-day asses and zebras diverged from the horse lineage. PMID:26444874
Mutations Affecting Expression of the rosy Locus in Drosophila melanogaster
Lee, Chong Sung; Curtis, Daniel; McCarron, Margaret; Love, Carol; Gray, Mark; Bender, Welcome; Chovnick, Arthur
1987-01-01
The rosy locus in Drosophila melanogaster codes for the enzyme xanthine dehydrogenase (XDH). Previous studies defined a "control element" near the 5' end of the gene, where variant sites affected the amount of rosy mRNA and protein produced. We have determined the DNA sequence of this region from both genomic and cDNA clones, and from the ry+10 underproducer strain. This variant strain had many sequence differences, so that the site of the regulatory change could not be fixed. A mutagenesis was also undertaken to isolate new regulatory mutations. We induced 376 new mutations with 1-ethyl-1-nitrosourea (ENU) and screened them to isolate those that reduced the amount of XDH protein produced, but did not change the properties of the enzyme. Genetic mapping was used to find mutations located near the 5' end of the gene. DNA from each of seven mutants was cloned and sequenced through the 5' region. Mutant base changes were identified in all seven; they appear to affect splicing and translation of the rosy mRNA. In a related study (T. P. Keith et al. 1987), the genomic and cDNA sequences are extended through the 3' end of the gene; the combined sequences define the processing pattern of the rosy transcript and predict the amino acid sequence of XDH. PMID:3036645
Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang
2012-05-01
The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
Structural modeling and molecular simulation analysis of HvAP2/EREBP from barley.
Pandey, Bharati; Sharma, Pradeep; Tyagi, Chetna; Goyal, Sukriti; Grover, Abhinav; Sharma, Indu
2016-06-01
AP2/ERF transcription factors play a critical role in plant development and stress adaptation. This study reports the three-dimensional ab initio-based model of AP2/EREBP protein of barley and its interaction with DNA. Full-length coding sequence of HvAP2/EREBP gene isolated from two Indian barley cultivars, RD 2503 and RD 31, was used to model the protein. Of five protein models obtained, the one with lowest C-score was chosen for further analysis. The N- and C-terminal regions of HvAP2 protein were found to be highly disordered. The dynamic properties of AP2/EREBP and its interaction with DNA were investigated by molecular dynamics simulation. Analysis of trajectories from simulation yielded the equilibrated conformation between 2-10ns for protein and 7-15ns for protein-DNA complex. We established relationship between DNA having GCC box and DNA-binding domain of HvAP2/EREBP was established by modeling 11-base-pair-long nucleotide sequence and HvAP2/EREBP protein using ab initio method. Analysis of protein-DNA interaction showed that a β-sheet motif constituting amino acid residues THR105, ARG100, ARG93, and ARG83 seems to play important role in stabilizing the complex as they form strong hydrogen bond interactions with the DNA motif. Taken together, this study provides first-hand comprehensive information detailing structural conformation and interactions of HvAP2/EREBP proteins in barley. The study intensifies the role of computational approaches for preliminary examination of unknown proteins in the absence of experimental information. It also provides molecular insight into protein-DNA binding for understanding and enhancing abiotic stress resistance for improving the water use efficiency in crop plants.
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays
Boerner, Susan; McGinnis, Karen M.
2012-01-01
Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.
Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao
2017-07-01
In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Genomics dataset of unidentified disclosed isolates.
Rekadwad, Bhagwan N
2016-09-01
Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Song, Wen Jun; Qin, Qi Wei; Qiu, Jin; Huang, Can Hua; Wang, Fan; Hew, Choy Leong
2004-01-01
Here we report the complete genome sequence of Singapore grouper iridovirus (SGIV). Sequencing of the random shotgun and restriction endonuclease genomic libraries showed that the entire SGIV genome consists of 140,131 nucleotide bp. One hundred sixty-two open reading frames (ORFs) from the sense and antisense DNA strands, coding for lengths varying from 41 to 1,268 amino acids, were identified. Computer-assisted analyses of the deduced amino acid sequences revealed that 77 of the ORFs exhibited homologies to known virus genes, 23 of which matched functional iridovirus proteins. Forty-two putative conserved domains or signatures were detected in the National Center for Biotechnology Information CD-Search database and PROSITE database. An assortment of enzyme activities involved in DNA replication, transcription, nucleotide metabolism, cell signaling, etc., were identified. Viruses were cultured on a cell line derived from the embryonated egg of the grouper Epinephelus tauvina, isolated, and purified by sucrose gradient ultracentrifugation. The protein extract from the purified virions was analyzed by polyacrylamide gel electrophoresis followed by in-gel digestion of protein bands. Matrix-assisted laser desorption ionization-time of flight mass spectrometry and database searching led to identification of 26 proteins. Twenty of these represented novel or previously unidentified genes, which were further confirmed by reverse transcription-PCR (RT-PCR) and DNA sequencing of their respective RT-PCR products. PMID:15507645
Archaebacterial rhodopsin sequences: Implications for evolution
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1991-01-01
It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Functional interrogation of non-coding DNA through CRISPR genome editing
Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.
2017-01-01
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Characterization of a Novel Polerovirus Infecting Maize in China
Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping
2016-01-01
A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3′ half of P3–P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved. PMID:27136578
Characterization of a Novel Polerovirus Infecting Maize in China.
Chen, Sha; Jiang, Guangzhuang; Wu, Jianxiang; Liu, Yong; Qian, Yajuan; Zhou, Xueping
2016-04-28
A novel virus, tentatively named Maize Yellow Mosaic Virus (MaYMV), was identified from the field-grown maize plants showing yellow mosaic symptoms on the leaves collected from the Yunnan Province of China by the deep sequencing of small RNAs. The complete 5642 nucleotide (nt)-long genome of the MaYMV shared the highest nucleotide sequence identity (73%) to Maize Yellow Dwarf Virus-RMV. Sequence comparisons and phylogenetic analyses suggested that MaYMV represents a new member of the genus Polerovirus in the family Luteoviridae. Furthermore, the P0 protein encoded by MaYMV was demonstrated to inhibit both local and systemic RNA silencing by co-infiltration assays using transgenic Nicotiana benthamiana line 16c carrying the GFP reporter gene, which further supported the identification of a new polerovirus. The biologically-active cDNA clone of MaYMV was generated by inserting the full-length cDNA of MaYMV into the binary vector pCB301. RT-PCR and Northern blot analyses showed that this clone was systemically infectious upon agro-inoculation into N. benthamiana. Subsequently, 13 different isolates of MaYMV from field-grown maize plants in different geographical locations of Yunnan and Guizhou provinces of China were sequenced. Analyses of their molecular variation indicate that the 3' half of P3-P5 read-through protein coding region was the most variable, whereas the coat protein- (CP-) and movement protein- (MP-)coding regions were the most conserved.
Liu, Betty R.; Huang, Yue-Wern; Aronstam, Robert S.; Lee, Han-Jung
2016-01-01
Cell-penetrating peptides (CPPs) have been shown to deliver cargos, including protein, DNA, RNA, and nanomaterials, in fully active forms into live cells. Most of the CPP sequences in use today are based on non-native proteins that may be immunogenic. Here we demonstrate that the L5a CPP (RRWQW) from bovine lactoferricin (LFcin), stably and noncovalently complexed with plasmid DNA and prepared at an optimal nitrogen/phosphate ratio of 12, is able to efficiently enter into human lung cancer A549 cells. The L5a CPP delivered a plasmid containing the enhanced green fluorescent protein (EGFP) coding sequence that was subsequently expressed in cells, as revealed by real-time PCR and fluorescent microscopy at the mRNA and protein levels, respectively. Treatment with calcium chloride increased the level of gene expression, without affecting CPP-mediated transfection efficiency. Zeta-potential analysis revealed that positively electrostatic interactions of CPP/DNA complexes correlated with CPP-mediated transport. The L5a and L5a/DNA complexes were not cytotoxic. This biomimetic LFcin L5a represents one of the shortest effective CPPs and could be a promising lead peptide with less immunogenic for DNA delivery in gene therapy. PMID:26942714
Liu, Betty R; Huang, Yue-Wern; Aronstam, Robert S; Lee, Han-Jung
2016-01-01
Cell-penetrating peptides (CPPs) have been shown to deliver cargos, including protein, DNA, RNA, and nanomaterials, in fully active forms into live cells. Most of the CPP sequences in use today are based on non-native proteins that may be immunogenic. Here we demonstrate that the L5a CPP (RRWQW) from bovine lactoferricin (LFcin), stably and noncovalently complexed with plasmid DNA and prepared at an optimal nitrogen/phosphate ratio of 12, is able to efficiently enter into human lung cancer A549 cells. The L5a CPP delivered a plasmid containing the enhanced green fluorescent protein (EGFP) coding sequence that was subsequently expressed in cells, as revealed by real-time PCR and fluorescent microscopy at the mRNA and protein levels, respectively. Treatment with calcium chloride increased the level of gene expression, without affecting CPP-mediated transfection efficiency. Zeta-potential analysis revealed that positively electrostatic interactions of CPP/DNA complexes correlated with CPP-mediated transport. The L5a and L5a/DNA complexes were not cytotoxic. This biomimetic LFcin L5a represents one of the shortest effective CPPs and could be a promising lead peptide with less immunogenic for DNA delivery in gene therapy.
Ganeshan, Seedhabadee; Sharma, Pallavi; Young, Lester; Kumar, Ashwani; Fowler, D Brian; Chibbar, Ravindra N
2011-03-01
Low-temperature (LT) tolerance in winter wheat (Triticum aestivum L.) is an economically important but complex trait. Four selected wheat genotypes, a winter hardy cultivar, Norstar, a tender spring cultivar, Manitou and two near-isogenic lines with Vrn-A1 (spring Norstar) and vrn-A1 (winter Manitou) alleles of Manitou and Norstar were cold-acclimated at 6°C and crown and leaf tissues were collected at 0, 2, 14, 21, 35, 42, 56 and 70 days of cold acclimation. cDNA-AFLP profiling was used to determine temporal expression profiles of transcripts during cold-acclimation in crown and leaf tissues, separately to determine if LT regulatory circuitries in crown and leaf tissues could be delineated using this approach. Screening 64 primer combinations identified 4,074 and 2,757 differentially expressed transcript-derived fragments (TDFs) out of which 38 and 16% were up-regulated as compared to 3 and 6% that were down-regulated in crown and leaf tissues, respectively. DNA sequencing of TDFs revealed sequences common to both tissues including genes coding for DEAD-box RNA helicase, choline-phosphate cytidylyltransferase and delta-1-pyrroline carboxylate synthetase. TDF specific to crown tissues included genes coding for phospahtidylinositol kinase, auxin response factor protein and brassinosteroid insensitive 1-associated receptor kinase. In leaf, genes such as methylene tetrahydrofolate reductase, NADH-cytochrome b5 reductase and malate dehydrogenase were identified. However, 30 and 14% of the DNA sequences from the crown and leaf tissues, respectively, were hypothetical or unknown proteins. Cluster analysis of up-, down-regulated and unique TDFs, DNA sequence and real-time PCR validation, infer that mechanisms operating in crown and leaf tissue in response to LT are differently regulated and warrant further studies.
Hill, J; McGraw, P; Tzagoloff, A
1985-03-25
The yeast nuclear gene CBP2 was previously proposed to code for a protein necessary for processing of the terminal intron in the cytochrome b pre-mRNA (McGraw, P., and Tzagoloff, A. (1983) J. Biol. Chem. 258, 9459-9468). In the present study we describe a mitochondrial mutation capable of suppressing the respiratory deficiency of cbp2 mutants. The mitochondrial suppressor mutation has been shown to be the result of a precise excision of the last intervening sequence from the cytochrome b gene. Strains with the altered mitochondrial DNA have normal levels of mature cytochrome b mRNA and of cytochrome b and exhibit wild type growth on glycerol. These results confirm that CBP2 codes for a protein specifically required for splicing of the cytochrome b intron and further suggest that absence of the intervening sequence does not noticeably affect the expression of respiratory function in mitochondria.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
2011-01-01
Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest mitochondrial genome reported so far. PMID:21854637
Correlation approach to identify coding regions in DNA sequences
NASA Technical Reports Server (NTRS)
Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1994-01-01
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
RNA Editing in Plant Mitochondria
NASA Astrophysics Data System (ADS)
Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel
1989-12-01
Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N
2014-03-01
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea.
Goncearenco, Alexander; Ma, Bin-Guang; Berezovsky, Igor N.
2014-01-01
DNA, RNA and proteins are major biological macromolecules that coevolve and adapt to environments as components of one highly interconnected system. We explore here sequence/structure determinants of mechanisms of adaptation of these molecules, links between them, and results of their mutual evolution. We complemented statistical analysis of genomic and proteomic sequences with folding simulations of RNA molecules, unraveling causal relations between compositional and sequence biases reflecting molecular adaptation on DNA, RNA and protein levels. We found many compositional peculiarities related to environmental adaptation and the life style. Specifically, thermal adaptation of protein-coding sequences in Archaea is characterized by a stronger codon bias than in Bacteria. Guanine and cytosine load in the third codon position is important for supporting the aerobic life style, and it is highly pronounced in Bacteria. The third codon position also provides a tradeoff between arginine and lysine, which are favorable for thermal adaptation and aerobicity, respectively. Dinucleotide composition provides stability of nucleic acids via strong base-stacking in ApG dinucleotides. In relation to coevolution of nucleic acids and proteins, thermostability-related demands on the amino acid composition affect the nucleotide content in the second codon position in Archaea. PMID:24371267
Subramanian, Sankar; Lingala, Syamala Gowri; Swaminathan, Siva; Huynen, Leon; Lambert, David
2014-08-01
The complete mitochondrial genome of the Chinstrap penguin (Pygoscelis antarcticus) was sequenced and compared with other penguin mitogenomes. The genome is 15,972 bp in length with the number and order of protein coding genes and RNAs being very similar to that of other known penguin mitogenomes. Comparative nucleotide analysis showed the Chinstrap mitogenome shares 94% homology with the mitogenome of its sister species, Pygoscelis adelie (Adélie penguin). Divergence at nonsynonymous nucleotide positions was found to be up to 23 times less than that observed in synonymous positions of protein coding genes, suggesting high selection constraints. The complete mitogenome data will be useful for genetic and evolutionary studies of penguins.
Sequence-based prediction of protein-binding sites in DNA: comparative study of two SVM models.
Park, Byungkyu; Im, Jinyong; Tuvshinjargal, Narankhuu; Lee, Wook; Han, Kyungsook
2014-11-01
As many structures of protein-DNA complexes have been known in the past years, several computational methods have been developed to predict DNA-binding sites in proteins. However, its inverse problem (i.e., predicting protein-binding sites in DNA) has received much less attention. One of the reasons is that the differences between the interaction propensities of nucleotides are much smaller than those between amino acids. Another reason is that DNA exhibits less diverse sequence patterns than protein. Therefore, predicting protein-binding DNA nucleotides is much harder than predicting DNA-binding amino acids. We computed the interaction propensity (IP) of nucleotide triplets with amino acids using an extensive dataset of protein-DNA complexes, and developed two support vector machine (SVM) models that predict protein-binding nucleotides from sequence data alone. One SVM model predicts protein-binding nucleotides using DNA sequence data alone, and the other SVM model predicts protein-binding nucleotides using both DNA and protein sequences. In a 10-fold cross-validation with 1519 DNA sequences, the SVM model that uses DNA sequence data only predicted protein-binding nucleotides with an accuracy of 67.0%, an F-measure of 67.1%, and a Matthews correlation coefficient (MCC) of 0.340. With an independent dataset of 181 DNAs that were not used in training, it achieved an accuracy of 66.2%, an F-measure 66.3% and a MCC of 0.324. Another SVM model that uses both DNA and protein sequences achieved an accuracy of 69.6%, an F-measure of 69.6%, and a MCC of 0.383 in a 10-fold cross-validation with 1519 DNA sequences and 859 protein sequences. With an independent dataset of 181 DNAs and 143 proteins, it showed an accuracy of 67.3%, an F-measure of 66.5% and a MCC of 0.329. Both in cross-validation and independent testing, the second SVM model that used both DNA and protein sequence data showed better performance than the first model that used DNA sequence data. To the best of our knowledge, this is the first attempt to predict protein-binding nucleotides in a given DNA sequence from the sequence data alone. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Kowalski, Madzia P.; Baylis, Howard A.; Krude, Torsten
2015-01-01
ABSTRACT Stem bulge RNAs (sbRNAs) are a family of small non-coding stem-loop RNAs present in Caenorhabditis elegans and other nematodes, the function of which is unknown. Here, we report the first functional characterisation of nematode sbRNAs. We demonstrate that sbRNAs from a range of nematode species are able to reconstitute the initiation of chromosomal DNA replication in the presence of replication proteins in vitro, and that conserved nucleotide sequence motifs are essential for this function. By functionally inactivating sbRNAs with antisense morpholino oligonucleotides, we show that sbRNAs are required for S phase progression, early embryonic development and the viability of C. elegans in vivo. Thus, we demonstrate a new and essential role for sbRNAs during the early development of C. elegans. sbRNAs show limited nucleotide sequence similarity to vertebrate Y RNAs, which are also essential for the initiation of DNA replication. Our results therefore establish that the essential function of small non-coding stem-loop RNAs during DNA replication extends beyond vertebrates. PMID:25908866
Conditional sterility in plants
Meagher, Richard B.; McKinney, Elizabeth; Kim, Tehryung
2010-02-23
The present disclosure provides methods, recombinant DNA molecules, recombinant host cells containing the DNA molecules, and transgenic plant cells, plant tissue and plants which contain and express at least one antisense or interference RNA specific for a thiamine biosynthetic coding sequence or a thiamine binding protein or a thiamine-degrading protein, wherein the RNA or thiamine binding protein is expressed under the regulatory control of a transcription regulatory sequence which directs expression in male and/or female reproductive tissue. These transgenic plants are conditionally sterile; i.e., they are fertile only in the presence of exogenous thiamine. Such plants are especially appropriate for use in the seed industry or in the environment, for example, for use in revegetation of contaminated soils or phytoremediation, especially when those transgenic plants also contain and express one or more chimeric genes which confer resistance to contaminants.
Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru
2007-01-01
The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
ANN modeling of DNA sequences: new strategies using DNA shape code.
Parbhane, R V; Tambe, S S; Kulkarni, B D
2000-09-01
Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
Roux-Rouquie, M; Marilley, M
2000-09-15
We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.
Functional interrogation of non-coding DNA through CRISPR genome editing.
Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H
2017-05-15
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Croteau, Rodney Bruce; Wildung, Mark Raymond; Lange, Bernd Markus; McCaskill, David G.
2001-01-01
cDNAs encoding 1-deoxyxylulose-5-phosphate synthase from peppermint (Mentha piperita) have been isolated and sequenced, and the corresponding amino acid sequences have been determined. Accordingly, isolated DNA sequences (SEQ ID NO:3, SEQ ID NO:5, SEQ ID NO:7) are provided which code for the expression of 1-deoxyxylulose-5-phosphate synthase from plants. In another aspect the present invention provides for isolated, recombinant DXPS proteins, such as the proteins having the sequences set forth in SEQ ID NO:4, SEQ ID NO:6 and SEQ ID NO:8. In other aspects, replicable recombinant cloning vehicles are provided which code for plant 1-deoxyxylulose-5-phosphate synthases, or for a base sequence sufficiently complementary to at least a portion of 1-deoxyxylulose-5-phosphate synthase DNA or RNA to enable hybridization therewith. In yet other aspects, modified host cells are provided that have been transformed, transfected, infected and/or injected with a recombinant cloning vehicle and/or DNA sequence encoding a plant 1-deoxyxylulose-5-phosphate synthase. Thus, systems and methods are provided for the recombinant expression of the aforementioned recombinant 1-deoxyxylulose-5-phosphate synthase that may be used to facilitate its production, isolation and purification in significant amounts. Recombinant 1-deoxyxylulose-5-phosphate synthase may be used to obtain expression or enhanced expression of 1-deoxyxylulose-5-phosphate synthase in plants in order to enhance the production of 1-deoxyxylulose-5-phosphate, or its derivatives such as isopentenyl diphosphate (BP), or may be otherwise employed for the regulation or expression of 1-deoxyxylulose-5-phosphate synthase, or the production of its products.
On fuzzy semantic similarity measure for DNA coding.
Ahmad, Muneer; Jung, Low Tang; Bhuiyan, Md Al-Amin
2016-02-01
A coding measure scheme numerically translates the DNA sequence to a time domain signal for protein coding regions identification. A number of coding measure schemes based on numerology, geometry, fixed mapping, statistical characteristics and chemical attributes of nucleotides have been proposed in recent decades. Such coding measure schemes lack the biologically meaningful aspects of nucleotide data and hence do not significantly discriminate coding regions from non-coding regions. This paper presents a novel fuzzy semantic similarity measure (FSSM) coding scheme centering on FSSM codons׳ clustering and genetic code context of nucleotides. Certain natural characteristics of nucleotides i.e. appearance as a unique combination of triplets, preserving special structure and occurrence, and ability to own and share density distributions in codons have been exploited in FSSM. The nucleotides׳ fuzzy behaviors, semantic similarities and defuzzification based on the center of gravity of nucleotides revealed a strong correlation between nucleotides in codons. The proposed FSSM coding scheme attains a significant enhancement in coding regions identification i.e. 36-133% as compared to other existing coding measure schemes tested over more than 250 benchmarked and randomly taken DNA datasets of different organisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
The agents of natural genome editing.
Witzany, Guenther
2011-06-01
The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins called chromatin that compacts the DNA in the nucleus, strongly restricting access to DNA sequences. As a result, regulatory factors only interact with a small subset of their potential binding elements in a given cell to regulate genes. How factors recognize and select sites in chromatin across the genome is not well understood -- but several discoveries in CCR’s Laboratory of Receptor Biology and Gene Expression (LRBGE) have shed light on the mechanisms that direct factors to DNA.
Genomics dataset on unclassified published organism (patent US 7547531).
Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier
2016-12-01
Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Sequence similarity is more relevant than species specificity in probabilistic backtranslation.
Ferro, Alfredo; Giugno, Rosalba; Pigola, Giuseppe; Pulvirenti, Alfredo; Di Pietro, Cinzia; Purrello, Michele; Ragusa, Marco
2007-02-21
Backtranslation is the process of decoding a sequence of amino acids into the corresponding codons. All synthetic gene design systems include a backtranslation module. The degeneracy of the genetic code makes backtranslation potentially ambiguous since most amino acids are encoded by multiple codons. The common approach to overcome this difficulty is based on imitation of codon usage within the target species. This paper describes EasyBack, a new parameter-free, fully-automated software for backtranslation using Hidden Markov Models. EasyBack is not based on imitation of codon usage within the target species, but instead uses a sequence-similarity criterion. The model is trained with a set of proteins with known cDNA coding sequences, constructed from the input protein by querying the NCBI databases with BLAST. Unlike existing software, the proposed method allows the quality of prediction to be estimated. When tested on a group of proteins that show different degrees of sequence conservation, EasyBack outperforms other published methods in terms of precision. The prediction quality of a protein backtranslation methis markedly increased by replacing the criterion of most used codon in the same species with a Hidden Markov Model trained with a set of most similar sequences from all species. Moreover, the proposed method allows the quality of prediction to be estimated probabilistically.
Lopez, M; Eberlé, F; Mattei, M G; Gabert, J; Birg, F; Bardin, F; Maroc, C; Dubreuil, P
1995-04-03
The human poliovirus (PV) receptor (PVR) is a member of the immunoglobulin (Ig) superfamily with unknown cellular function. We have isolated a human PVR-related (PRR) cDNA. The deduced amino acid (aa) sequence of PRR showed, in the extracellular region, 51.7 and 54.3% similarity with human PVR and with the murine PVR homolog, respectively. The cDNA coding sequence is 1.6-kb long and encodes a deduced 57-kDa protein; this protein has a structural organization analogous to that of PVR, that is, one V- and two C-set Ig domains, with a conserved number of aa. Northern blot analysis indicated that a major 5.9-kb transcript is present in all normal human tissues tested. In situ hybridization showed that the PRR gene is located at bands q23-q24 of human chromosome 11.
3-base periodicity in coding DNA is affected by intercodon dinucleotides
Sánchez, Joaquín
2011-01-01
All coding DNAs exhibit 3-base periodicity (TBP), which may be defined as the tendency of nucleotides and higher order n-tuples, e.g. trinucleotides (triplets), to be preferentially spaced by 3, 6, 9 etc, bases, and we have proposed an association between TBP and clustering of same-phase triplets. We here investigated if TBP was affected by intercodon dinucleotide tendencies and whether clustering of same-phase triplets was involved. Under constant protein sequence intercodon dinucleotide frequencies depend on the distribution of synonymous codons. So, possible effects were revealed by randomly exchanging synonymous codons without altering protein sequences to subsequently document changes in TBP via frequency distribution of distances (FDD) of DNA triplets. A tripartite positive correlation was found between intercodon dinucleotide frequencies, clustering of same-phase triplets and TBP. So, intercodon C|A (where “|” indicates the boundary between codons) was more frequent in native human DNA than in the codon-shuffled sequences; higher C|A frequency occurred along with more frequent clustering of C|AN triplets (where N jointly represents A, C, G and T) and with intense CAN TBP. The opposite was found for C|G, which was less frequent in native than in shuffled sequences; lower C|G frequency occurred together with reduced clustering of C|GN triplets and with less intense CGN TBP. We hence propose that intercodon dinucleotides affect TBP via same-phase triplet clustering. A possible biological relevance of our findings is briefly discussed. PMID:21814388
Liu, Zhong-Yuan; Wang, Yun; Lü, Guo-Dong; Wang, Xian-Lei; Zhang, Fu-Chun; Ma, Ji
2006-12-01
The partial cDNA sequence coding for the antifreeze proteins in the Tenebrio molitor was obtained by RT-PCR. Sequence analysis revealed nine putative cDNAs with a high degree of homology to Tenebrio molitor antifreeze proteins. The recombinant pGEX-4T-1-tmafp-XJ430 was introduced into E. coli BL21 to induce a GST fusion protein by IPTG. SDS-PAGE of the fusion protein demonstrated that the antifreeze protein migrated at a size of 38 kDa. The immunization was performed by intra-muscular injection of pCDNA3-tmafp-XJ430, and then antiserum was detected by ELISA. The titer of the antibody was 1:2,000. Western blotting analysis showed the antiserum was specific against the antifreeze protein. This finding could lead to further investigation of the properties and function of antifreeze proteins.
Lipinska, B; Rao, A S; Bolten, B M; Balakrishnan, R; Goldberg, E B
1989-01-01
We sequenced bacteriophage T4 genes 2 and 3 and the putative C-terminal portion of gene 50. They were found to have appropriate open reading frames directed counterclockwise on the T4 map. Mutations in genes 2 and 64 were shown to be in the same open reading frame, which we now call gene 2. This gene codes for a protein of 27,068 daltons. The open reading frame corresponding to gene 3 codes for a protein of 20,634 daltons. Appropriate bands on polyacrylamide gels were identified at 30 and 20 kilodaltons, respectively. We found that the product of the cloned gene 2 can protect T4 DNA double-stranded ends from exonuclease V action. Images PMID:2644202
Watanabe, H; Narai, A; Shimizu, M
1999-06-01
A new protein that decreases transepithelial electrical resistance (TEER) in the human intestinal Caco-2 cell monolayer was found in a water-soluble fraction of the mushroom Flammulina velutipes. This protein, termed TEER-decreasing protein (TDP), is not cytotoxic and does not induce cell detachment, but rapidly increases the tight junctional permeability for water-soluble marker substances such as Lucifer Yellow CH (Mr 457) through the paracellular pathway. TDP was isolated and purified from the aqueous extract of F. velutipes by chromatographic means. Purified TDP was found to be a simple, nonglycosylated protein without intermolecular disulfide bonds, and the apparent molecular mass as estimated by SDS/PAGE and gel filtration is 30 kDa. It was revealed that the N-terminal amino-acid sequence of purified TDP is identical to the recently reported N-terminal sequence of flammutoxin, a membrane-perturbing hemolytic protein, for which the complete primary structure has not yet been reported [Tomita, T., Ishikawa, D., Noguchi, T., Katayama, E., and Hashimoto, Y. (1998) Biochem. J. 333, 24794-24799]. The cDNA coding for TDP was cloned by 5' and 3' rapid amplification of cDNA ends. The ORF encodes a protein with 272 amino-acid residues showing no homology to known proteins. Relevant studies using TDP cDNA will provide insight into the structure-function relationships of membrane pore-forming toxins.
Fricova, Dominika; Valach, Matus; Farkas, Zoltan; Pfeiffer, Ilona; Kucsera, Judit; Tomaska, Lubomir; Nosek, Jozef
2010-01-01
As a part of our initiative aimed at a large-scale comparative analysis of fungal mitochondrial genomes, we determined the complete DNA sequence of the mitochondrial genome of the yeast Candida subhashii and found that it exhibits a number of peculiar features. First, the mitochondrial genome is represented by linear dsDNA molecules of uniform length (29 795 bp), with an unusually high content of guanine and cytosine residues (52.7 %). Second, the coding sequences lack introns; thus, the genome has a relatively compact organization. Third, the termini of the linear molecules consist of long inverted repeats and seem to contain a protein covalently bound to terminal nucleotides at the 5′ ends. This architecture resembles the telomeres in a number of linear viral and plasmid DNA genomes classified as invertrons, in which the terminal proteins serve as specific primers for the initiation of DNA synthesis. Finally, although the mitochondrial genome of C. subhashii contains essentially the same set of genes as other closely related pathogenic Candida species, we identified additional ORFs encoding two homologues of the family B protein-priming DNA polymerases and an unknown protein. The terminal structures and the genes for DNA polymerases are reminiscent of linear mitochondrial plasmids, indicating that this genome architecture might have emerged from fortuitous recombination between an ancestral, presumably circular, mitochondrial genome and an invertron-like element. PMID:20395267
Complete mitochondrial genome sequence of Melipona scutellaris, a Brazilian stingless bee.
Pereira, Ulisses de Padua; Bonetti, Ana Maria; Goulart, Luiz Ricardo; Santos, Anderson Rodrigues Dos; Oliveira, Guilherme Correa de; Cuadros-Orellana, Sara; Ueira-Vieira, Carlos
2016-09-01
Melipona scutellaris is a Brazilian stingless bee species and a highly important native pollinator besides its use in rational rearing for honey production. In this study, we present the whole mitochondrial DNA sequence of M. scutellaris from a haploid male. The mitogenome has a size of 14,862 bp and harbors 13 protein-coding genes (PCGs), 2 rRNA genes and 21 tRNA genes.
Protein and gene structure of a blue laccase from Pleurotus ostreatus1.
Giardina, P; Palmieri, G; Scaloni, A; Fontanella, B; Faraco, V; Cennamo, G; Sannia, G
1999-01-01
A new laccase isoenzyme (POXA1b, where POX is phenol oxidase), produced by Pleurotus ostreatus in cultures supplemented with copper sulphate, has been purified and fully characterized. The main characteristics of this protein (molecular mass in native and denaturing conditions, pI and catalytic properties) are almost identical to the previously studied laccase POXA1w. However, POXA1b contains four copper atoms per molecule instead of one copper, two zinc and one iron atom per molecule of POXA1w. Furthermore, POXA1b shows an unusually high stability at alkaline pH. The gene and cDNA coding for POXA1b have been cloned and sequenced. The gene coding sequence contains 1599 bp, interrupted by 15 introns. Comparison of the structure of the poxa1b gene with the two previously studied P. ostreatus laccase genes (pox1 and poxc) suggests that these genes belong to two different subfamilies. The amino acid sequence of POXA1b deduced from the cDNA sequence has been almost completely verified by means of matrix-assisted laser desorption ionization MS. It has been demonstrated that three out of six putative glycosylation sites are post-translationally modified and the structure of the bound glycosidic moieties has been determined, whereas two other putative glycosylation sites are unmodified. PMID:10417329
Morchikh, Mehdi; Cribier, Alexandra; Raffel, Raoul; Amraoui, Sonia; Cau, Julien; Severac, Dany; Dubois, Emeric; Schwartz, Olivier; Bennasser, Yamina; Benkirane, Monsef
2017-08-03
The DNA-mediated innate immune response underpins anti-microbial defenses and certain autoimmune diseases. Here we used immunoprecipitation, mass spectrometry, and RNA sequencing to identify a ribonuclear complex built around HEXIM1 and the long non-coding RNA NEAT1 that we dubbed the HEXIM1-DNA-PK-paraspeckle components-ribonucleoprotein complex (HDP-RNP). The HDP-RNP contains DNA-PK subunits (DNAPKc, Ku70, and Ku80) and paraspeckle proteins (SFPQ, NONO, PSPC1, RBM14, and MATRIN3). We show that binding of HEXIM1 to NEAT1 is required for its assembly. We further demonstrate that the HDP-RNP is required for the innate immune response to foreign DNA, through the cGAS-STING-IRF3 pathway. The HDP-RNP interacts with cGAS and its partner PQBP1, and their interaction is remodeled by foreign DNA. Remodeling leads to the release of paraspeckle proteins, recruitment of STING, and activation of DNAPKc and IRF3. Our study establishes the HDP-RNP as a key nuclear regulator of DNA-mediated activation of innate immune response through the cGAS-STING pathway. Copyright © 2017 Elsevier Inc. All rights reserved.
Ribosomal protein S14 transcripts are edited in Oenothera mitochondria.
Schuster, W; Unseld, M; Wissinger, B; Brennicke, A
1990-01-01
The gene encoding ribosomal protein S14 (rps14) in Oenothera mitochondria is located upstream of the cytochrome b gene (cob). Sequence analysis of independently derived cDNA clones covering the entire rps14 coding region shows two nucleotides edited from the genomic DNA to the mRNA derived sequences by C to U modifications. A third editing event occurs four nucleotides upstream of the AUG initiation codon and improves a potential ribosome binding site. A CGG codon specifying arginine in a position conserved in evolution between chloroplasts and E. coli as a UGG tryptophan codon is not edited in any of the cDNAs analysed. An inverted repeat 3' of an unidentified open reading frame is located upstream of the rps14 gene. The inverted repeat sequence is highly conserved at analogous regions in other Oenothera mitochondrial loci. Images PMID:2326162
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
Borodovsky, M; Rudd, K E; Koonin, E V
1994-01-01
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Wise, C A; Chiang, L C; Paznekas, W A; Sharma, M; Musy, M M; Ashley, J A; Lovett, M; Jabs, E W
1997-04-01
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
Kawano, Tomonori
2013-03-01
There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.
Yasuno, Rie; Wada, Hajime
1998-01-01
Lipoic acid is a coenzyme that is essential for the activity of enzyme complexes such as those of pyruvate dehydrogenase and glycine decarboxylase. We report here the isolation and characterization of LIP1 cDNA for lipoic acid synthase of Arabidopsis. The Arabidopsis LIP1 cDNA was isolated using an expressed sequence tag homologous to the lipoic acid synthase of Escherichia coli. This cDNA was shown to code for Arabidopsis lipoic acid synthase by its ability to complement a lipA mutant of E. coli defective in lipoic acid synthase. DNA-sequence analysis of the LIP1 cDNA revealed an open reading frame predicting a protein of 374 amino acids. Comparisons of the deduced amino acid sequence with those of E. coli and yeast lipoic acid synthase homologs showed a high degree of sequence similarity and the presence of a leader sequence presumably required for import into the mitochondria. Southern-hybridization analysis suggested that LIP1 is a single-copy gene in Arabidopsis. Western analysis with an antibody against lipoic acid synthase demonstrated that this enzyme is located in the mitochondrial compartment in Arabidopsis cells as a 43-kD polypeptide. PMID:9808738
Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C
2018-06-01
High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
O'Malley, David M.; Porter, Stephanie; Sederoff, Ronald R.
1992-01-01
Cinnamyl alcohol dehydrogenase (CAD, EC 1.1.1. 195) has been purified to homogeneity from differentiating xylem tissue and developing seeds of loblolly pine (Pinus taeda L.). The enzyme is a dimer with a native molecular weight of 82,000 and a subunit molecular weight of 44,000, and is the only form of CAD involved in lignification in differentiating xylem. High levels of loblolly pine CAD enzyme were found in nonlignifying seed tissue. Characterization of the enzyme from both seeds and xylem demonstrated that the enzyme is the same in both tissues. The enzyme has a high affinity for coniferaldehyde (Km = 1.7 micromolar) compared with sinapaldehyde (Km in excess of 100 micromolar). Kinetic data strongly suggest that coniferin is a noncompetitive inhibitor of CAD enzyme activity. Protein sequences were obtained for the N-terminus (28 amino acids) and for two other peptides. Degenerate oligonucleotide primers based on the protein sequences were used to amplify by polymerase chain reaction a 1050 base pair DNA fragment from xylem cDNA. Nucleotide sequence from the cloned DNA fragment coded for the N-terminal protein sequence and an internal peptide of CAD. The N-terminal protein sequence has little similarity with the λCAD4 clone isolated from bean (MH Walter, J Grima-Pettenati, C Grand, AM Boudet, CJ Lamb [1988] Proc Natl Acad Sci USA 86:5546-5550), which has homology with malic enzyme. ImagesFigure 2Figure 3 PMID:16668801
O'malley, D M; Porter, S; Sederoff, R R
1992-04-01
Cinnamyl alcohol dehydrogenase (CAD, EC 1.1.1. 195) has been purified to homogeneity from differentiating xylem tissue and developing seeds of loblolly pine (Pinus taeda L.). The enzyme is a dimer with a native molecular weight of 82,000 and a subunit molecular weight of 44,000, and is the only form of CAD involved in lignification in differentiating xylem. High levels of loblolly pine CAD enzyme were found in nonlignifying seed tissue. Characterization of the enzyme from both seeds and xylem demonstrated that the enzyme is the same in both tissues. The enzyme has a high affinity for coniferaldehyde (K(m) = 1.7 micromolar) compared with sinapaldehyde (K(m) in excess of 100 micromolar). Kinetic data strongly suggest that coniferin is a noncompetitive inhibitor of CAD enzyme activity. Protein sequences were obtained for the N-terminus (28 amino acids) and for two other peptides. Degenerate oligonucleotide primers based on the protein sequences were used to amplify by polymerase chain reaction a 1050 base pair DNA fragment from xylem cDNA. Nucleotide sequence from the cloned DNA fragment coded for the N-terminal protein sequence and an internal peptide of CAD. The N-terminal protein sequence has little similarity with the lambdaCAD4 clone isolated from bean (MH Walter, J Grima-Pettenati, C Grand, AM Boudet, CJ Lamb [1988] Proc Natl Acad Sci USA 86:5546-5550), which has homology with malic enzyme.
Improved detection of DNA-binding proteins via compression technology on PSSM information.
Wang, Yubo; Ding, Yijie; Guo, Fei; Wei, Leyi; Tang, Jijun
2017-01-01
Since the importance of DNA-binding proteins in multiple biomolecular functions has been recognized, an increasing number of researchers are attempting to identify DNA-binding proteins. In recent years, the machine learning methods have become more and more compelling in the case of protein sequence data soaring, because of their favorable speed and accuracy. In this paper, we extract three features from the protein sequence, namely NMBAC (Normalized Moreau-Broto Autocorrelation), PSSM-DWT (Position-specific scoring matrix-Discrete Wavelet Transform), and PSSM-DCT (Position-specific scoring matrix-Discrete Cosine Transform). We also employ feature selection algorithm on these feature vectors. Then, these features are fed into the training SVM (support vector machine) model as classifier to predict DNA-binding proteins. Our method applys three datasets, namely PDB1075, PDB594 and PDB186, to evaluate the performance of our approach. The PDB1075 and PDB594 datasets are employed for Jackknife test and the PDB186 dataset is used for the independent test. Our method achieves the best accuracy in the Jacknife test, from 79.20% to 86.23% and 80.5% to 86.20% on PDB1075 and PDB594 datasets, respectively. In the independent test, the accuracy of our method comes to 76.3%. The performance of independent test also shows that our method has a certain ability to be effectively used for DNA-binding protein prediction. The data and source code are at https://doi.org/10.6084/m9.figshare.5104084.
The complete chloroplast genome of Sinopodophyllum hexandrum Ying (Berberidaceae).
Meng, Lihua; Liu, Ruijuan; Chen, Jianbing; Ding, Chenxu
2017-05-01
The complete nucleotide sequence of the Sinopodophyllum hexandrum Ying chloroplast genome (cpDNA) was determined based on next-generation sequencing technologies in this study. The genome was 157 203 bp in length, containing a pair of inverted repeat (IRa and IRb) regions of 25 960 bp, which were separated by a large single-copy (LSC) region of 87 065 bp and a small single-copy (SSC) region of 18 218 bp, respectively. The cpDNA contained 148 genes, including 96 protein-coding genes, 8 ribosomal RNA genes, and 44 tRNA genes. In these genes, eight harbored a single intron, and two (ycf3 and clpP) contained a couple of introns. The cpDNA AT content of S. hexandrum cpDNA is 61.5%.
Analysis of 16S-23S rRNA intergenic spacer regions of Vibrio cholerae and Vibrio mimicus.
Chun, J; Huq, A; Colwell, R R
1999-05-01
Vibrio cholerae identification based on molecular sequence data has been hampered by a lack of sequence variation from the closely related Vibrio mimicus. The two species share many genes coding for proteins, such as ctxAB, and show almost identical 16S DNA coding for rRNA (rDNA) sequences. Primers targeting conserved sequences flanking the 3' end of the 16S and the 5' end of the 23S rDNAs were used to amplify the 16S-23S rRNA intergenic spacer regions of V. cholerae and V. mimicus. Two major (ca. 580 and 500 bp) and one minor (ca. 750 bp) amplicons were consistently generated for both species, and their sequences were determined. The largest fragment contains three tRNA genes (tDNAs) coding for tRNAGlu, tRNALys, and tRNAVal, which has not previously been found in bacteria examined to date. The 580-bp amplicon contained tDNAIle and tDNAAla, whereas the 500-bp fragment had single tDNA coding either tRNAGlu or tRNAAla. Little variation, i.e., 0 to 0.4%, was found among V. cholerae O1 classical, O1 El Tor, and O139 epidemic strains. Slightly more variation was found against the non-O1/non-O139 serotypes (ca. 1% difference) and V. mimicus (2 to 3% difference). A pair of oligonucleotide primers were designed, based on the region differentiating all of V. cholerae strains from V. mimicus. The PCR system developed was subsequently evaluated by using representatives of V. cholerae from environmental and clinical sources, and of other taxa, including V. mimicus. This study provides the first molecular tool for identifying the species V. cholerae.
Is a Genome a Codeword of an Error-Correcting Code?
Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo
2012-01-01
Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.
Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred
2018-01-01
The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.
García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús
2017-08-10
Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min; Na, Seong-In; Yoon, Seok-Hwan; Chun, Jongsik
2017-06-01
Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
Maruyama, Sandra Regina; Castro-Jorge, Luiza Antunes; Ribeiro, José Marcos Chaves; Gardinassi, Luiz Gustavo; Garcia, Gustavo Rocha; Brandão, Lucinda Giampietro; Rodrigues, Aline Rezende; Okada, Marcos Ituo; Abrão, Emiliana Pereira; Ferreira, Beatriz Rossetti; da Fonseca, Benedito Antonio Lopes; de Miranda-Santos, Isabel Kinney Ferreira
2013-01-01
Transcripts similar to those that encode the nonstructural (NS) proteins NS3 and NS5 from flaviviruses were found in a salivary gland (SG) complementary DNA (cDNA) library from the cattle tick Rhipicephalus microplus. Tick extracts were cultured with cells to enable the isolation of viruses capable of replicating in cultured invertebrate and vertebrate cells. Deep sequencing of the viral RNA isolated from culture supernatants provided the complete coding sequences for the NS3 and NS5 proteins and their molecular characterisation confirmed similarity with the NS3 and NS5 sequences from other flaviviruses. Despite this similarity, phylogenetic analyses revealed that this potentially novel virus may be a highly divergent member of the genus Flavivirus. Interestingly, we detected the divergent NS3 and NS5 sequences in ticks collected from several dairy farms widely distributed throughout three regions of Brazil. This is the first report of flavivirus-like transcripts in R. microplus ticks. This novel virus is a potential arbovirus because it replicated in arthropod and mammalian cells; furthermore, it was detected in a cDNA library from tick SGs and therefore may be present in tick saliva. It is important to determine whether and by what means this potential virus is transmissible and to monitor the virus as a potential emerging tick-borne zoonotic pathogen. PMID:24626302
Montandon, P E; Vasserot, A; Stutz, E
1986-01-01
We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.
Freije, J M; Díez-Itza, I; Balbín, M; Sánchez, L M; Blasco, R; Tolivia, J; López-Otín, C
1994-06-17
A cDNA coding for a new human matrix metalloproteinase (MMP) has been cloned from a cDNA library derived from a breast tumor. The isolated cDNA contains an open reading frame coding for a polypeptide of 471 amino acids. The predicted protein sequence displays extensive similarity to the previously known MMPs and presents all the structural features characteristic of the members of this protein family, including the well conserved PRCGXPD motif, involved in the latency of the enzyme and the zinc-binding domain (HEXGHXXXXXHS). In addition, this novel human MMP contains in its amino acid sequence several residues specific to the collagenase subfamily (Tyr-214, Asp-235, and Gly-237) and lacks the 9-residue insertion present in the stromelysins. According to these structural characteristics, the MMP described herein has been tentatively called collagenase-3, since it represents the third member of this subfamily, composed at present of fibroblast and neutrophil collagenases. The collagenase-3 cDNA was expressed in a vaccinia virus system, and the recombinant protein was able to degrade fibrillar collagens, providing support to the hypothesis that the isolated cDNA codes for an authentic collagenase. Northern blot analysis of RNA from normal and pathological tissues demonstrated the existence in breast tumors of three different mRNA species, which seem to be the result of the utilization of different polyadenylation sites present in the 3'-noncoding region of the gene. By contrast, no collagenase-3 mRNA was detected either by Northern blot or RNA polymerase chain reaction analysis with RNA from other human tissues, including normal breast, mammary fibroadenomas, liver, placenta, ovary, uterus, prostate, and parotid gland. On the basis of the increased expression of collagenase-3 in breast carcinomas and the absence of detectable expression in normal tissues, a possible role for this metalloproteinase in the tumoral process is proposed.
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.
Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A
1993-01-01
The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Clément, Nathalie; Avalosse, Bernard; El Bakkouri, Karim; Velu, Thierry; Brandenburger, Annick
2001-01-01
The production of wild-type-free stocks of recombinant parvovirus minute virus of mice [MVM(p)] is difficult due to the presence of homologous sequences in vector and helper genomes that cannot easily be eliminated from the overlapping coding sequences. We have therefore cloned and sequenced spontaneously occurring defective particles of MVM(p) with very small genomes to identify the minimal cis-acting sequences required for DNA amplification and virus production. One of them has lost all capsid-coding sequences but is still able to replicate in permissive cells when nonstructural proteins are provided in trans by a helper plasmid. Vectors derived from this particle produce stocks with no detectable wild-type MVM after cotransfection with new, matched, helper plasmids that present no homology downstream from the transgene. PMID:11152501
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression
Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang
2007-01-01
Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-01-01
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers. PMID:23708105
Liu, Changqing; Liu, Dan; Guo, Yu; Lu, Taofeng; Li, Xiangchen; Zhang, Minghai; Ma, Jianzhang; Ma, Yuehui; Guan, Weijun
2013-05-24
In this study, a full-length enriched cDNA library was successfully constructed from Bengal tiger, Panthera tigris tigris, the most well-known wild Animal. Total RNA was extracted from cultured Bengal tiger fibroblasts in vitro. The titers of primary and amplified libraries were 1.28 × 106 pfu/mL and 1.56 × 109 pfu/mL respectively. The percentage of recombinants from unamplified library was 90.2% and average length of exogenous inserts was 0.98 kb. A total of 212 individual ESTs with sizes ranging from 356 to 1108 bps were then analyzed. The BLASTX score revealed that 48.1% of the sequences were classified as a strong match, 45.3% as nominal and 6.6% as a weak match. Among the ESTs with known putative function, 26.4% ESTs were found to be related to all kinds of metabolisms, 19.3% ESTs to information storage and processing, 11.3% ESTs to posttranslational modification, protein turnover, chaperones, 11.3% ESTs to transport, 9.9% ESTs to signal transducer/cell communication, 9.0% ESTs to structure protein, 3.8% ESTs to cell cycle, and only 6.6% ESTs classified as novel genes. By EST sequencing, a full-length gene coding ferritin was identified and characterized. The recombinant plasmid pET32a-TAT-Ferritin was constructed, coded for the TAT-Ferritin fusion protein with two 6× His-tags in N and C-terminal. After BCA assay, the concentration of soluble Trx-TAT-Ferritin recombinant protein was 2.32 ± 0.12 mg/mL. These results demonstrated that the reliability and representativeness of the cDNA library attained to the requirements of a standard cDNA library. This library provided a useful platform for the functional genome and transcriptome research of Bengal tigers.
Otsuki, Tetsuji; Ota, Toshio; Nishikawa, Tetsuo; Hayashi, Koji; Suzuki, Yutaka; Yamamoto, Jun-ichi; Wakamatsu, Ai; Kimura, Kouichi; Sakamoto, Katsuhiko; Hatano, Naoto; Kawai, Yuri; Ishii, Shizuko; Saito, Kaoru; Kojima, Shin-ichi; Sugiyama, Tomoyasu; Ono, Tetsuyoshi; Okano, Kazunori; Yoshikawa, Yoko; Aotsuka, Satoshi; Sasaki, Naokazu; Hattori, Atsushi; Okumura, Koji; Nagai, Keiichi; Sugano, Sumio; Isogai, Takao
2005-01-01
We have developed an in silico method of selection of human full-length cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. Fullness rates were increased to about 80% by combination of the oligo-capping method and ATGpr, software for prediction of translation start point and the coding potential. Then, using 5'-end single-pass sequences, cDNAs having the signal sequence were selected by PSORT ('signal sequence trap'). We also applied 'secretion or membrane protein-related keyword trap' based on the result of BLAST search against the SWISS-PROT database for the cDNAs which could not be selected by PSORT. Using the above procedures, 789 cDNAs were primarily selected and subjected to full-length sequencing, and 334 of these cDNAs were finally selected as novel. Most of the cDNAs (295 cDNAs: 88.3%) were predicted to encode secretion or membrane proteins. In particular, 165(80.5%) of the 205 cDNAs selected by PSORT were predicted to have signal sequences, while 70 (54.2%) of the 129 cDNAs selected by 'keyword trap' preserved the secretion or membrane protein-related keywords. Many important cDNAs were obtained, including transporters, receptors, and ligands, involved in significant cellular functions. Thus, an efficient method of selecting secretion or membrane protein-encoding cDNAs was developed by combining the above four procedures.
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity
NASA Astrophysics Data System (ADS)
Mukherjee, Shashi Bajaj; Sen, Pradip Kumar
2010-10-01
Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boore, Jeffrey L.; Medina, Monica; Rosenberg, Lewis A.
2004-01-31
We have determined the complete sequence of the mitochondrial genome of the scaphopod mollusk Graptacme eborea (Conrad, 1846) (14,492 nts) and completed the sequence of the mitochondrial genome of the bivalve mollusk Mytilus edulis Linnaeus, 1758 (16,740 nts). (The name Graptacme eborea is a revision of the species formerly known as Dentalium eboreum.) G. eborea mtDNA contains the 37 genes that are typically found and has the genes divided about evenly between the two strands, but M. edulis contains an extra trnM and is missing atp8, and has all genes on the same strand. Each has a highly rearranged genemore » order relative to each other and to all other studied mtDNAs. G. eborea mtDNA has almost no strand skew, but the coding strand of M. edulis mtDNA is very rich in G and T. This is reflected in differential codon usage patterns and even in amino acid compositions. G. eborea mtDNA has fewer non-coding nucleotides than any other mtDNA studied to date, with the largest non-coding region being only 24 nt long. Phylogenetic analysis using 2,420 aligned amino acid positions of concatenated proteins weakly supports an association of the scaphopod with gastropods to the exclusion of Bivalvia, Cephalopoda, and Polyplacophora, but is generally unable to convincingly resolve the relationships among major groups of the Lophotrochozoa, in contrast to the good resolution seen for several other major metazoan groups.« less
Roux-Rouquie, Magali; Marilley, Monique
2000-01-01
We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-11-01
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
New progress in snake mitochondrial gene rearrangement.
Chen, Nian; Zhao, Shujin
2009-08-01
To further understand the evolution of snake mitochondrial genomes, the complete mitochondrial DNA (mtDNA) sequences were determined for representative species from two snake families: the Many-banded krait, the Banded krait, the Chinese cobra, the King cobra, the Hundred-pace viper, the Short-tailed mamushi, and the Chain viper. Thirteen protein-coding genes, 22-23 tRNA genes, 2 rRNA genes, and 2 control regions were identified in these mtDNAs. Duplication of the control region and translocation of the tRNAPro gene were two notable features of the snake mtDNAs. These results from the gene rearrangement comparisons confirm the correctness of traditional classification schemes and validate the utility of comparing complete mtDNA sequences for snake phylogeny reconstruction.
He, Zhang-Ping; Dai, Xia-Bin; Zhang, Shuai; Zhi, Ting-Ting; Lun, Zhao-Rong; Wu, Zhong-Dao; Yang, Ting-Bao
2016-01-01
The whole sequence (15,057 bp) of the mitochondrial DNA (mtDNA) of the terrestrial snail Achatina fulica (order Stylommatophora) was determined. The mitogenome, as the typical metazoan mtDNA, contains 13 protein-coding genes (PCG), 2 ribosomal RNA genes (rRNA) and 22 transfer RNA genes (tRNA). The tRNA genes include two trnS without standard secondary structure. Interestingly, among the known mitogenomes of Pulmonata species, we firstly characterized an unassigned lengthy sequence (551 bp) between the cox1 and the trnV which may be the CR for the sake of its AT bases usage bias (65.70%) and potential hairpin structure.
Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F
1985-01-01
Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512
Exploring Accelerating Science Applications with FPGAs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Storaasli, Olaf O; Strenski, Dave
2007-01-01
FPGA hardware and tools (VHDL, Viva, MitrionC and CHiMPS) are described. FPGA performance is evaluated on two Cray XD1 systems (Virtex-II Pro 50 and Virtex-4 LX160) for human genome (DNA and protein) sequence comparisons for a computational biology code (FASTA). Scalable FPGA speedups of 50X (Virtex-II) and 100X (Virtex-4) over a 2.2 GHz Opteron were achieved. Coding and IO issues faced for human genome data are described.
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Decoding DNA labels by melting curve analysis using real-time PCR.
Balog, József A; Fehér, Liliána Z; Puskás, László G
2017-12-01
Synthetic DNA has been used as an authentication code for a diverse number of applications. However, existing decoding approaches are based on either DNA sequencing or the determination of DNA length variations. Here, we present a simple alternative protocol for labeling different objects using a small number of short DNA sequences that differ in their melting points. Code amplification and decoding can be done in two steps using quantitative PCR (qPCR). To obtain a DNA barcode with high complexity, we defined 8 template groups, each having 4 different DNA templates, yielding 158 (>2.5 billion) combinations of different individual melting temperature (Tm) values and corresponding ID codes. The reproducibility and specificity of the decoding was confirmed by using the most complex template mixture, which had 32 different products in 8 groups with different Tm values. The industrial applicability of our protocol was also demonstrated by labeling a drone with an oil-based paint containing a predefined DNA code, which was then successfully decoded. The method presented here consists of a simple code system based on a small number of synthetic DNA sequences and a cost-effective, rapid decoding protocol using a few qPCR reactions, enabling a wide range of authentication applications.
Samuels, David C.; Boys, Richard J.; Henderson, Daniel A.; Chinnery, Patrick F.
2003-01-01
We applied a hidden Markov model segmentation method to the human mitochondrial genome to identify patterns in the sequence, to compare these patterns to the gene structure of mtDNA and to see whether these patterns reveal additional characteristics important for our understanding of genome evolution, structure and function. Our analysis identified three segmentation categories based upon the sequence transition probabilities. Category 2 segments corresponded to the tRNA and rRNA genes, with a greater strand-symmetry in these segments. Category 1 and 3 segments covered the protein- coding genes and almost all of the non-coding D-loop. Compared to category 1, the mtDNA segments assigned to category 3 had much lower guanine abundance. A comparison to two independent databases of mitochondrial mutations and polymorphisms showed that the high substitution rate of guanine in human mtDNA is largest in the category 3 segments. Analysis of synonymous mutations showed the same pattern. This suggests that this heterogeneity in the mutation rate is partly independent of respiratory chain function and is a direct property of the genome sequence itself. This has important implications for our understanding of mtDNA evolution and its use as a ‘molecular clock’ to determine the rate of population and species divergence. PMID:14530452
The complete chloroplast genome sequence of Dianthus superbus var. longicalycinus.
Gurusamy, Raman; Lee, Do-Hyung; Park, SeonJoo
2016-05-01
The complete chloroplast genome (cpDNA) sequence of Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicine was reported and characterized. The cpDNA of Dianthus superbus var. longicalycinus is 149,539 bp, with 36.3% GC content. A pair of inverted repeats (IRs) of 24,803 bp is separated by a large single-copy region (LSC, 82,805 bp) and a small single-copy region (SSC, 17,128 bp). It encodes 85 protein-coding genes, 36 tRNA genes and 8 rRNA genes. Of 129 individual genes, 13 genes encoded one intron and three genes have two introns.
Cloning of cDNA of major antigen of foot and mouth disease virus and expression in E. coli
NASA Astrophysics Data System (ADS)
Küpper, Hans; Keller, Walter; Kurz, Christina; Forss, Sonja; Schaller, Heinz
1981-02-01
Double-stranded DNA copies of the single-stranded genomic RNA of foot and mouth disease virus have been cloned into the Escherichia coli plasmid pBR322. A restriction map of the viral genome was established and aligned with the biochemical map of foot and mouth disease virus. The coding sequence for structural protein VP1, the major antigen of the virus, was identified and inserted into a plasmid vector where the expression of this sequence is under control of the phage λ PL promoter. In an appropriate host the synthesis of antigenic polypeptide can be demonstrated by radioimmunoassay.
Characterization of a periplasmic S1-like nuclease coded by the Mesorhizobium loti symbiosis island
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pimkin, Maxim; Miller, C. Glenn; Blakesley, Lauryn
DNA sequences encoding hypothetical proteins homologous to S1 nuclease from Aspergillus oryzae are found in many organisms including fungi, plants, pathogenic bacteria, and eukaryotic parasites. One of these is the M1 nuclease of Mesorhizobium loti which we demonstrate herein to be an enzymatically active, soluble, and stable S1 homolog that lacks the extensive mannosyl-glycosylation found in eukaryotic S1 nuclease homologs. We have expressed the cloned M1 protein in M. loti and purified recombinant native M1 to near homogeneity and have also isolated a homogeneous M1 carboxy-terminal hexahistidine tag fusion protein. Mass spectrometry and N-terminal Edman degradation sequencing confirmed the proteinmore » identity. The enzymatic properties of the purified M1 nuclease are similar to those of S1. At acidic pH M1 is 25 times more active on single-stranded DNA than on double-stranded DNA and 3 times more active on single-stranded DNA than on single-stranded RNA. At neutral pH the RNase activity of M1 exceeds the DNase activity. M1 nicks supercoiled RF-I plasmid DNA and rapidly cuts the phosphodiester bond across from the nick in the resultant relaxed RF-II plasmid DNA. Therefore, M1 represents an active bacterial S1 homolog in spite of great sequence divergence. The biochemical characterization of M1 nuclease supports our sequence alignment that reveals the minimal 21 amino acid residues that are necessarily conserved for the structure and functions of this enzyme family. The ability of M1 to degrade RNA at neutral pH implies previously unappreciated roles of these nucleases in biological systems.« less
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize; Zhao, Yun; Zhao, Hai
2017-01-01
Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela , Landoltia , Lemna , Wolffiella , and Wolffia . This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds.
Phylogenic study of Lemnoideae (duckweeds) through complete chloroplast genomes for eight accessions
Ding, Yanqiang; Fang, Yang; Guo, Ling; Li, Zhidan; He, Kaize
2017-01-01
Background Phylogenetic relationship within different genera of Lemnoideae, a kind of small aquatic monocotyledonous plants, was not well resolved, using either morphological characters or traditional markers. Given that rich genetic information in chloroplast genome makes them particularly useful for phylogenetic studies, we used chloroplast genomes to clarify the phylogeny within Lemnoideae. Methods DNAs were sequenced with next-generation sequencing. The duckweeds chloroplast genomes were indirectly filtered from the total DNA data, or directly obtained from chloroplast DNA data. To test the reliability of assembling the chloroplast genome based on the filtration of the total DNA, two methods were used to assemble the chloroplast genome of Landoltia punctata strain ZH0202. A phylogenetic tree was built on the basis of the whole chloroplast genome sequences using MrBayes v.3.2.6 and PhyML 3.0. Results Eight complete duckweeds chloroplast genomes were assembled, with lengths ranging from 165,775 bp to 171,152 bp, and each contains 80 protein-coding sequences, four rRNAs, 30 tRNAs and two pseudogenes. The identity of L. punctata strain ZH0202 chloroplast genomes assembled through two methods was 100%, and their sequences and lengths were completely identical. The chloroplast genome comparison demonstrated that the differences in chloroplast genome sizes among the Lemnoideae primarily resulted from variation in non-coding regions, especially from repeat sequence variation. The phylogenetic analysis demonstrated that the different genera of Lemnoideae are derived from each other in the following order: Spirodela, Landoltia, Lemna, Wolffiella, and Wolffia. Discussion This study demonstrates potential of whole chloroplast genome DNA as an effective option for phylogenetic studies of Lemnoideae. It also showed the possibility of using chloroplast DNA data to elucidate those phylogenies which were not yet solved well by traditional methods even in plants other than duckweeds. PMID:29302399
The DNA Methylome of Human Peripheral Blood Mononuclear Cells
Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing
2010-01-01
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-02-20
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-01-01
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Alawad, Abdullah; Alharbi, Sultan; Alhazzaa, Othman; Alagrafi, Faisal; Alkhrayef, Mohammed; Alhamdan, Ziyad; Alenazi, Abdullah; Al-Johi, Hasan; Alanazi, Ibrahim O; Hammad, Mohamed
2016-01-01
Although the sequencing information of Sox2 cDNA for many mammalian is available, the Sox2 cDNA of Camelus dromedaries has not yet been characterized. The objective of this study was to sequence and characterize Sox2 cDNA from the brain of C. dromedarius (also known as Arabian camel). A full coding sequence of the Sox2 gene from the brain of C. dromedarius was amplified by reverse transcription PCRjmc and then sequenced using the 3730XL series platform Sequencer (Applied Biosystem) for the first time. The cDNA sequence displayed an open reading frame of 822 nucleotides, encoding a protein of 273 amino acids. The molecular weight and the isoelectric point of the translated protein were calculated as 29.825 kDa and 10.11, respectively, using bioinformatics analysis. The predicted cSox2 protein sequence exhibited high identity: 99% for Homo sapiens, Mus musculus, Bos taurus, and Vicugna pacos; 98% for Sus scrofa and 93% for Camelus ferus. A 3D structure was built based on the available crystal structure of the HMG-box domain of human stem cell transcription factor Sox2 (PDB: 2 LE4) with 81 residues and predicting bioinformatics software for 273 amino acid residues. The comparison confirms the presence of the HMG-box domain in the cSox2 protein. The orthologous phylogenetic analysis showed that the Sox2 isoform from C. dromedarius was grouped with humans, alpacas, cattle, and pigs. We believe that this genetic and structural information will be a helpful source for the annotation. Furthermore, Sox2 is one of the transcription factors that contributes to the generation-induced pluripotent stem cells (iPSCs), which in turn will probably help generate camel induced pluripotent stem cells (CiPSCs).
Hiding message into DNA sequence through DNA coding and chaotic maps.
Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman
2014-09-01
The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
DNA as a Binary Code: How the Physical Structure of Nucleotide Bases Carries Information
ERIC Educational Resources Information Center
McCallister, Gary
2005-01-01
The DNA triplet code also functions as a binary code. Because double-ring compounds cannot bind to double-ring compounds in the DNA code, the sequence of bases classified simply as purines or pyrimidines can encode for smaller groups of possible amino acids. This is an intuitive approach to teaching the DNA code. (Contains 6 figures.)
Shaw, D R; Richter, H; Giorda, R; Ohmachi, T; Ennis, H L
1989-09-01
A Dictyostelium discoideum repetitive element composed of long repeats of the codon (AAC) is found in developmentally regulated transcripts. The concentration of (AAC) sequences is low in mRNA from dormant spores and growing cells and increases markedly during spore germination and multicellular development. The sequence hybridizes to many different sized Dictyostelium DNA restriction fragments indicating that it is scattered throughout the genome. Four cDNA clones isolated contain (AAC) sequences in the deduced coding region. Interestingly, the (AAC)-rich sequences are present in all three reading frames in the deduced proteins, i.e., AAC (asparagine), ACA (threonine) and CAA (glutamine). Three of the clones contain only one of these in-frame so that the individual proteins carry either asparagine, threonine, or glutamine clusters, not mixtures. However, one clone is both glutamine- and asparagine-rich. The (AAC) portion of the transcripts are reiterated 300 times in the haploid genome while the other portions of the cDNAs represent single copy genes, whose sequences show no similarity other than the (AAC) repeats. The repeated sequence is similar to the opa or M sequence found in Drosophila melanogaster notch and homeo box genes and in fly developmentally regulated transcripts. The transcripts are present on polysomes suggesting that they are translated. Although the function of these repeats is unknown, long amino acid repeats are a characteristic feature of extracellular proteins of lower eukaryotes.
Kawano, Tomonori
2013-01-01
There have been a wide variety of approaches for handling the pieces of DNA as the “unplugged” tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given “passwords” and/or secret numbers using DNA sequences. The “passwords” of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original “passwords.” The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed. PMID:23750303
Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick
1982-01-01
The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262
Cloning, sequencing, and expression of cDNA for human. beta. -glucuronidase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oshima, A.; Kyle, J.W.; Miller, R.D.
1987-02-01
The authors report here the cDNA sequence for human placental ..beta..-glucuronidase (..beta..-D-glucuronoside glucuronosohydrolase, EC 3.2.1.31) and demonstrate expression of the human enzyme in transfected COS cells. They also sequenced a partial cDNA clone from human fibroblasts that contained a 153-base-pair deletion within the coding sequence and found a second type of cDNA clone from placenta that contained the same deletion. Nuclease S1 mapping studies demonstrated two types of mRNAs in human placenta that corresponded to the two types of cDNA clones isolated. The NH/sub 2/-terminal amino acid sequence determined for human spleen ..beta..-glucuronidase agreed with that inferred from the DNAmore » sequence of the two placental clones, beginning at amino acid 23, suggesting a cleaved signal sequence of 22 amino acids. When transfected into COS cells, plasmids containing either placental clone expressed an immunoprecipitable protein that contained N-linked oligosaccharides as evidenced by sensitivity to endoglycosidase F. However, only transfection with the clone containing the 153-base-pair segment led to expression of human ..beta..-glucuronidase activity. These studies provide the sequence for the full-length cDNA for human ..beta..-glucuronidase, demonstrate the existence of two populations of mRNA for ..beta..-glucuronidase in human placenta, only one of which specifies a catalytically active enzyme, and illustrate the importance of expression studies in verifying that a cDNA is functionally full-length.« less
A murC gene in Porphyromonas gingivalis 381.
Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T
1995-09-01
The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Specific minor groove solvation is a crucial determinant of DNA binding site recognition
Harris, Lydia-Ann; Williams, Loren Dean; Koudelka, Gerald B.
2014-01-01
The DNA sequence preferences of nearly all sequence specific DNA binding proteins are influenced by the identities of bases that are not directly contacted by protein. Discrimination between non-contacted base sequences is commonly based on the differential abilities of DNA sequences to allow narrowing of the DNA minor groove. However, the factors that govern the propensity of minor groove narrowing are not completely understood. Here we show that the differential abilities of various DNA sequences to support formation of a highly ordered and stable minor groove solvation network are a key determinant of non-contacted base recognition by a sequence-specific binding protein. In addition, disrupting the solvent network in the non-contacted region of the binding site alters the protein's ability to recognize contacted base sequences at positions 5–6 bases away. This observation suggests that DNA solvent interactions link contacted and non-contacted base recognition by the protein. PMID:25429976
Comparative genomics of 9 novel Paenibacillus larvae bacteriophages
Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.
2016-01-01
ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559
Liu, Guo-Hua; Li, Chun; Li, Jia-Yuan; Zhou, Dong-Hui; Xiong, Rong-Chuan; Lin, Rui-Qing; Zou, Feng-Cai; Zhu, Xing-Quan
2012-01-01
Sparganosis, caused by the plerocercoid larvae of members of the genus Spirometra, can cause significant public health problem and considerable economic losses. In the present study, the complete mitochondrial DNA (mtDNA) sequence of Spirometra erinaceieuropaei from China was determined, characterized and compared with that of S. erinaceieuropaei from Japan. The gene arrangement in the mt genome sequences of S. erinaceieuropaei from China and Japan is identical. The identity of the mt genomes was 99.1% between S. erinaceieuropaei from China and Japan, and the complete mtDNA sequence of S. erinaceieuropaei from China is slightly shorter (2 bp) than that from Japan. Phylogenetic analysis of S. erinaceieuropaei with other representative cestodes using two different computational algorithms [Bayesian inference (BI) and maximum likelihood (ML)] based on concatenated amino acid sequences of 12 protein-coding genes, revealed that S. erinaceieuropaei is closely related to Diphyllobothrium spp., supporting classification based on morphological features. The present study determined the complete mtDNA sequences of S. erinaceieuropaei from China that provides novel genetic markers for studying the population genetics and molecular epidemiology of S. erinaceieuropaei in humans and animals. PMID:22553464
Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon
2015-01-01
Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
FragIdent--automatic identification and characterisation of cDNA-fragments.
Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin
2009-03-02
Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
Transposable elements and G-quadruplexes.
Kejnovsky, Eduard; Tokan, Viktor; Lexa, Matej
2015-09-01
A significant part of eukaryotic genomes is formed by transposable elements (TEs) containing not only genes but also regulatory sequences. Some of the regulatory sequences located within TEs can form secondary structures like hairpins or three-stranded (triplex DNA) and four-stranded (quadruplex DNA) conformations. This review focuses on recent evidence showing that G-quadruplex-forming sequences in particular are often present in specific parts of TEs in plants and humans. We discuss the potential role of these structures in the TE life cycle as well as the impact of G-quadruplexes on replication, transcription, translation, chromatin status, and recombination. The aim of this review is to emphasize that TEs may serve as vehicles for the genomic spread of G-quadruplexes. These non-canonical DNA structures and their conformational switches may constitute another regulatory system that, together with small and long non-coding RNA molecules and proteins, contribute to the complex cellular network resulting in the large diversity of eukaryotes.
Liu, Guo-Hua; Wang, Yan; Xu, Min-Jun; Zhou, Dong-Hui; Ye, Yong-Gang; Li, Jia-Yuan; Song, Hui-Qun; Lin, Rui-Qing; Zhu, Xing-Quan
2012-12-01
For many years, whipworms (Trichuris spp.) have been described with a relatively narrow range of both morphological and biometrical features. Moreover, there has been insufficient discrimination between congeners (or closely related species). In the present study, we determined the complete mitochondrial (mt) genomes of two whipworms Trichuris ovis and Trichuris discolor, compared them and then tested the hypothesis that T. ovis and T. discolor are distinct species by phylogenetic analyses using Bayesian inference, maximum likelihood and maximum parsimony) based on the deduced amino acid sequences of the mt protein-coding genes. The complete mt genomes of T. ovis and T. discolor were 13,946 bp and 13,904 bp in size, respectively. Both mt genomes are circular, and consist of 37 genes, including 13 genes coding for proteins, 2 genes for rRNA, and 22 genes for tRNA. The gene content and arrangement are identical to that of human and pig whipworms Trichuris trichiura and Trichuris suis. Taken together, these analyses showed genetic distinctiveness and strongly supported the recent proposal that T. ovis and T. discolor are distinct species using nuclear ribosomal DNA and a portion of the mtDNA sequence dataset. The availability of the complete mtDNA sequences of T. ovis and T. discolor provides novel genetic markers for studying the population genetics, diagnostics and molecular epidemiology of T. ovis and T. discolor. Copyright © 2012 Elsevier B.V. All rights reserved.
Veiga, Ana B. G.; Ribeiro, José M. C.; Guimarães, Jorge A.; Francischetti, Ivo M.B.
2010-01-01
Accidents with the caterpillar Lonomia obliqua are often associated with a coagulation disorder and hemorrhagic syndrome in humans. In the present study, we have constructed cDNA libraries from two venomous structures of the caterpillar, namely the tegument and the bristle. High-throughput sequencing and bioinformatics analyses were performed in parallel. Over one thousand cDNAs were obtained and clustered to produce a database of 538 contigs and singletons (clusters) for the tegument library and 368 for the bristle library. We have thus identified dozens of full-length cDNAs coding for proteins with sequence homology to snake venom prothrombin activator, trypsin-like enzymes, blood coagulation factors and prophenoloxidase cascade activators. We also report cDNA coding for cysteine proteases, Group III phospholipase A2, C-type lectins, lipocalins, in addition to protease inhibitors including serpins, Kazal-type inhibitors, cystatins and trypsin inhibitor-like molecules. Antibacterial proteins and housekeeping genes are also described. A significant number of sequences were devoid of database matches, suggesting that their biologic function remains to be defined. We also report the N-terminus of the most abundant proteins present in the bristle, tegument, hemolymph, and "cryosecretion". Thus, we have created a catalog that contains the predicted molecular weight, isoelectric point, accession number, and putative function for each selected molecule from the venomous structures of L. obliqua. The role of these molecules in the coagulation disorder and hemorrhagic syndrome caused by envenomation with this caterpillar is discussed. All sequence information and the Supplemental Data, including Figures and Tables with hyperlinks to FASTA-formatted files for each contig and the best match to the Databases, are available at http://www.ncbi.nih.gov/projects/omes. PMID:16023793
Fu, Cheng-Jie; Sheikh, Sanea; Miao, Wei; Andersson, Siv G E; Baldauf, Sandra L
2014-08-21
Discoba (Excavata) is an ancient group of eukaryotes with great morphological and ecological diversity. Unlike the other major divisions of Discoba (Jakobida and Euglenozoa), little is known about the mitochondrial DNAs (mtDNAs) of Heterolobosea. We have assembled a complete mtDNA genome from the aggregating heterolobosean amoeba, Acrasis kona, which consists of a single circular highly AT-rich (83.3%) molecule of 51.5 kb. Unexpectedly, A. kona mtDNA is missing roughly 40% of the protein-coding genes and nearly half of the transfer RNAs found in the only other sequenced heterolobosean mtDNAs, those of Naegleria spp. Instead, over a quarter of A. kona mtDNA consists of novel open reading frames. Eleven of the 16 protein-coding genes missing from A. kona mtDNA were identified in its nuclear DNA and polyA RNA, and phylogenetic analyses indicate that at least 10 of these 11 putative nuclear-encoded mitochondrial (NcMt) proteins arose by direct transfer from the mitochondrion. Acrasis kona mtDNA also employs C-to-U type RNA editing, and 12 homologs of DYW-type pentatricopeptide repeat (PPR) proteins implicated in plant organellar RNA editing are found in A. kona nuclear DNA. A mapping of mitochondrial gene content onto a consensus phylogeny reveals a sporadic pattern of relative stasis and rampant gene loss in Discoba. Rampant loss occurred independently in the unique common lineage leading to Heterolobosea + Tsukubamonadida and later in the unique lineage leading to Acrasis. Meanwhile, mtDNA gene content appears to be remarkably stable in the Acrasis sister lineage leading to Naegleria and in their distant relatives Jakobida. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun
2017-10-06
Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
Small, G J; Hemingway, J
2000-12-01
Widespread resistance to organophosphorus insecticides (OPs) in Nilaparvata lugens is associated with elevation of carboxylesterase activity. A cDNA encoding a carboxylesterase, Nl-EST1, has been isolated from an OP-resistant Sri Lankan strain of N. lugens. The full-length cDNA codes for a 547-amino acid protein with high homology to other esterases/lipases. Nl-EST1 has an N-terminal hydrophobic signal peptide sequence of 24 amino acids which suggests that the mature protein is secreted from cells expressing it. The nucleotide sequence of the homologue of Nl-EST1 in an OP-susceptible, low esterase Sri Lankan strain of N. lugens is identical to Nl-EST1. Southern analysis of genomic DNA from the Sri Lankan OP-resistant and susceptible strains suggests that Nl-EST1 is amplified in the resistant strain. Therefore, resistance to OPs in the Sri Lankan strain is through amplification of a gene identical to that found in the susceptible strain.
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Winkfein, R J; Nishikawa, S; Connor, W; Dixon, G H
1993-07-01
A synthetic oligonucleotide primer, designed from marsupial protamine protein-sequence data [Balhorn, R., Corzett, M., Matrimas, J. A., Cummins, J. & Faden, B. (1989) Analysis of protamines isolated from two marsupials, the ring-tailed wallaby and gray short-tailed opossum, J. Cell. Biol. 107] was used to amplify, via the polymerase chain reaction, protamine sequences from a North American opossum (Didelphis marsupialis) cDNA. Using the amplified sequences as probes, several protamine cDNA clones were isolated. The protein sequence, predicted from the cDNA sequences, consisted of 57 amino acids, contained a large number of arginine residues and exhibited the sequence ARYR at its amino terminus, which is conserved in avian and most eutherian mammal protamines. Like the true protamines of trout and chicken, the opossum protamine lacked cysteine residues, distinguishing it from placental mammalian protamine 1 (P1 or stable) protamines. Examination of the protamine gene, isolated by polymerase-chain-reaction amplification of genomic DNA, revealed the presence of an intron dividing the protamine-coding region, a common characteristic of all mammalian P1 genes. In addition, extensive sequence identity in the 5' and 3' flanking regions between mouse and opossum sequences classify the marsupial protamine as being closely related to placental mammal P1. Protamine transcripts, in both birds and mammals, are present in two size classes, differing by the length of their poly(A) tails (either short or long). Examination of opossum protamine transcripts by Northern hybridization revealed four distinct mRNA species in the total RNA fraction, two of which were enriched in the poly(A)-rich fraction. Northern-blot analysis, using an intron-specific probe, revealed the presence of intron sequences in two of the four protamine transcripts. If expressed, the corresponding protein from intron-containing transcripts would differ from spliced transcripts by length (49 versus 57 amino acids) and would contain a cysteine residue.
Feldhoff, A; Wetzel, T; Peters, D; Kellner, R; Krczal, G
1998-01-01
With the introduction of cutting-grown Petunia x hybrida plants on the European market, a new potyvirus which showed no serological reaction with antisera against any other potyviruses infecting petunias was discovered. Infected leaves contained flexuous rod-shaped virus particles of 750-800 nm in length and inclusion bodies (pinwheel structures) typical for potyviruses in ultrathin leaf sections. The purified coat protein with a Mr of approximately 36 kDa could be detected in Western immunoblots with a specific antibody to the coat protein of the petunia-infecting virus. The 3' end of the viral genome encompassing the 3' non-coding region, the coat protein gene, and part of the NIb gene was amplified from infected leaf material by IC/PCR using degenerate and specific primers. Sequences of PCR-generated cDNA clones were compared to other known sequences of potyviruses. Maximum homology of 56% was found in the 3' non-coding region between the petunia isolate and other potyviruses. A maximum homology of 69% was found between the amino acid sequence of the coat protein of the petunia isolate and corresponding sequences of other potyviruses. These data indicate that the petunia-infecting virus is a previously undescribed potyvirus and the name petunia flower mottle virus (PetFMV) is suggested.
Characterization of an In Vivo Z-DNA Detection Probe Based on a Cell Nucleus Accumulating Intrabody.
Gulis, Galina; Silva, Izabel Cristina Rodrigues; Sousa, Herdson Renney; Sousa, Isabel Garcia; Bezerra, Maryani Andressa Gomes; Quilici, Luana Salgado; Maranhao, Andrea Queiroz; Brigido, Marcelo Macedo
2016-09-01
Left-handed Z-DNA is a physiologically unstable DNA conformation, and its existence in vivo can be attributed to localized torsional distress. Despite evidence for the existence of Z-DNA in vivo, its precise role in the control of gene expression is not fully understood. Here, an in vivo probe based on an anti-Z-DNA intrabody is proposed for native Z-DNA detection. The probe was used for chromatin immunoprecipitation of potential Z-DNA-forming sequences in the human genome. One of the isolated putative Z-DNA-forming sequences was cloned upstream of a reporter gene expression cassette under control of the CMV promoter. The reporter gene encoded an antibody fragment fused to GFP. Transient co-transfection of this vector along with the Z-probe coding vector improved reporter gene expression. This improvement was demonstrated by measuring reporter gene mRNA and protein levels and the amount of fluorescence in co-transfected CHO-K1 cells. These results suggest that the presence of the anti-Z-DNA intrabody can interfere with a Z-DNA-containing reporter gene expression. Therefore, this in vivo probe for the detection of Z-DNA could be used for global correlation of Z-DNA-forming sequences and gene expression regulation.
Brzeziński, K; Janowski, R; Podkowiński, J; Jaskólski, M
2001-01-01
The coding sequences of two S-adenosyl-L-homocysteine hydrolases (SAHases) were identified in yellow lupine by screenig of a cDNA library. One of them, corresponding to the complete protein, was sequenced and compared with 52 other SAHase sequences. Phylogenetic analysis of these proteins identified three groups of the enzymes. Group A comprises only bacterial sequences. Group B is subdivided into two subgroups, one of which (B1) is formed by animal sequences. Subgroup B2 consist of two distinct clusters, B2a and B2b. Cluster B2b comprises all known plant sequences, including the yellow lupine enzyme, which are distinguished by a 50-residue insert. Group C is heterogeneous and contains SAHases from Archaea as well as a new class of animal enzymes, distinctly different from those in group B1.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Mitochondrial gene rearrangements confirm the parallel evolution of the crab-like form.
Morrison, C L; Harvey, A W; Lavery, S; Tieu, K; Huang, Y; Cunningham, C W
2002-01-01
The repeated appearance of strikingly similar crab-like forms in independent decapod crustacean lineages represents a remarkable case of parallel evolution. Uncertainty surrounding the phylogenetic relationships among crab-like lineages has hampered evolutionary studies. As is often the case, aligned DNA sequences by themselves were unable to fully resolve these relationships. Four nested mitochondrial gene rearrangements--including one of the few reported movements of an arthropod protein-coding gene--are congruent with the DNA phylogeny and help to resolve a crucial node. A phylogenetic analysis of DNA sequences, and gene rearrangements, supported five independent origins of the crab-like form, and suggests that the evolution of the crab-like form may be irreversible. This result supports the utility of mitochondrial gene rearrangements in phylogenetic reconstruction. PMID:11886621
ScaffoldSeq: Software for characterization of directed evolution populations.
Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J
2016-07-01
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
A gene variation of 14-3-3 zeta isoform in rat hippocampus.
Murakami, K; Situ, S Y; Eshete, F
1996-11-14
A variant form of 14-3-3 zeta was isolated from the rat hippocampal cDNA library. The cloned cDNA is 1687 bp in length and it contains an entire ORF (nt = 63-797) with 245 amino acids that is characteristic to 14-3-3 zeta subtype. By comparing with reported sequences of 14-3-3 zeta, we found three nucleotide substitutions within the coding sequence in our clone; C<-->T transition at nt = 325 and G<-->C transversions at nt = 387 and 388. Both are missense mutations, leading ACG (Thr) to ATG (Met) and CGT (Arg) to GCT (Ala) conversions at residue 88 and 109, respectively. Our results show that at least three different genetic variants of 14-3-3 zeta are present in rat species which results in protein variations. Such mutation in the amino acid sequence is an important indication of the diverse functions of this protein and may also contribute to the recent contradictory observations regarding the role of the 14-3-3 zeta subtype.
The actin multigene family and livestock speciation using the polymerase chain reaction.
Fairbrother, K S; Hopwood, A J; Lockley, A K; Bardsley, R G
1998-01-01
Actins constitute a family of highly-conserved multifunctional intracellular proteins, best known as myofibrillar components in striated muscle fibres. Most vertebrate genomes contain numerous actin genes with high sequence homology in protein coding regions but considerable variability in intron number and sizes. This genetic diversity can be utilised for livestock speciation purposes. The high sequence conservation has enabled a single pair of oligonucleotides to be used to prime the polymerase chain reaction (PCR) with DNA extracted from all animals so far studied. Multiple amplification products were obtained which on gel electrophoresis constituted characteristic species-specific 'fingerprints'. The patterns were reproducible, did not vary between individuals of the same breed or between different breeds within a species, and could be generated even from heat-processed muscle held at 120 degrees C for one hour. Given the capacity of PCR to amplify relatively short sequences in highly-degraded DNA, this approach may be suitable for authentication of processed meat products.
NASA Technical Reports Server (NTRS)
Lacey, J. C., Jr.; Mullins, D. W., Jr.; Watkins, C. L.; Hall, L. M.
1986-01-01
Cellular organisms store information as sequences of nucleotides in double stranded DNA. This information is useless unless it can be converted into the active molecular species, protein. This is done in contemporary creatures first by transcription of one strand to give a complementary strand of mRNA. The sequence of nucleotides is then translated into a specific sequence of amino acids in a protein. Translation is made possible by a genetic coding system in which a sequence of three nucleotides codes for a specific amino acid. The origin and evolution of any chemical system can be understood through elucidation of the properties of the chemical entities which make up the system. There is an underlying logic to the coding system revealed by a correlation of the hydrophobicities of amino acids and their anticodonic nucleotides (i.e., the complement of the codon). Its importance lies in the fact that every amino acid going into protein synthesis must first be activated. This is universally accomplished with ATP. Past studies have concentrated on the chemistry of the adenylates, but more recently we have found, through the use of NMR, that we can observe intramolecular interactions even at low concentrations, between amino acid side chains and nucleotide base rings in these adenylates. The use of this type of compound thus affords a novel way of elucidating the manner in which amino acids and nucleotides interact with each other. In aqueous solution, when a hydrophobic amino acid is attached to the most hydrophobic nucleotide, AMP, a hydrophobic interaction takes place between the amino acid side chain and the adenine ring. The studies to be reported concern these hydrophobic interactions.
Kang, Seokha; Sultana, Tahera; Eom, Keeseon S; Park, Yung Chul; Soonthornpong, Nathan; Nadler, Steven A; Park, Joong-Ki
2009-01-15
The complete mitochondrial genome sequence was determined for the human pinworm Enterobius vermicularis (Oxyurida: Nematoda) and used to infer its phylogenetic relationship to other major groups of chromadorean nematodes. The E. vermicularis genome is a 14,010-bp circular DNA molecule that encodes 36 genes (12 proteins, 22 tRNAs, and 2 rRNAs). This mtDNA genome lacks atp8, as reported for almost all other nematode species investigated. Phylogenetic analyses (maximum parsimony, maximum likelihood, neighbor joining, and Bayesian inference) of nucleotide sequences for the 12 protein-coding genes of 25 nematode species placed E. vermicularis, a representative of the order Oxyurida, as sister to the main Ascaridida+Rhabditida group. Tree topology comparisons using statistical tests rejected an alternative hypothesis favoring a closer relationship among Ascaridida, Spirurida, and Oxyurida, which has been supported from most studies based on nuclear ribosomal DNA sequences. Unlike the relatively conserved gene arrangement found for most chromadorean taxa, E. vermicularis mtDNA gene order is very unique, not sharing similarity to any other nematode species reported to date. This lack of gene order similarity may represent idiosyncratic gene rearrangements unique to this specific lineage of the oxyurids. To more fully understand the extent of gene rearrangement and its evolutionary significance within the nematode phylogenetic framework, additional mitochondrial genomes representing a greater evolutionary diversity of species must be characterized.
Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.
2008-01-01
The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry
2018-01-01
The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
NASA Astrophysics Data System (ADS)
Kikuchi, Shoshi
2009-02-01
Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
Chang, Vivian Y.; Federman, Noah; Martinez-Agosto, Julian; Tatishchev, Sergei F.; Nelson, Stanley F.
2014-01-01
Background Gastric adenocarcinoma is a rare diagnosis in childhood. A 14-year old male patient presented with metastatic gastric adenocarcinoma, and a strong family history of colon cancer. Clinical sequencing of CDH1 and APC were negative. Whole exome sequencing was therefore applied to capture the majority of protein-coding regions for the identification of single-nucleotide variants, small insertion/deletions, and copy number abnormalities in the patient’s germline as well as primary tumor. Materials and Methods DNA was extracted from the patient’s blood, primary tumor, and the unaffected mother’s blood. DNA libraries were constructed and sequenced on Illumina HiSeq2000. Data were post-processed using Picard and Samtools, then analyzed with the Genome Analysis Toolkit. Variants were annotated using an in-house Ensembl-based program. Copy number was assessed using ExomeCNV. Results Each sample was sequenced to a mean depth of coverage of greater than 120×. A rare non-synonymous coding SNV in TP53 was identified in the germline. There were 10 somatic cancer protein-damaging variants that were not observed in the unaffected mother genome. ExomeCNV comparing tumor to the patient’s germline, identified abnormal copy number, spanning 6,946 genes. Conclusion We present an unusual case of Li-Fraumeni detected by whole exome sequencing. There were also likely driver somatic mutations in the gastric adenocarcinoma. These results highlight the need for more thorough and broad scale germline and cancer analyses to accurately inform patients of inherited risk to cancer and to identify somatic mutations. PMID:23015295
Degenerative minimalism in the genome of a psyllid endosymbiont.
Clark, M A; Baumann, L; Thao, M L; Moran, N A; Baumann, P
2001-03-01
Psyllids, like aphids, feed on plant phloem sap and are obligately associated with prokaryotic endosymbionts acquired through vertical transmission from an ancestral infection. We have sequenced 37 kb of DNA of the genome of Carsonella ruddii, the endosymbiont of psyllids, and found that it has a number of unusual properties revealing a more extreme case of degeneration than was previously reported from studies of eubacterial genomes, including that of the aphid endosymbiont Buchnera aphidicola. Among the unusual properties are an exceptionally low guanine-plus-cytosine content (19.9%), almost complete absence of intergenic spaces, operon fusion, and lack of the usual promoter sequences upstream of 16S rDNA. These features suggest the synthesis of long mRNAs and translational coupling. The most extreme instances of base compositional bias occur in the genes encoding proteins that have less highly conserved amino acid sequences; the guanine-plus-cytosine content of some protein-coding sequences is as low as 10%. The shift in base composition has a large effect on proteins: in polypeptides of C. ruddii, half of the residues consist of five amino acids with codons low in guanine plus cytosine. Furthermore, the proteins of C. ruddii are reduced in size, with an average of about 9% fewer amino acids than in homologous proteins of related bacteria. These observations suggest that the C. ruddii genome is not subject to constraints that limit the evolution of other known eubacteria.
Structure of the coding region and mRNA variants of the apyrase gene from pea (Pisum sativum)
NASA Technical Reports Server (NTRS)
Shibata, K.; Abe, S.; Davies, E.
2001-01-01
Partial amino acid sequences of a 49 kDa apyrase (ATP diphosphohydrolase, EC 3.6.1.5) from the cytoskeletal fraction of etiolated pea stems were used to derive oligonucleotide DNA primers to generate a cDNA fragment of pea apyrase mRNA by RT-PCR and these primers were used to screen a pea stem cDNA library. Two almost identical cDNAs differing in just 6 nucleotides within the coding regions were found, and these cDNA sequences were used to clone genomic fragments by PCR. Two nearly identical gene fragments containing 8 exons and 7 introns were obtained. One of them (H-type) encoded the mRNA sequence described by Hsieh et al. (1996) (DDBJ/EMBL/GenBank Z32743), while the other (S-type) differed by the same 6 nucleotides as the mRNAs, suggesting that these genes may be alleles. The six nucleotide differences between these two alleles were found solely in the first exon, and these mutation sites had two types of consensus sequences. These mRNAs were found with varying lengths of 3' untranslated regions (3'-UTR). There are some similarities between the 3'-UTR of these mRNAs and those of actin and actin binding proteins in plants. The putative roles of the 3'-UTR and alternative polyadenylation sites are discussed in relation to their possible role in targeting the mRNAs to different subcellular compartments.
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
CAPRRESI: Chimera Assembly by Plasmid Recovery and Restriction Enzyme Site Insertion.
Santillán, Orlando; Ramírez-Romero, Miguel A; Dávila, Guillermo
2017-06-25
Here, we present chimera assembly by plasmid recovery and restriction enzyme site insertion (CAPRRESI). CAPRRESI benefits from many strengths of the original plasmid recovery method and introduces restriction enzyme digestion to ease DNA ligation reactions (required for chimera assembly). For this protocol, users clone wildtype genes into the same plasmid (pUC18 or pUC19). After the in silico selection of amino acid sequence regions where chimeras should be assembled, users obtain all the synonym DNA sequences that encode them. Ad hoc Perl scripts enable users to determine all synonym DNA sequences. After this step, another Perl script searches for restriction enzyme sites on all synonym DNA sequences. This in silico analysis is also performed using the ampicillin resistance gene (ampR) found on pUC18/19 plasmids. Users design oligonucleotides inside synonym regions to disrupt wildtype and ampR genes by PCR. After obtaining and purifying complementary DNA fragments, restriction enzyme digestion is accomplished. Chimera assembly is achieved by ligating appropriate complementary DNA fragments. pUC18/19 vectors are selected for CAPRRESI because they offer technical advantages, such as small size (2,686 base pairs), high copy number, advantageous sequencing reaction features, and commercial availability. The usage of restriction enzymes for chimera assembly eliminates the need for DNA polymerases yielding blunt-ended products. CAPRRESI is a fast and low-cost method for fusing protein-coding genes.
Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel.
Meadows, J R S; Hiendleder, S; Kijas, J W
2011-04-01
Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920,000 ± 190,000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA.
Haplogroup relationships between domestic and wild sheep resolved using a mitogenome panel
Meadows, J R S; Hiendleder, S; Kijas, J W
2011-01-01
Five haplogroups have been identified in domestic sheep through global surveys of mitochondrial (mt) sequence variation, however these group classifications are often based on small fragments of the complete mtDNA sequence; partial control region or the cytochrome B gene. This study presents the complete mitogenome from representatives of each haplogroup identified in domestic sheep, plus a sample of their wild relatives. Comparison of the sequence successfully resolved the relationships between each haplogroup and provided insight into the relationship with wild sheep. The five haplogroups were characterised as branching independently, a radiation that shared a common ancestor 920 000±190 000 years ago based on protein coding sequence. The utility of various mtDNA components to inform the true relationship between sheep was also examined with Bayesian, maximum likelihood and partitioned Bremmer support analyses. The control region was found to be the mtDNA component, which contributed the highest amount of support to the tree generated using the complete data set. This study provides the nucleus of a mtDNA mitogenome panel, which can be used to assess additional mitogenomes and serve as a reference set to evaluate small fragments of the mtDNA. PMID:20940734
Whole-exome/genome sequencing and genomics.
Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne
2013-12-01
As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.
GC-rich coding sequences reduce transposon-like, small RNA-mediated transgene silencing.
Sidorenko, Lyudmila V; Lee, Tzuu-Fen; Woosley, Aaron; Moskal, William A; Bevan, Scott A; Merlo, P Ann Owens; Walsh, Terence A; Wang, Xiujuan; Weaver, Staci; Glancy, Todd P; Wang, PoHao; Yang, Xiaozeng; Sriram, Shreedharan; Meyers, Blake C
2017-11-01
The molecular basis of transgene susceptibility to silencing is poorly characterized in plants; thus, we evaluated several transgene design parameters as means to reduce heritable transgene silencing. Analyses of Arabidopsis plants with transgenes encoding a microalgal polyunsaturated fatty acid (PUFA) synthase revealed that small RNA (sRNA)-mediated silencing, combined with the use of repetitive regulatory elements, led to aggressive transposon-like silencing of canola-biased PUFA synthase transgenes. Diversifying regulatory sequences and using native microalgal coding sequences (CDSs) with higher GC content improved transgene expression and resulted in a remarkable trans-generational stability via reduced accumulation of sRNAs and DNA methylation. Further experiments in maize with transgenes individually expressing three crystal (Cry) proteins from Bacillus thuringiensis (Bt) tested the impact of CDS recoding using different codon bias tables. Transgenes with higher GC content exhibited increased transcript and protein accumulation. These results demonstrate that the sequence composition of transgene CDSs can directly impact silencing, providing design strategies for increasing transgene expression levels and reducing risks of heritable loss of transgene expression.
Lourenco-Jaramillo, Diana Lelidett; Sifuentes-Rincón, Ana María; Parra-Bracamonte, Gaspar Manuel; de la Rosa-Reyna, Xochitl Fabiola; Segura-Cabrera, Aldo; Arellano-Vera, Williams
2012-01-01
DNA from four cattle breeds was used to re-sequence all of the exons and 56% of the introns of the bovine tyrosine hydroxylase (TH) gene and 97% and 13% of the bovine dopamine β-hydroxylase (DBH) coding and non-coding sequences, respectively. Two novel single nucleotide polymorphisms (SNPs) and a microsatellite motif were found in the TH sequences. The DBH sequences contained 62 nucleotide changes, including eight non-synonymous SNPs (nsSNPs) that are of particular interest because they may alter protein function and therefore affect the phenotype. These DBH nsSNPs resulted in amino acid substitutions that were predicted to destabilize the protein structure. Six SNPs (one from TH and five from DBH non-synonymous SNPs) were genotyped in 140 animals; all of them were polymorphic and had a minor allele frequency of > 9%. There were significant differences in the intra- and inter-population haplotype distributions. The haplotype differences between Brahman cattle and the three B. t. taurus breeds (Charolais, Holstein and Lidia) were interesting from a behavioural point of view because of the differences in temperament between these breeds. PMID:22888292
Jézéquel, Laetitia; Loeper, Jacqueline; Pompon, Denis
2008-11-01
Combinatorial libraries coding for mosaic enzymes with predefined crossover points constitute useful tools to address and model structure-function relationships and for functional optimization of enzymes based on multivariate statistics. The presented method, called sequence-independent generation of a chimera-ordered library (SIGNAL), allows easy shuffling of any predefined amino acid segment between two or more proteins. This method is particularly well adapted to the exchange of protein structural modules. The procedure could also be well suited to generate ordered combinatorial libraries independent of sequence similarities in a robotized manner. Sequence segments to be recombined are first extracted by PCR from a single-stranded template coding for an enzyme of interest using a biotin-avidin-based method. This technique allows the reduction of parental template contamination in the final library. Specific PCR primers allow amplification of two complementary mosaic DNA fragments, overlapping in the region to be exchanged. Fragments are finally reassembled using a fusion PCR. The process is illustrated via the construction of a set of mosaic CYP2B enzymes using this highly modular approach.
RNA editing: trypanosomes rewrite the genetic code.
Stuart, K
1998-01-01
The understanding of how genetic information is stored and expressed has advanced considerably since the "central dogma" asserted that genetic information flows from the nucleotide sequence of DNA to that of messenger RNA (mRNA) which in turn specifies the amino acid sequence of a protein. It was found that genetic information can be stored as RNA (e.g. in RNA viruses) and can flow from RNA to DNA by reverse transcriptase enzyme activity. In addition, some genes contain introns, nucleotide sequences that are removed from their RNA (by RNA splicing) and thus are not represented in the resultant protein. Furthermore, alternative splicing was found to produce variant proteins from a single gene. More recently, the study of trypanosome parasites revealed an unexpected and indeed counter-intuitive genetic complexity. Genetic information for a single protein can be dispersed among several (DNA) genes in these organisms. One of these genes specifies an encrypted precursor mRNA that is converted to a functional mRNA by a process called RNA editing that inserts and deletes uridylate nucleotides. The sequence of the edited mRNA is specified by multiple small RNAs, named guide RNAs, (gRNAs) each of which is encoded in a separate gene. Thus, edited mRNA sequences are assembled from multiple genes by the transfer of information from one type of RNA to another. The existence of editing was surprising but has stimulated the discovery of other types of RNA editing. The Stuart laboratory has been exploring RNA editing in trypanosomes from the time of its discovery. They found dramatic differences between the mitochondrial gene sequences and those of the corresponding mRNAs, which indicated editing by the insertion and deletion of uridylates. Some editing was modest; simply eliminating shifts in sequence register of minimally extending the protein coding sequence. However, editing of many mRNAs was startingly extensive. The RNA sequence was essentially entirely remodeled with its sequence more the result of editing than the gene sequence. The identities of genes for such extensively edited RNA were not recognizable from the DNA sequence but they were readily identifiable from the edited mRNA sequence. Thus, despite the complex and extensive editing the resultant mRNA sequence is precise. Characterization of partially edited RNAs indicated that editing proceeds in the direction opposite to that used to specify the protein which reflects the use of the gRNAs. The numerous gRNAs that are used for editing are encoded in the DNA molecules whose role was previously a mystery. Using information gained in our earlier studies, the Stuart group developed an in vitro system that reproduces the fundamental process of editing in order to resolve the mechanism by which it occurs. They determined that editing entails a series of enzymatic steps rather than the mechanism used in RNA splicing. They also showed that chimeric gRNA-mRNA molecules are aberrant by-products of editing rather than intermediates in the process as had been proposed. Additional studies are exploring precisely how the number of added and deleted uridylates is specified by the gRNA. The Stuart laboratory showed that editing is performed by an aggregation of enzymes that catalyze the separate steps of editing. It also developed a method to purify this multimolecule complex that contains several, perhaps tens of, proteins. This will allow the study of its composition and the functions of its component parts. Indeed, the gene for one component has been identified and its detailed characterization begun. These studies are developing tools to explore related processes. An early finding in the lab was that the various mRNAs are differentially edited during the life cycle of the parasite. The pattern of this editing indicates that editing serves to regulate the alternation between two modes of energy generation. This regulation is coordinated with other events that are occurring during the life c
Ludwig, Linda B; Ambrus, Julian L; Krawczyk, Kristie A; Sharma, Sanjay; Brooks, Stephen; Hsiao, Chiu-Bin; Schwartz, Stanley A
2006-01-01
Background While viruses have long been shown to capitalize on their limited genomic size by utilizing both strands of DNA or complementary DNA/RNA intermediates to code for viral proteins, it has been assumed that human retroviruses have all their major proteins translated only from the plus or sense strand of RNA, despite their requirement for a dsDNA proviral intermediate. Several studies, however, have suggested the presence of antisense transcription for both HIV-1 and HTLV-1. More recently an antisense transcript responsible for the HTLV-1 bZIP factor (HBZ) protein has been described. In this study we investigated the possibility of an antisense gene contained within the human immunodeficiency virus type 1 (HIV-1) long terminal repeat (LTR). Results Inspection of published sequences revealed a potential transcription initiator element (INR) situated downstream of, and in reverse orientation to, the usual HIV-1 promoter and transcription start site. This antisense initiator (HIVaINR) suggested the possibility of an antisense gene responsible for RNA and protein production. We show that antisense transcripts are generated, in vitro and in vivo, originating from the TAR DNA of the HIV-1 LTR. To test the possibility that protein(s) could be translated from this novel HIV-1 antisense RNA, recombinant HIV antisense gene-FLAG vectors were designed. Recombinant protein(s) were produced and isolated utilizing carboxy-terminal FLAG epitope (DYKDDDDK) sequences. In addition, affinity-purified antisera to an internal peptide derived from the HIV antisense protein (HAP) sequences identified HAPs from HIV+ human peripheral blood lymphocytes. Conclusion HIV-1 contains an antisense gene in the U3-R regions of the LTR responsible for both an antisense RNA transcript and proteins. This antisense transcript has tremendous potential for intrinsic RNA regulation because of its overlap with the beginning of all HIV-1 sense RNA transcripts by 25 nucleotides. The novel HAPs are encoded in a region of the LTR that has already been shown to be deleted in some HIV-infected long-term survivors and represent new potential targets for vaccine development. PMID:17090330
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics
NASA Technical Reports Server (NTRS)
Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
BLAST and FASTA similarity searching for multiple sequence alignment.
Pearson, William R
2014-01-01
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much more similarity than expected by chance, the simplest explanation for the excess similarity is common ancestry-homology. The most effective similarity searches compare protein sequences, rather than DNA sequences, for sequences that encode proteins, and use expectation values, rather than percent identity, to infer homology. The BLAST and FASTA packages of sequence comparison programs provide programs for comparing protein and DNA sequences to protein databases (the most sensitive searches). Protein and translated-DNA comparisons to protein databases routinely allow evolutionary look back times from 1 to 2 billion years; DNA:DNA searches are 5-10-fold less sensitive. BLAST and FASTA can be run on popular web sites, but can also be downloaded and installed on local computers. With local installation, target databases can be customized for the sequence data being characterized. With today's very large protein databases, search sensitivity can also be improved by searching smaller comprehensive databases, for example, a complete protein set from an evolutionarily neighboring model organism. By default, BLAST and FASTA use scoring strategies target for distant evolutionary relationships; for comparisons involving short domains or queries, or searches that seek relatively close homologs (e.g. mouse-human), shallower scoring matrices will be more effective. Both BLAST and FASTA provide very accurate statistical estimates, which can be used to reliably identify protein sequences that diverged more than 2 billion years ago.
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Dalla Valle, Luisa; Nardi, Alessia; Belvedere, Paola; Toni, Mattia; Alibardi, Lorenzo
2007-07-01
Beta-keratins of reptilian scales have been recently cloned and characterized in some lizards. Here we report for the first time the sequence of some beta-keratins from the snake Elaphe guttata. Five different cDNAs were obtained using 5'- and 3'-RACE analyses. Four sequences differ by only few nucleotides in the coding region, whereas the last cDNA shows, in this region, only 84% of identity. The gene corresponding to one of the cDNA sequences has a single intron present in the 5'-untranslated region. This genomic organization is similar to that of birds' beta-keratins. Cloning and Southern blotting analysis suggest that snake beta-keratins belong to a family of high-related genes as for geckos. PCR analysis suggests a head-to-tail orientation of genes in the same chromosome. In situ hybridization detected beta-keratin transcripts almost exclusively in differentiating oberhautchen and beta-cells of the snake epidermis in renewal phase. This is confirmed by Northern blotting that showed, in this phase, a high expression of two different transcripts whereas only the longer transcript is expressed at a much lower level in resting skin. The cDNA coding sequences encoded putative glycine-proline-serine rich proteins containing 137-139 amino acids, with apparent isoelectric point at 7.5 and 8.2. A central region, rich in proline, shows over 50% homology with avian scale, claw, and feather keratins. The prediction of secondary structure shows mainly a random coil conformation and few beta-strand regions in the central region, likely involved in the formation of a fibrous framework of beta-keratins. This region was possibly present in basic reptiles that originated reptiles and birds. Copyright 2007 Wiley-Liss, Inc.
Microbial metatranscriptomics in a permanent marine oxygen minimum zone.
Stewart, Frank J; Ulloa, Osvaldo; DeLong, Edward F
2012-01-01
Simultaneous characterization of taxonomic composition, metabolic gene content and gene expression in marine oxygen minimum zones (OMZs) has potential to broaden perspectives on the microbial and biogeochemical dynamics in these environments. Here, we present a metatranscriptomic survey of microbial community metabolism in the Eastern Tropical South Pacific OMZ off northern Chile. Community RNA was sampled in late austral autumn from four depths (50, 85, 110, 200 m) extending across the oxycline and into the upper OMZ. Shotgun pyrosequencing of cDNA yielded 180,000 to 550,000 transcript sequences per depth. Based on functional gene representation, transcriptome samples clustered apart from corresponding metagenome samples from the same depth, highlighting the discrepancies between metabolic potential and actual transcription. BLAST-based characterizations of non-ribosomal RNA sequences revealed a dominance of genes involved with both oxidative (nitrification) and reductive (anammox, denitrification) components of the marine nitrogen cycle. Using annotations of protein-coding genes as proxies for taxonomic affiliation, we observed depth-specific changes in gene expression by key functional taxonomic groups. Notably, transcripts most closely matching the genome of the ammonia-oxidizing archaeon Nitrosopumilus maritimus dominated the transcriptome in the upper three depths, representing one in five protein-coding transcripts at 85 m. In contrast, transcripts matching the anammox bacterium Kuenenia stuttgartiensis dominated at the core of the OMZ (200 m; 1 in 12 protein-coding transcripts). The distribution of N. maritimus-like transcripts paralleled that of transcripts matching ammonia monooxygenase genes, which, despite being represented by both bacterial and archaeal sequences in the community DNA, were dominated (> 99%) by archaeal sequences in the RNA, suggesting a substantial role for archaeal nitrification in the upper OMZ. These data, as well as those describing other key OMZ metabolic processes (e.g. sulfur oxidation), highlight gene-specific expression patterns in the context of the entire community transcriptome, as well as identify key functional groups for taxon-specific genomic profiling. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
McCutchen-Maloney, Sandra L.
2002-01-01
DNA mutation binding proteins alone and as chimeric proteins with nucleases are used with solid supports to detect DNA sequence variations, DNA mutations and single nucleotide polymorphisms. The solid supports may be flow cytometry beads, DNA chips, glass slides or DNA dips sticks. DNA molecules are coupled to solid supports to form DNA-support complexes. Labeled DNA is used with unlabeled DNA mutation binding proteins such at TthMutS to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by binding which gives an increase in signal. Unlabeled DNA is utilized with labeled chimeras to detect DNA sequence variations, DNA mutations and single nucleotide length polymorphisms by nuclease activity of the chimera which gives a decrease in signal.
Powell, J. Elijah; Ratnayeke, Nalin; Moran, Nancy A.
2017-01-01
High throughput rRNA amplicon surveys of bacterial communities provide a rapid snapshot of taxonomic composition. But strains with nearly identical rRNA sequences often differ in gene repertoires and metabolic capabilities. To assess strain-level variation within Snodgrassella alvi, a gut symbiont of corbiculate bees, we performed deep sequencing on amplicons of a single copy coding gene (minD) as well as the 16S rDNA V4 region. We surveyed honey bees (Apis mellifera) sampled globally and 12 bumble bee species (Bombus) sampled from two regions of the USA. The minD analyses reveal that S. alvi contains far more strain diversity than is evident from 16S rDNA analysis. Many taxa inferred on the basis of 16S rDNA are shared between A. mellifera and Bombus species, but taxa inferred on the basis of minD are never shared and often are restricted to particular Bombus species. Clustering based on minD revealed that gut communities often reflect host species and geographic location. Both minD and 16S rDNA analyses indicate that strain diversity is higher in A. mellifera than in Bombus species. The minD locus flanks a 16S gene, enabling development of strain-specific 16S fluorescent probes to illuminate the spatial relationship of strains within the bee gut. PMID:27482856
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.
Sheth, Bhavisha P; Thaker, Vrinda S
2015-10-01
Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Y.; Zhang, H.; Madrid, R.
1994-09-01
Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses ismore » to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.« less
Lichenase and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2000-08-15
The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
Caruccio, Nicholas
2011-01-01
DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.
URF6, Last Unidentified Reading Frame of Human mtDNA, Codes for an NADH Dehydrogenase Subunit
NASA Astrophysics Data System (ADS)
Chomyn, Anne; Cleeter, Michael W. J.; Ragan, C. Ian; Riley, Marcia; Doolittle, Russell F.; Attardi, Giuseppe
1986-10-01
The polypeptide encoded in URF6, the last unassigned reading frame of human mitochondrial DNA, has been identified with antibodies to peptides predicted from the DNA sequence. Antibodies prepared against highly purified respiratory chain NADH dehydrogenase from beef heart or against the cytoplasmically synthesized 49-kilodalton iron-sulfur subunit isolated from this enzyme complex, when added to a deoxycholate or a Triton X-100 mitochondrial lysate of HeLa cells, specifically precipitated the URF6 product together with the six other URF products previously identified as subunits of NADH dehydrogenase. These results strongly point to the URF6 product as being another subunit of this enzyme complex. Thus, almost 60% of the protein coding capacity of mammalian mitochondrial DNA is utilized for the assembly of the first enzyme complex of the respiratory chain. The absence of such information in yeast mitochondrial DNA dramatizes the variability in gene content of different mitochondrial genomes.
DNA-DNA hybridization values and their relationship to whole-genome sequence similarities.
Goris, Johan; Konstantinidis, Konstantinos T; Klappenbach, Joel A; Coenye, Tom; Vandamme, Peter; Tiedje, James M
2007-01-01
DNA-DNA hybridization (DDH) values have been used by bacterial taxonomists since the 1960s to determine relatedness between strains and are still the most important criterion in the delineation of bacterial species. Since the extent of hybridization between a pair of strains is ultimately governed by their respective genomic sequences, we examined the quantitative relationship between DDH values and genome sequence-derived parameters, such as the average nucleotide identity (ANI) of common genes and the percentage of conserved DNA. A total of 124 DDH values were determined for 28 strains for which genome sequences were available. The strains belong to six important and diverse groups of bacteria for which the intra-group 16S rRNA gene sequence identity was greater than 94 %. The results revealed a close relationship between DDH values and ANI and between DNA-DNA hybridization and the percentage of conserved DNA for each pair of strains. The recommended cut-off point of 70 % DDH for species delineation corresponded to 95 % ANI and 69 % conserved DNA. When the analysis was restricted to the protein-coding portion of the genome, 70 % DDH corresponded to 85 % conserved genes for a pair of strains. These results reveal extensive gene diversity within the current concept of "species". Examination of reciprocal values indicated that the level of experimental error associated with the DDH method is too high to reveal the subtle differences in genome size among the strains sampled. It is concluded that ANI can accurately replace DDH values for strains for which genome sequences are available.
Decoding the non-coding RNAs in Alzheimer's disease.
Schonrock, Nicole; Götz, Jürgen
2012-11-01
Non-coding RNAs (ncRNAs) are integral components of biological networks with fundamental roles in regulating gene expression. They can integrate sequence information from the DNA code, epigenetic regulation and functions of multimeric protein complexes to potentially determine the epigenetic status and transcriptional network in any given cell. Humans potentially contain more ncRNAs than any other species, especially in the brain, where they may well play a significant role in human development and cognitive ability. This review discusses their emerging role in Alzheimer's disease (AD), a human pathological condition characterized by the progressive impairment of cognitive functions. We discuss the complexity of the ncRNA world and how this is reflected in the regulation of the amyloid precursor protein and Tau, two proteins with central functions in AD. By understanding this intricate regulatory network, there is hope for a better understanding of disease mechanisms and ultimately developing diagnostic and therapeutic tools.
Does CTCF mediate between nuclear organization and gene expression?
Ohlsson, Rolf; Lobanenkov, Victor; Klenova, Elena
2010-01-01
The multifunctional zinc-finger protein CCCTC-binding factor (CTCF) is a very strong candidate for the role of coordinating the expression level of coding sequences with their three-dimensional position in the nucleus, apparently responding to a "code" in the DNA itself. Dynamic interactions between chromatin fibers in the context of nuclear architecture have been implicated in various aspects of genome functions. However, the molecular basis of these interactions still remains elusive and is a subject of intense debate. Here we discuss the nature of CTCF-DNA interactions, the CTCF-binding specificity to its binding sites and the relationship between CTCF and chromatin, and we examine data linking CTCF with gene regulation in the three-dimensional nuclear space. We discuss why these features render CTCF a very strong candidate for the role and propose a unifying model, the "CTCF code," explaining the mechanistic basis of how the information encrypted in DNA may be interpreted by CTCF into diverse nuclear functions.
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.
O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F
2011-07-01
To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Sun, Jiajie; Gao, Yuan; Liu, Dong; Ma, Wei; Xue, Jing; Zhang, Chunlei; Lan, Xianyong; Lei, Chuzhao; Chen, Hong
2012-06-01
The insulin-induced gene 1 (INSIG1) gene encodes a protein that blocks proteolytic activation of sterol regulatory element binding proteins, which are transcription factors that activate genes that regulate cholesterol, fatty acid, and glucose metabolism. However, similar research for the bovine INSIG1 gene is lacking. Therefore, in this study, polymorphisms of the bovine INSIG1 gene were detected in 643 individuals from four cattle breeds by DNA pooling, forced PCR-RFLP, PCR-SSCP, and DNA sequencing methods. Only 10 novel SNPs were identified, which included four mutations in the coding region and the others in the introns. In Nanyang individuals, seven common haplotypes were identified based on four coding region SNPs. The haplotype GACT, with a frequency of 75.4%, was the most prevalent haplotypes and SNPs formed two linkage disequilibrium blocks with strong multi-allelic D' (D' = 1). Additionally, association analysis between mutations of the bovine INSIG1 gene and growth traits in Nanyang cattle at 6, 12, 18, and 24 months old was performed, and the results indicated that the polymorphisms were not significantly associated with body mass.
Shapiro, James A
2016-06-08
The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess "Read-Write Genomes" they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification.
Shapiro, James A.
2016-01-01
The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess “Read–Write Genomes” they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification. PMID:27338490
Schneider, T D
2001-12-01
The sequence logo for DNA binding sites of the bacteriophage P1 replication protein RepA shows unusually high sequence conservation ( approximately 2 bits) at a minor groove that faces RepA. However, B-form DNA can support only 1 bit of sequence conservation via contacts into the minor groove. The high conservation in RepA sites therefore implies a distorted DNA helix with direct or indirect contacts to the protein. Here I show that a high minor groove conservation signature also appears in sequence logos of sites for other replication origin binding proteins (Rts1, DnaA, P4 alpha, EBNA1, ORC) and promoter binding proteins (sigma(70), sigma(D) factors). This finding implies that DNA binding proteins generally use non-B-form DNA distortion such as base flipping to initiate replication and transcription.
Next stop for the CRISPR revolution: RNA-guided epigenetic regulators.
Vora, Suhani; Tuttle, Marcelle; Cheng, Jenny; Church, George
2016-09-01
Clustered regularly interspaced short palindromic repeats (CRISPRs) and CRISPR-associated (Cas) proteins offer a breakthrough platform for cheap, programmable, and effective sequence-specific DNA targeting. The CRISPR-Cas system is naturally equipped for targeted DNA cutting through its native nuclease activity. As such, groups researching a broad spectrum of biological organisms have quickly adopted the technology with groundbreaking applications to genomic sequence editing in over 20 different species. However, the biological code of life is not only encoded in genetics but also in epigenetics as well. While genetic sequence editing is a powerful ability, we must also be able to edit and regulate transcriptional and epigenetic code. Taking inspiration from work on earlier sequence-specific targeting technologies such as zinc fingers (ZFs) and transcription activator-like effectors (TALEs), researchers quickly expanded the CRISPR-Cas toolbox to include transcriptional activation, repression, and epigenetic modification. In this review, we highlight advances that extend the CRISPR-Cas toolkit for transcriptional and epigenetic regulation, as well as best practice guidelines for these tools, and a perspective on future applications. © 2016 The Authors. The FEBS Journal published by John Wiley & Sons Ltd on behalf of Federation of European Biochemical Societies.
Liu, X; Gorovsky, M A
1996-01-01
A truncated cDNA clone encoding Tetrahymena thermophila histone H2A2 was isolated using synthetic degenerate oligonucleotide probes derived from H2A protein sequences of Tetrahymena pyriformis. The cDNA clone was used as a homologous probe to isolate a truncated genomic clone encoding H2A1. The remaining regions of the genes for H2A1 (HTA1) and H2A2 (HTA2) were then isolated using inverse PCR on circularized genomic DNA fragments. These partial clones were assembled into intact HTA1 and HTA2 clones. Nucleotide sequences of the two genes were highly homologous within the coding region but not in the noncoding regions. Comparison of the deduced amino acid sequences with protein sequences of T. pyriformis H2As showed only two and three differences respectively, in a total of 137 amino acids for H2A1, and 132 amino acids for H2A2, indicating the two genes arose before the divergence of these two species. The HTA2 gene contains a TAA triplet within the coding region, encoding a glutamine residue. In contrast with the T. thermophila HHO and HTA3 genes, no introns were identified within the two genes. The 5'- and 3'-ends of the histone H2A mRNAs; were determined by RNase protection and by PCR mapping using RACE and RLM-RACE methods. Both genes encode polyadenylated mRNAs and are highly expressed in vegetatively growing cells but only weakly expressed in starved cultures. With the inclusion of these two genes, T. thermophila is the first organism whose entire complement of known core and linker histones, including replication-dependent and basal variants, has been cloned and sequenced. PMID:8760889
Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip
2015-01-01
The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan
2016-04-20
DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .
Francischetti, Ivo M. B.; My-Pham, Van; Harrison, Jim; Garfield, Mark K.; Ribeiro, José M. C.
2010-01-01
The venom gland of the snake Bitis gabonica (Gaboon viper) was used for the first time to construct a unidirectional cDNA phage library followed by high-throughput sequencing and bioinformatic analysis. Hundreds of cDNAs were obtained and clustered into contigs. We found mostly novel full-length cDNA coding for metalloproteases (P-II and P-III classes), Lys49-phospholipase A2, serine proteases with essential mutations in the active site, Kunitz protease inhibitors, several C-type lectins, bradykinin-potentiating peptide, vascular endothelial growth factor, nucleotidases and nucleases, nerve growth factor, and L-amino acid oxidases. Two new members of the recently described short coding region family of disintegrin, displaying RGD and MLD motifs are reported. In addition, we have identified for the first time a cytokine-like molecule and a multi-Kunitz protease inhibitor in snake venoms. The CLUSTAL alignment and the unrooted cladograms for selected families of B. gabonica venom proteins are also presented. A significant number of sequences were devoid of database matches, suggesting that their biologic function remains to be identified. This paper also reports the N-terminus of the 15 most abundant venom proteins and the sequences matching their corresponding transcripts. The electronic version of this manuscript, available on request, contains spreadsheets with hyperlinks to FASTA-formatted files for each contig and the best match to the GenBank and Conserved Domain Databases, in addition to CLUSTAL alignments of each contig. We have thus generated a comprehensive catalog of the B. gabonica venom gland, containing for each secreted protein: i) the predicted molecular weight, ii) the predicted isoelectric point, iii) the accession number, and iv) the putative function. The role of these molecules is discussed in the context of the envenomation caused by the Gaboon viper. PMID:15276202
Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard
2008-01-10
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less
Liberek, K; Osipiuk, J; Zylicz, M; Ang, D; Skorko, J; Georgopoulos, C
1990-02-25
The process of initiation of lambda DNA replication requires the assembly of the proper nucleoprotein complex at the origin of replication, ori lambda. The complex is composed of both phage and host-coded proteins. The lambda O initiator protein binds specifically to ori lambda. The lambda P initiator protein binds to both lambda O and the host-coded dnaB helicase, giving rise to an ori lambda DNA.lambda O.lambda P.dnaB structure. The dnaK and dnaJ heat shock proteins have been shown capable of dissociating this complex. The thus freed dnaB helicase unwinds the duplex DNA template at the replication fork. In this report, through cross-linking, size chromatography, and protein affinity chromatography, we document some of the protein-protein interactions occurring at ori lambda. Our results show that the dnaK protein specifically interacts with both lambda O and lambda P, and that the dnaJ protein specifically interacts with the dnaB helicase.
Yáñez, R J; Boursnell, M; Nogal, M L; Yuste, L; Viñuela, E
1993-01-01
A random sequencing strategy applied to two large SalI restriction fragments (SB and SD) of the African swine fever virus (ASFV) genome revealed that they might encode proteins similar to the two largest RNA polymerase subunits of eukaryotes, poxviruses and Escherichia coli. After further mapping by dot-blot hybridization, two large open reading frames (ORFs) were completely sequenced. The first ORF (NP1450L) encodes a protein of 1450 amino acids with extensive similarity to the largest subunit of RNA polymerases. The second one (EP1242L) codes for a protein of 1242 amino acids similar to the second largest RNA polymerase subunit. Proteins NP1450L and EP1242L are more similar to the corresponding subunits of eukaryotic RNA polymerase II than to those of vaccinia virus, the prototype poxvirus, which shares many functional characteristics with ASFV. ORFs NP1450L and EP1242L are mainly expressed late in ASFV infection, after the onset of DNA replication. Images PMID:8506138
Disruption of SMIM1 causes the Vel− blood type
Ballif, Bryan A; Helias, Virginie; Peyrard, Thierry; Menanteau, Cécile; Saison, Carole; Lucien, Nicole; Bourgouin, Sébastien; Le Gall, Maude; Cartron, Jean-Pierre; Arnaud, Lionel
2013-01-01
Here, we report the biochemical and genetic basis of the Vel blood group antigen, which has been a vexing mystery for decades, especially as anti-Vel regularly causes severe haemolytic transfusion reactions. The protein carrying the Vel blood group antigen was biochemically purified from red blood cell membranes. Mass spectrometry-based de novo peptide sequencing identified this protein to be small integral membrane protein 1 (SMIM1), a previously uncharacterized single-pass membrane protein. Expression of SMIM1 cDNA in Vel− cultured cells generated anti-Vel cell surface reactivity, confirming that SMIM1 encoded the Vel blood group antigen. A cohort of 70 Vel− individuals was found to be uniformly homozygous for a 17 nucleotide deletion in the coding sequence of SMIM1. The genetic homogeneity of the Vel− blood type, likely having a common origin, facilitated the development of two highly specific DNA-based tests for rapid Vel genotyping, which can be easily integrated into blood group genotyping platforms. These results answer a 60-year-old riddle and provide tools of immediate assistance to all clinicians involved in the care of Vel− patients. PMID:23505126
Colonization of heterochromatic genes by transposable elements in Drosophila.
Dimitri, Patrizio; Junakovic, Nikolaj; Arcà, Bruno
2003-04-01
As a further step toward understanding transposable element-host genome interactions, we investigated the molecular anatomy of introns from five heterochromatic and 22 euchromatic protein-coding genes of Drosophila melanogaster. A total of 79 kb of intronic sequences from heterochromatic genes and 355 kb of intronic sequences from euchromatic genes have been used in Blast searches against Drosophila transposable elements (TEs). The results show that TE-homologous sequences belonging to 19 different families represent about 50% of intronic DNA from heterochromatic genes. In contrast, only 0.1% of the euchromatic intron DNA exhibits homology to known TEs. Intraspecific and interspecific size polymorphisms of introns were found, which are likely to be associated with changes in TE-related sequences. Together, the enrichment in TEs and the apparent dynamic state of heterochromatic introns suggest that TEs contribute significantly to the evolution of genes located in heterochromatin.
Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris.
Kami, J; Velásquez, V B; Debouck, D G; Gepts, P
1995-01-01
Common bean (Phaseolus vulgaris) consists of two major geographic gene pools, one distributed in Mexico, Central America, and Colombia and the other in the southern Andes (southern Peru, Bolivia, and Argentina). Amplification and sequencing of members of the multigene family coding for phaseolin, the major seed storage protein of the common bean, provide evidence for accumulation of tandem direct repeats in both introns and exons during evolution of the multigene family in this species. The presumed ancestral phaseolin sequences, without tandem repeats, were found in recently discovered but nearly extinct wild common bean populations of Ecuador and northern Peru that are intermediate between the two major gene pools of the species based on geographical and molecular arguments. Our results illustrate the usefulness of tandem direct repeats in establishing the polarity of DNA sequence divergence and therefore in proposing phylogenies. Images Fig. 1 Fig. 3 PMID:7862642
de Souza, C R; Aragão, F J; Moreira, E C O; Costa, C N M; Nascimento, S B; Carvalho, L J
2009-03-24
Cassava is one of the most important tropical food crops for more than 600 million people worldwide. Transgenic technologies can be useful for increasing its nutritional value and its resistance to viral diseases and insect pests. However, tissue-specific promoters that guarantee correct expression of transgenes would be necessary. We used inverse polymerase chain reaction to isolate a promoter sequence of the Mec1 gene coding for Pt2L4, a glutamic acid-rich protein differentially expressed in cassava storage roots. In silico analysis revealed putative cis-acting regulatory elements within this promoter sequence, including root-specific elements that may be required for its expression in vascular tissues. Transient expression experiments showed that the Mec1 promoter is functional, since this sequence was able to drive GUS expression in bean embryonic axes. Results from our computational analysis can serve as a guide for functional experiments to identify regions with tissue-specific Mec1 promoter activity. The DNA sequence that we identified is a new promoter that could be a candidate for genetic engineering of cassava roots.
Shows, Kathryn H; Ward, Christy; Summers, Laura; Li, Lin; Ziegler, Gregory R; Hendrickx, Andrew G; Shiang, Rita
2006-02-01
Mutations in the human gene TCOF1 cause a mandibulofacial dysostosis known as Treacher Collins syndrome (TCS). An infant rhesus macaque (Macaca mulatta) that displayed the TCS phenotype was identified at the California National Primate Research Center. The TCOF1 coding region was cloned from a normal rhesus macaque and sequenced. The rhesus macaque homolog of TCOF1 is 91.6% identical in cDNA sequence and 93.8% identical in translated protein sequence compared to human TCOF1. Sequencing of TCOF1 in the TCS-affected rhesus macaque showed no mutations within the coding region or splice sites; however, real-time quantitative PCR showed an 87% reduction of spleen TCOF1 mRNA level in the TCS affected macaque when compared with normal macaque spleen.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-09-01
In this study, the complete mitogenome sequence of a cryptic species from East Australia (Mugil sp. H) belonging to the worldwide Mugil cephalus species complex (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,845 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop consists of 1067 bp length, and is located between tRNA-Pro and tRNA-Phe. The overall base composition of East Australia M. cephalus is 28.4% for A, 29.3% for C, 15.4% for G and 26.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Li, Huei-Ying; Chen, Pei-Lung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of Northwestern Pacific 2 (NWP2) cryptic species of flathead mullet, Mugil cephalus (Teleostei: Mugilidae) has been amplified by long-range PCR and sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,686 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop was 909 bp length and was located between tRNA-Pro and tRNA-Phe. The overall base composition of NWP2 M. cephalus was 28.4% for A, 29.8% for C, 26.5% for T and 15.3% for G. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for flathead mullet species complex.
Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji
2007-01-01
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
Rolland, N; Droux, M; Lebrun, M; Douce, R
1993-01-01
The last enzymatic step for L-cysteine biosynthesis is catalyzed by O-acetylserine(thiol)lyase (OASTL, EC 4.2.99.8) which synthesizes L-cysteine from O-acetylserine and "sulfide." We have isolated and characterized a full-length cDNA (1432 bp) from a lambda gt11 library of spinach leaf encoding the complete precursor of the chloroplast isoform. The 1149-nucleotide open reading frame coding for O-acetylserine(thiol)lyase was in the direction opposite that of the lambda gt11 beta-galactosidase gene. The derived amino acid sequence indicates that the protein precursor consists of 383 amino acid residues including a N-terminal presequence peptide of 52 residues. The amino acid sequence of mature spinach chloroplast O-acetylserine(thiol)lyase shows 40 and 57% homology with its bacterial counterparts. Sequence comparison with several pyridoxal 5'-phosphate-containing proteins reveals the presence of a lysine residue assumed to be involved in cofactor binding. A synthetic cDNA was constructed, coding for the entire 331-amino-acid mature O-acetylserine(thiol)lyase and for an initiating methionine. A high level of expression of the active mature chloroplast isoform was achieved in an Escherichia coli strain carrying the T7 RNA polymerase system (F. W. Studier, A. H. Rosenberg, J. J. Dunn, and J. W. Dubendorff, 1990, in Methods in Enzymology, D. V. Goeddel, Ed., Vol. 185, pp. 60-89, Academic Press, San Diego, CA). Addition of pyridoxine to the bacterial growth medium enhanced the enzyme activity due to the recombinant protein. The extent of production is 25-fold higher than in chloroplast from spinach leaves and the recombinant protein presents the relative molecular mass and immunological properties of the natural enzyme from spinach leaf chloroplast. This work, together with our previous biochemical studies, are in accordance with a prokaryotic type enzyme for L-cysteine biosynthesis in higher plant chloroplasts. Southern blot analysis indicated that O-acetylserine(thiol)lyase is encoded by multiple genes in the spinach leaf genomic DNA.
Xia, Hui; Li, Lingling; Yin, Zhouyang; Hou, Xiandeng; Zhu, Jun-Jie
2015-01-14
A dual signal amplification strategy for electrochemiluminescence (ECL) aptasensor was designed based on biobar-coded gold nanoparticles (Au NPs) and DNAzyme. CdSeTe@ZnS quantum dots (QDs) were chosen as the ECL signal probes. To verify the proposed ultrasensitive ECL aptasensor for biomolecules, we detected thrombin (Tb) as a proof-of-principle analyte. The hairpin DNA designed for the recognition of protein consists of two parts: the sequences of catalytical 8-17 DNAzyme and thrombin aptamer. Only in the presence of thrombin could the hairpin DNA be opened, followed by a recycling cleavage of excess substrates by catalytic core of the DNAzyme to induce the first-step amplification. One part of the fragments was captured to open the capture DNA modified on the Au electrode, which further connected with the prepared biobar-coded Au NPs-CdSeTe@ZnS QDs to get the final dual-amplified ECL signal. The limit of detection for Tb was 0.28 fM with excellent selectivity, and this proposed method possessed good performance in real sample analysis. This design introduces the new concept of dual-signal amplification by a biobar-coded system and DNAzyme recycling into ECL determination, and it is promising to be extended to provide a highly sensitive platform for various target biomolecules.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Depto, A.S.; Stenberg, R.M.
1989-03-01
To better understand the regulation of late gene expression in human cytomegalovirus (CMV)-infected cells, the authors examined expression of the gene that codes for the 65-kilodalton lower-matrix phosphoprotein (pp65). Analysis of RNA isolated at 72 h from cells infected with CMV Towne or ts66, a DNA-negative temperature-sensitive mutant, supported the fact that pp65 is expressed at low levels prior to viral DNA replication but maximally expressed after the initiation of viral DNA replication. To investigate promoter activation in a transient expression assay, the pp65 promoter was cloned into the indicator plasmid containing the gene for chloramphenicol acetyltransferase (CAT). Transfection ofmore » the promoter-CAT construct and subsequent superinfection with CMV resulted in activation of the promoter at early times after infection. Cotransfection with plasmids capable of expressing immediate-early (IE) proteins demonstrated that the promoter was activated by IE proteins and that both IE regions 1 and 2 were necessary. These studies suggest that interactions between IE proteins and this octamer sequence may be important for the regulation and expression of this CMV gene.« less
Klein, B; Pawlowski, K; Höricke-Grandpierre, C; Schell, J; Töpfer, R
1992-05-01
A cDNA encoding beta-ketoacyl-ACP reductase (EC 1.1.1.100), an integral part of the fatty acid synthase type II, was cloned from Cuphea lanceolata. This cDNA of 1276 bp codes for a polypeptide of 320 amino acids with 63 N-terminal residues presumably representing a transit peptide and 257 residues corresponding to the mature protein of 27 kDa. The encoded protein shows strong homology with the amino-terminal sequence and two tryptic peptides from avocado mesocarp beta-ketoacyl-ACP reductase, and its total amino acid composition is highly similar to those of the beta-ketoacyl-ACP reductases of avocado and spinach. Amino acid sequence homologies to polyketide synthase, beta-ketoreductases and short-chain alcohol dehydrogenases are discussed. An engineered fusion protein lacking most of the transit peptide, which was produced in Escherichia coli, was isolated and proved to possess beta-ketoacyl-ACP reductase activity. Hybridization studies revealed that in C. lanceolata beta-ketoacyl-ACP reductase is encoded by a small family of at least two genes and that members of this family are expressed in roots, leaves, flowers and seeds.
A deep learning method for lincRNA detection using auto-encoder algorithm.
Yu, Ning; Yu, Zeng; Pan, Yi
2017-12-06
RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
Kurtz, David T.; Feigelson, Philip
1977-01-01
A procedure is presented for the preparation of a 3H-labeled complementary DNA (cDNA) specific for the mRNA coding for α2u-globulin, a male rat liver protein under multihormonal control that represents approximately 1% of hepatic protein synthesis. Rat liver polysomes are incubated with monospecific rabbit antiserum to α2u-globulin, which binds to the nascent α2u-globulin chains on the polysomes. These antibody-polysome complexes are then adsorbed to goat antiserum to rabbit IgG that is covalently linked to p-aminobenzylcellulose. mRNA preparations are thus obtained that contain 30-40% α2u-globulin mRNA. A labeled cDNA is made to this α2u-globulin-enriched mRNA preparation by using RNA-dependent DNA polymerase (reverse transcriptase). To remove the non-α2u-globulin sequences, this cDNA preparation is hybridized to an RNA concentration × incubation time (R0t) of 1000 mol of ribonucleotide per liter × sec with female rat liver mRNA, which, though it shares the vast majority of mRNA sequences with male liver, contains no α2u-globulin mRNA sequences. The cDNA remaining single-stranded is isolated by hydroxylapatite chromatography and is shown to be specific for α2u-globulin mRNA by several criteria. Good correlation was found in all endocrine states studied between the hepatic level of α2u-globulin, the level of functional α2u-globulin mRNA as assayed in a wheat germ cell-free translational system, and the level of α2u-globulin mRNA sequences as measured by hybridization to the α2u-globulin cDNA. Thus, the hormonal control of hepatic α2u-globulin synthesis by sex steroids and thyroid hormone occurs through modulation of the cellular level of α2u-globulin mRNA sequences, presumably by hormonal control of transcriptive synthesis. PMID:73184
Sato, T; Oeller, P W; Theologis, A
1991-02-25
The key regulatory enzyme in the biosynthetic pathway of the plant hormone ethylene is 1-aminocyclopropane-1-carboxylic acid (ACC) synthase (EC 4.4.1.14). We have partially purified ACC synthase 6,000-fold from Cucurbita fruit tissue treated with indoleacetic acid + benzyladenine + aminooxyacetic acid + LiCl. The enzyme has a specific activity of 35,000 nmol/h/mg protein, a pH optimum of 9.5, an isoelectric point of 5.0, a Km of 17 microM with respect to S-adenosylmethionine, and is a dimer of two identical subunits of approximately 46,000 Da each. The subunit exists in vivo as a 55,000-Da species similar in size to the primary in vitro translation product. DNA sequence analysis of the cDNA clone pACC1 revealed that the coding region of the ACC synthase mRNA spans 493 amino acids corresponding to a 55,779-Da polypeptide; and expression of the coding sequence (pACC1) in Escherichia coli as a COOH terminus hybrid of beta-galactosidase or as a nonhybrid polypeptide catalyzed the conversion of S-adenosylmethionine to ACC (Sato, T., and Theologis, A. (1989) Proc. Natl. Acad. Sci. U.S.A. 86, 6621-6625). Immunoblotting experiments herein show that the molecular mass of the beta-galactosidase hybrid polypeptide is 170,000 Da, and the size of the largest nonhybrid polypeptide is 53,000 Da. The data suggest that the enzyme is post-translationally processed during protein purification.
Statistical and linguistic features of DNA sequences
NASA Technical Reports Server (NTRS)
Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
Engqvist, Martin K M; Nielsen, Jens
2015-08-21
The Ambiguous Nucleotide Tool (ANT) is a desktop application that generates and evaluates degenerate codons. Degenerate codons are used to represent DNA positions that have multiple possible nucleotide alternatives. This is useful for protein engineering and directed evolution, where primers specified with degenerate codons are used as a basis for generating libraries of protein sequences. ANT is intuitive and can be used in a graphical user interface or by interacting with the code through a defined application programming interface. ANT comes with full support for nonstandard, user-defined, or expanded genetic codes (translation tables), which is important because synthetic biology is being applied to an ever widening range of natural and engineered organisms. The Python source code for ANT is freely distributed so that it may be used without restriction, modified, and incorporated in other software or custom data pipelines.
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614
Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang
2014-01-01
Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.
Vargas-Caro, Carolina; Bustamante, Carlos; Lamilla, Julio; Bennett, Michael B; Ovenden, Jennifer R
2016-07-01
The complete mitochondrial genome of the roughskin skate Dipturus trachyderma is described from 1 455 724 sequences obtained using Illumina NGS technology. Total length of the mitogenome was 16 909 base pairs, comprising 2 rRNAs, 13 protein-coding genes, 22 tRNAs and 2 non-coding regions. Phylogenetic analysis based on mtDNA revealed low genetic divergence among longnose skates, in particular, those dwelling the continental shelf and slope off the coasts of Chile and Argentina.
On the path to genetic novelties: insights from programmed DNA elimination and RNA splicing.
Catania, Francesco; Schmitz, Jürgen
2015-01-01
Understanding how genetic novelties arise is a central goal of evolutionary biology. To this end, programmed DNA elimination and RNA splicing deserve special consideration. While programmed DNA elimination reshapes genomes by eliminating chromatin during organismal development, RNA splicing rearranges genetic messages by removing intronic regions during transcription. Small RNAs help to mediate this class of sequence reorganization, which is not error-free. It is this imperfection that makes programmed DNA elimination and RNA splicing excellent candidates for generating evolutionary novelties. Leveraging a number of these two processes' mechanistic and evolutionary properties, which have been uncovered over the past years, we present recently proposed models and empirical evidence for how splicing can shape the structure of protein-coding genes in eukaryotes. We also chronicle a number of intriguing similarities between the processes of programmed DNA elimination and RNA splicing, and highlight the role that the variation in the population-genetic environment may play in shaping their target sequences. © 2015 Wiley Periodicals, Inc.
Kshirsagar, Rucha; Khan, Krishnendu; Joshi, Mamata V; Hosur, Ramakrishna V; Muniyappa, K
2017-05-23
A plethora of evidence suggests that different types of DNA quadruplexes are widely present in the genome of all organisms. The existence of a growing number of proteins that selectively bind and/or process these structures underscores their biological relevance. Moreover, G-quadruplex DNA has been implicated in the alignment of four sister chromatids by forming parallel guanine quadruplexes during meiosis; however, the underlying mechanism is not well defined. Here we show that a G/C-rich motif associated with a meiosis-specific DNA double-strand break (DSB) in Saccharomyces cerevisiae folds into G-quadruplex, and the C-rich sequence complementary to the G-rich sequence forms an i-motif. The presence of G-quadruplex or i-motif structures upstream of the green fluorescent protein-coding sequence markedly reduces the levels of gfp mRNA expression in S. cerevisiae cells, with a concomitant decrease in green fluorescent protein abundance, and blocks primer extension by DNA polymerase, thereby demonstrating the functional significance of these structures. Surprisingly, although S. cerevisiae Hop1, a component of synaptonemal complex axial/lateral elements, exhibits strong affinity to G-quadruplex DNA, it displays a much weaker affinity for the i-motif structure. However, the Hop1 C-terminal but not the N-terminal domain possesses strong i-motif binding activity, implying that the C-terminal domain has a distinct substrate specificity. Additionally, we found that Hop1 promotes intermolecular pairing between G/C-rich DNA segments associated with a meiosis-specific DSB site. Our results support the idea that the G/C-rich motifs associated with meiosis-specific DSBs fold into intramolecular G-quadruplex and i-motif structures, both in vitro and in vivo, thus revealing an important link between non-B form DNA structures and Hop1 in meiotic chromosome synapsis and recombination. Copyright © 2017 Biophysical Society. Published by Elsevier Inc. All rights reserved.
Schuster, W; Wissinger, B; Unseld, M; Brennicke, A
1990-01-01
A number of cytosines are altered to be recognized as uridines in transcripts of the nad3 locus in mitochondria of the higher plant Oenothera. Such nucleotide modifications can be found at 16 different sites within the nad3 coding region. Most of these alterations in the mRNA sequence change codon identities to specify amino acids better conserved in evolution. Individual cDNA clones differ in their degree of editing at five nucleotide positions, three of which are silent, while two lead to codon alterations specifying different amino acids. None of the cDNA clones analysed is maximally edited at all possible sites, suggesting slow processing or lowered stringency of editing at these nucleotides. Differentially edited transcripts could be editing intermediates or could code for differing polypeptides. Two edited nucleotides in an open reading frame located upstream of nad3 change two amino acids in the deduced polypeptide. Part of the well-conserved ribosomal protein gene rps12 also encoded downstream of nad3 in other plants, is lost in Oenothera mitochondria by recombination events. The functional rps12 protein must be imported from the cytoplasm since the deleted sequences of this gene are not found in the Oenothera mitochondrial genome. The pseudogene sequence is not edited at any nucleotide position. Images Fig. 3. Fig. 4. Fig. 7. PMID:1688531
Scaling features of noncoding DNA
NASA Technical Reports Server (NTRS)
Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.
1999-01-01
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Cell cycle dependent intracellular distribution of two spliced isoforms of TCP/ILF3 proteins.
Xu, You Hai; Leonova, Tatyana; Grabowski, Gregory A
2003-12-01
TCP80 is an approximately 80kDa mammalian cytoplasmic protein that binds to a set of mRNAs and inhibits their translation in vitro and ex vivo. This protein has high sequence similarity to interleukin-2 enhancer-binding factors (NF90/ILF3) and the M-phase phosphoprotein (MPP4)/DRBP76. A 110kDa immunologic isoform of TCP80/NF90/MPP4/DRBP76, termed TCP110, also is present in cytoplasm and nuclei of many types of cells. A cDNA sequence coding for TCP110 was derived by 5(')RACE. The TCP110 sequence is identical to ILF3. The gene coding for TCP110/ILF3 mapped to human chromosome 19 and the gene organization was analyzed using TCP80 and TCP110/ILF3 cDNA sequences. The TCP/ILF3 gene spans >34.8kb and contains 21 exons. At least one alternatively spliced product involving exons 19-21 exists and predicts the formation of either TCP80 or TCP110/ILF3. However, the functional relationships of TCP80 and TCP110/ILF3 required elucidation. The metabolic turnover rates and subcellular distribution of TCP80 and TCP110/ILF3 during the cell cycle showed TCP80 to be relatively stable (t(1/2)=5 days) in the cytoplasmic compartment. In comparison, TCP110/ILF3 migrated between the cytoplasmic and nuclear compartments during the cell cycle. The TCP110 C-terminal segment contains an additional nuclear localizing signal that plays a role in its nuclear translocation. This study indicates that the multiple cellular functions, i.e., translation control, interleukin-2 enhancer binding, or cell division, of TCP/ILF3 are fulfilled by alternatively spliced isoforms.
Sequence Polishing Library (SPL) v10.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oberortner, Ernst
The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
A new age in functional genomics using CRISPR/Cas9 in arrayed library screening.
Agrotis, Alexander; Ketteler, Robin
2015-01-01
CRISPR technology has rapidly changed the face of biological research, such that precise genome editing has now become routine for many labs within several years of its initial development. What makes CRISPR/Cas9 so revolutionary is the ability to target a protein (Cas9) to an exact genomic locus, through designing a specific short complementary nucleotide sequence, that together with a common scaffold sequence, constitute the guide RNA bridging the protein and the DNA. Wild-type Cas9 cleaves both DNA strands at its target sequence, but this protein can also be modified to exert many other functions. For instance, by attaching an activation domain to catalytically inactive Cas9 and targeting a promoter region, it is possible to stimulate the expression of a specific endogenous gene. In principle, any genomic region can be targeted, and recent efforts have successfully generated pooled guide RNA libraries for coding and regulatory regions of human, mouse and Drosophila genomes with high coverage, thus facilitating functional phenotypic screening. In this review, we will highlight recent developments in the area of CRISPR-based functional genomics and discuss potential future directions, with a special focus on mammalian cell systems and arrayed library screening.
Carr, Ian M; Morgan, Joanne; Watson, Christopher; Melnik, Svitlana; Diggle, Christine P; Logan, Clare V; Harrison, Sally M; Taylor, Graham R; Pena, Sergio D J; Markham, Alexander F; Alkuraya, Fowzan S; Black, Graeme C M; Ali, Manir; Bonthron, David T
2013-07-01
Massively parallel ("next generation") DNA sequencing (NGS) has quickly become the method of choice for seeking pathogenic mutations in rare uncharacterized monogenic diseases. Typically, before DNA sequencing, protein-coding regions are enriched from patient genomic DNA, representing either the entire genome ("exome sequencing") or selected mapped candidate loci. Sequence variants, identified as differences between the patient's and the human genome reference sequences, are then filtered according to various quality parameters. Changes are screened against datasets of known polymorphisms, such as dbSNP and the 1000 Genomes Project, in the effort to narrow the list of candidate causative variants. An increasing number of commercial services now offer to both generate and align NGS data to a reference genome. This potentially allows small groups with limited computing infrastructure and informatics skills to utilize this technology. However, the capability to effectively filter and assess sequence variants is still an important bottleneck in the identification of deleterious sequence variants in both research and diagnostic settings. We have developed an approach to this problem comprising a user-friendly suite of programs that can interactively analyze, filter and screen data from enrichment-capture NGS data. These programs ("Agile Suite") are particularly suitable for small-scale gene discovery or for diagnostic analysis. © 2013 WILEY PERIODICALS, INC.
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tanaka, Yoshiyuki; Matsuoka, Makoto; Yamanoto, Naoki
A cDNA clone for phenylalanine ammonia-lyase (PAL) induced in wounded sweet potato (Ipomoea batatas Lam.) root was obtained by immunoscreening a cDNA library. The protein produced in Escherichia coli cells containing the plasmid pPAL02 was indistinguishable from sweet potato PAL as judged by Ouchterlony double diffusion assays. The M{sub r} of its subunit was 77,000. The cells converted ({sup 14}C)-L-phenylalanine into ({sup 14}C)-t-cinnamic acid and PAL activity was detected in the homogenate of the cells. The activity was dependent on the presence of the pPAL02 plasmid DNA. The nucleotide sequence of the cDNA contained a 2,121-base pair (bp) open-reading framemore » capable of coding for a polypeptide with 707 amino acids (M{sub r} 77,137), a 22-bp 5{prime}-noncoding region and a 207-bp 3{prime}-noncoding region. The results suggest that the insert DNA fully encoded the amino acid sequence for sweet potato PAL that is induced by wounding. Comparison of the deduced amino acid sequence with that of a PAL cDNA fragment from Phaseolus vulgaris revealed 78.9% homology. The sequence from amino acid residues 258 to 494 was highly conserved, showing 90.7% homology.« less
Gruszka, Damian; Marzec, Marek; Szarejko, Iwona
2012-06-14
The high level of conservation of genes that regulate DNA replication and repair indicates that they may serve as a source of information on the origin and evolution of the species and makes them a reliable system for the identification of cross-species homologs. Studies that had been conducted to date shed light on the processes of DNA replication and repair in bacteria, yeast and mammals. However, there is still much to be learned about the process of DNA damage repair in plants. These studies, which were conducted mainly using bioinformatics tools, enabled the list of genes that participate in various pathways of DNA repair in Arabidopsis thaliana (L.) Heynh to be outlined; however, information regarding these mechanisms in crop plants is still very limited. A similar, functional approach is particularly difficult for a species whose complete genomic sequences are still unavailable. One of the solutions is to apply ESTs (Expressed Sequence Tags) as the basis for gene identification. For the construction of the barley EST DNA Replication and Repair Database (bEST-DRRD), presented here, the Arabidopsis nucleotide and protein sequences involved in DNA replication and repair were used to browse for and retrieve the deposited sequences, derived from four barley (Hordeum vulgare L.) sequence databases, including the "Barley Genome version 0.05" database (encompassing ca. 90% of barley coding sequences) and from two databases covering the complete genomes of two monocot models: Oryza sativa L. and Brachypodium distachyon L. in order to identify homologous genes. Sequences of the categorised Arabidopsis queries are used for browsing the repositories, which are located on the ViroBLAST platform. The bEST-DRRD is currently used in our project during the identification and validation of the barley genes involved in DNA repair. The presented database provides information about the Arabidopsis genes involved in DNA replication and repair, their expression patterns and models of protein interactions. It was designed and established to provide an open-access tool for the identification of monocot homologs of known Arabidopsis genes that are responsible for DNA-related processes. The barley genes identified in the project are currently being analysed to validate their function.
Large-Scale Concatenation cDNA Sequencing
Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.
1997-01-01
A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174
Churchill, M E; Jones, D N; Glaser, T; Hefner, H; Searles, M A; Travers, A A
1995-01-01
The high mobility group (HMG) protein HMG-D from Drosophila melanogaster is a highly abundant chromosomal protein that is closely related to the vertebrate HMG domain proteins HMG1 and HMG2. In general, chromosomal HMG domain proteins lack sequence specificity. However, using both NMR spectroscopy and standard biochemical techniques we show that binding of HMG-D to a single DNA site is sequence selective. The preferred duplex DNA binding site comprises at least 5 bp and contains the deformable dinucleotide TG embedded in A/T-rich sequences. The TG motif constitutes a common core element in the binding sites of the well-characterized sequence-specific HMG domain proteins. We show that a conserved aromatic residue in helix 1 of the HMG domain may be involved in recognition of this core sequence. In common with other HMG domain proteins HMG-D binds preferentially to DNA sites that are stably bent and underwound, therefore HMG-D can be considered an architecture-specific protein. Finally, we show that HMG-D bends DNA and may confer a superhelical DNA conformation at a natural DNA binding site in the Drosophila fushi tarazu scaffold-associated region. Images PMID:7720717
Drosophila Melanogaster Mitochondrial DNA: Gene Organization and Evolutionary Considerations
Garesse, R.
1988-01-01
The sequence of a 8351-nucleotide mitochondrial DNA (mtDNA) fragment has been obtained extending the knowledge of the Drosophila melanogaster mitochondrial genome to 90% of its coding region. The sequence encodes seven polypeptides, 12 tRNAs and the 3' end of the 16S rRNA and CO III genes. The gene organization is strictly conserved with respect to the Drosophila yakuba mitochondrial genome, and different from that found in mammals and Xenopus. The high A + T content of D. melanogaster mitochondrial DNA is reflected in a reiterative codon usage, with more than 90% of the codons ending in T or A, G + C rich codons being practically absent. The average level of homology between the D. melanogaster and D. yakuba sequences is very high (roughly 94%), although insertion and deletions have been detected in protein, tRNA and large ribosomal genes. The analysis of nucleotide changes reveals a similar frequency for transitions and transversions, and reflects a strong bias against G+C on both strands. The predominant type of transition is strand specific. PMID:3130291
Design and construction of functional AAV vectors.
Gray, John T; Zolotukhin, Serge
2011-01-01
Using the basic principles of molecular biology and laboratory techniques presented in this chapter, researchers should be able to create a wide variety of AAV vectors for both clinical and basic research applications. Basic vector design concepts are covered for both protein coding gene expression and small non-coding RNA gene expression cassettes. AAV plasmid vector backbones (available via AddGene) are described, along with critical sequence details for a variety of modular expression components that can be inserted as needed for specific applications. Protocols are provided for assembling the various DNA components into AAV vector plasmids in Escherichia coli, as well as for transferring these vector sequences into baculovirus genomes for large-scale production of AAV in the insect cell production system.
Wu, Yueh-Lung; Wu, Carol-P; Huang, Yu-Hui; Huang, Sheng-Ping; Lo, Huei-Ru; Chang, Hao-Shuo; Lin, Pi-Hsiu; Wu, Ming-Cheng; Chang, Chia-Jung; Chao, Yu-Chan
2014-11-01
The p143 gene from Autographa californica multinucleocapsid nucleopolyhedrovirus (AcMNPV) has been found to increase the expression of luciferase, which is driven by the polyhedrin gene promoter, in a plasmid with virus coinfection. Further study indicated that this is due to the presence of a replication origin (ori) in the coding region of this gene. Transient DNA replication assays showed that a specific fragment of the p143 coding sequence, p143-3, underwent virus-dependent DNA replication in Spodoptera frugiperda IPLB-Sf-21 (Sf-21) cells. Deletion analysis of the p143-3 fragment showed that subfragment p143-3.2a contained the essential sequence of this putative ori. Sequence analysis of this region revealed a unique distribution of imperfect palindromes with high AT contents. No sequence homology or similarity between p143-3.2a and any other known ori was detected, suggesting that it is a novel baculovirus ori. Further study showed that the p143-3.2a ori can replicate more efficiently in infected Sf-21 cells than baculovirus homologous regions (hrs), the major baculovirus ori, or non-hr oris during virus replication. Previously, hr on its own was unable to replicate in mammalian cells, and for mammalian viral oris, viral proteins are generally required for their proper replication in host cells. However, the p143-3.2a ori was, surprisingly, found to function as an efficient ori in mammalian cells without the need for any viral proteins. We conclude that p143 contains a unique sequence that can function as an ori to enhance gene expression in not only insect cells but also mammalian cells. Baculovirus DNA replication relies on both hr and non-hr oris; however, so far very little is known about the latter oris. Here we have identified a new non-hr ori, the p143 ori, which resides in the coding region of p143. By developing a novel DNA replication-enhanced reporter system, we have identified and located the core region required for the p143 ori. This ori contains a large number of imperfect inverted repeats and is the most active ori in the viral genome during virus infection in insect cells. We also found that it is a unique ori that can replicate in mammalian cells without the assistance of baculovirus gene products. The identification of this ori should contribute to a better understanding of baculovirus DNA replication. Also, this ori is very useful in assisting with gene expression in mammalian cells. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Prevalence of transcription promoters within archaeal operons and coding sequences.
Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S
2009-01-01
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
Degenerative Minimalism in the Genome of a Psyllid Endosymbiont
Clark, Marta A.; Baumann, Linda; Thao, MyLo Ly; Moran, Nancy A.; Baumann, Paul
2001-01-01
Psyllids, like aphids, feed on plant phloem sap and are obligately associated with prokaryotic endosymbionts acquired through vertical transmission from an ancestral infection. We have sequenced 37 kb of DNA of the genome of Carsonella ruddii, the endosymbiont of psyllids, and found that it has a number of unusual properties revealing a more extreme case of degeneration than was previously reported from studies of eubacterial genomes, including that of the aphid endosymbiont Buchnera aphidicola. Among the unusual properties are an exceptionally low guanine-plus-cytosine content (19.9%), almost complete absence of intergenic spaces, operon fusion, and lack of the usual promoter sequences upstream of 16S rDNA. These features suggest the synthesis of long mRNAs and translational coupling. The most extreme instances of base compositional bias occur in the genes encoding proteins that have less highly conserved amino acid sequences; the guanine-plus-cytosine content of some protein-coding sequences is as low as 10%. The shift in base composition has a large effect on proteins: in polypeptides of C. ruddii, half of the residues consist of five amino acids with codons low in guanine plus cytosine. Furthermore, the proteins of C. ruddii are reduced in size, with an average of about 9% fewer amino acids than in homologous proteins of related bacteria. These observations suggest that the C. ruddii genome is not subject to constraints that limit the evolution of other known eubacteria. PMID:11222582
Primary structure and mapping of the hupA gene of Salmonella typhimurium.
Higgins, N P; Hillyard, D
1988-01-01
In bacteria, the complex nucleoid structure is folded and maintained by negative superhelical tension and a set of type II DNA-binding proteins, also called histonelike proteins. The most abundant type II DNA-binding protein is HU. Southern blot analysis showed that Salmonella typhimurium contained two HU genes that corresponded to Escherichia coli genes hupA (encoding HU-2 protein) and hupB (encoding HU-1). Salmonella hupA was cloned, and the nucleotide sequence of the gene was determined. Comparison of hupA of E. coli and S. typhimurium revealed that the HU-2 proteins were identical and that there was high conservation of nucleotide sequences outside the coding frames of the genes. A 300-member genomic library of S. typhimurium was constructed by using random transposition of MudP, a specialized chimeric P22-Mu phage that packages chromosomal DNA unidirectionally from its insertion point. Oligonucleotide hybridization against the library identified one MudP insertion that lies within 28 kilobases of hupA; the MudP was 12% linked to purH at 90.5 min on the standard map. Plasmids expressing HU-2 had a surprising phenotype; they caused growth arrest when they were introduced into E. coli strains bearing a himA or hip mutation. These results suggest that IHF and HU have interactive roles in bacteria. Images PMID:3056912
Resurrection of DNA Function In Vivo from an Extinct Genome
Pask, Andrew J.; Behringer, Richard R.; Renfree, Marilyn B.
2008-01-01
There is a burgeoning repository of information available from ancient DNA that can be used to understand how genomes have evolved and to determine the genetic features that defined a particular species. To assess the functional consequences of changes to a genome, a variety of methods are needed to examine extinct DNA function. We isolated a transcriptional enhancer element from the genome of an extinct marsupial, the Tasmanian tiger (Thylacinus cynocephalus or thylacine), obtained from 100 year-old ethanol-fixed tissues from museum collections. We then examined the function of the enhancer in vivo. Using a transgenic approach, it was possible to resurrect DNA function in transgenic mice. The results demonstrate that the thylacine Col2A1 enhancer directed chondrocyte-specific expression in this extinct mammalian species in the same way as its orthologue does in mice. While other studies have examined extinct coding DNA function in vitro, this is the first example of the restoration of extinct non-coding DNA and examination of its function in vivo. Our method using transgenesis can be used to explore the function of regulatory and protein-coding sequences obtained from any extinct species in an in vivo model system, providing important insights into gene evolution and diversity. PMID:18493600
The complete chloroplast genome sequence of Dendrobium officinale.
Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui
2016-01-01
The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.
The complete mitochondrial genome sequence of Eimeria magna (Apicomplexa: Coccidia).
Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Liu, Guo-Hua; Wang, Chun-Ren; Zhu, Xing-Quan
2015-01-01
In the present study, we determined the complete mitochondrial DNA (mtDNA) sequence of Eimeria magna from rabbits for the first time, and compared its gene contents and genome organizations with that of seven Eimeria spp. from domestic chickens. The size of the complete mt genome sequence of E. magna is 6249 bp, which consists of 3 protein-coding genes (cytb, cox1 and cox3), 12 gene fragments for the large subunit (LSU) rRNA, and 7 gene fragments for the small subunit (SSU) rRNA, without transfer RNA genes, in accordance with that of Eimeria spp. from chickens. The putative direction of translation for three genes (cytb, cox1 and cox3) was the same as those of Eimeria species from domestic chickens. The content of A + T is 65.16% for E. magna mt genome (29.73% A, 35.43% T, 17.09 G and 17.75% C). The E. magna mt genome sequence provides novel mtDNA markers for studying the molecular epidemiology and population genetics of Eimeria spp. and has implications for the molecular diagnosis and control of rabbit coccidiosis.
Zhou, Man; Hu, Qian; Li, Zhigang; Li, Dayong; Chen, Chin-Fu; Luo, Hong
2011-01-01
Background Turfgrass species are agriculturally and economically important perennial crops. Turfgrass species are highly susceptible to a wide range of fungal pathogens. Dollar spot and brown patch, two important diseases caused by fungal pathogens Sclerotinia homoecarpa and Rhizoctonia solani, respectively, are among the most severe turfgrass diseases. Currently, turf fungal disease control mainly relies on fungicide treatments, which raises many concerns for human health and the environment. Antimicrobial peptides found in various organisms play an important role in innate immune response. Methodology/Principal Findings The antimicrobial peptide - Penaeidin4-1 (Pen4-1) from the shrimp, Litopenaeus setiferus has been reported to possess in vitro antifungal and antibacterial activities against various economically important fungal and bacterial pathogens. In this study, we have studied the feasibility of using this novel peptide for engineering enhanced disease resistance into creeping bentgrass plants (Agrostis stolonifera L., cv. Penn A-4). Two DNA constructs were prepared containing either the coding sequence of a single peptide, Pen4-1 or the DNA sequence coding for the transit signal peptide of the secreted tobacco AP24 protein translationally fused to the Pen4-1 coding sequence. A maize ubiquitin promoter was used in both constructs to drive gene expression. Transgenic turfgrass plants containing different DNA constructs were generated by Agrobacterium-mediated transformation and analyzed for transgene insertion and expression. In replicated in vitro and in vivo experiments under controlled environments, transgenic plants exhibited significantly enhanced resistance to dollar spot and brown patch, the two major fungal diseases in turfgrass. The targeting of Pen4-1 to endoplasmic reticulum by the transit peptide of AP24 protein did not significantly impact disease resistance in transgenic plants. Conclusion/Significance Our results demonstrate the effectiveness of Pen4-1 in a perennial species against fungal pathogens and suggest a potential strategy for engineering broad-spectrum fungal disease resistance in crop species. PMID:21931807
Schiavo, G; Strillacci, M G; Ribani, A; Bovo, S; Roman-Ponce, S I; Cerolini, S; Bertolini, F; Bagnato, A; Fontanesi, L
2018-06-01
Mitochondrial DNA (mtDNA) insertions have been detected in the nuclear genome of many eukaryotes. These sequences are pseudogenes originated by horizontal transfer of mtDNA fragments into the nuclear genome, producing nuclear DNA sequences of mitochondrial origin (numt). In this study we determined the frequency and distribution of mtDNA-originated pseudogenes in the turkey (Meleagris gallopavo) nuclear genome. The turkey reference genome (Turkey_2.01) was aligned with the reference linearized mtDNA sequence using last. A total of 32 numt sequences (corresponding to 18 numt regions derived by unique insertional events) were identified in the turkey nuclear genome (size ranging from 66 to 1415 bp; identity against the modern turkey mtDNA corresponding region ranging from 62% to 100%). Numts were distributed in nine chromosomes and in one scaffold. They derived from parts of 10 mtDNA protein-coding genes, ribosomal genes, the control region and 10 tRNA genes. Seven numt regions reported in the turkey genome were identified in orthologues positions in the Gallus gallus genome and therefore were present in the ancestral genome that in the Cretaceous originated the lineages of the modern crown Galliformes. Five recently integrated turkey numts were validated by PCR in 168 turkeys of six different domestic populations. None of the analysed numts were polymorphic (i.e. absence of the inserted sequence, as reported in numts of recent integration in other species), suggesting that the reticulate speciation model is not useful for explaining the origin of the domesticated turkey lineage. © 2018 Stichting International Foundation for Animal Genetics.
Dimitrieva, Slavica; Anisimova, Maria
2014-01-01
In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
Cloning and characterization of a DNA polymerase beta gene from Trypanosoma cruzi.
Venegas, Juan A; Aslund, Lena; Solari, Aldo
2009-06-01
A gene coding for a DNA polymerase beta from the Trypanosoma cruzi Miranda clone, belonging to the TcI lineage, was cloned (Miranda Tcpol beta), using the information from eight peptides of the T. cruzi beta-like DNA polymerase purified previously. The gene encodes for a protein of 403 amino acids which is very similar to the two T. cruzi CL Brener (TcIIe lineage) sequences published, but has three different residues in highly conserved segments. At the amino acid level, the identity of TcI-pol beta with mitochondrial pol beta and pol beta-PAK from other trypanosomatids was between 68-80% and 22-30%, respectively. Miranda Tc-pol beta protein has an N-terminal sequence similar to that described in the mitochondrial Crithidia fasciculata pol beta, which suggests that the TcI-pol beta plays a role in the organelle. Northern and Western analyses showed that this T. cruzi gene is highly expressed both in proliferative and non-proliferative developmental forms. These results suggest that, in addition to replication of kDNA in proliferative cells, this enzyme may have another function in non-proliferative cells, such as DNA repair role similar to that which has extensively been described in a vast spectrum of eukaryotic cells.
Improvement and Optimization of Two Engineered Phage Resistance Mechanisms in Lactococcus lactis
McGrath, Stephen; Fitzgerald, Gerald F.; van Sinderen, Douwe
2001-01-01
Homologous replication module genes were identified for four P335 type phages. DNA sequence analysis revealed that all four phages exhibited more than 90% DNA homology for at least two genes, designated rep2009 and orf17. One of these genes, rep2009, codes for a putative replisome organizer protein and contains an assumed origin of phage DNA replication (ori2009), which was identical for all four phages. DNA fragments representing the ori2009 sequence confer a phage-encoded resistance (Per) phenotype on lactococcal hosts when they are supplied on a high-copy-number vector. Furthermore, cloning multiple copies of the ori2009 sequence was found to increase the effectiveness of the Per phenotype conferred. A number of antisense plasmids targeting specific genes of the replication module were constructed. Two separate plasmids targeting rep2009 and orf17 were found to efficiently inhibit proliferation of all four phages by interfering with intracellular phage DNA replication. These results represent two highly effective strategies for inhibiting bacteriophage proliferation, and they also identify a novel gene, orf17, which appears to be important for phage DNA replication. Furthermore, these results indicate that although the actual mechanisms of DNA replication are very similar, if not identical, for all four phages, expression of the replication genes is significantly different in each case. PMID:11157223
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Role of indirect readout mechanism in TATA box binding protein-DNA interaction.
Mondal, Manas; Choudhury, Devapriya; Chakrabarti, Jaydeb; Bhattacharyya, Dhananjay
2015-03-01
Gene expression generally initiates from recognition of TATA-box binding protein (TBP) to the minor groove of DNA of TATA box sequence where the DNA structure is significantly different from B-DNA. We have carried out molecular dynamics simulation studies of TBP-DNA system to understand how the DNA structure alters for efficient binding. We observed rigid nature of the protein while the DNA of TATA box sequence has an inherent flexibility in terms of bending and minor groove widening. The bending analysis of the free DNA and the TBP bound DNA systems indicate presence of some similar structures. Principal coordinate ordination analysis also indicates some structural features of the protein bound and free DNA are similar. Thus we suggest that the DNA of TATA box sequence regularly oscillates between several alternate structures and the one suitable for TBP binding is induced further by the protein for proper complex formation.
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
McCutchen-Maloney, Sandra L.
2002-01-01
Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.
Chiu, Chi-Chien; John, Joseph Abraham Christopher; Hseu, Tzong-Hsiung; Chang, Chi-Yao
2002-03-01
The pituitary-specific transcription factor Pit-1 belongs to the family of POU-domain proteins and is known to play an important role in the differentiation of pituitary cells. Here we report the complete nucleotide sequence of cDNA encoding Pit-1 from the brackish water fish, ayu (Plecoglossus altivelis). Nucleotide sequence analysis of 1910 bp of ayu Pit-1 cDNA revealed an open reading frame of 1074 bp that encodes a protein of 358 amino acids containing a POU-specific domain, POU homeodomain, and an STA (Ser/Thr-rich activation) transactivation domain. We inserted the coding region of Pit-1 cDNA, obtained by PCR, into a pET-20b(+) plasmid to produce recombinant Pit-1 in Escherichia coli BL21 (DE3) pLysS cells. Upon induction with isopropyl beta-D-thiogalactopyranoside, Pit-1 was expressed and accumulated as inclusion bodies in E. coli. The protein was then purified in one step by affinity chromatography on a nickel-nitrilotriacetic acid agarose column under denaturing conditions. This method yielded 0.7 mg of highly pure and stable protein per 200 ml of bacterial culture. A band of 40 kDa, resolved as recombinant ayu Pit-1 by sodium dodecyl sulfate-polyacrylamide gel electrophoresis, agrees well with the molecular mass calculated from the translated cDNA sequence. The purified recombinant Pit-1 was confirmed in vitro through Western blot analysis, using its monoclonal antibody. This monoclonal antibody detected Pit-1 in the nuclei of ayu developing pituitary by immunohistochemical reaction. It serves as a good reagent for the detection of ayu Pit-1 in situ. Copyright 2002 Elsevier Science (USA).
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-06-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation.
Fukuda, Tomoyuki; Ohta, Kunihiro; Ohya, Yoshikazu
2006-01-01
VMA1-derived endonuclease (VDE), a homing endonuclease in Saccharomyces cerevisiae, is encoded by the mobile intein-coding sequence within the nuclear VMA1 gene. VDE recognizes and cleaves DNA at the 31-bp VDE recognition sequence (VRS) in the VMA1 gene lacking the intein-coding sequence during meiosis to insert a copy of the intein-coding sequence at the cleaved site. The mechanism underlying the meiosis specificity of VMA1 intein-coding sequence homing remains unclear. We studied various factors that might influence the cleavage activity in vivo and found that VDE binding to the VRS can be detected only when DNA cleavage by VDE takes place, implying that meiosis-specific DNA cleavage is regulated by the accessibility of VDE to its target site. As a possible candidate for the determinant of this accessibility, we analyzed chromatin structure around the VRS and revealed that local chromatin structure near the VRS is altered during meiosis. Although the meiotic chromatin alteration exhibits correlations with DNA binding and cleavage by VDE at the VMA1 locus, such a chromatin alteration is not necessarily observed when the VRS is embedded in ectopic gene loci. This suggests that nucleosome positioning or occupancy around the VRS by itself is not the sole mechanism for the regulation of meiosis-specific DNA cleavage by VDE and that other mechanisms are involved in the regulation. PMID:16757746
Non-coding RNAs in lung cancer
Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio
2014-01-01
The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996
Zhou, Jinlin; Gong, Haiyan; Zhou, Yongzhi; Xuan, Xuenan; Fujisaki, Kozo
2006-12-01
A cDNA coding a glycine-rich protein was identified from the Rhipicephalus haemaphysaloides tick. The cDNA named here as RH50 was 1,823 bp, including a single open reading frame (ORF) of 1,518 nucleotides. The ORF encodes a polypeptide of 506 amino acid residues with a size of 50 kDa, as calculated by a computer. The predicted amino acid sequence of RH50 showed a low homology to sequences of some known extracellular matrix-like proteins. The native protein was identified in both the fed tick salivary gland lysates and extracts of cement material using the serum against the recombinant protein. Reverse transcription polymerase chain reaction results showed that RH50 mRNA was only transcribed in partially fed tick salivary glands, not in unfed tick salivary glands or partially fed tick midgut, fat body, or ovary. The differential expression of RH50 protein in fed tick salivary glands was confirmed by immunofluorescence. The low attachment rate both in the adult and nymphal tick, and the high mortality of immature ticks (nymph) feeding on recombinant RH50-immunized rabbits were found. These results show that the RH50 protein could be a useful candidate for anti-tick vaccine development.
Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G
1995-01-01
The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Adachi, Noboru; Umetsu, Kazuo; Shojo, Hideki
2014-01-01
Mitochondrial DNA (mtDNA) is widely used for DNA analysis of highly degraded samples because of its polymorphic nature and high number of copies in a cell. However, as endogenous mtDNA in deteriorated samples is scarce and highly fragmented, it is not easy to obtain reliable data. In the current study, we report the risks of direct sequencing mtDNA in highly degraded material, and suggest a strategy to ensure the quality of sequencing data. It was observed that direct sequencing data of the hypervariable segment (HVS) 1 by using primer sets that generate an amplicon of 407 bp (long-primer sets) was different from results obtained by using newly designed primer sets that produce an amplicon of 120-139 bp (mini-primer sets). The data aligned with the results of mini-primer sets analysis in an amplicon length-dependent manner; the shorter the amplicon, the more evident the endogenous sequence became. Coding region analysis using multiplex amplified product-length polymorphisms revealed the incongruence of single nucleotide polymorphisms between the coding region and HVS 1 caused by contamination with exogenous mtDNA. Although the sequencing data obtained using long-primer sets turned out to be erroneous, it was unambiguous and reproducible. These findings suggest that PCR primers that produce amplicons shorter than those currently recognized should be used for mtDNA analysis in highly degraded samples. Haplogroup motif analysis of the coding region and HVS should also be performed to improve the reliability of forensic mtDNA data. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Enea, Vincenzo; Ellis, Joan; Zavala, Fidel; Arnot, David E.; Asavanich, Achara; Masuda, Aoi; Quakyi, Isabella; Nussenzweig, Ruth S.
1984-08-01
A clone of complementary DNA encoding the circumsporozoite (CS) protein of the human malaria parasite Plasmodium falciparum has been isolated by screening an Escherichia coli complementary DNA library with a monoclonal antibody to the CS protein. The DNA sequence of the complementary DNA insert encodes a four-amino acid sequence: proline-asparagine-alanine-asparagine, tandemly repeated 23 times. The CS β -lactamase fusion protein specifically binds monoclonal antibodies to the CS protein and inhibits the binding of these antibodies to native Plasmodium falciparum CS protein. These findings provide a basis for the development of a vaccine against Plasmodium falciparum malaria.
USDA-ARS?s Scientific Manuscript database
Single-nucleotide Polymorphism (SNP) markers are by far the most common form of DNA polymorphism in a genome. The objectives of this study were to discover SNPs in common bean comparing sequences from coding and non-coding regions obtained from Genbank and genomic DNA and to compare sequencing resu...
NASA Astrophysics Data System (ADS)
Freeman, Gordon Samuel
DNA is of central importance in biology as it is responsible for carrying, copying, and translating the genetic code into the building blocks that comprise life. In order to accomplish these tasks, the DNA molecule must be versatile and robust. Indeed, the underlying molecular interactions that allow DNA to execute these tasks are complex and their origins are only beginning to be understood. While experiments are able to elucidate many key biophysical phenomena, there remain many unanswered questions. Molecular simulation is able to shed light on phenomena at the molecular scale and provide information that is missing from experimental views of DNA behavior. In this dissertation I use state-of-the-art coarse-grained DNA models to address two key problems. In the first, metadynamics calculations are employed to uncover the free energy surface of two complimentary DNA strands. This free energy surface takes on the appearance of a hybridization funnel and reveals candidates for intermediate states in the hybridization of short DNA oligomers. Such short oligomers are important building blocks for DNA-driven self-assembly and the mechanism of hybridization in this regime is not well understood. The second problem is that of nucleosome formation. Nucleosomes are the fundamental subunit of genome compaction in the nucleus of a cell. As such, nucleosomes are a key epigenetic factor and affect gene expression and the ability of DNA-binding proteins to locate and bind to the appropriate position in the genome. However, the factors that drive nucleosome positioning are not well understood. While DNA sequence is known to affect nucleosome formation, the mechanism by which it does so has not been established and a number of hypotheses explaining this sequence-dependence exist in the literature. I demonstrate that DNA shape dominates this process with contributions arising from both intrinsic DNA curvature as well as DNA-protein interactions driven by sequence-dependent variations in minor groove dimensions.
DNA mimic proteins: functions, structures, and bioinformatic analysis.
Wang, Hao-Ching; Ho, Chun-Han; Hsu, Kai-Cheng; Yang, Jinn-Moon; Wang, Andrew H-J
2014-05-13
DNA mimic proteins have DNA-like negative surface charge distributions, and they function by occupying the DNA binding sites of DNA binding proteins to prevent these sites from being accessed by DNA. DNA mimic proteins control the activities of a variety of DNA binding proteins and are involved in a wide range of cellular mechanisms such as chromatin assembly, DNA repair, transcription regulation, and gene recombination. However, the sequences and structures of DNA mimic proteins are diverse, making them difficult to predict by bioinformatic search. To date, only a few DNA mimic proteins have been reported. These DNA mimics were not found by searching for functional motifs in their sequences but were revealed only by structural analysis of their charge distribution. This review highlights the biological roles and structures of 16 reported DNA mimic proteins. We also discuss approaches that might be used to discover new DNA mimic proteins.
A DEK Domain-Containing Protein Modulates Chromatin Structure and Function in Arabidopsis[W][OPEN
Waidmann, Sascha; Kusenda, Branislav; Mayerhofer, Juliane; Mechtler, Karl; Jonak, Claudia
2014-01-01
Chromatin is a major determinant in the regulation of virtually all DNA-dependent processes. Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. The evolutionarily conserved DEK domain-containing protein is implicated in important chromatin-related processes in animals, but little is known about its DNA targets and protein interaction partners. In plants, the role of DEK has remained elusive. In this work, we identified DEK3 as a chromatin-associated protein in Arabidopsis thaliana. DEK3 specifically binds histones H3 and H4. Purification of other proteins associated with nuclear DEK3 also established DNA topoisomerase 1α and proteins of the cohesion complex as in vivo interaction partners. Genome-wide mapping of DEK3 binding sites by chromatin immunoprecipitation followed by deep sequencing revealed enrichment of DEK3 at protein-coding genes throughout the genome. Using DEK3 knockout and overexpressor lines, we show that DEK3 affects nucleosome occupancy and chromatin accessibility and modulates the expression of DEK3 target genes. Furthermore, functional levels of DEK3 are crucial for stress tolerance. Overall, data indicate that DEK3 contributes to modulation of Arabidopsis chromatin structure and function. PMID:25387881
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)
Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan
2014-01-01
Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Reading of the non-template DNA by transcription elongation factors.
Svetlov, Vladimir; Nudler, Evgeny
2018-05-14
Unlike transcription initiation and termination, which have easily discernable signals such as promoters and terminators, elongation is regulated through a dynamic network involving RNA/DNA pause signals and states- rather than sequence-specific protein interactions. A report by Nedialkov et al. (in press) provides experimental evidence for sequence-specific recruitment of elongation factor RfaH to transcribing RNA polymerase (RNAP) and outlines the mechanism of gene expression regulation by restraint ("locking") of the DNA non-template strand. According to this model, the elongation complex pauses at the so called "operon polarity sequence" (found in some long bacterial operons coding for virulence genes), when the usually flexible non-template DNA strand adopts a distinct hairpin-loop conformation on the surface of transcribing RNAP. Sequence-specific binding of RfaH to this DNA segment facilitates conversion of RfaH from its inactive closed to its active open conformation. The interaction network formed between RfaH, non-template DNA, and RNAP locks DNA in a conformation that renders the elongation complex resistant to pausing and termination. The effects of such locking on transcript elongation can be mimicked by restraint of the non-template strand due to its shortening. This work advances our understanding of regulation of transcript elongation and has important implications for the action of general transcription factors, such as NusG, which lack apparent sequence-specificity, as well as for the mechanisms of other processes linked to transcription such as transcription-coupled DNA repair. This article is protected by copyright. All rights reserved. © 2018 John Wiley & Sons Ltd.
Multiple tag labeling method for DNA sequencing
Mathies, Richard A.; Huang, Xiaohua C.; Quesada, Mark A.
1995-01-01
A DNA sequencing method described which uses single lane or channel electrophoresis. Sequencing fragments are separated in said lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radio-isotope labels.
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Tan, Li Huey; Xing, Hang; Lu, Yi
2014-06-17
CONSPECTUS: Several properties of nanomaterials, such as morphologies (e.g., shapes and surface structures) and distance dependent properties (e.g., plasmonic and quantum confinement effects), make nanomaterials uniquely qualified as potential choices for future applications from catalysis to biomedicine. To realize the full potential of these nanomaterials, it is important to demonstrate fine control of the morphology of individual nanoparticles, as well as precise spatial control of the position, orientation, and distances between multiple nanoparticles. In addition, dynamic control of nanomaterial assembly in response to multiple stimuli, with minimal or no error, and the reversibility of the assemblies are also required. In this Account, we summarize recent progress of using DNA as a powerful programmable tool to realize the above goals. First, inspired by the discovery of genetic codes in biology, we have discovered DNA sequence combinations to control different morphologies of nanoparticles during their growth process and have shown that these effects are synergistic or competitive, depending on the sequence combination. The DNA, which guides the growth of the nanomaterial, is stable and retains its biorecognition ability. Second, by taking advantage of different reactivities of phosphorothioate and phosphodiester backbone, we have placed phosphorothioate at selective positions on different DNA nanostructures including DNA tetrahedrons. Bifunctional linkers have been used to conjugate phosphorothioate on one end and bind nanoparticles or proteins on the other end. In doing so, precise control of distances between two or more nanoparticles or proteins with nanometer resolution can be achieved. Furthermore, by developing facile methods to functionalize two hemispheres of Janus nanoparticles with two different DNA sequences regioselectively, we have demonstrated directional control of nanomaterial assembly, where DNA strands with specific hybridization serve as orthogonal linkers. Third, by using functional DNA that includes DNAzyme, aptamer, and aptazyme, dynamic control of assemblies of gold nanoparticles, quantum dots, carbon nanotubes, and iron oxide nanoparticles in response to one or more stimuli cooperatively have been achieved, resulting in colorimetric, fluorescent, electrochemical, and magnetic resonance signals for a wide range of targets, such as metal ions, small molecules, proteins, and intact cells. Fourth, by mimicking biology, we have employed DNAzymes as proofreading units to remove errors in nanoparticle assembly and further used DNAzyme cascade reactions to modify or repair DNA sequences involved in the assembly. Finally, by taking advantage of different affinities of biotin and desthiobiotin toward streptavidin, we have demonstrated reversible assembly of proteins on DNA origami.
Pierrel, Jérôme
2012-01-01
The importance of viruses as model organisms is well-established in molecular biology and Max Delbrück's phage group set standards in the DNA phage field. In this paper, I argue that RNA phages, discovered in the 1960s, were also instrumental in the making of molecular biology. As part of experimental systems, RNA phages stood for messenger RNA (mRNA), genes and genome. RNA was thought to mediate information transfers between DNA and proteins. Furthermore, RNA was more manageable at the bench than DNA due to the availability of specific RNases, enzymes used as chemical tools to analyse RNA. Finally, RNA phages provided scientists with a pure source of mRNA to investigate the genetic code, genes and even a genome sequence. This paper focuses on Walter Fiers' laboratory at Ghent University (Belgium) and their work on the RNA phage MS2. When setting up his Laboratory of Molecular Biology, Fiers planned a comprehensive study of the virus with a strong emphasis on the issue of structure. In his lab, RNA sequencing, now a little-known technique, evolved gradually from a means to solve the genetic code, to a tool for completing the first genome sequence. Thus, I follow the research pathway of Fiers and his 'RNA phage lab' with their evolving experimental system from 1960 to the late 1970s. This study illuminates two decisive shifts in post-war biology: the emergence of molecular biology as a discipline in the 1960s in Europe and of genomics in the 1990s.
Protein Crystal Eco R1 Endonulease-DNA Complex
NASA Technical Reports Server (NTRS)
1998-01-01
Type II restriction enzymes, such as Eco R1 endonulease, present a unique advantage for the study of sequence-specific recognition because they leave a record of where they have been in the form of the cleaved ends of the DNA sites where they were bound. The differential behavior of a sequence -specific protein at sites of differing base sequence is the essence of the sequence-specificity; the core question is how do these proteins discriminate between different DNA sequences especially when the two sequences are very similar. Principal Investigator: Dan Carter/New Century Pharmaceuticals
2017-01-01
Abstract Target search as performed by DNA-binding proteins is a complex process, in which multiple factors contribute to both thermodynamic discrimination of the target sequence from overwhelmingly abundant off-target sites and kinetic acceleration of dynamic sequence interrogation. TRF1, the protein that binds to telomeric tandem repeats, faces an intriguing variant of the search problem where target sites are clustered within short fragments of chromosomal DNA. In this study, we use extensive (>0.5 ms in total) MD simulations to study the dynamical aspects of sequence-specific binding of TRF1 at both telomeric and non-cognate DNA. For the first time, we describe the spontaneous formation of a sequence-specific native protein–DNA complex in atomistic detail, and study the mechanism by which proteins avoid off-target binding while retaining high affinity for target sites. Our calculated free energy landscapes reproduce the thermodynamics of sequence-specific binding, while statistical approaches allow for a comprehensive description of intermediate stages of complex formation. PMID:28633355
Sequences in the intergenic spacer influence RNA Pol I transcription from the human rRNA promoter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, W.M.; Sylvester, J.E.
1994-09-01
In most eucaryotic species, ribosomal genes are tandemly repeated about 100-5000 times per haploid genome. The 43 Kb human rDNA repeat consists of a 13 Kb coding region for the 18S, 5.8S, 28S ribosomal RNAs (rRNAs) and transcribed spacers separated by a 30 Kb intergenic spacer. For species such as frog, mouse and rat, sequences in the intergenic spacer other than the gene promoter have been shown to modulate transcription of the ribosomal gene. These sequences are spacer promoters, enhancers and the terminator for spacer transcription. We are addressing whether the human ribosomal gene promoter is similarly influenced. In-vitro transcriptionmore » run-off assays have revealed that the 4.5 kb region (CBE), directly upstream of the gene promoter, has cis-stimulation and trans-competition properties. This suggests that the CBE fragment contains an enhancer(s) for ribosomal gene transcription. Further experiments have shown that a fragment ({approximately}1.6 kb) within the CBE fragment also has trans-competition function. Deletion subclones of this region are being tested to delineate the exact sequences responsible for these modulating activities. Previous sequence analysis and functional studies have revealed that CBE contains regions of DNA capable of adopting alternative structures such as bent DNA, Z-DNA, and triple-stranded DNA. Whether these structures are required for modulating transcription remains to be determined as does the specific DNA-protein interaction involved.« less
Arthur, A K; Höss, A; Fanning, E
1988-01-01
The genomic coding sequence of the large T antigen of simian virus 40 (SV40) was cloned into an Escherichia coli expression vector by joining new restriction sites, BglII and BamHI, introduced at the intron boundaries of the gene. Full-length large T antigen, as well as deletion and amino acid substitution mutants, were inducibly expressed from the lac promoter of pUC9, albeit with different efficiencies and protein stabilities. Specific interaction with SV40 origin DNA was detected for full-length T antigen and certain mutants. Deletion mutants lacking T-antigen residues 1 to 130 and 260 to 708 retained specific origin-binding activity, demonstrating that the region between residues 131 and 259 must carry the essential binding domain for DNA-binding sites I and II. A sequence between residues 302 and 320 homologous to a metal-binding "finger" motif is therefore not required for origin-specific binding. However, substitution of serine for either of two cysteine residues in this motif caused a dramatic decrease in origin DNA-binding activity. This region, as well as other regions of the full-length protein, may thus be involved in stabilizing the DNA-binding domain and altering its preference for binding to site I or site II DNA. Images PMID:2835505
Template-Directed Copolymerization, Random Walks along Disordered Tracks, and Fractals
NASA Astrophysics Data System (ADS)
Gaspard, Pierre
2016-12-01
In biology, template-directed copolymerization is the fundamental mechanism responsible for the synthesis of DNA, RNA, and proteins. More than 50 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of information processing in DNA replication, transcription, and translation remain poorly understood. Challenging issues are the facts that DNA or RNA sequences constitute disordered media for the motion of polymerases or ribosomes while errors occur in copying the template. Here, it is shown that these issues can be addressed and sequence heterogeneity effects can be quantitatively understood within a framework revealing universal aspects of information processing at the molecular scale. In steady growth regimes, the local velocities of polymerases or ribosomes along the template are distributed as the continuous or fractal invariant set of a so-called iterated function system, which determines the copying error probabilities. The growth may become sublinear in time with a scaling exponent that can also be deduced from the iterated function system.
Iwanowicz, L; Densmore, C; Hahn, C; McAllister, P; Odenkirk, J
2013-09-01
The Northern Snakehead Channa argus is an introduced species that now inhabits the Chesapeake Bay. During a preliminary survey for introduced pathogens possibly harbored by these fish in Virginia waters, a filterable agent was isolated from five specimens that produced cytopathic effects in BF-2 cells. Based on PCR amplification and partial sequencing of the major capsid protein (MCP), DNA polymerase (DNApol), and DNA methyltransferase (Mtase) genes, the isolates were identified as Largemouth Bass virus (LMBV). Nucleotide sequences of the MCP (492 bp) and DNApol (419 pb) genes were 100% identical to those of LMBV. The nucleotide sequence of the Mtase (206 bp) gene was 99.5% identical to that of LMBV, and the single nucleotide substitution did not lead to a predicted amino acid coding change. This is the first report of LMBV from the Northern Snakehead, and provides evidence that noncentrarchid fishes may be susceptible to this virus.
Iwanowicz, Luke R.; Densmore, Christine L.; Hahn, Cassidy M.; McAllister, Phillip; Odenkirk, John
2013-01-01
The Northern Snakehead Channa argus is an introduced species that now inhabits the Chesapeake Bay. During a preliminary survey for introduced pathogens possibly harbored by these fish in Virginia waters, a filterable agent was isolated from five specimens that produced cytopathic effects in BF-2 cells. Based on PCR amplification and partial sequencing of the major capsid protein (MCP), DNA polymerase (DNApol), and DNA methyltransferase (Mtase) genes, the isolates were identified as Largemouth Bass virus (LMBV). Nucleotide sequences of the MCP (492 bp) and DNApol (419 pb) genes were 100% identical to those of LMBV. The nucleotide sequence of the Mtase (206 bp) gene was 99.5% identical to that of LMBV, and the single nucleotide substitution did not lead to a predicted amino acid coding change. This is the first report of LMBV from the Northern Snakehead, and provides evidence that noncentrarchid fishes may be susceptible to this virus.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helfenbein, Kevin G.; Fourcade, H. Matthew; Vanjani, Rohit G.
2004-05-01
We report the first complete mitochondrial (mt) DNA sequence from a member of the phylum Chaetognatha (arrow worms). The Paraspadella gotoi mtDNA is highly unusual, missing 23 of the genes commonly found in animal mtDNAs, including atp6, which has otherwise been found universally to be present. Its 14 genes are unusually arranged into two groups, one on each strand. One group is punctuated by numerous non-coding intergenic nucleotides, while the other group is tightly packed, having no non-coding nucleotides, leading to speculation that there are two transcription units with differing modes of expression. The phylogenetic position of the Chaetognatha withinmore » the Metazoa has long been uncertain, with conflicting or equivocal results from various morphological analyses and rRNA sequence comparisons. Comparisons here of amino acid sequences from mitochondrially encoded proteins gives a single most parsimonious tree that supports a position of Chaetognatha as sister to the protostomes studied here. From this, one can more clearly interpret the patterns of evolution of various developmental features, especially regarding the embryological fate of the blastopore.« less
Orpinomyces cellulase celf protein and coding sequences
Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.
2000-09-05
A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.
Maeda, Yasuhiro; Yamaguchi, Terufumi; Ueda, Satomi; Matsuo, Koki; Morita, Yasuyoshi; Naiki, Yoshito; Miyazato, Hajime; Shimada, Takahiro; Miyatake, Jun-Ichi; Matsuda, Mitsuhiro; Kanamaru, Akihisa
2003-07-01
In this study, we observed the expression of the GSTT-1 gene in patients with myelodysplastic syndrome (MDS) at the messenger RNA level. Reverse transcription-polymerase chain reaction (RT-PCR) for GSTT-1 was performed with a pair of primers complementary to the 5' coding section and the 3' coding section of the GSTT-1 cDNA for amplifying the 623-bp band. Among 20 patients with MDS, 8 patients showed the expected 623-bp band on RT-PCR, and 12 patients showed a 500-bp band on RT-PCR, indicating that a 123-bp sequence was deleted as a mutant of the GSTT-1 gene. Furthermore, a BLAST DNA search showed that the deletion of a 123 bp sequence creates a sequence that is 63% homologous to human FKBP-rapamycin associated protein (FRAP); this protein has been termed a mammalian target of rapamycin (mTOR). We respectively transfected the wild type and the mutant type GSTT-1 gene in an expression vector to two cell lines (K562 and HL-60). The stable transformants for the wild type and the mutant type GSTT-1 genes were made by G418 selection. Interestingly, rapamycin could induce significant growth inhibition of the stable transformants for mutant type GSTT-1, which was indicative of apoptosis, but not that of those for wild type GSTT-1. These results suggest that rapamycin could be included in the therapeutic modality for the patients with MDS who have the mTOR sequences in GSTT-1 gene.
Genes Involved in Anaerobic Metabolism of Phenol in the Bacterium Thauera aromatica
Breinig, Sabine; Schiltz, Emile; Fuchs, Georg
2000-01-01
Genes involved in the anaerobic metabolism of phenol in the denitrifying bacterium Thauera aromatica have been studied. The first two committed steps in this metabolism appear to be phosphorylation of phenol to phenylphosphate by an unknown phosphoryl donor (“phenylphosphate synthase”) and subsequent carboxylation of phenylphosphate to 4-hydroxybenzoate under release of phosphate (“phenylphosphate carboxylase”). Both enzyme activities are strictly phenol induced. Two-dimensional gel electrophoresis allowed identification of several phenol-induced proteins. Based on N-terminal and internal amino acid sequences of such proteins, degenerate oligonucleotides were designed to identify the corresponding genes. A chromosomal DNA segment of about 14 kbp was sequenced which contained 10 genes transcribed in the same direction. These are organized in two adjacent gene clusters and include the genes coding for five identified phenol-induced proteins. Comparison with sequences in the databases revealed the following similarities: the gene products of two open reading frames (ORFs) are each similar to either the central part and N-terminal part of phosphoenolpyruvate synthases. We propose that these ORFs are components of the phenylphosphate synthase system. Three ORFs showed similarity to the ubiD gene product, 3-octaprenyl-4-hydroxybenzoate carboxy lyase; UbiD catalyzes the decarboxylation of a 4-hydroxybenzoate analogue in ubiquinone biosynthesis. Another ORF was similar to the ubiX gene product, an isoenzyme of UbiD. We propose that (some of) these four proteins are involved in the carboxylation of phenylphosphate. A 700-bp PCR product derived from one of these ORFs cross-hybridized with DNA from different Thauera and Azoarcus strains, even from those which have not been reported to grow with phenol. One ORF showed similarity to the mutT gene product, and three ORFs showed no strong similarities to sequences in the databases. Upstream of the first gene cluster, an ORF which is transcribed in the opposite direction codes for a protein highly similar to the DmpR regulatory protein of Pseudomonas putida. DmpR controls transcription of the genes of aerobic phenol metabolism, suggesting a similar regulation of anaerobic phenol metabolism by the putative regulator. PMID:11004186
Yao, Q; Fischer, K P; Tyrrell, D L; Gutfreund, K S
2015-04-01
Programmed death ligand-1 (PD-L1) plays an important role in the attenuation of adaptive immune responses in higher vertebrates. Here, we describe the identification of the Pekin duck PD-L1 orthologue (duPD-L1) and its gene structure. The duPD-L1 cDNA encodes a 311-amino acid protein that has an amino acid identity of 78% and 42% with chicken and human PD-L1, respectively. Mapping of the duPD-L1 cDNA with duck genomic sequences revealed an exonic structure of its coding sequence similar to those of other vertebrates but lacked a noncoding exon 1. Homology modelling of the duPD-L1 extracellular domain was compatible with the tandem IgV-like and IgC-like IgSF domain structure of human PD-L1 (PDB ID: 3BIS). Residues known to be important for receptor binding of human PD-L1 were mostly conserved in duPD-L1 within the N-terminus and the G sheet, and partially conserved within the F sheet but not within sheets C and C'. DuPD-L1 mRNA was constitutively expressed in all tissues examined with highest expression levels in lung and spleen and very low levels of expression in muscle, kidney and brain. Mitogen stimulation of duck peripheral blood mononuclear cells transiently increased duPD-L1 mRNA expression. Our observations demonstrate evolutionary conservation of the exonic structure of its coding sequence, the extracellular domain structure and residues implicated in receptor binding, but the role of the longer cytoplasmic tail in avian PD-L1 proteins remains to be determined. © 2014 John Wiley & Sons Ltd.
Ramalho-Ortigão, J M; Temporal, P; de Oliveira , S M; Barbosa, A F; Vilela, M L; Rangel, E F; Brazil, R P; Traub-Cseko, Y M
2001-01-01
Molecular studies of insect disease vectors are of paramount importance for understanding parasite-vector relationship. Advances in this area have led to important findings regarding changes in vectors' physiology upon blood feeding and parasite infection. Mechanisms for interfering with the vectorial capacity of insects responsible for the transmission of diseases such as malaria, Chagas disease and dengue fever are being devised with the ultimate goal of developing transgenic insects. A primary necessity for this goal is information on gene expression and control in the target insect. Our group is investigating molecular aspects of the interaction between Leishmania parasites and Lutzomyia sand flies. As an initial step in our studies we have used random sequencing of cDNA clones from two expression libraries made from head/thorax and abdomen of sugar fed L. longipalpis for the identification of expressed sequence tags (EST). We applied differential display reverse transcriptase-PCR and randomly amplified polymorphic DNA-PCR to characterize differentially expressed mRNA from sugar and blood fed insects, and, in one case, from a L. (V.) braziliensis-infected L. longipalpis. We identified 37 cDNAs that have shown homology to known sequences from GeneBank. Of these, 32 cDNAs code for constitutive proteins such as zinc finger protein, glutamine synthetase, G binding protein, ubiquitin conjugating enzyme. Three are putative differentially expressed cDNAs from blood fed and Leishmania-infected midgut, a chitinase, a V-ATPase and a MAP kinase. Finally, two sequences are homologous to Drosophila melanogaster gene products recently discovered through the Drosophila genome initiative.
Complete mitogenome sequencing and phylogenetic analysis of PaLi yak (Bos grunniens).
Bao, Pengjia; Guo, Xian; Pei, Jie; Liang, Chunnian; Ding, Xuezhi; Min, Chu; Wang, Hongbo; Wu, Xiaoyun; Yan, Ping
2016-11-01
PaLi yak is a very important local breed in China; as a year-round grazing animal, it plays a very important role for the economic and native herdsmen. The PaLi yak complete mitochondrial DNA is sequenced in this study, the total length is 16,324 bp, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a non-coding control region (D-loop region). The order and composition are similar to most of the other vertebrates. The base contents are: 33.72% A, 25.80% C, 13.21% G and 27.27% T; A + T (60.99%) was higher than G + C (39.01%). The phylogenetic relationships were analyzed using the complete mitogenome sequence, results showed that the genetic relationship between yak and cattle is distinct. These information provides useful data for further study on protection of genetic resources and the taxonomy of Bovinae.
Shen, Kang-Ning; Yen, Ta-Chi; Chen, Ching-Hung; Ye, Jeng-Jia; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of the cryptic "lineage B" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by next-generation sequencing method. The assembled mitogenome consisting of 16,694 bp, includes 13 protein coding genes, 25 transfer RNAs, 2 ribosomal RNAs genes. The overall base composition of "lineage B" S. lessoniana is 36.7% for A, 18.9 % for C, 34.5 % for T and 9.8 % for G and show 90% identities to "lineage C" S. lessoniana. It is also exhibits high T + A content (71.2%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage B" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Hsiao, Chung-Der; Shen, Kang-Ning; Ching, Tzu-Yun; Wang, Ya-Hsien; Ye, Jeng-Jia; Tsai, Shiou-Yi; Wu, Shan-Chun; Chen, Ching-Hung; Wang, Chia-Hui
2016-07-01
In this study, the complete mitogenome sequence of the cryptic "lineage A" big-fin reef squid, Sepioteuthis lessoniana (Cephalopoda: Loliginidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome consists of 16,605 bp, which includes 13 protein-coding genes, 22 transfer RNAs, and 2 ribosomal RNAs genes. The overall base composition of "lineage A" S. lessoniana is 37.5% for A, 17.4% for C, 9.1% for G, and 35.9% for T and shows 87% identities to "lineage C" S. lessoniana. It is also noticed by its high T + A content (73.4%), two non-coding regions with TA tandem repeats. The complete mitogenome of the cryptic "lineage A" S. lessoniana provides essential and important DNA molecular data for further phylogeography and evolutionary analysis for big-fin reef squid species complex.
Novel variants of the 5S rRNA genes in Eruca sativa.
Singh, K; Bhatia, S; Lakshmikumaran, M
1994-02-01
The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Frouco, Gonçalo; Freitas, Ferdinando B; Coelho, João; Leitão, Alexandre; Martins, Carlos; Ferreira, Fernando
2017-06-15
African swine fever virus (ASFV) codes for a putative histone-like protein (pA104R) with extensive sequence homology to bacterial proteins that are implicated in genome replication and packaging. Functional characterization of purified recombinant pA104R revealed that it binds to single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) over a wide range of temperatures, pH values, and salt concentrations and in an ATP-independent manner, with an estimated binding site size of about 14 to 16 nucleotides. Using site-directed mutagenesis, the arginine located in pA104R's DNA-binding domain, at position 69, was found to be relevant for efficient DNA-binding activity. Together, pA104R and ASFV topoisomerase II (pP1192R) display DNA-supercoiling activity, although none of the proteins by themselves do, indicating that the two cooperate in this process. In ASFV-infected cells, A104R transcripts were detected from 2 h postinfection (hpi) onward, reaching a maximum concentration around 16 hpi. pA104R was detected from 12 hpi onward, localizing with viral DNA replication sites and being found exclusively in the Triton-insoluble fraction. Small interfering RNA (siRNA) knockdown experiments revealed that pA104R plays a critical role in viral DNA replication and gene expression, with transfected cells showing lower viral progeny numbers (up to a reduction of 82.0%), lower copy numbers of viral genomes (-78.3%), and reduced transcription of a late viral gene (-47.6%). Taken together, our results strongly suggest that pA104R participates in the modulation of viral DNA topology, probably being involved in viral DNA replication, transcription, and packaging, emphasizing that ASFV mutants lacking the A104R gene could be used as a strategy to develop a vaccine against ASFV. IMPORTANCE Recently reintroduced in Europe, African swine fever virus (ASFV) causes a fatal disease in domestic pigs, causing high economic losses in affected countries, as no vaccine or treatment is currently available. Remarkably, ASFV is the only known mammalian virus that putatively codes for a histone-like protein (pA104R) that shares extensive sequence homology with bacterial histone-like proteins. In this study, we characterized the DNA-binding properties of pA104R, analyzed the functional importance of two conserved residues, and showed that pA104R and ASFV topoisomerase II cooperate and display DNA-supercoiling activity. Moreover, pA104R is expressed during the late phase of infection and accumulates in viral DNA replication sites, and its downregulation revealed that pA104R is required for viral DNA replication and transcription. These results suggest that pA104R participates in the modulation of viral DNA topology and genome packaging, indicating that A104R deletion mutants may be a good strategy for vaccine development against ASFV. Copyright © 2017 American Society for Microbiology.
Zhang, Yu; Yao, Youlin; Jiang, Siyuan; Lu, Yilu; Liu, Yunqiang; Tao, Dachang; Zhang, Sizhong; Ma, Yongxin
2015-04-01
To identify protein-protein interaction partners of PER1 (period circadian protein homolog 1), key component of the molecular oscillation system of the circadian rhythm in tumors using bacterial two-hybrid system technique. Human cervical carcinoma cell Hela library was adopted. Recombinant bait plasmid pBT-PER1 and pTRG cDNA plasmid library were cotransformed into the two-hybrid system reporter strain cultured in a special selective medium. Target clones were screened. After isolating the positive clones, the target clones were sequenced and analyzed. Fourteen protein coding genes were identified, 4 of which were found to contain whole coding regions of genes, which included optic atrophy 3 protein (OPA3) associated with mitochondrial dynamics and homo sapiens cutA divalent cation tolerance homolog of E. coli (CUTA) associated with copper metabolism. There were also cellular events related proteins and proteins which are involved in biochemical reaction and signal transduction-related proteins. Identification of potential interacting proteins with PER1 in tumors may provide us new insights into the functions of the circadian clock protein PER1 during tumorigenesis.
Walker, M D; Park, C W; Rosen, A; Aronheim, A
1990-01-01
Cell specific expression of the insulin gene is achieved through transcriptional mechanisms operating on multiple DNA sequence elements located in the 5' flanking region of the gene. Of particular importance in the rat insulin I gene are two closely similar 9 bp sequences (IEB1 and IEB2): mutation of either of these leads to 5-10 fold reduction in transcriptional activity. We have screened an expression cDNA library derived from mouse pancreatic endocrine beta cells with a radioactive DNA probe containing multiple copies of the IEB1 sequence. A cDNA clone (A1) isolated by this procedure encodes a protein which shows efficient binding to the IEB1 probe, but much weaker binding to either an unrelated DNA probe or to a probe bearing a single base pair insertion within the recognition sequence. DNA sequence analysis indicates a protein belonging to the helix-loop-helix family of DNA-binding proteins. The ability of the protein encoded by clone A1 to recognize a number of wild type and mutant DNA sequences correlates closely with the ability of each sequence element to support transcription in vivo in the context of the insulin 5' flanking DNA. We conclude that the isolated cDNA may encode a transcription factor that participates in control of insulin gene expression. Images PMID:2181401
Gubser, Caroline; Smith, Geoffrey L
2002-04-01
Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.
We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less
Is “Junk” DNA Mostly Intron DNA?
Wong, Gane Ka-Shu; Passey, Douglas A.; Huang, Ying-zong; Yang, Zhiyong; Yu, Jun
2000-01-01
Among higher eukaryotes, very little of the genome codes for protein. What is in the rest of the genome, or the “junk” DNA, that, in Homo sapiens, is estimated to be almost 97% of the genome? Is it possible that much of this “junk” is intron DNA? This is not a question that can be answered just by looking at the published data, even from the finished genomes. One cannot assume that there are no genes in a sequenced region, just because no genes were annotated. We introduce another approach to this problem, based on an analysis of the cDNA-to-genomic alignments, in all of the complete or nearly-complete genomes from the multicellular organisms. Our conclusion is that, in animals but not in plants, most of the “junk” is intron DNA. PMID:11076852
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Hennebert, Elise; Maldonado, Barbara; Ladurner, Peter; Flammang, Patrick; Santos, Romana
2015-01-01
Adhesive secretions occur in both aquatic and terrestrial animals, in which they perform diverse functions. Biological adhesives can therefore be remarkably complex and involve a large range of components with different functions and interactions. However, being mainly protein based, biological adhesives can be characterized by classical molecular methods. This review compiles experimental strategies that were successfully used to identify, characterize and obtain the full-length sequence of adhesive proteins from nine biological models: echinoderms, barnacles, tubeworms, mussels, sticklebacks, slugs, velvet worms, spiders and ticks. A brief description and practical examples are given for a variety of tools used to study adhesive molecules at different levels from genes to secreted proteins. In most studies, proteins, extracted from secreted materials or from adhesive organs, are analysed for the presence of post-translational modifications and submitted to peptide sequencing. The peptide sequences are then used directly for a BLAST search in genomic or transcriptomic databases, or to design degenerate primers to perform RT-PCR, both allowing the recovery of the sequence of the cDNA coding for the investigated protein. These sequences can then be used for functional validation and recombinant production. In recent years, the dual proteomic and transcriptomic approach has emerged as the best way leading to the identification of novel adhesive proteins and retrieval of their complete sequences. PMID:25657842
Foldability of a Natural De Novo Evolved Protein.
Bungard, Dixie; Copple, Jacob S; Yan, Jing; Chhun, Jimmy J; Kumirov, Vlad K; Foy, Scott G; Masel, Joanna; Wysocki, Vicki H; Cordes, Matthew H J
2017-11-07
The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo. Copyright © 2017 Elsevier Ltd. All rights reserved.
Genome-wide uniformity of human ‘open’ pre-initiation complexes
Lai, William K.M.; Pugh, B. Franklin
2017-01-01
Transcription of protein-coding and noncoding DNA occurs pervasively throughout the mammalian genome. Their sites of initiation are generally inferred from transcript 5′ ends and are thought to be either locally dispersed or focused. How these two modes of initiation relate is unclear. Here, we apply permanganate treatment and chromatin immunoprecipitation (PIP-seq) of initiation factors to identify the precise location of melted DNA separately associated with the preinitiation complex (PIC) and the adjacent paused complex (PC). This approach revealed the two known modes of transcription initiation. However, in contrast to prevailing views, they co-occurred within the same promoter region: initiation originating from a focused PIC, and broad nucleosome-linked initiation. PIP-seq allowed transcriptional orientation of Pol II to be determined, which may be useful near promoters where sufficient sense/anti-sense transcript mapping information is lacking. PIP-seq detected divergently oriented Pol II at both coding and noncoding promoters, as well as at enhancers. Their occupancy levels were not necessarily coupled in the two orientations. DNA sequence and shape analysis of initiation complex sites suggest that both sequence and shape contribute to specificity, but in a context-restricted manner. That is, initiation sites have the locally “best” initiator (INR) sequence and/or shape. These findings reveal a common core to pervasive Pol II initiation throughout the human genome. PMID:27927716
Structure and chromosomal localization of the human PD-1 gene (PDCD1)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shinohara, T.; Ishida, Y.; Kawaichi, M.
1994-10-01
A cDNA encoding mouse PD-1, a member of the immunoglobulin superfamily, was previously isolated from apoptosis-induced cells by subtractive hybridization. To determine the structure and chromosomal location of the human PD-1 gene, we screened a human T cell cDNA library by mouse PD-1 probe and isolated a cDNA coding for the human PD-1 protein. The deduced amino acid sequence of human PD-1 was 60% identical to the mouse counterpart, and a putative tyrosine kinase-association motif was well conserved. The human PD-1 gene was mapped to 2q37.3 by chromosomal in situ hybridization. 7 refs., 3 figs.
Living Organisms Author Their Read-Write Genomes in Evolution.
Shapiro, James A
2017-12-06
Evolutionary variations generating phenotypic adaptations and novel taxa resulted from complex cellular activities altering genome content and expression: (i) Symbiogenetic cell mergers producing the mitochondrion-bearing ancestor of eukaryotes and chloroplast-bearing ancestors of photosynthetic eukaryotes; (ii) interspecific hybridizations and genome doublings generating new species and adaptive radiations of higher plants and animals; and, (iii) interspecific horizontal DNA transfer encoding virtually all of the cellular functions between organisms and their viruses in all domains of life. Consequently, assuming that evolutionary processes occur in isolated genomes of individual species has become an unrealistic abstraction. Adaptive variations also involved natural genetic engineering of mobile DNA elements to rewire regulatory networks. In the most highly evolved organisms, biological complexity scales with "non-coding" DNA content more closely than with protein-coding capacity. Coincidentally, we have learned how so-called "non-coding" RNAs that are rich in repetitive mobile DNA sequences are key regulators of complex phenotypes. Both biotic and abiotic ecological challenges serve as triggers for episodes of elevated genome change. The intersections of cell activities, biosphere interactions, horizontal DNA transfers, and non-random Read-Write genome modifications by natural genetic engineering provide a rich molecular and biological foundation for understanding how ecological disruptions can stimulate productive, often abrupt, evolutionary transformations.
Yuliana, Tri; Nakajima, Nobuyoshi; Yamamura, Shigeki; Tomita, Masaru; Suzuki, Haruo; Amachi, Seigo
2017-01-01
Roseovarius sp. A-2 is a heterotrophic iodide (I - )-oxidizing bacterium isolated from iodide-rich natural gas brine water in Chiba, Japan. This strain oxidizes iodide to molecular iodine (I 2 ) by means of an extracellular multicopper oxidase. Here we report the draft genome sequence of strain A-2. The draft genome contained 46 tRNA genes, 1 copy of a 16S-23S-5S rRNA operon, and 4,514 protein coding DNA sequences, of which 1,207 (27%) were hypothetical proteins. The genome contained a gene encoding IoxA, a multicopper oxidase previously found to catalyze the oxidation of iodide in Iodidimonas sp. Q-1. This draft genome provides detailed insights into the metabolism and potential application of Roseovarius sp. A-2.
Multiple tag labeling method for DNA sequencing
Mathies, R.A.; Huang, X.C.; Quesada, M.A.
1995-07-25
A DNA sequencing method is described which uses single lane or channel electrophoresis. Sequencing fragments are separated in the lane and detected using a laser-excited, confocal fluorescence scanner. Each set of DNA sequencing fragments is separated in the same lane and then distinguished using a binary coding scheme employing only two different fluorescent labels. Also described is a method of using radioisotope labels. 5 figs.
NASA Astrophysics Data System (ADS)
Liu, Jiao; Li, Xianchao; Tang, Xuexi; Zhou, Bin
2016-03-01
Members of the DnaJ family are proteins that play a pivotal role in various cellular processes, such as protein folding, protein transport and cellular responses to stress. In the present study, we identified and characterized the full-length DnaJ cDNA sequence from expressed sequence tags of Pyropia yezoensis ( PyDnaJ) via rapid identification of cDNA ends. This cDNA encoded a protein of 429 amino acids, which shared high sequence similarity with other identified DnaJ proteins, such as a heat shock protein 40/DnaJ from Pyropia haitanensis. The relative mRNA expression level of PyDnaJ was investigated using real-time PCR to determine its specific expression during the algal life cycle and during desiccation. The relative mRNA expression level in sporophytes was higher than that in gametophytes and significantly increased during the whole desiccation process. These results indicate that PyDnaJ is an authentic member of the DnaJ family in plants and red algae and might play a pivotal role in mitigating damage to P. yezoensis during desiccation.
Horizontal gene transfer of chromosomal Type II toxin-antitoxin systems of Escherichia coli.
Ramisetty, Bhaskar Chandra Mohan; Santhosh, Ramachandran Sarojini
2016-02-01
Type II toxin-antitoxin systems (TAs) are small autoregulated bicistronic operons that encode a toxin protein with the potential to inhibit metabolic processes and an antitoxin protein to neutralize the toxin. Most of the bacterial genomes encode multiple TAs. However, the diversity and accumulation of TAs on bacterial genomes and its physiological implications are highly debated. Here we provide evidence that Escherichia coli chromosomal TAs (encoding RNase toxins) are 'acquired' DNA likely originated from heterologous DNA and are the smallest known autoregulated operons with the potential for horizontal propagation. Sequence analyses revealed that integration of TAs into the bacterial genome is unique and contributes to variations in the coding and/or regulatory regions of flanking host genome sequences. Plasmids and genomes encoding identical TAs of natural isolates are mutually exclusive. Chromosomal TAs might play significant roles in the evolution and ecology of bacteria by contributing to host genome variation and by moderation of plasmid maintenance. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Hamed, Mazen Y.
2018-05-01
Molecular dynamics and MM_GBSA energy calculations on various zinc finger proteins containing three and four fingers bound to their target DNA gave insights into the role of each finger in the DNA binding process as part of the protein structure. The wild type Zif 268 (PDB code: 1AAY) gave a ΔG value of - 76.1 (14) kcal/mol. Zinc fingers ZF1, ZF2 and ZF3 were mutated in one experiment and in another experiment one finger was cut and the rest of the protein was studied for binding. The ΔΔG values for the Zinc Finger protein with both ZF1 and ZF2 mutated was + 80 kcal/mol, while mutating only ZF1 the ΔΔG value was + 52 kcal/mol (relative to the wild type). Cutting ZF3 and studying the protein consisting only of ZF1 linked to ZF2 gave a ΔΔG value of + 68 kcal/mol. Upon cutting ZF1, the resulting ZF2 linked to ZF3 protein gave a ΔΔG value of + 41 kcal/mol. The above results shed light on the importance of each finger in the binding process, especially the role of ZF1 as the anchoring finger followed in importance by ZF2 and ZF3. The energy difference between the binding of the wild type protein Zif268 (1AAY) and that for individual finger binding to DNA according to the formula: ΔΔGlinkers, otherstructuralfactors = ΔGzif268 - (ΔGF1+F2+F3) gave a value = - 44.5 kcal/mol. This stabilization can be attributed to the contribution of linkers and other structural factors in the intact protein in the DNA binding process. DNA binding energies of variant proteins of the wild type Zif268 which differ in their ZF1 amino acid sequence gave evidence of a good relationship between binding energy and recognition and specificity, this finding confirms the reported vital role of ZF1 in the ZF protein scanning and anchoring to the target DNA sequence. The role of hydrogen bonds in both specific and nonspecific amino acid-DNA contacts is discussed in relation to mutations. The binding energies of variant Zinc Finger proteins confirmed the role of ZF1 in the recognition, specificity and anchoring of the zinc finger protein to DNA.
Hamed, Mazen Y
2018-05-03
Molecular dynamics and MM_GBSA energy calculations on various zinc finger proteins containing three and four fingers bound to their target DNA gave insights into the role of each finger in the DNA binding process as part of the protein structure. The wild type Zif 268 (PDB code: 1AAY) gave a ΔG value of - 76.1 (14) kcal/mol. Zinc fingers ZF1, ZF2 and ZF3 were mutated in one experiment and in another experiment one finger was cut and the rest of the protein was studied for binding. The ΔΔG values for the Zinc Finger protein with both ZF1 and ZF2 mutated was + 80 kcal/mol, while mutating only ZF1 the ΔΔG value was + 52 kcal/mol (relative to the wild type). Cutting ZF3 and studying the protein consisting only of ZF1 linked to ZF2 gave a ΔΔG value of + 68 kcal/mol. Upon cutting ZF1, the resulting ZF2 linked to ZF3 protein gave a ΔΔG value of + 41 kcal/mol. The above results shed light on the importance of each finger in the binding process, especially the role of ZF1 as the anchoring finger followed in importance by ZF2 and ZF3. The energy difference between the binding of the wild type protein Zif268 (1AAY) and that for individual finger binding to DNA according to the formula: ΔΔG linkers, otherstructuralfactors = ΔG zif268 - (ΔG F1+F2+F3 ) gave a value = - 44.5 kcal/mol. This stabilization can be attributed to the contribution of linkers and other structural factors in the intact protein in the DNA binding process. DNA binding energies of variant proteins of the wild type Zif268 which differ in their ZF1 amino acid sequence gave evidence of a good relationship between binding energy and recognition and specificity, this finding confirms the reported vital role of ZF1 in the ZF protein scanning and anchoring to the target DNA sequence. The role of hydrogen bonds in both specific and nonspecific amino acid-DNA contacts is discussed in relation to mutations. The binding energies of variant Zinc Finger proteins confirmed the role of ZF1 in the recognition, specificity and anchoring of the zinc finger protein to DNA.
NASA Astrophysics Data System (ADS)
Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf
Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.
Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.
2013-01-01
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Link, Gerhard
1984-01-01
A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540
Moszczynska, Anna; Burghardt, Kyle J.; Yu, Dongyue
2017-01-01
Short interspersed elements (SINEs) are typically silenced by DNA hypermethylation in somatic cells, but can retrotranspose in proliferating cells during adult neurogenesis. Hypomethylation caused by disease pathology or genotoxic stress leads to genomic instability of SINEs. The goal of the present investigation was to determine whether neurotoxic doses of binge or chronic methamphetamine (METH) trigger retrotransposition of the identifier (ID) element, a member of the rat SINE family, in the dentate gyrus genomic DNA. Adult male Sprague-Dawley rats were treated with saline or high doses of binge or chronic METH and sacrificed at three different time points thereafter. DNA methylation analysis, immunohistochemistry and next-generation sequencing (NGS) were performed on the dorsal dentate gyrus samples. Binge METH triggered hypomethylation, while chronic METH triggered hypermethylation of the CpG-2 site. Both METH regimens were associated with increased intensities in poly(A)-binding protein 1 (PABP1, a SINE regulatory protein)-like immunohistochemical staining in the dentate gyrus. The amplification of several ID element sequences was significantly higher in the chronic METH group than in the control group a week after METH, and they mapped to genes coding for proteins regulating cell growth and proliferation, transcription, protein function as well as for a variety of transporters. The results suggest that chronic METH induces ID element retrotransposition in the dorsal dentate gyrus and may affect hippocampal neurogenesis. PMID:28272323
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schriner, J.E.; Yi, W.; Hofmann, S.L.
Palmitoyl-protein thioesterase (PPT) is a small glycoprotein that removes palmitate groups from cysteine residues in lipid-modified proteins. We recently reported mutations in PPT in patients with infantile neuronal ceroid lipofuscinosis (INCL), a severe neurodegenerative disorder. INCL is characterized by the accumulation of proteolipid storage material in brain and other tissues, suggesting that the disease is a consequence of abnormal catabolism of acylated proteins. In the current paper, we report the sequence of the human PPT cDNA and the structure of the human PPT gene. The cDNA predicts a protein of 306 amino acids that contains a 25-amino-acid signal peptide, threemore » N-linked glycosylation sites, and consensus motifs characteristic of thioesterases. Northern analysis of a human tissue blot revealed ubiquitous expression of a single 2.5-kb mRNA, with highest expression in lung, brain, and heart. The human PPT gene spans 25 kb and is composed of seven coding exons and a large eighth exon, containing the entire 3{prime}-untranslated region of 1388 bp. An Alu repeat and promoter elements corresponding to putative binding sites for several general transcription factors were identified in the 1060 nucleotides upstream of the transcription start site. The human PPT cDNA sequence and gene structure will provide the means for the identification of further causative mutations in INCL and facilitate genetic screening in selected high-risk populations. 31 refs., 5 figs., 1 tab.« less
Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph
2017-01-01
Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, O.; Masters, C.; Lewis, M.B.
1994-09-01
In an 8-year-old girl and her father, both of whom have severe type III OI, we have previously used RNA/RNA hybrid analysis to demonstrate a mismatch in the region of {alpha}1(I) mRNA coding for aa 558-861. We used SSCP to further localize the abnormality to a subregion coding for aa 579-679. This region was subcloned and sequenced. Each patient`s cDNA has a deletion of the sequences coding for the last residue of exon 34, and all of exons 35 and 36 (aa 604-639), followed by an insertion of 156 nt from the 3{prime}-end of intron 36. PCR amplification of leukocytemore » DNA from the patients and the clinically normal paternal grandmother yielded two fragments: a 1007 bp fragment predicted from normal genomic sequences and a 445 bp fragment. Subcloning and sequencing of the shorter genomic PCR product confirmed the presence of a 565 bp genomic deletion from the end of exon 34 to the middle of intron 36. The abnormal protein is apparently synthesized and incorporated into helix. The inserted nucleotides are in frame with the collagenous sequence and contain no stop codons. They encode a 52 aa non-collagenous region. The fibroblast procollagen of the patients has both normal and electrophoretically delayed pro{alpha}(I) bands. The electrophoretically delayed procollagen is very sensitive to pepsin or trypsin digestion, as predicted by its non-collagenous sequence, and cannot be visualized as collagen. This unique OI collagen mutation is an excellent candidate for molecular targeting to {open_quotes}turn off{close_quotes} a dominant mutant allele.« less
The Genome of the Western Clawed Frog Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.
2009-10-01
The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less
Nodeomics: Pathogen Detection in Vertebrate Lymph Nodes Using Meta-Transcriptomics
Wittekindt, Nicola E.; Padhi, Abinash; Schuster, Stephan C.; Qi, Ji; Zhao, Fangqing; Tomsho, Lynn P.; Kasson, Lindsay R.; Packard, Michael; Cross, Paul C.; Poss, Mary
2010-01-01
The ongoing emergence of human infections originating from wildlife highlights the need for better knowledge of the microbial community in wildlife species where traditional diagnostic approaches are limited. Here we evaluate the microbial biota in healthy mule deer (Odocoileus hemionus) by analyses of lymph node meta-transcriptomes. cDNA libraries from five individuals and two pools of samples were prepared from retropharyngeal lymph node RNA enriched for polyadenylated RNA and sequenced using Roche-454 Life Sciences technology. Protein-coding and 16S ribosomal RNA (rRNA) sequences were taxonomically profiled using protein and rRNA specific databases. Representatives of all bacterial phyla were detected in the seven libraries based on protein-coding transcripts indicating that viable microbiota were present in lymph nodes. Residents of skin and rumen, and those ubiquitous in mule deer habitat dominated classifiable bacterial species. Based on detection of both rRNA and protein-coding transcripts, we identified two new proteobacterial species; a Helicobacter closely related to Helicobacter cetorum in the Helicobacter pylori/Helicobacter acinonychis complex and an Acinetobacter related to Acinetobacter schindleri. Among viruses, a novel gamma retrovirus and other members of the Poxviridae and Retroviridae were identified. We additionally evaluated bacterial diversity by amplicon sequencing the hypervariable V6 region of 16S rRNA and demonstrate that overall taxonomic diversity is higher with the meta-transcriptomic approach. These data provide the most complete picture to date of the microbial diversity within a wildlife host. Our research advances the use of meta-transcriptomics to study microbiota in wildlife tissues, which will facilitate detection of novel organisms with pathogenic potential to human and animals.
The mitochondrial genome of Moniliophthora roreri, the frosty pod rot pathogen of cacao.
Costa, Gustavo G L; Cabrera, Odalys G; Tiburcio, Ricardo A; Medrano, Francisco J; Carazzolle, Marcelo F; Thomazella, Daniela P T; Schuster, Stephen C; Carlson, John E; Guiltinan, Mark J; Bailey, Bryan A; Mieczkowski, Piotr; Pereira, Gonçalo A G; Meinhardt, Lyndel W
2012-05-01
In this study, we report the sequence of the mitochondrial (mt) genome of the Basidiomycete fungus Moniliophthora roreri, which is the etiologic agent of frosty pod rot of cacao (Theobroma cacao L.). We also compare it to the mtDNA from the closely-related species Moniliophthora perniciosa, which causes witches' broom disease of cacao. The 94 Kb mtDNA genome of M. roreri has a circular topology and codes for the typical 14 mt genes involved in oxidative phosphorylation. It also codes for both rRNA genes, a ribosomal protein subunit, 13 intronic open reading frames (ORFs), and a full complement of 27 tRNA genes. The conserved genes of M. roreri mtDNA are completely syntenic with homologous genes of the 109 Kb mtDNA of M. perniciosa. As in M. perniciosa, M. roreri mtDNA contains a high number of hypothetical ORFs (28), a remarkable feature that make Moniliophthoras the largest reservoir of hypothetical ORFs among sequenced fungal mtDNA. Additionally, the mt genome of M. roreri has three free invertron-like linear mt plasmids, one of which is very similar to that previously described as integrated into the main M. perniciosa mtDNA molecule. Moniliophthora roreri mtDNA also has a region of suspected plasmid origin containing 15 hypothetical ORFs distributed in both strands. One of these ORFs is similar to an ORF in the mtDNA gene encoding DNA polymerase in Pleurotus ostreatus. The comparison to M. perniciosa showed that the 15 Kb difference in mtDNA sizes is mainly attributed to a lower abundance of repetitive regions in M. roreri (5.8 Kb vs 20.7 Kb). The most notable differences between M. roreri and M. perniciosa mtDNA are attributed to repeats and regions of plasmid origin. These elements might have contributed to the rapid evolution of mtDNA. Since M. roreri is the second species of the genus Moniliophthora whose mtDNA genome has been sequenced, the data presented here contribute valuable information for understanding the evolution of fungal mt genomes among closely-related species. Crown Copyright © 2012. Published by Elsevier Ltd. All rights reserved.
Short segment search method for phylogenetic analysis using nested sliding windows
NASA Astrophysics Data System (ADS)
Iskandar, A. A.; Bustamam, A.; Trimarsanto, H.
2017-10-01
To analyze phylogenetics in Bioinformatics, coding DNA sequences (CDS) segment is needed for maximal accuracy. However, analysis by CDS cost a lot of time and money, so a short representative segment by CDS, which is envelope protein segment or non-structural 3 (NS3) segment is necessary. After sliding window is implemented, a better short segment than envelope protein segment and NS3 is found. This paper will discuss a mathematical method to analyze sequences using nested sliding window to find a short segment which is representative for the whole genome. The result shows that our method can find a short segment which more representative about 6.57% in topological view to CDS segment than an Envelope segment or NS3 segment.
Johnston, Christine; Magaret, Amalia; Roychoudhury, Pavitra; Greninger, Alexander L; Cheng, Anqi; Diem, Kurt; Fitzgibbon, Matthew P; Huang, Meei-Li; Selke, Stacy; Lingappa, Jairam R; Celum, Connie; Jerome, Keith R; Wald, Anna; Koelle, David M
2017-10-01
Understanding the variability in circulating herpes simplex virus type 2 (HSV-2) genomic sequences is critical to the development of HSV-2 vaccines. Genital lesion swabs containing ≥ 10 7 log 10 copies HSV DNA collected from Africa, the USA, and South America underwent next-generation sequencing, followed by K-mer based filtering and de novo genomic assembly. Sites of heterogeneity within coding regions in unique long and unique short (U L _U S ) regions were identified. Phylogenetic trees were created using maximum likelihood reconstruction. Among 46 samples from 38 persons, 1468 intragenic base-pair substitutions were identified. The maximum nucleotide distance between strains for concatenated U L_ U S segments was 0.4%. Phylogeny did not reveal geographic clustering. The most variable proteins had non-synonymous mutations in < 3% of amino acids. Unenriched HSV-2 DNA can undergo next-generation sequencing to identify intragenic variability. The use of clinical swabs for sequencing expands the information that can be gathered directly from these specimens. Copyright © 2017 Elsevier Inc. All rights reserved.
Signatures of DNA Methylation across Insects Suggest Reduced DNA Methylation Levels in Holometabola
Provataris, Panagiotis; Meusemann, Karen; Niehuis, Oliver; Grath, Sonja; Misof, Bernhard
2018-01-01
Abstract It has been experimentally shown that DNA methylation is involved in the regulation of gene expression and the silencing of transposable element activity in eukaryotes. The variable levels of DNA methylation among different insect species indicate an evolutionarily flexible role of DNA methylation in insects, which due to a lack of comparative data is not yet well-substantiated. Here, we use computational methods to trace signatures of DNA methylation across insects by analyzing transcriptomic and genomic sequence data from all currently recognized insect orders. We conclude that: 1) a functional methylation system relying exclusively on DNA methyltransferase 1 is widespread across insects. 2) DNA methylation has potentially been lost or extremely reduced in species belonging to springtails (Collembola), flies and relatives (Diptera), and twisted-winged parasites (Strepsiptera). 3) Holometabolous insects display signs of reduced DNA methylation levels in protein-coding sequences compared with hemimetabolous insects. 4) Evolutionarily conserved insect genes associated with housekeeping functions tend to display signs of heavier DNA methylation in comparison to the genomic/transcriptomic background. With this comparative study, we provide the much needed basis for experimental and detailed comparative analyses required to gain a deeper understanding on the evolution and function of DNA methylation in insects. PMID:29697817
BuD, a helix–loop–helix DNA-binding domain for genome modification
Stella, Stefano; Molina, Rafael; López-Méndez, Blanca; Juillerat, Alexandre; Bertonati, Claudia; Daboussi, Fayza; Campos-Olivas, Ramon; Duchateau, Phillippe; Montoya, Guillermo
2014-01-01
DNA editing offers new possibilities in synthetic biology and biomedicine for modulation or modification of cellular functions to organisms. However, inaccuracy in this process may lead to genome damage. To address this important problem, a strategy allowing specific gene modification has been achieved through the addition, removal or exchange of DNA sequences using customized proteins and the endogenous DNA-repair machinery. Therefore, the engineering of specific protein–DNA interactions in protein scaffolds is key to providing ‘toolkits’ for precise genome modification or regulation of gene expression. In a search for putative DNA-binding domains, BurrH, a protein that recognizes a 19 bp DNA target, was identified. Here, its apo and DNA-bound crystal structures are reported, revealing a central region containing 19 repeats of a helix–loop–helix modular domain (BurrH domain; BuD), which identifies the DNA target by a single residue-to-nucleotide code, thus facilitating its redesign for gene targeting. New DNA-binding specificities have been engineered in this template, showing that BuD-derived nucleases (BuDNs) induce high levels of gene targeting in a locus of the human haemoglobin β (HBB) gene close to mutations responsible for sickle-cell anaemia. Hence, the unique combination of high efficiency and specificity of the BuD arrays can push forward diverse genome-modification approaches for cell or organism redesign, opening new avenues for gene editing. PMID:25004980
Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl
2014-07-04
Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.
The protective function of noncoding DNA in genome defense of eukaryotic male germ cells.
Qiu, Guo-Hua; Huang, Cuiqin; Zheng, Xintian; Yang, Xiaoyan
2018-04-01
Peripheral and abundant noncoding DNA has been hypothesized to protect the genome and the central protein-coding sequences against DNA damage in somatic genome. In the cytosol, invading exogenous nucleic acids may first be deactivated by small RNAs encoded by noncoding DNA via mechanisms similar to the prokaryotic CRISPR-Cas system. In the nucleus, the radicals generated by radiation in the cytosol, radiation energy and invading exogenous nucleic acids are absorbed, blocked and/or reduced by peripheral heterochromatin, and damaged DNA in heterochromatin is removed and excluded from the nucleus to the cytoplasm through nuclear pore complexes. To further strengthen the hypothesis, this review summarizes the experimental evidence supporting the protective function of noncoding DNA in the genome of male germ cells. Based on these data, this review provides evidence supporting the protective role of noncoding DNA in the genome defense of sperm genome through similar mechanisms to those of the somatic genome.
Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L
2008-10-04
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases.
Iyer, Lakshminarayan M; Abhiman, Saraswathi; Aravind, L
2008-01-01
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases. This article was reviewed by Eugene Koonin and Mark Ragan. PMID:18834537
2010-01-01
Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840
He, Xiao-Lan; Li, Qian; Peng, Wei-Hong; Zhou, Jie; Cao, Xue-Lian; Wang, Di; Huang, Zhong-Qian; Tan, Wei; Li, Yu; Gan, Bing-Cheng
2017-06-26
The internal transcribed spacer (ITS), RNA polymerase II second largest subunit (RPB2), and elongation factor 1-alpha (EF1α) are often used in fungal taxonomy and phylogenetic analysis. As we know, an ideal molecular marker used in molecular identification and phylogenetic studies is homogeneous within species, and interspecific variation exceeds intraspecific variation. However, during our process of performing ITS, RPB2, and EF1α sequencing on the Pleurotus spp., we found that intra-isolate sequence polymorphism might be present in these genes because direct sequencing of PCR products failed in some isolates. Therefore, we detected intra- and inter-isolate variation of the three genes in Pleurotus by polymerase chain reaction amplification and cloning in this study. Results showed that intra-isolate variation of ITS was not uncommon but the polymorphic level in each isolate was relatively low in Pleurotus; intra-isolate variations of EF1α and RPB2 sequences were present in an unexpectedly high amount. The polymorphism level differed significantly between ITS, RPB2, and EF1α in the same individual, and the intra-isolate heterogeneity level of each gene varied between isolates within the same species. Intra-isolate and intraspecific variation of ITS in the tested isolates was less than interspecific variation, and intra-isolate and intraspecific variation of RPB2 was probably equal with interspecific divergence. Meanwhile, intra-isolate and intraspecific variation of EF1α could exceed interspecific divergence. These findings suggested that RPB2 and EF1α are not desirable barcoding candidates for Pleurotus. We also discussed the reason why rDNA and protein-coding genes showed variants within a single isolate in Pleurotus, but must be addressed in further research. Our study demonstrated that intra-isolate variation of ribosomal and protein-coding genes are likely widespread in fungi. This has implications for studies on fungal evolution, taxonomy, phylogenetics, and population genetics. More extensive sampling of these genes and other candidates will be required to ensure reliability as phylogenetic markers and DNA barcodes.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-06-24
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 x 10(5) cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination.
Li, Yu-Ping; Xia, Run-Xi; Wang, Huan; Li, Xi-Sheng; Liu, Yan-Qun; Wei, Zhao-Jun; Lu, Cheng; Xiang, Zhong-Huai
2009-01-01
In this study we successfully constructed a full-length cDNA library from Chinese oak silkworm, Antheraea pernyi, the most well-known wild silkworm used for silk production and insect food. Total RNA was extracted from a single fresh female pupa at the diapause stage. The titer of the library was 5 × 105 cfu/ml and the proportion of recombinant clones was approximately 95%. Expressed sequence tag (EST) analysis was used to characterize the library. A total of 175 clustered ESTs consisting of 24 contigs and 151 singlets were generated from 250 effective sequences. Of the 175 unigenes, 97 (55.4%) were known genes but only five from A. pernyi, 37 (21.2%) were known ESTs without function annotation, and 41 (23.4%) were novel ESTs. By EST sequencing, a gene coding KK-42-binding protein in A. pernyi (named as ApKK42-BP; GenBank accession no. FJ744151) was identified and characterized. Protein sequence analysis showed that ApKK42-BP was not a membrane protein but an extracellular protein with a signal peptide at position 1-18, and contained two putative conserved domains, abhydro_lipase and abhydrolase_1, suggesting it may be a member of lipase superfamily. Expression analysis based on number of ESTs showed that ApKK42-BP was an abundant gene in the period of diapause stage, suggesting it may also be involved in pupa-diapause termination. PMID:19564928
Hao, Yan-Zhe; Hou, Wan-Ru; Hou, Yi-Ling; Du, Yu-Jie; Zhang, Tian; Peng, Zheng-Song
2009-11-01
RPS25 is a component of the 40S small ribosomal subunit encoded by RPS25 gene, which is specific to eukaryotes. Studies in reference to RPS25 gene from animals were handful. The Giant Panda (Ailuropoda melanoleuca), known as a "living fossil", are increasingly concerned by the world community. Studies on RPS25 of the Giant Panda could provide scientific data for inquiring into the hereditary traits of the gene and formulating the protective strategy for the Giant Panda. The cDNA of the RPS25 cloned from Giant Panda is 436 bp in size, containing an open reading frame of 378 bp encoding 125 amino acids. The length of the genomic sequence is 1,992 bp, which was found to possess four exons and three introns. Alignment analysis indicated that the nucleotide sequence of the coding sequence shows a high homology to those of Homo sapiens, Bos taurus, Mus musculus and Rattus norvegicus as determined by Blast analysis, 92.6, 94.4, 89.2 and 91.5%, respectively. Primary structure analysis revealed that the molecular weight of the putative RPS25 protein is 13.7421 kDa with a theoretical pI 10.12. Topology prediction showed there is one N-glycosylation site, one cAMP and cGMP-dependent protein kinase phosphorylation site, two Protein kinase C phosphorylation sites and one Tyrosine kinase phosphorylation site in the RPS25 protein of the Giant Panda. The RPS25 gene was overexpressed in E. coli BL21 and Western Blotting of the RPS25 protein was also done. The results indicated that the RPS25 gene can be really expressed in E. coli and the RPS25 protein fusioned with the N-terminally his-tagged form gave rise to the accumulation of an expected 17.4 kDa polypeptide. The cDNA and the genomic sequence of RPS25 were cloned successfully for the first time from the Giant Panda using RT-PCR technology and Touchdown-PCR, respectively, which were both sequenced and analyzed preliminarily; then the cDNA of the RPS25 gene was overexpressed in E. coli BL21 and immunoblotted, which is the first report on the RPS25 gene from the Giant Panda. The data will enrich and supplement the information about RPS25, which will contribute to the protection for gene resources and the discussion of the genetic polymorphism.
Focareta, T; Manning, P A
1987-01-01
The gene encoding the extracellular DNase of Vibrio cholerae was cloned into Escherichia coli K-12. A maximal coding region of 1.2 kb and a minimal region of 0.6 kb were determined by transposon mutagenesis and deletion analysis. The nucleotide sequence of this region contained a single open reading frame of 690 bp corresponding to a protein of Mr 26,389 with a typical N-terminal signal sequence of 18 aa which, when removed, would give a mature protein of Mr 24,163. This is in good agreement with the size of 24 kDa, calculated directly by Coomassie blue staining following sodium dodecyl sulphate-polyacrylamide gel electrophoresis and indirectly via a DNA-hydrolysis assay. The protein is located in the periplasmic space of E. coli K-12 unlike in V. cholerae where it is excreted into the extracellular medium. The introduction of the DNase gene into a periplasmic (tolA) leaky mutant of E. coli K-12 facilitates the release of the protein, further confirming the periplasmic location.
Traverse, Charles C; Ochman, Howard
2017-08-29
Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola , which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. IMPORTANCE The high level of mistakes generated during transcription can result in the accumulation of malfunctioning and misfolded proteins which can alter global gene regulation and in the expenditure of energy to degrade these nonfunctional proteins. The transcriptome-wide occurrence of base substitutions has been elucidated in bacteria, but information on transcription insertions and deletions-errors that potentially have more dire effects on protein function-is limited to reporter gene constructs. Here, we capture the transcriptome-wide spectrum of insertions and deletions in Escherichia coli and Buchnera aphidicola and show that they occur at rates approaching those of base substitutions. Knowledge of the full extent of sequences subject to transcription indels supports a new model of bacterial transcription slippage, one that relies on the number of complementary bases between the transcript and the DNA template to which it slipped. Copyright © 2017 Traverse and Ochman.
NASA Astrophysics Data System (ADS)
Xu, Kuipeng; Tang, Xianghai; Bi, Guiqi; Cao, Min; Wang, Lu; Mao, Yunxiang
2017-08-01
Pyropia species grow in the intertidal zone and are cold-water adapted. To date, most of the information about the whole plastid and mitochondrial genomes (ptDNA and mtDNA) of this genus is limited to Northern Hemisphere species. Here, we report the sequencing of the ptDNA and mtDNA of the Antarctic red alga Pyropia endiviifolia using the Illumina platform. The plastid genome (195 784 bp, 33.28% GC content) contains 210 protein-coding genes, 37 tRNA genes and 6 rRNA genes. The mitochondrial genome (34 603 bp, 30.5% GC content) contains 26 protein-coding genes, 25 tRNA genes and 2 rRNA genes. Our results suggest that the organellar genomes of Py. endiviifolia have a compact organization. Although the collinearity of these genomes is conserved compared with other Pyropia species, the genome sizes show significant differences, mainly because of the different copy numbers of rDNA operons in the ptDNA and group II introns in the mtDNA. The other Pyropia species have 2u20133 distinct intronic ORFs in their cox 1 genes, but Py. endiviifolia has no introns in its cox 1 gene. This has led to a smaller mtDNA than in other Pyropia species. The phylogenetic relationships within Pyropia were examined using concatenated gene sets from most of the available organellar genomes with both the maximum likelihood and Bayesian methods. The analysis revealed a sister taxa affiliation between the Antarctic species Py. endiviifolia and the North American species Py. kanakaensis.
[Exome sequencing revealed Allan-Herndon-Dudley syndrome underlying multiple disabilities].
Arvio, Maria; Philips, Anju K; Ahvenainen, Minna; Somer, Mirja; Kalscheuer, Vera; Järvelä, Irma
2014-01-01
Normal function of the thyroid gland is the cornerstone of a child's mental development and physical growth. We describe a Finnish family, in which the diagnosis of three brothers became clear after investigations that lasted for more than 30 years. Two of the sons have already died. DNA analysis of the third one, a 16-year-old boy, revealed in exome sequencing of the complete X chromosome a mutation in the SLC16A2 gene, i.e. MCT8, coding for a thyroid hormone transport protein. Allan-Herndon-Dudley syndrome was thus shown to be the cause of multiple disabilities.
The genetic basis of adaptive pigmentation variation in Drosophila melanogaster.
Pool, John E; Aquadro, Charles F
2007-07-01
In a broad survey of Drosophila melanogaster population samples, levels of abdominal pigmentation were found to be highly variable and geographically differentiated. A strong positive correlation was found between dark pigmentation and high altitude, suggesting adaptation to specific environments. DNA sequence polymorphism at the candidate gene ebony revealed a clear association with the pigmentation of homozygous third chromosome lines. The darkest lines sequenced had nearly identical haplotypes spanning 14.5 kb upstream of the protein-coding exons of ebony. Thus, natural selection may have elevated the frequency of an allele that confers dark abdominal pigmentation by influencing the regulation of ebony.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jensen, B.A.; Hahn, M.E.
1995-12-31
The aryl hydrocarbon receptor (AhR) mediates the effects of many common and potentially toxic organic hydrocarbons, including some polychlorinated biphenyls and dioxins. Since small cetaceans often inhabit industrially polluted coastal waters, comparison of the molecular structure and function of this protein in cetaeans with other marine and mammalian species is important for evaluating the sensitivity of cetaceans to these pollutants. An AhR protein has been identified in beluga liver by photoaffinity labeling. In the present study, the authors sought to clone and sequence an AhR cDNA from beluga as a prelude to studying its structure and function, using reverse-transcription polymerasemore » chain reaction (RT-PCR) and degenerate primers, a 515 base pair fragment was amplified, cloned and sequenced, revealing homology to the PAS domain (ligand binding and dimerization region) of AhRs from terrestrial mammals. This portion of the putative beluga AhR has 82% amino acid and 81% nucleotide sequence identity to the mouse AhR, and 63% amino acid and 64% nucleotide sequence identity to an AhR from the marine fish Fundulus heteroclitus. A beluga cDNA library was synthesized and is currently being screened with the PCR-generated fragment to obtain the complete coding sequence. This is the first molecular evidence of AhR presence in cetaceans.« less
Martínez-Quintana, José A; Peregrino-Uriarte, Alma B; Gollas-Galván, Teresa; Gómez-Jiménez, Silvia; Yepiz-Plascencia, Gloria
2014-12-01
During hypoxia the shrimp Litopenaeus vannamei accelerates anaerobic glycolysis to obtain energy; therefore, a correct supply of glucose to the cells is needed. Facilitated glucose transport across the cells is mediated by a group of membrane embedded integral proteins called GLUT; being GLUT1 the most ubiquitous form. In this work, we report the first cDNA nucleotide and deduced amino acid sequences of a glucose transporter 1 from L. vannamei. A 1619 bp sequence was obtained by RT-PCR and RACE approaches. The 5´ UTR is 161 bp and the poly A tail is exactly after the stop codon in the mRNA. The ORF is 1485 bp and codes for 485 amino acids. The deduced protein sequence has high identity to GLUT1 proteins from several species and contains all the main features of glucose transporter proteins, including twelve transmembrane domains, the conserved motives and amino acids involved in transport activity, ligands binding and membrane anchor. Therefore, we decided to name this sequence, glucose transporter 1 of L. vannamei (LvGLUT1). A partial gene sequence of 8.87 Kbp was also obtained; it contains the complete coding sequence divided in 10 exons. LvGlut1 expression was detected in hemocytes, hepatopancreas, intestine gills, muscle and pleopods. The higher relative expression was found in gills and the lower in hemocytes. This indicates that LvGlut1 is ubiquitously expressed but its levels are tissue-specific and upon short-term hypoxia, the GLUT1 transcripts increase 3.7-fold in hepatopancreas and gills. To our knowledge, this is the first evidence of expression of GLUT1 in crustaceans.
Complete mitochondrial genome of the moon jellyfish, Aurelia sp. nov. (Cnidaria, Scyphozoa).
Hwang, Dae-Sik; Park, Eunji; Won, Yong-Jin; Lee, Jae-Seong
2014-02-01
We sequenced 16,971 bp of the linear mitochondrial DNA of the moon jellyfish Aurelia sp. nov. and characterized it by comparing with Aurelia aurita. They had 13 protein-coding genes (PCGs), 16S rRNA and 12S rRNA with three tRNAs (tRNA-Leu, tRNA-Ser(TGA), tRNA-Met). Both have another two PCGs, orf969 and orf324 with telomeres at both ends. After comparison of Aurelia sp. nov. with Aurelia aurita, we found low-protein similarity of orf969 (59%) and orf324 (75%), respectively, while the other 13 PCGs showed 80% to 98% protein similarities.
The Dermatophagoides farinae group 22 allergen: cloning and expression in Escherichia coli.
Cui, Yu-bao; Cai, Hong-xing; Zhou, Ying; Wang, Nan; Yu, Li-li; Yang, Li; Zhang, Cheng-bo
2015-09-01
Dermatophagoides farinae (Hughes) (Acari: Pyroglyphidae) and other domestic mites produce allergens that affect people worldwide. Here, the complementary DNA (cDNA) coding for group 22 allergen of D. farinae (Der f 22) from China was cloned, sequenced, and expressed successfully. The cDNA encoding Der f 22 was synthesized by reverse transcription polymerase chain reaction (RT-PCR), then ligated to the pCold-TF for expression in Escherichia coli BL21 cells. The purified recombinant fusion protein was identified by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), Western-blotting, and tandem matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF/TOF). The full-length cDNA comprised 468 nucleotides and was 99.57% (466/468) identical with the reference sequence (GenBank: DQ643992). After the plasmid pCold-TF-Der f 22 was transformed into E. coli BL21 and expressed with the induction of IPTG, SDS-PAGE showed a specific band for the recombinant fusion protein. The recombinant fusion protein, which was purified by chromatography, bound with a His-tagged antibody by Western blotting. MALDI-TOF/TOF mass spectrometry revealed that the structure of the recombinant protein was identical to the predicted Der f 22 structure. The hydrophilic protein contains a signal peptide of 20 amino acids, and the mature Der f 22 consists of 135 amino acid residues with a molecular weight of 14.7 kDa and theoretical isoelectric points (pI) of 6.38. Its secondary structure comprises an alpha helix (38.5%), beta-sheet (45.9%), random coils (11.85%), and beta-turns (11.1%). This work represents the first reported full-length sequence and successful cloning of Der f 22 from D. farinae in China; bioinformatics analysis can be used to further study the allergenicity and clinical utility of the recombinant Der f 22. © 2015 ARS-AAOA, LLC.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.
Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A
2010-02-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer
Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.
2010-01-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Identification and characterization of novel reptile cathelicidins from elapid snakes.
Zhao, Hui; Gan, Tong-Xiang; Liu, Xiao-Dong; Jin, Yang; Lee, Wen-Hui; Shen, Ji-Hong; Zhang, Yun
2008-10-01
Three cDNA sequences coding for elapid cathelicidins were cloned from constructed venom gland cDNA libraries of Naja atra, Bungarus fasciatus and Ophiophagus hannah. The open reading frames of the cloned elapid cathelicidins were all composed of 576bp and coded for 191 amino acid residue protein precursors. Each of the deduced elapid cathelicidin has a 22 amino acid residue signal peptide, a conserved cathelin domain of 135 amino acid residues and a mature antimicrobial peptide of 34 amino acid residues. Unlike the highly divergent cathelicidins in mammals, the nucleotide and deduced protein sequences of the three cloned elapid cathelicidins were remarkably conserved. All the elapid mature cathelicidins were predicted to be cleaved at Valine157 by elastase. OH-CATH, the deduced mature cathelicidin from king cobra, was chemically synthesized and it showed strong antibacterial activity against various bacteria with minimal inhibitory concentration of 1-20microg/ml in the presence of 1% NaCl. Meanwhile, the synthetic peptide showed no haemolytic activity toward human red blood cells even at a high dose of 200microg/ml. Phylogenetic analysis of cathelicidins from vertebrate suggested that elapid and viperid cathelicidins were grouped together in the tree. Snake cathelicidins were evolutionary closely related to the neutrophilic granule proteins (NGPs) from mouse, rat and rabbit. Snake cathelicidins also showed a close relationship with avian fowlicidins (1-3) and chicken myeloid antimicrobial peptide 27. Elapid cathelicidins might be used as models for the development of novel therapeutic drugs.
Cai, Jin; Han, Yu; Ren, Hongmei; Chen, Caiyu; He, Duofen; Zhou, Lin; Eisner, Gilbert M.; Asico, Laureano D.; Jose, Pedro A.; Zeng, Chunyu
2013-01-01
Extracellular vesicles (EVs) carry signals within or at their limiting membranes, providing a mechanism by which cells can exchange more complex information than what was previously thought. In addition to mRNAs and microRNAs, there are DNA fragments in EVs. Solexa sequencing indicated the presence of at least 16434 genomic DNA (gDNA) fragments in the EVs from human plasma. Immunofluorescence study showed direct evidence that acridine orange-stained EV DNAs could be transferred into the cells and localize to and inside the nuclear membrane. However, whether the transferred EV DNAs are functional or not is not clear. We found that EV gDNAs could be homologously or heterologously transferred from donor cells to recipient cells, and increase gDNA-coding mRNA, protein expression, and function (e.g. AT1 receptor). An endogenous promoter of the AT1 receptor, NF-κB, could be recruited to the transferred DNAs in the nucleus, and increase the transcription of AT1 receptor in the recipient cells. Moreover, the transferred EV gDNAs have pathophysiological significance. BCR/ABL hybrid gene, involved in the pathogenesis of chronic myeloid leukemia, could be transferred from K562 EVs to HEK293 cells or neutrophils. Our present study shows that the gDNAs transferred from EVs to cells have physiological significance, not only to increase the gDNA-coding mRNA and protein levels, but also to influence function in recipient cells. PMID:23580760
Hong, S B; Hwang, I; Dessaux, Y; Guyon, P; Kim, K S; Farrand, S K
1997-01-01
The mechanisms that ensure that Ti plasmid T-DNA genes encoding proteins involved in the biosynthesis of opines in crown gall tumors are always matched by Ti plasmid genes conferring the ability to catabolize that set of opines on the inducing Agrobacterium strains are unknown. The pathway for the biosynthesis of the opine agropine is thought to require an enzyme, mannopine cyclase, coded for by the ags gene located in the T(R) region of octopine-type Ti plasmids. Extracts prepared from agropine-type tumors contained an activity that cyclized mannopine to agropine. Tumor cells containing a T region in which ags was mutated lacked this activity and did not contain agropine. Expression of ags from the lac promoter conferred mannopine-lactonizing activity on Escherichia coli. Agrobacterium tumefaciens strains harboring an octopine-type Ti plasmid exhibit a similar activity which is not coded for by ags. Analysis of the DNA sequence of the gene encoding this activity, called agcA, showed it to be about 60% identical to T-DNA ags genes. Relatedness decreased abruptly in the 5' and 3' untranslated regions of the genes. ags is preceded by a promoter that functions only in the plant. Expression analysis showed that agcA also is preceded by its own promoter, which is active in the bacterium. Translation of agcA yielded a protein of about 45 kDa, consistent with the size predicted from the DNA sequence. Antibodies raised against the agcA product cross-reacted with the anabolic enzyme. These results indicate that the agropine system arose by a duplication of a progenitor gene, one copy of which became associated with the T-DNA and the other copy of which remained associated with the bacterium. PMID:9244272
Traverse, Charles C.
2017-01-01
ABSTRACT Advances in sequencing technologies have enabled direct quantification of genome-wide errors that occur during RNA transcription. These errors occur at rates that are orders of magnitude higher than rates during DNA replication, but due to technical difficulties such measurements have been limited to single-base substitutions and have not yet quantified the scope of transcription insertions and deletions. Previous reporter gene assay findings suggested that transcription indels are produced exclusively by elongation complex slippage at homopolymeric runs, so we enumerated indels across the protein-coding transcriptomes of Escherichia coli and Buchnera aphidicola, which differ widely in their genomic base compositions and incidence of repeat regions. As anticipated from prior assays, transcription insertions prevailed in homopolymeric runs of A and T; however, transcription deletions arose in much more complex sequences and were rarely associated with homopolymeric runs. By reconstructing the relocated positions of the elongation complex as inferred from the sequences inserted or deleted during transcription, we show that continuation of transcription after slippage hinges on the degree of nucleotide complementarity within the RNA:DNA hybrid at the new DNA template location. PMID:28851848
Márquez, Edna J; Castro, Erick R; Alzate, Juan F
2016-01-01
The queen conch Strombus gigas is an endangered marine gastropod of significant economic importance across the Greater Caribbean region. This work reports for the first time the complete mitochondrial genome of S. gigas, obtained by FLX 454 pyrosequencing. The mtDNA genome encodes for 13 proteins, 22 tRNAs and 2 ribosomal RNAs. In addition, the coding sequences and gene synteny were similar to other previously reported mitogenomes of gastropods.
de Moura, Tatiana R; Oliveira, Fabiano; Carneiro, Marcia W; Miranda, José Carlos; Clarêncio, Jorge; Barral-Netto, Manoel; Brodskyn, Cláudia; Barral, Aldina; Ribeiro, José M C; Valenzuela, Jesus G; de Oliveira, Camila I
2013-01-01
Leishmania parasites are transmitted in the presence of sand fly saliva. Together with the parasite, the sand fly injects salivary components that change the environment at the feeding site. Mice immunized with Phlebotomus papatasi salivary gland (SG) homogenate are protected against Leishmania major infection, while immunity to Lutzomyia intermedia SG homogenate exacerbated experimental Leishmania braziliensis infection. In humans, antibodies to Lu. intermedia saliva are associated with risk of acquiring L. braziliensis infection. Despite these important findings, there is no information regarding the repertoire of Lu. intermedia salivary proteins. A cDNA library from the Salivary Glands (SGs) of wild-caught Lu. intermedia was constructed, sequenced, and complemented by a proteomic approach based on 1D SDS PAGE and mass/mass spectrometry to validate the transcripts present in this cDNA library. We identified the most abundant transcripts and proteins reported in other sand fly species as well as novel proteins such as neurotoxin-like proteins, peptides with ML domain, and three small peptides found so far only in this sand fly species. DNA plasmids coding for ten selected transcripts were constructed and used to immunize BALB/c mice to study their immunogenicity. Plasmid Linb-11--coding for a 4.5-kDa protein--induced a cellular immune response and conferred protection against L. braziliensis infection. This protection correlated with a decreased parasite load and an increased frequency of IFN-γ-producing cells. We identified the most abundant and novel proteins present in the SGs of Lu. intermedia, a vector of cutaneous leishmaniasis in the Americas. We also show for the first time that immunity to a single salivary protein from Lu. intermedia can protect against cutaneous leishmaniasis caused by L. braziliensis.
Greenberg, Jay R.; Perry, Robert P.
1971-01-01
The relationship of the DNA sequences from which polyribosomal messenger RNA (mRNA) and heterogeneous nuclear RNA (NRNA) of mouse L cells are transcribed was investigated by means of hybridization kinetics and thermal denaturation of the hybrids. Hybridization was performed in formamide solutions at DNA excess. Under these conditions most of the hybridizing mRNA and NRNA react at values of Dot (DNA concentration multiplied by time) expected for RNA transcribed from the nonrepeated or rarely repeated fraction of the genome. However, a fraction of both mRNA and NRNA hybridize at values of Dot about 10,000 times lower, and therefore must be transcribed from highly redundant DNA sequences. The fraction of NRNA hybridizing to highly repeated sequences is about 1.7 times greater than the corresponding fraction of mRNA. The hybrids formed by the rapidly reacting fractions of both NRNA and mRNA melt over a narrow temperature range with a midpoint about 11°C below that of native L cell DNA. This indicates that these hybrids consist of partially complementary sequences with approximately 11% mismatching of bases. Hybrids formed by the slowly reacting fraction of NRNA melt within 4°–6°C of native DNA, indicating very little, if any, mismatching of bases. Hybrids of the slowly reacting components of mRNA, formed under conditions of sufficiently low RNA input, have a high thermal stability, similar to that observed for hybrids of the slowly reacting NRNA component. However, when higher inputs of mRNA are used, hybrids are formed which have a strikingly lower thermal stability. This observation can be explained by assuming that there is sufficient similarity among the relatively rare DNA sequences coding for mRNA so that under hybridization conditions, in which these DNA sequences are not truly in excess, reversible hybrids exhibiting a considerable amount of mispairing are formed. The fact that a comparable phenomenon has not been observed for NRNA may mean that there is less similarity among the relatively rare DNA sequences coding for NRNA than there is among the rare sequences coding for mRNA. PMID:4999767
Iwasaki, H; Shiba, T; Makino, K; Nakata, A; Shinagawa, H
1989-01-01
The ruvA and ruvB genes of Escherichia coli constitute an operon which belongs to the SOS regulon. Genetic evidence suggests that the products of the ruv operon are involved in DNA repair and recombination. To begin biochemical characterization of these proteins, we developed a plasmid system that overproduced RuvB protein to 20% of total cell protein. Starting from the overproducing system, we purified RuvB protein. The purified RuvB protein behaved like a monomer in gel filtration chromatography and had an apparent relative molecular mass of 38 kilodaltons in sodium dodecyl sulfate-polyacrylamide gel electrophoresis, which agrees with the value predicted from the DNA sequence. The amino acid sequence of the amino-terminal region of the purified protein was analyzed, and the sequence agreed with the one deduced from the DNA sequence. Since the deduced sequence of RuvB protein contained the consensus sequence for ATP-binding proteins, we examined the ATP-binding and ATPase activities of the purified RuvB protein. RuvB protein had a stronger affinity to ADP than to ATP and weak ATPase activity. The results suggest that the weak ATPase activity of RuvB protein is at least partly due to end product inhibition by ADP. Images PMID:2529252
The Diversity Present in 5140 Human Mitochondrial Genomes
Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.
2009-01-01
We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
2014-01-01
Background Nematodirus spp. are among the most common nematodes of ruminants worldwide. N. oiratianus and N. spathiger are distributed worldwide as highly prevalent gastrointestinal nematodes, which cause emerging health problems and economic losses. Accurate identification of Nematodirus species is essential to develop effective control strategies for Nematodirus infection in ruminants. Mitochondrial DNA (mtDNA) could provide powerful genetic markers for identifying these closely related species and resolving phylogenetic relationships at different taxonomic levels. Methods In the present study, the complete mitochondrial (mt) genomes of N. oiratianus and N. spathiger from small ruminants in China were obtained using Long-range PCR and sequencing. Results The complete mt genomes of N. oiratianus and N. spathiger were 13,765 bp and 13,519 bp in length, respectively. Both mt genomes were circular and consisted of 36 genes, including 12 genes encoding proteins, 2 genes encoding rRNA, and 22 genes encoding tRNA. Phylogenetic analyses based on the concatenated amino acid sequence data of all 12 protein-coding genes by Bayesian inference (BI), Maximum likelihood (ML) and Maximum parsimony (MP) showed that the two Nematodirus species (Molineidae) were closely related to Dictyocaulidae. Conclusions The availability of the complete mtDNA sequences of N. oiratianus and N. spathiger not only provides new mtDNA sources for a better understanding of nematode mt genomics and phylogeny, but also provides novel and useful genetic markers for studying diagnosis, population genetics and molecular epidemiology of Nematodirus spp. in small ruminants. PMID:25015379
Zhao, Guang-Hui; Jia, Yan-Qing; Cheng, Wen-Yu; Zhao, Wen; Bian, Qing-Qing; Liu, Guo-Hua
2014-07-11
Nematodirus spp. are among the most common nematodes of ruminants worldwide. N. oiratianus and N. spathiger are distributed worldwide as highly prevalent gastrointestinal nematodes, which cause emerging health problems and economic losses. Accurate identification of Nematodirus species is essential to develop effective control strategies for Nematodirus infection in ruminants. Mitochondrial DNA (mtDNA) could provide powerful genetic markers for identifying these closely related species and resolving phylogenetic relationships at different taxonomic levels. In the present study, the complete mitochondrial (mt) genomes of N. oiratianus and N. spathiger from small ruminants in China were obtained using Long-range PCR and sequencing. The complete mt genomes of N. oiratianus and N. spathiger were 13,765 bp and 13,519 bp in length, respectively. Both mt genomes were circular and consisted of 36 genes, including 12 genes encoding proteins, 2 genes encoding rRNA, and 22 genes encoding tRNA. Phylogenetic analyses based on the concatenated amino acid sequence data of all 12 protein-coding genes by Bayesian inference (BI), Maximum likelihood (ML) and Maximum parsimony (MP) showed that the two Nematodirus species (Molineidae) were closely related to Dictyocaulidae. The availability of the complete mtDNA sequences of N. oiratianus and N. spathiger not only provides new mtDNA sources for a better understanding of nematode mt genomics and phylogeny, but also provides novel and useful genetic markers for studying diagnosis, population genetics and molecular epidemiology of Nematodirus spp. in small ruminants.